de.tudarmstadt.ukp.jwktl.parser
Class WiktionaryEntryParser

java.lang.Object
  extended by de.tudarmstadt.ukp.jwktl.parser.WiktionaryEntryParser
All Implemented Interfaces:
IWiktionaryEntryParser
Direct Known Subclasses:
DEWiktionaryEntryParser, ENWiktionaryEntryParser, RUWiktionaryEntryParser

public abstract class WiktionaryEntryParser
extends Object
implements IWiktionaryEntryParser

Base implementation for parsing the textual contents of an article page in order to construct IWiktionaryEntry and IWiktionarySense instances. The parser is based on a finite state machine using a set of block handlers that are being asked if they want to process the current line of text. If so, the handler is in a position to process the subsequent lines until the entire block has been processed and the next line is subject to initialize a different block handler. Since there are large differences between the individual Wiktionary language editions, there should be one subclass of this parser for each language edition, which cares about language-specific adaptation and the selection of the block handlers used.

Author:
Christian M. Meyer, Christof Müller

Field Summary
protected static Pattern COMMENT_PATTERN
           
protected  long entryId
           
protected  List<IBlockHandler> handlers
           
protected static Pattern IMAGE_PATTERN
           
protected  ILanguage language
           
protected  String redirectTemplate
           
protected static Pattern REFERENCES_PATTERN
           
 
Constructor Summary
WiktionaryEntryParser(ILanguage language, String redirectName)
          Instanciates the entry parser for the given language.
 
Method Summary
protected  boolean checkForRedirect(WiktionaryPage page, String text)
          Check if the specified text is a redirect and set the redirect target of the given Wiktionary page.
protected abstract  ParsingContext createParsingContext(WiktionaryPage page)
           
 ILanguage getLanguage()
          Returns the language of this parser's Wiktionary edition.
protected abstract  boolean isStartOfBlock(String line)
          Hotspot for deciding if the given line is a potential start of a new article constituent.
 void parse(WiktionaryPage page, String text)
          Creates Wiktionary word entry instances from the provided text, and adds them to the given article page.
protected  void register(IBlockHandler handler)
          Register the given handler that will be invoked during the parsing.
protected  IBlockHandler selectHandler(String line)
          Find a handler that is willing to handle the given line.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

COMMENT_PATTERN

protected static final Pattern COMMENT_PATTERN

IMAGE_PATTERN

protected static final Pattern IMAGE_PATTERN

REFERENCES_PATTERN

protected static final Pattern REFERENCES_PATTERN

language

protected ILanguage language

redirectTemplate

protected String redirectTemplate

entryId

protected long entryId

handlers

protected List<IBlockHandler> handlers
Constructor Detail

WiktionaryEntryParser

public WiktionaryEntryParser(ILanguage language,
                             String redirectName)
Instanciates the entry parser for the given language.

Parameters:
redirectName - denotes the language-specific prefix used for redirections.
Method Detail

parse

public void parse(WiktionaryPage page,
                  String text)
Description copied from interface: IWiktionaryEntryParser
Creates Wiktionary word entry instances from the provided text, and adds them to the given article page.

Specified by:
parse in interface IWiktionaryEntryParser

createParsingContext

protected abstract ParsingContext createParsingContext(WiktionaryPage page)

checkForRedirect

protected boolean checkForRedirect(WiktionaryPage page,
                                   String text)
Check if the specified text is a redirect and set the redirect target of the given Wiktionary page.


isStartOfBlock

protected abstract boolean isStartOfBlock(String line)
Hotspot for deciding if the given line is a potential start of a new article constituent. This may include headlines, templates, or other typographic variants.


selectHandler

protected IBlockHandler selectHandler(String line)
Find a handler that is willing to handle the given line.


register

protected void register(IBlockHandler handler)
Register the given handler that will be invoked during the parsing.


getLanguage

public ILanguage getLanguage()
Returns the language of this parser's Wiktionary edition.



Copyright © 2011-2013 Ubiquitous Knowledge Processing (UKP) Lab. All Rights Reserved.