|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.tudarmstadt.ukp.jwktl.parser.WiktionaryEntryParser
public abstract class WiktionaryEntryParser
Base implementation for parsing the textual contents of an article page in
order to construct IWiktionaryEntry
and IWiktionarySense
instances. The parser is based on a finite state machine using a set
of block handlers that are being asked if they want to process the current
line of text. If so, the handler is in a position to process the subsequent
lines until the entire block has been processed and the next line is
subject to initialize a different block handler. Since there are large
differences between the individual Wiktionary language editions, there
should be one subclass of this parser for each language edition, which
cares about language-specific adaptation and the selection of the
block handlers used.
Field Summary | |
---|---|
protected static Pattern |
COMMENT_PATTERN
|
protected long |
entryId
|
protected List<IBlockHandler> |
handlers
|
protected static Pattern |
IMAGE_PATTERN
|
protected ILanguage |
language
|
protected String |
redirectTemplate
|
protected static Pattern |
REFERENCES_PATTERN
|
Constructor Summary | |
---|---|
WiktionaryEntryParser(ILanguage language,
String redirectName)
Instanciates the entry parser for the given language. |
Method Summary | |
---|---|
protected boolean |
checkForRedirect(WiktionaryPage page,
String text)
Check if the specified text is a redirect and set the redirect target of the given Wiktionary page. |
protected abstract ParsingContext |
createParsingContext(WiktionaryPage page)
|
ILanguage |
getLanguage()
Returns the language of this parser's Wiktionary edition. |
protected abstract boolean |
isStartOfBlock(String line)
Hotspot for deciding if the given line is a potential start of a new article constituent. |
void |
parse(WiktionaryPage page,
String text)
Creates Wiktionary word entry instances from the provided text, and adds them to the given article page. |
protected void |
register(IBlockHandler handler)
Register the given handler that will be invoked during the parsing. |
protected IBlockHandler |
selectHandler(String line)
Find a handler that is willing to handle the given line. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected static final Pattern COMMENT_PATTERN
protected static final Pattern IMAGE_PATTERN
protected static final Pattern REFERENCES_PATTERN
protected ILanguage language
protected String redirectTemplate
protected long entryId
protected List<IBlockHandler> handlers
Constructor Detail |
---|
public WiktionaryEntryParser(ILanguage language, String redirectName)
redirectName
- denotes the language-specific prefix used for
redirections.Method Detail |
---|
public void parse(WiktionaryPage page, String text)
IWiktionaryEntryParser
parse
in interface IWiktionaryEntryParser
protected abstract ParsingContext createParsingContext(WiktionaryPage page)
protected boolean checkForRedirect(WiktionaryPage page, String text)
protected abstract boolean isStartOfBlock(String line)
protected IBlockHandler selectHandler(String line)
protected void register(IBlockHandler handler)
public ILanguage getLanguage()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |