|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.tudarmstadt.ukp.jwktl.parser.XMLDumpParser
public abstract class XMLDumpParser
Implementation of IWiktionaryDumpParser
for processing XML files
downloaded from http://download.wikimedia.org/backup-index.html. There
can be different specializations of this class that focus on a certain
aspect of the dump, e.g., parsing the full text on the article pages and
create an object structure from them, processing some aspects of
the user pages, filtering the article pages, etc. The base class should
be somewhat generic.
Nested Class Summary | |
---|---|
protected class |
XMLDumpParser.XMLDumpHandler
|
Field Summary | |
---|---|
static String |
BZ2_FILE_EXTENSION
The file extension for bzip2 files that is used for the automatic detection of the file format. |
Constructor Summary | |
---|---|
XMLDumpParser()
|
Method Summary | |
---|---|
protected abstract void |
onElementEnd(String name,
XMLDumpParser.XMLDumpHandler handler)
Hotspot that is invoked for each closing XML element. |
protected abstract void |
onElementStart(String name,
XMLDumpParser.XMLDumpHandler handler)
Hotspot that is invoked for each opening XML element. |
protected void |
onParserEnd()
Hotspot that is invoked on finishing the parsing. |
protected void |
onParserStart()
Hotspot that is invoked on starting the parser. |
void |
parse(File dumpFile)
Parses the given XML dump file. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface de.tudarmstadt.ukp.jwktl.parser.IWiktionaryDumpParser |
---|
getPageParsers, register |
Field Detail |
---|
public static final String BZ2_FILE_EXTENSION
Constructor Detail |
---|
public XMLDumpParser()
Method Detail |
---|
public void parse(File dumpFile) throws WiktionaryException
parse
in interface IWiktionaryDumpParser
WiktionaryException
- in case of any parser errors.protected void onParserStart()
protected abstract void onElementStart(String name, XMLDumpParser.XMLDumpHandler handler)
protected abstract void onElementEnd(String name, XMLDumpParser.XMLDumpHandler handler)
protected void onParserEnd()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |