|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.tudarmstadt.ukp.jwktl.parser.XMLDumpParser
de.tudarmstadt.ukp.jwktl.parser.WiktionaryDumpParser
public class WiktionaryDumpParser
Extension of the XMLDumpParser
that reads the different XML tags
of the Wiktionary XML dump file format and provides hotspots for each
type of information. A number of IWiktionaryPageParser
s can
be registered for this dump parser. The page parsers are called whenever
a certain information has been read. Different page parsers can, for
example, handle different page types or namespaces.
Nested Class Summary |
---|
Nested classes/interfaces inherited from class de.tudarmstadt.ukp.jwktl.parser.XMLDumpParser |
---|
XMLDumpParser.XMLDumpHandler |
Field Summary | |
---|---|
protected DumpInfo |
dumpInfo
|
protected boolean |
inPage
|
protected List<IWiktionaryPageParser> |
parserRegistry
|
protected DateFormat |
timestampFormat
|
Fields inherited from class de.tudarmstadt.ukp.jwktl.parser.XMLDumpParser |
---|
BZ2_FILE_EXTENSION |
Constructor Summary | |
---|---|
WiktionaryDumpParser(IWiktionaryPageParser... pageParsers)
Initializes the dump parser and registers the given page parsers. |
Method Summary | |
---|---|
protected void |
addNamespace(String namespace)
|
IDumpInfo |
getDumpInfo()
Returns information on the current dump file and its parsing progress. |
Iterable<IWiktionaryPageParser> |
getPageParsers()
Returns the list of all registered IWiktionaryPageParser s. |
protected void |
onClose()
|
protected void |
onElementEnd(String name,
XMLDumpParser.XMLDumpHandler handler)
Hotspot that is invoked for each closing XML element. |
protected void |
onElementStart(String name,
XMLDumpParser.XMLDumpHandler handler)
Hotspot that is invoked for each opening XML element. |
protected void |
onPageEnd()
|
protected void |
onPageStart()
|
protected void |
onParserEnd()
Hotspot that is invoked on finishing the parsing. |
protected void |
onParserStart()
Hotspot that is invoked on starting the parser. |
protected void |
onSiteInfoComplete()
|
void |
parse(File dumpFile)
Parses the given XML dump file. |
protected Date |
parseTimestamp(String dateString)
|
void |
register(IWiktionaryPageParser pageParser)
Register the given IWiktionaryPageParser . |
protected static ILanguage |
resolveLanguage(String baseURL)
|
protected void |
setAuthor(String author)
|
protected void |
setBaseURL(String baseURL)
|
protected void |
setPageId(long pageId)
|
protected void |
setRevision(long revisionId)
|
protected void |
setText(String text)
|
protected void |
setTimestamp(Date timestamp)
|
protected void |
setTitle(String title)
|
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected List<IWiktionaryPageParser> parserRegistry
protected boolean inPage
protected DumpInfo dumpInfo
protected DateFormat timestampFormat
Constructor Detail |
---|
public WiktionaryDumpParser(IWiktionaryPageParser... pageParsers)
Method Detail |
---|
public void register(IWiktionaryPageParser pageParser)
IWiktionaryDumpParser
IWiktionaryPageParser
. The registered
parser will then be notified once a Wiktionary-related XML tag
has been processed.
public Iterable<IWiktionaryPageParser> getPageParsers()
IWiktionaryDumpParser
IWiktionaryPageParser
s.
public void parse(File dumpFile) throws WiktionaryException
XMLDumpParser
parse
in interface IWiktionaryDumpParser
parse
in class XMLDumpParser
WiktionaryException
- in case of any parser errors.protected void onParserStart()
XMLDumpParser
onParserStart
in class XMLDumpParser
protected void onSiteInfoComplete()
protected void onParserEnd()
XMLDumpParser
onParserEnd
in class XMLDumpParser
protected void onClose()
protected void onElementStart(String name, XMLDumpParser.XMLDumpHandler handler)
XMLDumpParser
onElementStart
in class XMLDumpParser
protected void onElementEnd(String name, XMLDumpParser.XMLDumpHandler handler)
XMLDumpParser
onElementEnd
in class XMLDumpParser
protected void onPageStart()
protected void onPageEnd()
protected void setBaseURL(String baseURL)
protected static ILanguage resolveLanguage(String baseURL)
protected void addNamespace(String namespace)
protected void setAuthor(String author)
protected void setRevision(long revisionId)
protected void setTimestamp(Date timestamp)
protected void setPageId(long pageId)
protected void setTitle(String title)
protected void setText(String text)
protected Date parseTimestamp(String dateString) throws ParseException
ParseException
public IDumpInfo getDumpInfo()
null
if the parser has not
yet been started (i.e., the parse(File)
method has not
been called).
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |