de.tudarmstadt.ukp.jwktl.parser.wikisaurus
Class WikisaurusArticleParser

java.lang.Object
  extended by de.tudarmstadt.ukp.jwktl.parser.wikisaurus.WikisaurusArticleParser
All Implemented Interfaces:
IWiktionaryPageParser

public class WikisaurusArticleParser
extends Object
implements IWiktionaryPageParser

(Yet experimental) parser for the Wikisaurus entries (i.e., wiki pages in the Wikisaurus namespace that contain thesaurus-like information).

Author:
Yevgen Chebotar, Christian M. Meyer

Field Summary
protected  String currentNamespace
           
protected  String currentTitle
           
protected  List<WikisaurusEntry> entryQueue
           
protected  Map<String,Integer> notFoundRelation
           
protected  Map<String,RelationType> relTypeMap
           
protected  IWritableWiktionaryEdition wiktionaryDB
           
 
Constructor Summary
WikisaurusArticleParser(IWritableWiktionaryEdition wiktionaryDB)
          Instanciates the parser for the given database.
 
Method Summary
protected  String[] extractRelTarget(String wsRel)
          Extracts relation target and target sense definition (if exists).
protected  String extractSenseDefinition(String wsSense)
          Extracts sense definition from Wikisaurus line.
 void onClose(IDumpInfo dumpInfo)
          Hotspot that is invoked after the parser has finished its work.
 void onPageEnd()
          Hotspot that is invoked upon finishing the current article page.
 void onPageStart()
          Hotspot that is invoked upon starting a new article page.
 void onParserEnd(IDumpInfo dumpInfo)
          Hotspot that is invoked when the parser is about to end.
 void onParserStart(IDumpInfo dumpInfo)
          Hotspot that is invoked upon starting the parser.
 void onSiteInfoComplete(IDumpInfo dumpInfo)
          Hotspot that is invoked after the siteinfo header has been read.
protected  Set<WikisaurusEntry> parseWikisaurusEntries(String title, String text)
           
protected  void saveWikisaurusEntry(WikisaurusEntry wikisaurusEntry, boolean allowCaching)
           
 void setAuthor(String author)
          Hotspot that is invoked after the current page's author is read.
 void setPageId(long pageId)
          Hotspot that is invoked after the current page's id is read.
 void setRevision(long revisionId)
          Hotspot that is invoked after the current page's revision id is read.
 void setText(String text)
          Hotspot that is invoked after the current page's text is read.
 void setTimestamp(Date timestamp)
          Hotspot that is invoked after the current page's timestamp is read.
 void setTitle(String title, String namespace)
          Hotspot that is invoked after the current page's title is read.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

wiktionaryDB

protected IWritableWiktionaryEdition wiktionaryDB

entryQueue

protected List<WikisaurusEntry> entryQueue

currentTitle

protected String currentTitle

currentNamespace

protected String currentNamespace

notFoundRelation

protected Map<String,Integer> notFoundRelation

relTypeMap

protected Map<String,RelationType> relTypeMap
Constructor Detail

WikisaurusArticleParser

public WikisaurusArticleParser(IWritableWiktionaryEdition wiktionaryDB)
Instanciates the parser for the given database.

Method Detail

onParserStart

public void onParserStart(IDumpInfo dumpInfo)
Description copied from interface: IWiktionaryPageParser
Hotspot that is invoked upon starting the parser.

Specified by:
onParserStart in interface IWiktionaryPageParser

onSiteInfoComplete

public void onSiteInfoComplete(IDumpInfo dumpInfo)
Description copied from interface: IWiktionaryPageParser
Hotspot that is invoked after the siteinfo header has been read. At this point in time, the dump info contains all information, including dump language and namespaces.

Specified by:
onSiteInfoComplete in interface IWiktionaryPageParser

onClose

public void onClose(IDumpInfo dumpInfo)
Description copied from interface: IWiktionaryPageParser
Hotspot that is invoked after the parser has finished its work. This method is supposed to close and cleanup any resources (e.g., closing a database connection). It is called after all IWiktionaryPageParser.onParserEnd(IDumpInfo) calls have been handled.

Specified by:
onClose in interface IWiktionaryPageParser

onPageStart

public void onPageStart()
Description copied from interface: IWiktionaryPageParser
Hotspot that is invoked upon starting a new article page.

Specified by:
onPageStart in interface IWiktionaryPageParser

onPageEnd

public void onPageEnd()
Description copied from interface: IWiktionaryPageParser
Hotspot that is invoked upon finishing the current article page.

Specified by:
onPageEnd in interface IWiktionaryPageParser

setAuthor

public void setAuthor(String author)
Description copied from interface: IWiktionaryPageParser
Hotspot that is invoked after the current page's author is read.

Specified by:
setAuthor in interface IWiktionaryPageParser

setRevision

public void setRevision(long revisionId)
Description copied from interface: IWiktionaryPageParser
Hotspot that is invoked after the current page's revision id is read.

Specified by:
setRevision in interface IWiktionaryPageParser

setTimestamp

public void setTimestamp(Date timestamp)
Description copied from interface: IWiktionaryPageParser
Hotspot that is invoked after the current page's timestamp is read.

Specified by:
setTimestamp in interface IWiktionaryPageParser

setPageId

public void setPageId(long pageId)
Description copied from interface: IWiktionaryPageParser
Hotspot that is invoked after the current page's id is read.

Specified by:
setPageId in interface IWiktionaryPageParser

setTitle

public void setTitle(String title,
                     String namespace)
Description copied from interface: IWiktionaryPageParser
Hotspot that is invoked after the current page's title is read.

Specified by:
setTitle in interface IWiktionaryPageParser

setText

public void setText(String text)
Description copied from interface: IWiktionaryPageParser
Hotspot that is invoked after the current page's text is read.

Specified by:
setText in interface IWiktionaryPageParser

parseWikisaurusEntries

protected Set<WikisaurusEntry> parseWikisaurusEntries(String title,
                                                      String text)

onParserEnd

public void onParserEnd(IDumpInfo dumpInfo)
Description copied from interface: IWiktionaryPageParser
Hotspot that is invoked when the parser is about to end. Use this method for writing any owing information to a file or database. For closing and cleaning up resources, you should, however, use the IWiktionaryPageParser.onClose(IDumpInfo) hotspot.

Specified by:
onParserEnd in interface IWiktionaryPageParser

saveWikisaurusEntry

protected void saveWikisaurusEntry(WikisaurusEntry wikisaurusEntry,
                                   boolean allowCaching)

extractSenseDefinition

protected String extractSenseDefinition(String wsSense)
Extracts sense definition from Wikisaurus line.


extractRelTarget

protected String[] extractRelTarget(String wsRel)
Extracts relation target and target sense definition (if exists).



Copyright © 2011-2013 Ubiquitous Knowledge Processing (UKP) Lab. All Rights Reserved.