de.tudarmstadt.ukp.jwktl.api.entry
Class BerkeleyDBWiktionaryEdition

java.lang.Object
  extended by de.tudarmstadt.ukp.jwktl.api.entry.AbstractWiktionary
      extended by de.tudarmstadt.ukp.jwktl.api.entry.WiktionaryEdition
          extended by de.tudarmstadt.ukp.jwktl.api.entry.BerkeleyDBWiktionaryEdition
All Implemented Interfaces:
IWiktionary, IWiktionaryEdition
Direct Known Subclasses:
WritableBerkeleyDBWiktionaryEdition

public class BerkeleyDBWiktionaryEdition
extends WiktionaryEdition

Implementation of the IWiktionaryEdition interface, which makes use of a Berkeley DB to store and retrieve the parsed Wiktionary information.

Author:
Christian M. Meyer

Nested Class Summary
static class BerkeleyDBWiktionaryEdition.WiktionaryEntryProxy
          Proxy object for referencing to a IWiktionaryEntry.
static class BerkeleyDBWiktionaryEdition.WiktionarySenseProxy
          Proxy object for referencing to a IWiktionarySense.
 
Field Summary
static String DATABASE_NAME
          The internal name of the parsed Wiktionary database.
protected  File dbPath
           
protected  com.sleepycat.persist.SecondaryIndex<Long,String,BerkeleyDBWiktionaryEdition.WiktionaryEntryProxy> entryById
           
protected  com.sleepycat.persist.PrimaryIndex<String,BerkeleyDBWiktionaryEdition.WiktionaryEntryProxy> entryByKey
           
protected  com.sleepycat.je.Environment env
           
protected  ILanguage language
           
protected  Set<com.sleepycat.persist.EntityCursor<?>> openCursors
           
protected  com.sleepycat.persist.PrimaryIndex<Long,WiktionaryPage> pageById
           
protected  com.sleepycat.persist.SecondaryIndex<String,Long,WiktionaryPage> pageByNormalizedTitle
           
protected  com.sleepycat.persist.SecondaryIndex<String,Long,WiktionaryPage> pageByTitle
           
protected  Properties properties
           
static String PROPERTY_FILE_NAME
          The name of the property file containing info about the parsed DB.
protected  com.sleepycat.persist.PrimaryIndex<String,BerkeleyDBWiktionaryEdition.WiktionarySenseProxy> senseByKey
           
protected  com.sleepycat.persist.EntityStore store
           
 
Fields inherited from class de.tudarmstadt.ukp.jwktl.api.entry.WiktionaryEdition
isClosed
 
Constructor Summary
  BerkeleyDBWiktionaryEdition(File dbPath)
          Connects to the parsed Wiktionary contained in the specified directory.
protected BerkeleyDBWiktionaryEdition(File parsedWiktionaryDump, boolean isReadOnly, boolean allowCreateNew, boolean overwriteExisting, Long cacheSize)
          Configures the database adapter and connects to the DB files at the specified path.
  BerkeleyDBWiktionaryEdition(File dbPath, Long cacheSize)
          Connects to the parsed Wiktionary contained in the specified directory.
 
Method Summary
protected  void connect(boolean isReadOnly, boolean allowCreateNew, boolean overwriteExisting, Long cacheSize)
           
static void deleteParsedWiktionary(File targetDirectory)
          Removes all files belonging to a previously parsed Wiktionary database from the given target directory.
protected  void doClose()
          Hotspot for closing the connection.
 WiktionaryIterator<IWiktionaryPage> getAllPages(IWiktionaryPageFilter filter, boolean sortByTitle, boolean normalize)
          Returns an iterator over all IWiktionaryPages within the Wiktionary edition.
 String getDBName()
          Returns the internal name of the Berkeley DB.
 File getDBPath()
          Returns the file path of the parsed database.
 IWiktionaryEntry getEntryForId(long entryId)
          Returns the IWiktionaryEntry with the given entry id.
 ILanguage getLanguage()
          Returns the language of the Wiktionary edition, which is equivalent to the entry language of the contained entries.
 WiktionaryPage getPageForId(long id)
          Returns the page with the given unique id.
 WiktionaryPage getPageForWord(String word)
          Returns the page with the given title.
 List<IWiktionaryPage> getPagesForWord(String word, IWiktionaryPageFilter filter, boolean normalize)
          Returns the page with the given title.
 IWiktionarySense getSenseForKey(String key)
          Returns the word sense with the given unique id.
protected  WiktionaryPage loadPage(WiktionaryPage page, IWiktionaryPageFilter filter)
           
protected  void prepareTargetDirectory(File targetDirectory, boolean overwriteExisting)
          Creates the given target dictionary if necessary.
 
Methods inherited from class de.tudarmstadt.ukp.jwktl.api.entry.WiktionaryEdition
close, ensureOpen, getAllEntries, getAllSenses, getEntriesForWord, getEntryForId, getEntryForWord, getSenseForId, getSenseForId, getSensesForWord, getSensesForWord, getSensesForWord, isClosed
 
Methods inherited from class de.tudarmstadt.ukp.jwktl.api.entry.AbstractWiktionary
getAllEntries, getAllEntries, getAllEntries, getAllEntries, getAllEntries, getAllPages, getAllPages, getAllPages, getAllPages, getAllPages, getAllSenses, getAllSenses, getAllSenses, getAllSenses, getAllSenses, getEntriesForWord, getEntriesForWord, getEntriesForWord, getPagesForWord, getSensesForWord, getSensesForWord, getSensesForWord
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface de.tudarmstadt.ukp.jwktl.api.IWiktionary
getAllEntries, getAllEntries, getAllEntries, getAllEntries, getAllEntries, getAllPages, getAllPages, getAllPages, getAllPages, getAllPages, getAllSenses, getAllSenses, getAllSenses, getAllSenses, getAllSenses, getEntriesForWord, getEntriesForWord, getEntriesForWord, getPagesForWord, getSensesForWord, getSensesForWord, getSensesForWord
 

Field Detail

DATABASE_NAME

public static final String DATABASE_NAME
The internal name of the parsed Wiktionary database.

See Also:
Constant Field Values

PROPERTY_FILE_NAME

public static final String PROPERTY_FILE_NAME
The name of the property file containing info about the parsed DB.

See Also:
Constant Field Values

env

protected com.sleepycat.je.Environment env

store

protected com.sleepycat.persist.EntityStore store

dbPath

protected File dbPath

properties

protected Properties properties

language

protected ILanguage language

pageById

protected com.sleepycat.persist.PrimaryIndex<Long,WiktionaryPage> pageById

pageByTitle

protected com.sleepycat.persist.SecondaryIndex<String,Long,WiktionaryPage> pageByTitle

pageByNormalizedTitle

protected com.sleepycat.persist.SecondaryIndex<String,Long,WiktionaryPage> pageByNormalizedTitle

entryByKey

protected com.sleepycat.persist.PrimaryIndex<String,BerkeleyDBWiktionaryEdition.WiktionaryEntryProxy> entryByKey

entryById

protected com.sleepycat.persist.SecondaryIndex<Long,String,BerkeleyDBWiktionaryEdition.WiktionaryEntryProxy> entryById

senseByKey

protected com.sleepycat.persist.PrimaryIndex<String,BerkeleyDBWiktionaryEdition.WiktionarySenseProxy> senseByKey

openCursors

protected Set<com.sleepycat.persist.EntityCursor<?>> openCursors
Constructor Detail

BerkeleyDBWiktionaryEdition

public BerkeleyDBWiktionaryEdition(File dbPath)
Connects to the parsed Wiktionary contained in the specified directory.

Parameters:
dbPath - the path of the database files.
Throws:
WiktionaryException - if the connection could not be established.

BerkeleyDBWiktionaryEdition

public BerkeleyDBWiktionaryEdition(File dbPath,
                                   Long cacheSize)
Connects to the parsed Wiktionary contained in the specified directory.

Parameters:
dbPath - the path of the database files.
cacheSize - the memory (in Bytes) that is used as database cache, which can be used to speed up the DB access. Use null as a default value.
Throws:
WiktionaryException - if the connection could not be established.

BerkeleyDBWiktionaryEdition

protected BerkeleyDBWiktionaryEdition(File parsedWiktionaryDump,
                                      boolean isReadOnly,
                                      boolean allowCreateNew,
                                      boolean overwriteExisting,
                                      Long cacheSize)
Configures the database adapter and connects to the DB files at the specified path.

Parameters:
parsedWiktionaryDump - the path of the database files.
isReadOnly - controls write permissions on the DB files.
allowCreateNew - if true, a new DB will be created if none exists at the specified path.
cacheSize - the memory (in Bytes) that is used as database cache, which can be used to speed up the DB access. Use null as a default value.
Throws:
WiktionaryException - if the connection could not be established.
Method Detail

connect

protected void connect(boolean isReadOnly,
                       boolean allowCreateNew,
                       boolean overwriteExisting,
                       Long cacheSize)
                throws com.sleepycat.je.DatabaseException
Throws:
com.sleepycat.je.DatabaseException

prepareTargetDirectory

protected void prepareTargetDirectory(File targetDirectory,
                                      boolean overwriteExisting)
                               throws WiktionaryException
Creates the given target dictionary if necessary. Removes a previously parsed Wiktionary database from the target folder if there exists one and overwriteExisting is set to true.

Throws:
WiktionaryException - if the target dictionary is not empty and overwriteExisting was set to false.

deleteParsedWiktionary

public static void deleteParsedWiktionary(File targetDirectory)
Removes all files belonging to a previously parsed Wiktionary database from the given target directory. If not Wiktionary could be found there, nothing is changed.


getPageForId

public WiktionaryPage getPageForId(long id)
Description copied from interface: IWiktionaryEdition
Returns the page with the given unique id.


getPageForWord

public WiktionaryPage getPageForWord(String word)
Description copied from interface: IWiktionaryEdition
Returns the page with the given title. The method only returns the page if its title matches exactly. Use IWiktionary.getPagesForWord(String, boolean) for case insensitive and string-normalized matching.


getPagesForWord

public List<IWiktionaryPage> getPagesForWord(String word,
                                             IWiktionaryPageFilter filter,
                                             boolean normalize)
Description copied from interface: IWiktionary
Returns the page with the given title. The method returns also pages, whose title matches in a case insensitive or string-normalized manner. The latter means that strings are converted to lower case, and umlauts or accents are substituted by their canonical form. The word "prêt-à-porter" is, e.g., normalized to "pret-a-porter". Using the given IWiktionaryPageFilter, unwanted pages can be ignored.

Specified by:
getPagesForWord in interface IWiktionary
Specified by:
getPagesForWord in class AbstractWiktionary

getAllPages

public WiktionaryIterator<IWiktionaryPage> getAllPages(IWiktionaryPageFilter filter,
                                                       boolean sortByTitle,
                                                       boolean normalize)
Description copied from interface: IWiktionary
Returns an iterator over all IWiktionaryPages within the Wiktionary edition. Using the given IWiktionaryPageFilter, unwanted pages can be ignored.

Specified by:
getAllPages in interface IWiktionary
Specified by:
getAllPages in class AbstractWiktionary
sortByTitle - if true sort by page title; otherwise by page id.
normalize - if true sort case insensitive; otherwise case sensitive (only affects sorting by title).

loadPage

protected WiktionaryPage loadPage(WiktionaryPage page,
                                  IWiktionaryPageFilter filter)

getEntryForId

public IWiktionaryEntry getEntryForId(long entryId)
Description copied from interface: IWiktionaryEdition
Returns the IWiktionaryEntry with the given entry id. Note that this id is only stable over the same XML dump and JWKTL version.


getSenseForKey

public IWiktionarySense getSenseForKey(String key)
Description copied from interface: IWiktionaryEdition
Returns the word sense with the given unique id. Note that this id is only stable over the same XML dump and JWKTL version.


getDBName

public String getDBName()
Returns the internal name of the Berkeley DB.


getDBPath

public File getDBPath()
Description copied from interface: IWiktionaryEdition
Returns the file path of the parsed database.


getLanguage

public ILanguage getLanguage()
Description copied from interface: IWiktionaryEdition
Returns the language of the Wiktionary edition, which is equivalent to the entry language of the contained entries.


doClose

protected void doClose()
Hotspot for closing the connection.

Specified by:
doClose in class WiktionaryEdition
Throws:
WiktionaryException - if the connection could not be closed.


Copyright © 2011-2013 Ubiquitous Knowledge Processing (UKP) Lab. All Rights Reserved.