de.tudarmstadt.ukp.jwktl.parser.util
Class WordListProcessor
java.lang.Object
de.tudarmstadt.ukp.jwktl.parser.util.WordListProcessor
public class WordListProcessor
- extends Object
Helper class for segmenting word lists separated by comma, semicolon,
line breaks, etc. This is, for example, the case for semantic relations
which are often encoded as comma-separated lists.
- Author:
- Christof Müller, Lizhen Qu
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
HTML_REMOVER
protected static final Pattern HTML_REMOVER
ESCAPE_DELIMITER1
protected static final Pattern ESCAPE_DELIMITER1
ESCAPE_DELIMITER2
protected static final Pattern ESCAPE_DELIMITER2
ESCAPE_DELIMITER3
protected static final Pattern ESCAPE_DELIMITER3
REFERENCE_PATTERN
protected static final Pattern REFERENCE_PATTERN
SUPERSCRIPT_PATTERN
protected static final Pattern SUPERSCRIPT_PATTERN
WordListProcessor
public WordListProcessor()
escapeDelimiters
protected String escapeDelimiters(String text)
splitWordList
public List<String> splitWordList(String text)
- Splits the given text by comma, semicolon, line break, etc. and
removes multiple types of special characters and affixes. The
resulting segments are returned as a list of strings.
deWikify
protected String deWikify(String word)
removeBrackets
protected String removeBrackets(String word)
removeComments
protected String removeComments(String word)
removeTemplates
protected String removeTemplates(String word)
Copyright © 2011-2013 Ubiquitous Knowledge Processing (UKP) Lab. All Rights Reserved.