de.tudarmstadt.ukp.jwktl.parser.ru.wikokit.base.wikipedia.util
Class StringUtilRegular

java.lang.Object
  extended by de.tudarmstadt.ukp.jwktl.parser.ru.wikokit.base.wikipedia.util.StringUtilRegular

public class StringUtilRegular
extends Object

String usefull functions via regular expressions


Constructor Summary
StringUtilRegular()
           
 
Method Summary
static String encodeRussianToLatinitsa(String text, String enc_from, String enc_to)
          Encodes the text to latinitsa, e.g.: женьшень -> zhen'shen' (Russian)
static int getFirstEmptyLinePosition(int start_pos, String text)
          Gets position of first header in text from start_pos, e.g. 2nd, 3rd or 4th level header ==?
static int getFirstHeaderPosition(int start_pos, String text)
          Gets position of first header in text from start_pos, e.g. 2nd, 3rd, 4th, or 5th level header ==?
static String getLettersTillHyphen(String text)
          Gets first letters till first hyphen "-".
static String getLettersTillSpace(String text)
          Gets first letters till space.
static String getLettersTillSpaceHyphenOrPipe(String text)
          Gets first letters till space " ", ... or pipe "|" (shortest string).
static String getTextTillFirstHeaderOrEmptyLine(int start_pos, String text)
          Gets text from 'start_pos' position till the nearest position: (1) of first header text, or (2) of first empty line, (3) or till the end of text (if header and empty lines are absent).
static String getTextTillFirstHeaderPosition(int start_pos, String text)
          Gets text from 'start_pos' position till position of first header in text, or till the end of text (if header is absent).
static String replaceComplexSpacesByTrivialSpaces(String text)
          Replaces special spaces by usual whitespace, e.g. in quote author names "Name Surname"
static void stripNonWordLetters(String[] words)
          Strips non-word letters in source array "words".
static String substringAndchopLastNewline(String text, int start_pos, int end_pos)
          Gets text substring from 'start_pos' position till 'end_pos' position and chop last symbol if it is newline \n symbol.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

StringUtilRegular

public StringUtilRegular()
Method Detail

stripNonWordLetters

public static void stripNonWordLetters(String[] words)
Strips non-word letters in source array "words". E.g. {"\nword1", "\t word-long2\r\n"} -> {"word1", "word-long2"}.


getLettersTillSpace

public static String getLettersTillSpace(String text)
Gets first letters till space. E.g. "word1 " -> "word1", "\t word-long2\r\n" -> "word-long2"


getLettersTillSpaceHyphenOrPipe

public static String getLettersTillSpaceHyphenOrPipe(String text)
Gets first letters till space " ", ... or pipe "|" (shortest string). E.g. "word1 " -> "word1", "\t word-long2\r\n" -> "word-long2" This functions is used by WPOSRu.guessPOS().


replaceComplexSpacesByTrivialSpaces

public static String replaceComplexSpacesByTrivialSpaces(String text)
Replaces special spaces by usual whitespace, e.g. in quote author names "Name Surname"


getLettersTillHyphen

public static String getLettersTillHyphen(String text)
Gets first letters till first hyphen "-". E.g. "word1 " -> "word1", "\t word-long2\r\n" -> "word-long2"


encodeRussianToLatinitsa

public static String encodeRussianToLatinitsa(String text,
                                              String enc_from,
                                              String enc_to)
Encodes the text to latinitsa, e.g.: женьшень -> zhen'shen' (Russian)


getFirstHeaderPosition

public static int getFirstHeaderPosition(int start_pos,
                                         String text)
Gets position of first header in text from start_pos, e.g. 2nd, 3rd, 4th, or 5th level header ==?=?=? Header ==?=?=?, If header is absent then return -1.


getFirstEmptyLinePosition

public static int getFirstEmptyLinePosition(int start_pos,
                                            String text)
Gets position of first header in text from start_pos, e.g. 2nd, 3rd or 4th level header ==?=? Header ==?=?, If header is absent then return -1.


getTextTillFirstHeaderPosition

public static String getTextTillFirstHeaderPosition(int start_pos,
                                                    String text)
Gets text from 'start_pos' position till position of first header in text, or till the end of text (if header is absent).


getTextTillFirstHeaderOrEmptyLine

public static String getTextTillFirstHeaderOrEmptyLine(int start_pos,
                                                       String text)
Gets text from 'start_pos' position till the nearest position: (1) of first header text, or (2) of first empty line, (3) or till the end of text (if header and empty lines are absent).


substringAndchopLastNewline

public static String substringAndchopLastNewline(String text,
                                                 int start_pos,
                                                 int end_pos)
Gets text substring from 'start_pos' position till 'end_pos' position and chop last symbol if it is newline \n symbol.



Copyright © 2011-2013 Ubiquitous Knowledge Processing (UKP) Lab. All Rights Reserved.