de.tudarmstadt.ukp.jwktl.parser.ru.wikokit.base.wikt.multi.ru
Class WPOSRu

java.lang.Object
  extended by de.tudarmstadt.ukp.jwktl.parser.ru.wikokit.base.wikt.multi.ru.WPOSRu

public class WPOSRu
extends Object

Splits text to fragments related to different parts of speech (POS). POS is basically a level 2 header in Russian Wiktionary, e.g. for "roast": ==roast I== ... ==roast II== (and a level 3 in English Wiktionary: ===Verb===)

See Also:
http://ru.wiktionary.org/wiki/Викисловарь:Части речи, оформления статей

Constructor Summary
WPOSRu()
           
 
Method Summary
static POS checkIfSuchPOSExist(String pos_name)
           
static POSText guessPOS(StringBuffer text)
          The POS should be extracted from the texts, e.g.
static POS guessPOSWith2ndLevelHeader(String page_title, String pos_title, StringBuffer text)
          The POS should be extracted from the text.
static boolean isSecondLevelHeaderWordNotPOS(String str)
          Gets true, if str is known header, e.g.
static POSText[] splitToPOSSections(String page_title, LangText lt)
          page_title - word which are described in this article 'text'
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

WPOSRu

public WPOSRu()
Method Detail

isSecondLevelHeaderWordNotPOS

public static boolean isSecondLevelHeaderWordNotPOS(String str)
Gets true, if str is known header, e.g. "References", but it's not a part of speech name, e.g. "Verb".


splitToPOSSections

public static POSText[] splitToPOSSections(String page_title,
                                           LangText lt)
page_title - word which are described in this article 'text'

Parameters:
lt - .text will be parsed and splitted, .lang is not using now, may be in future... 1) Split the following text to "lead I" and "leat II" 2) Extracts part of speech "гл" from "lead II"
 == lead I == 
 English text1 
 == lead II== 
 ===Морфологические и синтаксические свойства===" 
 {{гл en reg|lead}}";
todo isPOSHeader() (remove acce'nt -> accent) or guessPOS

guessPOS

public static POSText guessPOS(StringBuffer text)
The POS should be extracted from the texts, e.g.
 noun:
 ===Морфологические и синтаксические свойства===
 {{сущ en|слоги=lead|lead|leads}}
 
 verb:
 ===Морфологические и синтаксические свойства===
 {{гл ru 4b-ся
 {{гл ru 8b/b^
 {{гл ru 5c'^-т
 
 adjective:
 ===Морфологические и синтаксические свойства===
 {{прил ru 1*a
 
 adverb:
 ===Морфологические и синтаксические свойства===
 
 {{adv ru|слоги={{по-слогам|ра|но|ва́|то}}|или=предикатив|или-кат=предикативы|}}
 
 {{adv-ru|
 Наречие, неизменяемое.
 
 Old formatting 
 
  ===Морфологические и синтаксические свойства===
  {{СущМужНеодуш1c(1)
  {{СущЖенНеодуш8a
  Существительное, ...
 
 {{прил ia}}
 
 {{парадигма-рус  // old formatting (>500, < 1000 pages)
 |шаблон=Гл11b/c
 {{Гл1a


checkIfSuchPOSExist

public static POS checkIfSuchPOSExist(String pos_name)

guessPOSWith2ndLevelHeader

public static POS guessPOSWith2ndLevelHeader(String page_title,
                                             String pos_title,
                                             StringBuffer text)
The POS should be extracted from the text.

Parameters:
page_title - word, name of the article, e.g. "lead"
pos_title - extracted 2nd level title, e.g. "lead I", "lead II", or "Adverb" (old style)
Returns:
POS, e.g. POS.verb for "== Verb =="


Copyright © 2011-2013 Ubiquitous Knowledge Processing (UKP) Lab. All Rights Reserved.