de.tudarmstadt.ukp.jwktl.parser.util
Class SimilarityUtils

java.lang.Object
  extended by de.tudarmstadt.ukp.jwktl.parser.util.SimilarityUtils

public class SimilarityUtils
extends Object

Some convenient string utilities.

Author:
Lizhen Qu

Field Summary
protected static Pattern NGRAM_PATTERN
           
 
Constructor Summary
SimilarityUtils()
           
 
Method Summary
protected static Map<String,Integer> computeNGrams(int startOrder, int maxOrder, String text)
          Compute N Grams.
protected static Map<String,Integer> computeWord2count(String text)
          Calculate word frequency.
protected static double similarity(Map<String,Integer> ngramsA, Map<String,Integer> ngramsB)
          Calculate similarity between two sets of n grams
static double similarity(String textA, String textB)
          Calculate similarity between two text based on trigram.
static double wordSim(String textA, String textB)
          Calculate string similarity based on ugram of words.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

NGRAM_PATTERN

protected static final Pattern NGRAM_PATTERN
Constructor Detail

SimilarityUtils

public SimilarityUtils()
Method Detail

computeNGrams

protected static Map<String,Integer> computeNGrams(int startOrder,
                                                   int maxOrder,
                                                   String text)
Compute N Grams.

Parameters:
startOrder -
maxOrder -
text -
Returns:
a n gram to frequency map.

computeWord2count

protected static Map<String,Integer> computeWord2count(String text)

Calculate word frequency.

Parameters:
text - a text to process
Returns:
a map of word to frequency.

similarity

protected static double similarity(Map<String,Integer> ngramsA,
                                   Map<String,Integer> ngramsB)

Calculate similarity between two sets of n grams

Parameters:
ngramsA - a set of n grams
ngramsB - a set of n grams
Returns:
the similarity value.

similarity

public static double similarity(String textA,
                                String textB)

Calculate similarity between two text based on trigram.

Parameters:
textA - text A
textB - text B
Returns:
similarity value

wordSim

public static double wordSim(String textA,
                             String textB)

Calculate string similarity based on ugram of words.

Parameters:
textA - text A
textB - text B
Returns:
similarity value


Copyright © 2011-2013 Ubiquitous Knowledge Processing (UKP) Lab. All Rights Reserved.