similarity
Class VSM

java.lang.Object
  extended bysimilarity.VSM
Direct Known Subclasses:
LTC_LTC

public abstract class VSM
extends java.lang.Object

The template class for implementing Vector Space Method Similarity


Field Summary
protected  int N
          The number of documents in the collection
protected  Thesaurus thesauri
          The collection to be queried
 
Constructor Summary
VSM(Thesaurus thesauri)
           
 
Method Summary
protected abstract  double firstNorm(java.util.List d1)
          The normalization for the first document
protected abstract  double firstValue(Term term)
          The tf * idf according to scheme for the first document; returning one element of Sigma
protected abstract  double secondNorm(java.util.List d2)
          The normalization for the second document
protected abstract  double secondValue(Term term)
          The tf * idf according to scheme for the second document; returning one element of Sigma
 double similarity(java.util.List d1, java.util.List d2)
          Compute the similarity of two documents.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

N

protected int N
The number of documents in the collection


thesauri

protected Thesaurus thesauri
The collection to be queried

Constructor Detail

VSM

public VSM(Thesaurus thesauri)
Method Detail

similarity

public double similarity(java.util.List d1,
                         java.util.List d2)
Compute the similarity of two documents. Because of float precission(0 is not realy 0) the value may be a little more than zero.

Parameters:
d1 - first documetn
d2 - second document
Returns:
the similarity in [0, 1]

firstValue

protected abstract double firstValue(Term term)
The tf * idf according to scheme for the first document; returning one element of Sigma

Parameters:
term - the for which the value will be computed
Returns:
one element of Sigma

secondValue

protected abstract double secondValue(Term term)
The tf * idf according to scheme for the second document; returning one element of Sigma

Parameters:
term - the for which the value will be computed
Returns:
one element of Sigma

firstNorm

protected abstract double firstNorm(java.util.List d1)
The normalization for the first document

Parameters:
d1 - the document to be normalized
Returns:
normalization value of document

secondNorm

protected abstract double secondNorm(java.util.List d2)
The normalization for the second document

Parameters:
d2 - the document to be normalized
Returns:
normalization value of document