index
Class Thesaurus

java.lang.Object
  extended byindex.Thesaurus

public class Thesaurus
extends java.lang.Object

This will get the directory which contains the file and make the collection from them. Bulding Inveted index and Index of the documents.


Field Summary
private  java.util.Map allFilesFingerprints
          Index of the documents
private  FileID[] fileIDs
          List of fileIDs.
private  java.io.File[] files
          List of the files for obtaining information about files
private  MultiMap invertedIndex
          Inverted index of the collection
 
Constructor Summary
Thesaurus(java.lang.String path)
          Make the thesaurus from the files in the specified directory
 
Method Summary
private  java.util.ArrayList extractAllGrams(java.io.File[] files)
          Each file has corresponding entry in for the list of the fingerprints in the return value
private  FileID[] fillFileIDs()
          Make another view of allFilesFingerprints in the form of FileID[]
 java.util.Map getAllFilesFingerprints()
           
 java.util.List getDocuments(java.lang.Integer gram)
          The list of documents which contain the specific gram
 java.util.List getDocumentTerms(java.lang.String document)
          Getting the terms of the specified document name
 FileID[] getFileIDs()
           
private  java.util.Map getFingerprints()
          Make the index from the documents
 MultiMap getInvertedIndex()
           
 int getNumberOfDocuments()
          Getting the number of collection documents.
 int getNumberOfDocumentsContainingGram(java.lang.Integer gram)
          Getting the number of the documents containing the specific gram(fingerprint)
private  MultiMap makeInvertedIndex()
          Make the inverted index
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

invertedIndex

private MultiMap invertedIndex
Inverted index of the collection


allFilesFingerprints

private java.util.Map allFilesFingerprints
Index of the documents


files

private java.io.File[] files
List of the files for obtaining information about files


fileIDs

private FileID[] fileIDs
List of fileIDs. it's another view of allFilesFingerprints

Constructor Detail

Thesaurus

public Thesaurus(java.lang.String path)
          throws DirectoryNotFound,
                 FileTooShort
Make the thesaurus from the files in the specified directory

Parameters:
path - the directory containing the files of the collection
Throws:
DirectoryNotFound - the directory must exist
FileTooShort
Method Detail

getDocumentTerms

public java.util.List getDocumentTerms(java.lang.String document)
Getting the terms of the specified document name

Parameters:
document - the name of the document
Returns:
the list of its terms

getNumberOfDocuments

public int getNumberOfDocuments()
Getting the number of collection documents.

Returns:
the number of the collection documents

getNumberOfDocumentsContainingGram

public int getNumberOfDocumentsContainingGram(java.lang.Integer gram)
Getting the number of the documents containing the specific gram(fingerprint)

Parameters:
gram - the gram which number of its occurrences must be computed
Returns:
the number of occurrences

getDocuments

public java.util.List getDocuments(java.lang.Integer gram)
The list of documents which contain the specific gram

Parameters:
gram - the specified gram
Returns:
the list of documents

fillFileIDs

private FileID[] fillFileIDs()
Make another view of allFilesFingerprints in the form of FileID[]

Returns:
another view of allFilesFingerprints

makeInvertedIndex

private MultiMap makeInvertedIndex()
Make the inverted index

Returns:
inverted index

getFingerprints

private java.util.Map getFingerprints()
                               throws FileTooShort
Make the index from the documents

Returns:
the index
Throws:
FileTooShort

extractAllGrams

private java.util.ArrayList extractAllGrams(java.io.File[] files)
                                     throws FileTooShort
Each file has corresponding entry in for the list of the fingerprints in the return value

Parameters:
files - the files to obtain fingerprints
Returns:
list of fingerprints
Throws:
FileTooShort

getAllFilesFingerprints

public java.util.Map getAllFilesFingerprints()

getFileIDs

public FileID[] getFileIDs()

getInvertedIndex

public MultiMap getInvertedIndex()