org.carrot2.text.vsm
Class VectorSpaceModelContext

java.lang.Object
  extended by org.carrot2.text.vsm.VectorSpaceModelContext

public class VectorSpaceModelContext
extends Object

Stores data related to the Vector Space Model of the processed documents.


Field Summary
 PreprocessingContext preprocessingContext
          Preprocessing context for the underlying documents.
 com.carrotsearch.hppc.IntIntOpenHashMap stemToRowIndex
          Stem index to row index mapping for the tdMatrix.
 org.apache.mahout.math.matrix.DoubleMatrix2D termDocumentMatrix
          Term-document matrix.
 org.apache.mahout.math.matrix.DoubleMatrix2D termPhraseMatrix
          Term-document-like matrix for phrases from PreprocessingContext.AllLabels.
 
Constructor Summary
VectorSpaceModelContext(PreprocessingContext preprocessingContext)
          Creates a vector space model context with the provided preprocessing context.
 
Method Summary
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

preprocessingContext

public final PreprocessingContext preprocessingContext
Preprocessing context for the underlying documents.


termDocumentMatrix

public org.apache.mahout.math.matrix.DoubleMatrix2D termDocumentMatrix
Term-document matrix. Rows of the matrix correspond to word stems, columns correspond to the processed documents. For mapping between rows of this matrix and PreprocessingContext.AllStems, see stemToRowIndex.

This matrix is produced by TermDocumentMatrixBuilder.buildTermDocumentMatrix(VectorSpaceModelContext).


termPhraseMatrix

public org.apache.mahout.math.matrix.DoubleMatrix2D termPhraseMatrix
Term-document-like matrix for phrases from PreprocessingContext.AllLabels. If there are no phrases in PreprocessingContext.AllLabels, phrase matrix is null. For mapping between rows of this matrix and PreprocessingContext.AllStems, see stemToRowIndex.

This matrix is produced by TermDocumentMatrixBuilder.buildTermPhraseMatrix(VectorSpaceModelContext).


stemToRowIndex

public com.carrotsearch.hppc.IntIntOpenHashMap stemToRowIndex
Stem index to row index mapping for the tdMatrix. Keys in this map are indices of entries in PreprocessingContext.AllStems arrays, values are the indices of tdMatrix rows corresponding to the stems. Please note that depending on the limit on the size of the matrix, some stems may not have their corresponding matrix rows.

This object is produced by TermDocumentMatrixBuilder.buildTermDocumentMatrix(VectorSpaceModelContext).

Constructor Detail

VectorSpaceModelContext

public VectorSpaceModelContext(PreprocessingContext preprocessingContext)
Creates a vector space model context with the provided preprocessing context.



Copyright (c) Dawid Weiss, Stanislaw Osinski