|
Carrot2 v3.5.2
API Documentation |
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectorg.carrot2.text.vsm.TermDocumentMatrixBuilder
public class TermDocumentMatrixBuilder
Builds a term document matrix based on the provided PreprocessingContext.
| Field Summary | |
|---|---|
int |
maximumMatrixSize
Maximum matrix size. |
double |
maxWordDf
Maximum word document frequency. |
ITermWeighting |
termWeighting
Term weighting. |
double |
titleWordsBoost
Title word boost. |
| Constructor Summary | |
|---|---|
TermDocumentMatrixBuilder()
|
|
| Method Summary | |
|---|---|
void |
buildTermDocumentMatrix(VectorSpaceModelContext vsmContext)
Builds a term document matrix from data provided in the context,
stores the result in there. |
void |
buildTermPhraseMatrix(VectorSpaceModelContext context)
Builds a term-phrase matrix in the same space as the main term-document matrix. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public double titleWordsBoost
Document.TITLE fields.
public int maximumMatrixSize
public double maxWordDf
maxWordDf will be ignored. For example, when maxWordDf is
0.4, words appearing in more than 40% of documents will be be ignored.
A value of 1.0 means that all words will be taken into
account, no matter in how many documents they appear.
This attribute may be useful when certain words appear in most of the input
documents (e.g. company name from header or footer) and such words dominate the
cluster labels. In such case, setting maxWordDf to a value lower than
1.0, e.g. 0.9 may improve the clusters.
Another useful application of this attribute is when there is a need to generate
only very specific clusters, i.e. clusters containing small numbers of documents.
This can be achieved by setting maxWordDf to extremely low values,
e.g. 0.1 or 0.05.
public ITermWeighting termWeighting
| Constructor Detail |
|---|
public TermDocumentMatrixBuilder()
| Method Detail |
|---|
public void buildTermDocumentMatrix(VectorSpaceModelContext vsmContext)
context,
stores the result in there.
public void buildTermPhraseMatrix(VectorSpaceModelContext context)
VectorSpaceModelContext.termPhraseMatrix will remain null.
|
Please refer to project documentation at
http://project.carrot2.org |
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||