|
Carrot2 v3.5.2
API Documentation |
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectorg.carrot2.text.vsm.TermDocumentMatrixBuilderDescriptor.AttributeBuilder
public static class TermDocumentMatrixBuilderDescriptor.AttributeBuilder
Attribute map builder for the TermDocumentMatrixBuilder component. You can use this
builder as a type-safe alternative to populating the attribute map using attribute keys.
| Field Summary | |
|---|---|
Map<String,Object> |
map
The attribute map populated by this builder. |
| Constructor Summary | |
|---|---|
protected |
TermDocumentMatrixBuilderDescriptor.AttributeBuilder(Map<String,Object> map)
Creates a builder backed by the provided map. |
| Method Summary | |
|---|---|
TermDocumentMatrixBuilderDescriptor.AttributeBuilder |
maximumMatrixSize(int value)
Maximum matrix size. |
TermDocumentMatrixBuilderDescriptor.AttributeBuilder |
maxWordDf(double value)
Maximum word document frequency. |
TermDocumentMatrixBuilderDescriptor.AttributeBuilder |
termWeighting(Class<? extends ITermWeighting> clazz)
Term weighting. |
TermDocumentMatrixBuilderDescriptor.AttributeBuilder |
termWeighting(ITermWeighting value)
Term weighting. |
TermDocumentMatrixBuilderDescriptor.AttributeBuilder |
titleWordsBoost(double value)
Title word boost. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public final Map<String,Object> map
| Constructor Detail |
|---|
protected TermDocumentMatrixBuilderDescriptor.AttributeBuilder(Map<String,Object> map)
| Method Detail |
|---|
public TermDocumentMatrixBuilderDescriptor.AttributeBuilder titleWordsBoost(double value)
Document.TITLE fields.
TermDocumentMatrixBuilder.titleWordsBoostpublic TermDocumentMatrixBuilderDescriptor.AttributeBuilder maximumMatrixSize(int value)
TermDocumentMatrixBuilder.maximumMatrixSizepublic TermDocumentMatrixBuilderDescriptor.AttributeBuilder maxWordDf(double value)
maxWordDf will be ignored. For example, when maxWordDf is
0.4, words appearing in more than 40% of documents will be be ignored.
A value of 1.0 means that all words will be taken into
account, no matter in how many documents they appear.
This attribute may be useful when certain words appear in most of the input
documents (e.g. company name from header or footer) and such words dominate the
cluster labels. In such case, setting maxWordDf to a value lower than
1.0, e.g. 0.9 may improve the clusters.
Another useful application of this attribute is when there is a need to generate
only very specific clusters, i.e. clusters containing small numbers of documents.
This can be achieved by setting maxWordDf to extremely low values,
e.g. 0.1 or 0.05.
TermDocumentMatrixBuilder.maxWordDfpublic TermDocumentMatrixBuilderDescriptor.AttributeBuilder termWeighting(ITermWeighting value)
TermDocumentMatrixBuilder.termWeightingpublic TermDocumentMatrixBuilderDescriptor.AttributeBuilder termWeighting(Class<? extends ITermWeighting> clazz)
TermDocumentMatrixBuilder.termWeighting
|
Please refer to project documentation at
http://project.carrot2.org |
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||