|
Carrot2 v3.5.2
API Documentation |
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectorg.carrot2.text.preprocessing.PreprocessingContext.AllStems
public class PreprocessingContext.AllStems
Information about all unique stems found in the input
PreprocessingContext.documents. Each entry in each array corresponds to one
base form different words can be transformed to by the IStemmer used while
processing. E.g. the English mining and mine will be aggregated
to one entry in the arrays, while they will have separate entries in
PreprocessingContext.AllWords.
All arrays in this class have the same length and values across different arrays correspond to each other for the same index.
| Field Summary | |
|---|---|
byte[] |
fieldIndices
A bit-packed indices of all fields in which this word appears at least once. |
char[][] |
image
Stem image as produced by the IStemmer, may not correspond to any
correct word. |
int[] |
mostFrequentOriginalWordIndex
Pointer to the PreprocessingContext.AllWords arrays, to the most frequent original form of
the stem. |
int[] |
tf
Term frequency of the stem, i.e. |
int[][] |
tfByDocument
Term frequency of the stem for each document. |
| Constructor Summary | |
|---|---|
PreprocessingContext.AllStems()
|
|
| Method Summary | |
|---|---|
String |
toString()
For debugging purposes. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Field Detail |
|---|
public char[][] image
IStemmer, may not correspond to any
correct word.
This array is produced by LanguageModelStemmer.
public int[] mostFrequentOriginalWordIndex
PreprocessingContext.AllWords arrays, to the most frequent original form of
the stem. Pointers to the less frequent variants are not available.
This array is produced by LanguageModelStemmer.
public int[] tf
PreprocessingContext.AllWords.tf values
for which the PreprocessingContext.AllWords.stemIndex points to this stem.
This array is produced by LanguageModelStemmer.
public int[][] tfByDocument
PreprocessingContext.AllWords.tfByDocument.
This array is produced by LanguageModelStemmer.
public byte[] fieldIndices
PreprocessingContext.AllFields arrays. Fast conversion between the bit-packed representation
and byte[] with index values is done by PreprocessingContext.toFieldIndexes(byte)
This array is produced by LanguageModelStemmer
| Constructor Detail |
|---|
public PreprocessingContext.AllStems()
| Method Detail |
|---|
public String toString()
toString in class Object
|
Please refer to project documentation at
http://project.carrot2.org |
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||