|
Carrot2 v3.5.2
API Documentation |
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectorg.carrot2.text.preprocessing.PreprocessingContext.AllWords
public class PreprocessingContext.AllWords
Information about all unique words found in the input
PreprocessingContext.documents. An entry in each parallel array corresponds to one
conflated form of a word. For example, data and DATA will most likely become
a single entry in the words table. However, different grammatical forms of a single lemma
(like computer and computers) will have different entries in the
words table. See PreprocessingContext.AllStems for inflection-conflated versions.
All arrays in this class have the same length and values across different arrays correspond to each other for the same index.
| Field Summary | |
|---|---|
byte[] |
fieldIndices
A bit-packed indices of all fields in which this word appears at least once. |
char[][] |
image
The most frequently appearing variant of the word with respect to case. |
int[] |
stemIndex
A pointer to the PreprocessingContext.AllStems arrays for this word. |
int[] |
tf
Term Frequency of the word, aggregated across all variants with respect to case. |
int[][] |
tfByDocument
Term Frequency of the word for each document. |
short[] |
type
Token type of this word copied from PreprocessingContext.AllTokens.type. |
| Constructor Summary | |
|---|---|
PreprocessingContext.AllWords()
|
|
| Method Summary | |
|---|---|
String |
toString()
For debugging purposes. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Field Detail |
|---|
public char[][] image
This array is produced by CaseNormalizer.
public short[] type
PreprocessingContext.AllTokens.type. Additional
flags are set for each word by
CaseNormalizer and LanguageModelStemmer.
This array is produced by CaseNormalizer.
This array is modified by LanguageModelStemmer.
ITokenizerpublic int[] tf
This array is produced by CaseNormalizer.
public int[][] tfByDocument
PreprocessingContext.documents, elements at odd indices contain the
frequency of the word in the document. For example, an array with 4 values:
[2, 15, 138, 7] means that the word appeared 15 times in document
at index 2 and 7 times in document at index 138.
This array is produced by CaseNormalizer.
public int[] stemIndex
PreprocessingContext.AllStems arrays for this word.
This array is produced by LanguageModelStemmer.
public byte[] fieldIndices
PreprocessingContext.AllFields arrays. Fast conversion between the bit-packed representation
and byte[] with index values is done by PreprocessingContext.toFieldIndexes(byte)
This array is produced by CaseNormalizer.
| Constructor Detail |
|---|
public PreprocessingContext.AllWords()
| Method Detail |
|---|
public String toString()
toString in class Object
|
Please refer to project documentation at
http://project.carrot2.org |
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||