|
Carrot2 v3.3.0
API Documentation |
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectorg.carrot2.text.preprocessing.CaseNormalizer
public final class CaseNormalizer
Performs case normalization and calculates a number of frequency statistics for words. The aim of case normalization is to find the most frequently appearing variants of words in terms of case. For example, if in the input documents MacOS appears 20 times, Macos 5 times and macos 2 times, case normalizer will select MacOS to represent all variants and assign the aggregated term frequency of 27 to it.
This class saves the following results to the PreprocessingContext:
PreprocessingContext.AllTokens.wordIndexPreprocessingContext.AllWords.imagePreprocessingContext.AllWords.tfPreprocessingContext.AllWords.tfByDocument
This class requires that Tokenizer be invoked first.
| Field Summary | |
|---|---|
int |
dfThreshold
Word Document Frequency threshold. |
| Constructor Summary | |
|---|---|
CaseNormalizer()
|
|
| Method Summary | |
|---|---|
void |
normalize(PreprocessingContext context)
Performs normalization and saves the results to the context. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public int dfThreshold
dfThreshold documents will be ignored.
| Constructor Detail |
|---|
public CaseNormalizer()
| Method Detail |
|---|
public void normalize(PreprocessingContext context)
context.
|
Please refer to project documentation at
http://project.carrot2.org |
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||