|
Carrot2 v3.6.0-SNAPSHOT
API Documentation |
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectorg.carrot2.text.preprocessing.PhraseExtractor
public class PhraseExtractor
Extracts frequent phrases from the provided document. A frequent phrase is a sequence of words that appears in the documents more than once. This phrase extractor aggregates different inflection variants of phrase words into one phrase, returning the most frequent variant. For example, if phrase computing science appears 2 times and computer sciences appears 4 times, the latter will be returned with aggregated frequency of 6.
This class saves the following results to the PreprocessingContext:
PreprocessingContext.AllPhrases.wordIndicesPreprocessingContext.AllPhrases.tfPreprocessingContext.AllPhrases.tfByDocumentPreprocessingContext.AllTokens.suffixOrderPreprocessingContext.AllTokens.lcp
This class requires that Tokenizer, CaseNormalizer and
LanguageModelStemmer be invoked first.
| Field Summary | |
|---|---|
int |
dfThreshold
Phrase Document Frequency threshold. |
| Constructor Summary | |
|---|---|
PhraseExtractor()
|
|
| Method Summary | |
|---|---|
void |
extractPhrases(PreprocessingContext context)
Performs phrase extraction and saves the results to the provided context. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public int dfThreshold
dfThreshold documents will be ignored.
| Constructor Detail |
|---|
public PhraseExtractor()
| Method Detail |
|---|
public void extractPhrases(PreprocessingContext context)
context.
|
Please refer to project documentation at
http://project.carrot2.org |
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||