|
Carrot2 v3.5.2
API Documentation |
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectorg.carrot2.text.analysis.ExtendedWhitespaceTokenizer
public final class ExtendedWhitespaceTokenizer
A tokenizer separating input characters on whitespace, but capable of extracting more complex tokens, such as URLs, e-mail addresses and sentence delimiters.
| Field Summary |
|---|
| Fields inherited from interface org.carrot2.text.analysis.ITokenizer |
|---|
TF_COMMON_WORD, TF_QUERY_WORD, TF_SEPARATOR_DOCUMENT, TF_SEPARATOR_FIELD, TF_SEPARATOR_SENTENCE, TF_TERMINATOR, TT_ACRONYM, TT_BARE_URL, TT_EMAIL, TT_EOF, TT_FILE, TT_FULL_URL, TT_HYPHTERM, TT_NUMERIC, TT_PUNCTUATION, TT_TERM, TYPE_MASK |
| Constructor Summary | |
|---|---|
ExtendedWhitespaceTokenizer()
|
|
| Method Summary | |
|---|---|
short |
nextToken()
Returns the next token from the input stream. |
void |
reset(Reader input)
Reset this tokenizer to start parsing another stream. |
void |
setTermBuffer(MutableCharArray array)
Sets the current token image to the provided buffer. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public ExtendedWhitespaceTokenizer()
| Method Detail |
|---|
public void reset(Reader input)
reset in interface ITokenizerinput - the input to tokenize. The reader will not be closed
by the tokenizer when the end of stream is reached.
public short nextToken()
throws IOException
ITokenizer
nextToken in interface ITokenizerITokenizer.TT_TERM and other
constants or ITokenizer.TT_EOF when the end of the data stream has been
reached.
IOExceptionTokenTypeUtilspublic void setTermBuffer(MutableCharArray array)
ITokenizer
setTermBuffer in interface ITokenizerarray - buffer in which the current token image should be
stored
|
Please refer to project documentation at
http://project.carrot2.org |
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||