org.carrot2.text.analysis
Class ExtendedWhitespaceTokenizer
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.Tokenizer
org.carrot2.text.analysis.ExtendedWhitespaceTokenizer
- All Implemented Interfaces:
- Closeable
public final class ExtendedWhitespaceTokenizer
- extends org.apache.lucene.analysis.Tokenizer
A tokenizer separating input characters on whitespace, but capable of extracting more
complex tokens, such as URLs, e-mail addresses and sentence delimiters. Provides
TermAttributes and TokenTypeAttributeImpls implementing ITokenTypeAttribute.
| Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource |
org.apache.lucene.util.AttributeSource.AttributeFactory, org.apache.lucene.util.AttributeSource.State |
| Fields inherited from class org.apache.lucene.analysis.Tokenizer |
input |
| Methods inherited from class org.apache.lucene.analysis.Tokenizer |
correctOffset |
| Methods inherited from class org.apache.lucene.analysis.TokenStream |
end |
| Methods inherited from class org.apache.lucene.util.AttributeSource |
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, restoreState, toString |
ExtendedWhitespaceTokenizer
public ExtendedWhitespaceTokenizer()
incrementToken
public boolean incrementToken()
throws IOException
- Specified by:
incrementToken in class org.apache.lucene.analysis.TokenStream
- Throws:
IOException
reset
public void reset()
throws IOException
- Not implemented in this tokenizer. Use
reset(Reader) or close().
- Overrides:
reset in class org.apache.lucene.analysis.TokenStream
- Throws:
IOException
reset
public void reset(Reader input)
- Reset this tokenizer to start parsing another stream.
- Overrides:
reset in class org.apache.lucene.analysis.Tokenizer
close
public void close()
throws IOException
- Specified by:
close in interface Closeable- Overrides:
close in class org.apache.lucene.analysis.Tokenizer
- Throws:
IOException
equals
public boolean equals(Object other)
- Overrides:
equals in class org.apache.lucene.util.AttributeSource
hashCode
public int hashCode()
- Overrides:
hashCode in class org.apache.lucene.util.AttributeSource
Copyright (c) Dawid Weiss, Stanislaw Osinski