org.carrot2.text.analysis
Class ExtendedWhitespaceTokenizer

java.lang.Object
  extended by org.apache.lucene.util.AttributeSource
      extended by org.apache.lucene.analysis.TokenStream
          extended by org.apache.lucene.analysis.Tokenizer
              extended by org.carrot2.text.analysis.ExtendedWhitespaceTokenizer
All Implemented Interfaces:
Closeable

public final class ExtendedWhitespaceTokenizer
extends org.apache.lucene.analysis.Tokenizer

A tokenizer separating input characters on whitespace, but capable of extracting more complex tokens, such as URLs, e-mail addresses and sentence delimiters. Provides TermAttributes and TokenTypeAttributeImpls implementing ITokenTypeAttribute.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
org.apache.lucene.util.AttributeSource.AttributeFactory, org.apache.lucene.util.AttributeSource.State
 
Field Summary
 
Fields inherited from class org.apache.lucene.analysis.Tokenizer
input
 
Constructor Summary
ExtendedWhitespaceTokenizer()
           
 
Method Summary
 void close()
           
 boolean equals(Object other)
           
 int hashCode()
           
 boolean incrementToken()
           
 void reset()
          Not implemented in this tokenizer.
 void reset(Reader input)
          Reset this tokenizer to start parsing another stream.
 
Methods inherited from class org.apache.lucene.analysis.Tokenizer
correctOffset
 
Methods inherited from class org.apache.lucene.analysis.TokenStream
end
 
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, restoreState, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

ExtendedWhitespaceTokenizer

public ExtendedWhitespaceTokenizer()
Method Detail

incrementToken

public boolean incrementToken()
                       throws IOException
Specified by:
incrementToken in class org.apache.lucene.analysis.TokenStream
Throws:
IOException

reset

public void reset()
           throws IOException
Not implemented in this tokenizer. Use reset(Reader) or close().

Overrides:
reset in class org.apache.lucene.analysis.TokenStream
Throws:
IOException

reset

public void reset(Reader input)
Reset this tokenizer to start parsing another stream.

Overrides:
reset in class org.apache.lucene.analysis.Tokenizer

close

public void close()
           throws IOException
Specified by:
close in interface Closeable
Overrides:
close in class org.apache.lucene.analysis.Tokenizer
Throws:
IOException

equals

public boolean equals(Object other)
Overrides:
equals in class org.apache.lucene.util.AttributeSource

hashCode

public int hashCode()
Overrides:
hashCode in class org.apache.lucene.util.AttributeSource


Copyright (c) Dawid Weiss, Stanislaw Osinski