org.carrot2.text.linguistic.lucene
Class ThaiTokenizerAdapter

java.lang.Object
  extended by org.carrot2.text.linguistic.lucene.ThaiTokenizerAdapter
All Implemented Interfaces:
ITokenizer

public final class ThaiTokenizerAdapter
extends Object
implements ITokenizer

Thai tokenizer implemented using Lucene's ThaiWordFilter.


Field Summary
 
Fields inherited from interface org.carrot2.text.analysis.ITokenizer
TF_COMMON_WORD, TF_QUERY_WORD, TF_SEPARATOR_DOCUMENT, TF_SEPARATOR_FIELD, TF_SEPARATOR_SENTENCE, TF_TERMINATOR, TT_ACRONYM, TT_BARE_URL, TT_EMAIL, TT_EOF, TT_FILE, TT_FULL_URL, TT_HYPHTERM, TT_NUMERIC, TT_PUNCTUATION, TT_TERM, TYPE_MASK
 
Constructor Summary
ThaiTokenizerAdapter()
           
 
Method Summary
 short nextToken()
          Returns the next token from the input stream.
static boolean platformSupportsThai()
          Check support for Thai.
 void reset(Reader input)
          Resets the tokenizer to process new data
 void setTermBuffer(MutableCharArray array)
          Sets the current token image to the provided buffer.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ThaiTokenizerAdapter

public ThaiTokenizerAdapter()
Method Detail

nextToken

public short nextToken()
                throws IOException
Description copied from interface: ITokenizer
Returns the next token from the input stream.

Specified by:
nextToken in interface ITokenizer
Returns:
the type of the token as defined by the ITokenizer.TT_TERM and other constants or ITokenizer.TT_EOF when the end of the data stream has been reached.
Throws:
IOException
See Also:
TokenTypeUtils

setTermBuffer

public void setTermBuffer(MutableCharArray array)
Description copied from interface: ITokenizer
Sets the current token image to the provided buffer.

Specified by:
setTermBuffer in interface ITokenizer
Parameters:
array - buffer in which the current token image should be stored

reset

public void reset(Reader input)
           throws IOException
Description copied from interface: ITokenizer
Resets the tokenizer to process new data

Specified by:
reset in interface ITokenizer
Parameters:
input - the input to tokenize. The reader will not be closed by the tokenizer when the end of stream is reached.
Throws:
IOException

platformSupportsThai

public static boolean platformSupportsThai()
Check support for Thai.



Copyright (c) Dawid Weiss, Stanislaw Osinski