- add(int, IWord) - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
directly add a IWord item to the dictionary
- add(int, String, int, int, String) - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
add a new word to the dictionary with its statistics frequency
- add(int, String, int, int) - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
add a new word to the dictionary
- add(int, String, int) - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
add a new word to the dictionary
- add(int, String, int, String) - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
add a new word to the dictionary
- add(int, IWord) - Method in class org.lionsoul.jcseg.tokenizer.Dictionary
-
- add(int, String, int, int, String) - Method in class org.lionsoul.jcseg.tokenizer.Dictionary
-
- add(int, String, int) - Method in class org.lionsoul.jcseg.tokenizer.Dictionary
-
- add(int, String, int, int) - Method in class org.lionsoul.jcseg.tokenizer.Dictionary
-
- add(int, String, int, String) - Method in class org.lionsoul.jcseg.tokenizer.Dictionary
-
- add(T) - Method in class org.lionsoul.jcseg.util.IHashQueue
-
append a item from the tail
- add(int) - Method in class org.lionsoul.jcseg.util.IntArrayList
-
Append a new Integer to the end.
- addPartSpeech(String) - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
-
add a new part to speech to the word.
- addPartSpeech(String) - Method in class org.lionsoul.jcseg.tokenizer.Word
-
- addSyn(String) - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
-
add a new syn word to the word.
- addSyn(String) - Method in class org.lionsoul.jcseg.tokenizer.Word
-
- ADictionary - Class in org.lionsoul.jcseg.tokenizer.core
-
Dictionary abstract super class
- ADictionary(JcsegTaskConfig, Boolean) - Constructor for class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
initialize the ADictionary
- AL_TODO_FILE - Static variable in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
the default auto load task file name
- append(String) - Method in class org.lionsoul.jcseg.util.IStringBuffer
-
append a string to the buffer
- append(char[], int, int) - Method in class org.lionsoul.jcseg.util.IStringBuffer
-
append parts of the chars to the buffer
- append(char[], int) - Method in class org.lionsoul.jcseg.util.IStringBuffer
-
append the rest of the chars to the buffer
- append(char[]) - Method in class org.lionsoul.jcseg.util.IStringBuffer
-
append some chars to the buffer
- append(char) - Method in class org.lionsoul.jcseg.util.IStringBuffer
-
append a char to the buffer
- append(boolean) - Method in class org.lionsoul.jcseg.util.IStringBuffer
-
append a boolean value
- append(short) - Method in class org.lionsoul.jcseg.util.IStringBuffer
-
append a short value
- append(int) - Method in class org.lionsoul.jcseg.util.IStringBuffer
-
append a int value
- append(long) - Method in class org.lionsoul.jcseg.util.IStringBuffer
-
append a long value
- append(float) - Method in class org.lionsoul.jcseg.util.IStringBuffer
-
append a float value
- append(double) - Method in class org.lionsoul.jcseg.util.IStringBuffer
-
append a double value
- APPEND_CJK_ENTITY - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
do the entity recognition ?
- APPEND_CJK_PINYIN - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
append the Pinyin to the splited IWord
- APPEND_CJK_SYN - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
append the syn word to the splited IWord.
- APPEND_PART_OF_SPEECH - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
append the part of speech.
- appendCJKPinyin() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- appendCJKSyn() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- appendLatinSyn(IWord) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
-
Check and append the synonyms words of specified word included the CJK and basic Latin words
All the synonyms words share the same position part of speech, word type with the primitive word
- appendWordFeatures(IWord) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
-
check and append the pinyin and the synonyms words of the specified word
- ASegment - Class in org.lionsoul.jcseg.tokenizer
-
abstract segmentation super class:
1.
- ASegment(Reader, JcsegTaskConfig, ADictionary) - Constructor for class org.lionsoul.jcseg.tokenizer.ASegment
-
initialize the segment
- ASegment(JcsegTaskConfig, ADictionary) - Constructor for class org.lionsoul.jcseg.tokenizer.ASegment
-
- autoFilter - Variable in class org.lionsoul.jcseg.extractor.impl.TextRankKeywordsExtractor
-
auto filter the words with low score
- autoLoad() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
initialize the value of its options by auto searching the jcesg.properties file:
- AutoLoadFile - Class in org.lionsoul.jcseg.tokenizer.core
-
AutoLoad file to describle the autoload configuration files
- AutoLoadFile(String) - Constructor for class org.lionsoul.jcseg.tokenizer.core.AutoLoadFile
-
- autoMinLength - Variable in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
-
auto append the words with a length over the specifield value
as a phrase
- charAt(int) - Method in class org.lionsoul.jcseg.util.IStringBuffer
-
get the char at a specified position in the buffer
- CHECK_CE_MASk - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ISegment
-
- CHECK_CF_MASK - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ISegment
-
- CHECK_EC_MASK - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ISegment
-
Whether to check the English Chinese mixed suffix
For the new implementation of the mixed word recognition
Added at 2016/11/22
- Chunk - Class in org.lionsoul.jcseg.tokenizer
-
chunk concept for the mmseg chinese word segment algorithm has implemented IChunk interface
- Chunk(IWord[]) - Constructor for class org.lionsoul.jcseg.tokenizer.Chunk
-
- CJK_CHAR - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ILexicon
-
CJK single word
- CJK_UNIT - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ILexicon
-
Chinese single units
- CJK_WORD - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ILexicon
-
Chinese, JPanese, Korean words
- CJKIndexOf(String, int) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
get the index of the first CJK char of the specified string
- CJKIndexOf(String) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
- clear() - Method in class org.lionsoul.jcseg.util.IntArrayList
-
- clear() - Method in class org.lionsoul.jcseg.util.IStringBuffer
-
clear the buffer by reset the count to 0
- CLEAR_STOPWORD - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
clear away the stop word.
- clearStopwords() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- clone() - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
-
make clone available
- clone() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
rewrite the clone method
- clone() - Method in class org.lionsoul.jcseg.tokenizer.Word
-
Interface to clone the current object
- CN_DNAME_1 - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ILexicon
-
first word of Chinese double name
- CN_DNAME_2 - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ILexicon
-
second word of Chinese double name
- CN_LNAME - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ILexicon
-
Chinese last name
- CN_LNAME_ADORN - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ILexicon
-
the adorn(修饰) char before the last name
like word "老陈", "小陈"
- CN_SNAME - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ILexicon
-
Chinese single name
- CNFRA_TO_ARABIC - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
Chinese fraction to Arabic fraction .
- cnFractionToArabic() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- CNNUM_TO_ARABIC - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
Chinese numeric to Arabic .
- cnNumericToArabic(String, boolean) - Static method in class org.lionsoul.jcseg.util.NumericUtil
-
a static method to turn the Chinese numeric to Arabic numbers
- cnNumToArabic() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- compareTo(TextRankSummaryExtractor.Document) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor.Document
-
override the compareTo method
compare document with its relevance score
- COMPLEX_MODE - Static variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- ComplexSeg - Class in org.lionsoul.jcseg.tokenizer
-
Jcseg complex segmentation implements extended from the ASegment class
this will need the filter works of the four MMSeg rules:
- ComplexSeg(JcsegTaskConfig, ADictionary) - Constructor for class org.lionsoul.jcseg.tokenizer.ComplexSeg
-
- ComplexSeg(Reader, JcsegTaskConfig, ADictionary) - Constructor for class org.lionsoul.jcseg.tokenizer.ComplexSeg
-
- config - Variable in class org.lionsoul.jcseg.tokenizer.ASegment
-
- config - Variable in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
- contains(T) - Method in class org.lionsoul.jcseg.util.IHashQueue
-
check the specifield T is aleady exists in the queue or not
- createDateTimePool() - Static method in class org.lionsoul.jcseg.util.TimeUtil
-
create and return a date-time pool
- createDefaultDictionary(JcsegTaskConfig, boolean, boolean) - Static method in class org.lionsoul.jcseg.tokenizer.core.DictionaryFactory
-
create a default ADictionary instance:
1.
- createDefaultDictionary(JcsegTaskConfig) - Static method in class org.lionsoul.jcseg.tokenizer.core.DictionaryFactory
-
create the ADictionary according to the JcsegTaskConfig
check and load the lexicon by default
- createDefaultDictionary(JcsegTaskConfig, boolean) - Static method in class org.lionsoul.jcseg.tokenizer.core.DictionaryFactory
-
create the ADictionary according to the JcsegTaskConfig
- createDictionary(Class<? extends ADictionary>, Class<?>[], Object[]) - Static method in class org.lionsoul.jcseg.tokenizer.core.DictionaryFactory
-
create a new ADictionary instance
- createJcseg(int, Object...) - Static method in class org.lionsoul.jcseg.tokenizer.core.SegmentFactory
-
create the specified mode Jcseg instance
- createSegment(Class<? extends ISegment>, Class<?>[], Object[]) - Static method in class org.lionsoul.jcseg.tokenizer.core.SegmentFactory
-
load the ISegment class with the given path
- createSingletonDictionary(JcsegTaskConfig) - Static method in class org.lionsoul.jcseg.tokenizer.core.DictionaryFactory
-
create a singleton ADictionary object according to the JcsegTaskConfig
check and load the lexicon by default
- createSingletonDictionary(JcsegTaskConfig, boolean) - Static method in class org.lionsoul.jcseg.tokenizer.core.DictionaryFactory
-
create a singleton ADictionary object according to the JcsegTaskConfig
- ctrlMask - Variable in class org.lionsoul.jcseg.tokenizer.ASegment
-
segmentation runtime function control mask
- D - Static variable in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
-
- D - Static variable in class org.lionsoul.jcseg.extractor.impl.TextRankKeywordsExtractor
-
- D - Static variable in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor
-
- data - Variable in class org.lionsoul.jcseg.util.IHashQueue.Entry
-
- data - Variable in class org.lionsoul.jcseg.util.IIntFIFO.Entry
-
- data - Variable in class org.lionsoul.jcseg.util.IIntQueue.Entry
-
- DATETIME_A - Static variable in class org.lionsoul.jcseg.util.TimeUtil
-
- DATETIME_AV - Static variable in class org.lionsoul.jcseg.util.TimeUtil
-
- DATETIME_D - Static variable in class org.lionsoul.jcseg.util.TimeUtil
-
- DATETIME_DV - Static variable in class org.lionsoul.jcseg.util.TimeUtil
-
- DATETIME_H - Static variable in class org.lionsoul.jcseg.util.TimeUtil
-
- DATETIME_HV - Static variable in class org.lionsoul.jcseg.util.TimeUtil
-
- DATETIME_I - Static variable in class org.lionsoul.jcseg.util.TimeUtil
-
- DATETIME_IV - Static variable in class org.lionsoul.jcseg.util.TimeUtil
-
- DATETIME_M - Static variable in class org.lionsoul.jcseg.util.TimeUtil
-
- DATETIME_MV - Static variable in class org.lionsoul.jcseg.util.TimeUtil
-
- DATETIME_NONE - Static variable in class org.lionsoul.jcseg.util.TimeUtil
-
date-time part index constants
we consider a date-time as the following seven parts:
+------+-------+-----+---------------+------+--------+--------+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 |
+------+-------+-----+---------------+------+--------+--------+
| year | month | day | timing method | hour | minute | second |
+------+-------+-----+---------------+------+--------+--------+
and the numeric value before every part.
- DATETIME_S - Static variable in class org.lionsoul.jcseg.util.TimeUtil
-
- DATETIME_SV - Static variable in class org.lionsoul.jcseg.util.TimeUtil
-
- DATETIME_Y - Static variable in class org.lionsoul.jcseg.util.TimeUtil
-
- DATETIME_YV - Static variable in class org.lionsoul.jcseg.util.TimeUtil
-
- decrease(char) - Method in class org.lionsoul.jcseg.util.ByteCharCounter
-
- decrease(char, int) - Method in class org.lionsoul.jcseg.util.ByteCharCounter
-
- deleteCharAt(int) - Method in class org.lionsoul.jcseg.util.IStringBuffer
-
delete the char at the specified position
- DELIMITER_MODE - Static variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- DelimiterSeg - Class in org.lionsoul.jcseg.tokenizer
-
delimiter segment algorithm implementation
extended from common segment interface ISegment
- DelimiterSeg(JcsegTaskConfig, ADictionary) - Constructor for class org.lionsoul.jcseg.tokenizer.DelimiterSeg
-
method to create a new ISegment
- DelimiterSeg(Reader, JcsegTaskConfig, ADictionary) - Constructor for class org.lionsoul.jcseg.tokenizer.DelimiterSeg
-
method to create a new ISegment
- deQueue() - Method in class org.lionsoul.jcseg.util.IIntFIFO
-
remove the first item from the queue
- deQueue() - Method in class org.lionsoul.jcseg.util.IIntQueue
-
remove the node from the head
and you should make sure the size is larger than 0 by calling size()
before you invoke the method or you will just get -1
- DETECT_MODE - Static variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- DetectSeg - Class in org.lionsoul.jcseg.tokenizer
-
Detect segmentation mode return words only in the loaded dictionary
yat, when matched a word and return it
or continue to find the next word in the dictionary
- DetectSeg(JcsegTaskConfig, ADictionary) - Constructor for class org.lionsoul.jcseg.tokenizer.DetectSeg
-
method to create the new ISegment
- DetectSeg(Reader, JcsegTaskConfig, ADictionary) - Constructor for class org.lionsoul.jcseg.tokenizer.DetectSeg
-
method to create a new ISegment
- dic - Variable in class org.lionsoul.jcseg.tokenizer.ASegment
-
the dictionary and task configuration instance
- Dictionary - Class in org.lionsoul.jcseg.tokenizer
-
Dictionary class
- Dictionary(JcsegTaskConfig, Boolean) - Constructor for class org.lionsoul.jcseg.tokenizer.Dictionary
-
- DictionaryFactory - Class in org.lionsoul.jcseg.tokenizer.core
-
Dictionary Factory to create Dictionary instance
a path of the class that has extends the ADictionary class must be given first
- Document(int, Sentence, List<IWord>, double) - Constructor for class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor.Document
-
construct method
- DOMAIN_SUFFIX - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ILexicon
-
domain name suffix dictionary for the URL recognition
- get(int, String) - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
return the IWord associate with the given key.
- get(String) - Static method in class org.lionsoul.jcseg.tokenizer.core.Entity
-
get the entity string by the specified key
- get(int, String) - Method in class org.lionsoul.jcseg.tokenizer.Dictionary
-
- get(char) - Method in class org.lionsoul.jcseg.util.ByteCharCounter
-
- get(int) - Method in class org.lionsoul.jcseg.util.IntArrayList
-
- getAutoMinLength() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
-
- getAverageWordsLength() - Method in class org.lionsoul.jcseg.tokenizer.Chunk
-
- getAverageWordsLength() - Method in interface org.lionsoul.jcseg.tokenizer.core.IChunk
-
return the average word length for all the chunks.
- getBestCJKChunk(char[], int) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
-
an abstract method to gain a CJK word from the
current position.
- getBestCJKChunk(char[], int) - Method in class org.lionsoul.jcseg.tokenizer.ComplexSeg
-
- getBestCJKChunk(char[], int) - Method in class org.lionsoul.jcseg.tokenizer.SearchSeg
-
here we don't have to do anything
- getBestCJKChunk(char[], int) - Method in class org.lionsoul.jcseg.tokenizer.SimpleSeg
-
- getConfig() - Method in class org.lionsoul.jcseg.tokenizer.ASegment
-
get the current task configuration instance.
- getConfig() - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
- getConfig() - Method in class org.lionsoul.jcseg.tokenizer.DelimiterSeg
-
get the current JcsegTaskConfig instance
- getConfig() - Method in class org.lionsoul.jcseg.tokenizer.DetectSeg
-
get the current task config instance
- getDateTimeIndex(String) - Static method in class org.lionsoul.jcseg.util.TimeUtil
-
get and return the time part index of the specified IWord#entity
- getDelimiter() - Method in class org.lionsoul.jcseg.tokenizer.DelimiterSeg
-
get the current delimiter
- getDic() - Method in class org.lionsoul.jcseg.tokenizer.DelimiterSeg
-
get the current dictionary instance
- getDict() - Method in class org.lionsoul.jcseg.tokenizer.ASegment
-
get the current dictionary instance.
- getDict() - Method in class org.lionsoul.jcseg.tokenizer.DetectSeg
-
get the current dictionary instance
- getEnCharType(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
get the type of the English char
defined in this class and start with EN_.
- getEnSecondSeg() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- getEntity() - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
-
get the entity name of the word
- getEntity() - Method in class org.lionsoul.jcseg.tokenizer.Word
-
- getFile() - Method in class org.lionsoul.jcseg.tokenizer.core.AutoLoadFile
-
- getFrequency() - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
-
return the frequency of the word,
use only when the word's length is one.
- getFrequency() - Method in class org.lionsoul.jcseg.tokenizer.Word
-
- getIndex() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor.Document
-
- getIndex(String) - Static method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
get the key's type index located in ILexicon interface
- getJarHome(Object) - Static method in class org.lionsoul.jcseg.util.Util
-
get the absolute parent path for the jar file.
- getKeyphrase(Reader) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
-
- getKeyphrase(Reader) - Method in class org.lionsoul.jcseg.extractor.KeyphraseExtractor
-
get the keyphrase list from a reader
- getKeyphraseFromFile(String) - Method in class org.lionsoul.jcseg.extractor.KeyphraseExtractor
-
get the keyphrase list from a file
- getKeyphraseFromString(String) - Method in class org.lionsoul.jcseg.extractor.KeyphraseExtractor
-
get the keyphrase list from a string
- getKeySentence(Reader) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor
-
- getKeySentence(Reader) - Method in class org.lionsoul.jcseg.extractor.SummaryExtractor
-
get the key sentence from a reader
- getKeySentenceFromFile(String) - Method in class org.lionsoul.jcseg.extractor.SummaryExtractor
-
get key sentence from a file path
- getKeySentenceFromString(String) - Method in class org.lionsoul.jcseg.extractor.SummaryExtractor
-
get key sentence from a string
- getKeywords(Reader) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeywordsExtractor
-
- getKeywords(Reader) - Method in class org.lionsoul.jcseg.extractor.KeywordsExtractor
-
get the keywords list from a reader
- getKeywordsFromFile(String) - Method in class org.lionsoul.jcseg.extractor.KeywordsExtractor
-
get the keywords list from a file
- getKeywordsFromString(String) - Method in class org.lionsoul.jcseg.extractor.KeywordsExtractor
-
get the keywords list from a string
- getKeywordsNum() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
-
- getKeywordsNum() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeywordsExtractor
-
- getLargestAverageWordLengthChunks(IChunk[]) - Static method in class org.lionsoul.jcseg.tokenizer.core.MMSegFilter
-
2.
- getLargestSingleMorphemicFreedomChunks(IChunk[]) - Static method in class org.lionsoul.jcseg.tokenizer.core.MMSegFilter
-
the largest sum of degree of morphemic freedom of one-character words
this rule will return the chunks that own the largest sum of degree of morphemic freedom
of one-character
- getLastUpdateTime() - Method in class org.lionsoul.jcseg.tokenizer.core.AutoLoadFile
-
- getLength() - Method in class org.lionsoul.jcseg.sentence.Sentence
-
- getLength() - Method in class org.lionsoul.jcseg.tokenizer.Chunk
-
- getLength() - Method in interface org.lionsoul.jcseg.tokenizer.core.IChunk
-
return the length of the chunk(the number of the word)
- getLength() - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
-
return the length of the word
- getLength() - Method in class org.lionsoul.jcseg.tokenizer.Word
-
- getLexiconPath() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
return the lexicon directory path
- getMaxCnLnadron() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- getMaximumMatchChunks(IChunk[]) - Static method in class org.lionsoul.jcseg.tokenizer.core.MMSegFilter
-
1.
- getMaxIterateNum() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
-
- getMaxIterateNum() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeywordsExtractor
-
- getMaxIterateNum() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor
-
- getMaxLength() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- getMaxWordsNum() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
-
- getNameSingleThreshold() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- getNextCJKWord(int, int) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
-
get the next CJK word from the current position of the input stream
- getNextCJKWord(int, int) - Method in class org.lionsoul.jcseg.tokenizer.NLPSeg
-
- getNextCJKWord(int, int) - Method in class org.lionsoul.jcseg.tokenizer.SearchSeg
-
get the next CJK word from the current position of the input stream
and this function is the core part the most segmentation implements
- getNextDatetimeWord(IWord) - Method in class org.lionsoul.jcseg.tokenizer.NLPSeg
-
get and return the next date-time word
- getNextLatinWord(int, int) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
-
get the next Latin word from the current position of the input stream
- getNextMatch(char[], int) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
-
match the next CJK word in the dictionary
- getNextMixedWord(char[], int) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
-
get the next mixed word, CJK-English or CJK-English-CJK or whatever
- getNextPunctuationPairWord(int, int) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
-
get the next punctuation pair word from the current position
of the input stream.
- getNextTimeMergedWord(IWord) - Method in class org.lionsoul.jcseg.tokenizer.NLPSeg
-
get and return the next time merged date-time word
- getPairPunctuationText(int) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
-
find pair punctuation of the given punctuation char
the purpose is to get the text between them
- getPartSpeech() - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
-
return the part of speech of the word.
- getPartSpeech() - Method in class org.lionsoul.jcseg.tokenizer.Word
-
- getPinyin() - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
-
return the pinying of the word
- getPinyin() - Method in class org.lionsoul.jcseg.tokenizer.Word
-
- getPollTime() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- getPosition() - Method in class org.lionsoul.jcseg.sentence.Sentence
-
- getPosition() - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
-
return the start position of the word.
- getPosition() - Method in class org.lionsoul.jcseg.tokenizer.Word
-
- getPPTMaxLength() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- getPropertieFile() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- getPunctuationPair(char) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
get the pair punctuation' pair
- getQueueSize() - Method in class org.lionsoul.jcseg.util.IPushbackReader
-
get the buffer size - the number of buffered data
- getScore() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor.Document
-
- getSeg() - Method in class org.lionsoul.jcseg.extractor.KeyphraseExtractor
-
- getSeg() - Method in class org.lionsoul.jcseg.extractor.KeywordsExtractor
-
- getSentence() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor.Document
-
- getSentenceNum() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor
-
- getSentenceSeg() - Method in class org.lionsoul.jcseg.extractor.SummaryExtractor
-
- getSingleWordsMorphemicFreedom() - Method in class org.lionsoul.jcseg.tokenizer.Chunk
-
- getSingleWordsMorphemicFreedom() - Method in interface org.lionsoul.jcseg.tokenizer.core.IChunk
-
return the degree of morphemic freedom for all
the single words.
- getSmallestVarianceWordLengthChunks(IChunk[]) - Static method in class org.lionsoul.jcseg.tokenizer.core.MMSegFilter
-
the smallest variance word length
this rule will the chunks that one the smallest variance word length
- getSTokenMinLen() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- getStreamPosition() - Method in class org.lionsoul.jcseg.tokenizer.ASegment
-
- getStreamPosition() - Method in interface org.lionsoul.jcseg.tokenizer.core.ISegment
-
get the current length of the stream
- getStreamPosition() - Method in class org.lionsoul.jcseg.tokenizer.DelimiterSeg
-
- getStreamPosition() - Method in class org.lionsoul.jcseg.tokenizer.DetectSeg
-
- getSummary(Reader, int) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor
-
- getSummary(Reader, int) - Method in class org.lionsoul.jcseg.extractor.SummaryExtractor
-
get summary from a reader
- getSummaryFromFile(String, int) - Method in class org.lionsoul.jcseg.extractor.SummaryExtractor
-
get document summary from a file
- getSummaryFromString(String, int) - Method in class org.lionsoul.jcseg.extractor.SummaryExtractor
-
get document summary from a string
- getSyn() - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
-
return the syn words of the word.
- getSyn() - Method in class org.lionsoul.jcseg.tokenizer.Word
-
- getTimeKey(String) - Static method in class org.lionsoul.jcseg.util.TimeUtil
-
get and return the time key part of the specified entity string
- getTimeKey(IWord) - Static method in class org.lionsoul.jcseg.util.TimeUtil
-
- getTimeKey(int) - Static method in class org.lionsoul.jcseg.util.TimeUtil
-
get and return the time key part with the part index value
- getType() - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
-
return the type of the word
- getType() - Method in class org.lionsoul.jcseg.tokenizer.Word
-
- getValue() - Method in class org.lionsoul.jcseg.sentence.Sentence
-
- getValue() - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
-
return the value of the word
- getValue() - Method in class org.lionsoul.jcseg.tokenizer.Word
-
- getWindowSize() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
-
- getWindowSize() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeywordsExtractor
-
- getWords() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor.Document
-
- getWords() - Method in class org.lionsoul.jcseg.tokenizer.Chunk
-
- getWords() - Method in interface org.lionsoul.jcseg.tokenizer.core.IChunk
-
get the all the words in the chunk.
- getWordSeg() - Method in class org.lionsoul.jcseg.extractor.SummaryExtractor
-
- getWordsVariance() - Method in class org.lionsoul.jcseg.tokenizer.Chunk
-
- getWordsVariance() - Method in interface org.lionsoul.jcseg.tokenizer.core.IChunk
-
return the variance of all the words in all
the chunks.
- gisb - Variable in class org.lionsoul.jcseg.sentence.SentenceSeg
-
global string buffer
- I_CN_NAME - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
identify the Chinese name?
- ialist - Variable in class org.lionsoul.jcseg.tokenizer.ASegment
-
- IChunk - Interface in org.lionsoul.jcseg.tokenizer.core
-
chunk interface for Jcseg.
- identifyCnName() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- idx - Variable in class org.lionsoul.jcseg.sentence.SentenceSeg
-
- idx - Variable in class org.lionsoul.jcseg.tokenizer.ASegment
-
the index value of the current input stream
mainly for track the start position of the token
- IHashQueue<T extends IWord> - Class in org.lionsoul.jcseg.util
-
A normal queue base one single link
but with hash index, so, it is fast for searching
- IHashQueue() - Constructor for class org.lionsoul.jcseg.util.IHashQueue
-
- IHashQueue.Entry<T> - Class in org.lionsoul.jcseg.util
-
innner Entry node class
- IIntFIFO - Class in org.lionsoul.jcseg.util
-
int first in first out queue base on single link
- IIntFIFO() - Constructor for class org.lionsoul.jcseg.util.IIntFIFO
-
- IIntFIFO.Entry - Class in org.lionsoul.jcseg.util
-
Item Entry inner class
- IIntQueue - Class in org.lionsoul.jcseg.util
-
char queue class base on double link
Not thread safe
- IIntQueue() - Constructor for class org.lionsoul.jcseg.util.IIntQueue
-
- IIntQueue.Entry - Class in org.lionsoul.jcseg.util
-
innner Entry node class
- ILexicon - Interface in org.lionsoul.jcseg.tokenizer.core
-
lexicon configuration class.
- increase(char) - Method in class org.lionsoul.jcseg.util.ByteCharCounter
-
- increase(char, int) - Method in class org.lionsoul.jcseg.util.ByteCharCounter
-
- insertionSort(T[]) - Static method in class org.lionsoul.jcseg.util.Sort
-
insert sort method
- insertionSort(T[], int, int) - Static method in class org.lionsoul.jcseg.util.Sort
-
method to sort an subarray from start to end with insertion sort algorithm
- IntArrayList - Class in org.lionsoul.jcseg.util
-
array list for basic int data type to intead of ArrayList
Well, this will save a lot work to Reopened and Unpacking
- IntArrayList() - Constructor for class org.lionsoul.jcseg.util.IntArrayList
-
- IntArrayList(int) - Constructor for class org.lionsoul.jcseg.util.IntArrayList
-
- IPushbackReader - Class in org.lionsoul.jcseg.util
-
IPushBackReader based on Reader
Not thread safe support unlimited unread operation
- IPushbackReader(Reader) - Constructor for class org.lionsoul.jcseg.util.IPushbackReader
-
- isAutoFilter() - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeywordsExtractor
-
- isAutoload() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
about lexicon autoload
- isb - Variable in class org.lionsoul.jcseg.tokenizer.ASegment
-
- isCJK(String) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
check if the specified string is all CJK chars
- isCJK(String, int, int) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
- isCJKChar(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
check the specified char is CJK, Thai...
- isCNNumeric(char) - Static method in class org.lionsoul.jcseg.util.NumericUtil
-
check if the given char is a Chinese numeric or not
- isCnPunctuation(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
- isDate(String, char) - Static method in class org.lionsoul.jcseg.util.EntityFormat
-
check if the specified string is an valid Latin Date string
like "2017/02/22", "2017-02-22" or "2017.02.22"
- isDecimal(String) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
check the specified char is a decimal including the full-width char
- isDecimal(String, int, int) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
- isDigit(String) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
check the specified char is a digit or not
true will return if it is or return false this method can recognize full-with char
- isDigit(String, int, int) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
- ISegment - Interface in org.lionsoul.jcseg.tokenizer.core
-
Jcseg segment interface
- isEnChar(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
check the specified char is a basic Latin and Russia and
Greece letter.
- isENKeepPunctuaton(char) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
check the given char is English keep punctuation
- isEnLetter(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
include the full-width and half-width char
- isEnNumeric(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
check the specified char is an English numeric(48-57)
including the full-width char
- isEnPunctuation(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
check the given char is half-width punctuation
- isFWEnChar(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
check the given char is a full-width char
AT+reader: the full-width punctuation is not included here
- isHWEnChar(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
check the given char is a half-width char or not
- isIpAddress(String) - Static method in class org.lionsoul.jcseg.util.EntityFormat
-
check if the specified string is a IPv4/v6 address
v6 is not supported for now
- isKeepPunctuation(char) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- isLatin(String) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
check if the specified string is all Latin chars
- isLatin(String, int, int) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
- isLetter(String) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
check if the specified string is Latin letter
- isLetter(String, int, int) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
- isLetterNumber(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
check the specified char is Letter number like 'ⅠⅡ'
true will be return if it is, or return false
- isLetterOrNumeric(String) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
check if the specified string is Latin numeric or letter
- isLetterOrNumeric(String, int, int) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
- isLowerCaseLetter(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
- isMailAddress(String) - Static method in class org.lionsoul.jcseg.util.EntityFormat
-
check if the specified string is an email address or not
- isMobileNumber(String) - Static method in class org.lionsoul.jcseg.util.EntityFormat
-
check if the specified string is a mobile number
- isNoTailingPunctuation(char) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
check if the given punctuation is the one that need to be cleared
- isNumeric(String) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
check if the specified string it Latin numeric
- isNumeric(String, int, int) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
- isOtherNumber(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
check the specified char is other number like '①⑩⑽㈩'
true will be return if it is, or return false
- isPairPunctuation(char) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
check the given char is pair punctuation or not
- isSync() - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
- isTime(String) - Static method in class org.lionsoul.jcseg.util.EntityFormat
-
check if the specified string is a valid time string
like '12:45', '12:45:12'
- IStringBuffer - Class in org.lionsoul.jcseg.util
-
string buffer class
- IStringBuffer() - Constructor for class org.lionsoul.jcseg.util.IStringBuffer
-
create a buffer with a default length 16
- IStringBuffer(int) - Constructor for class org.lionsoul.jcseg.util.IStringBuffer
-
create a buffer with a specified length
- IStringBuffer(String) - Constructor for class org.lionsoul.jcseg.util.IStringBuffer
-
create a buffer with a specified string
- isUpperCaseLetter(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
- isUrlAddress(String, ADictionary) - Static method in class org.lionsoul.jcseg.util.EntityFormat
-
check if the specified string is an URL address or not
- isWhitespace(int) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
check the given string is a whitespace
- IWord - Interface in org.lionsoul.jcseg.tokenizer.core
-
Word interface
- ladCJKPos() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- last() - Method in class org.lionsoul.jcseg.util.IStringBuffer
-
always return the last char
- latinIndexOf(String, int) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
get the index of the first Latin char of the specified string
- latinIndexOf(String) - Static method in class org.lionsoul.jcseg.util.StringUtil
-
- length() - Method in class org.lionsoul.jcseg.util.IStringBuffer
-
return the length of the buffer
- LEX_PROPERTY_FILE - Static variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
default lexicon property file name
- LexiconException - Exception in org.lionsoul.jcseg.tokenizer.core
-
JCSeg Dictionary configuration exception class
- LexiconException(String) - Constructor for exception org.lionsoul.jcseg.tokenizer.core.LexiconException
-
- LexiconException(Throwable) - Constructor for exception org.lionsoul.jcseg.tokenizer.core.LexiconException
-
- LexiconException(String, Throwable) - Constructor for exception org.lionsoul.jcseg.tokenizer.core.LexiconException
-
- load(File) - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
load all the words from a specified lexicon file
- load(String) - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
load all the words from a specified lexicon path
- load(InputStream) - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
load all the words from a specified lexicon input stream
- load(String) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
initialize the value of its options from a speicfied
jcseg.properties propertie file
- load(InputStream) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
initialize the value of its options from a InputStream
of a jcseg.properties prperties file
- LOAD_CJK_ENTITY - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
whether to load the entity define
- LOAD_CJK_PINYIN - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
whether to load the Pinyin of the CJK_WORDS
- LOAD_CJK_POS - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
whether to load the word's part of speech
- LOAD_CJK_SYN - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
whether to load the syn word of the CJK_WORDS.
- loadCJKEntity() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- loadCJKPinyin() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- loadCJKSyn() - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- loadClassPath() - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
load all the words from all the files under the specified class path.
- loadDirectory(String) - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
load the all the words form all the files under a specified lexicon directory
- loadWords(JcsegTaskConfig, ADictionary, File) - Static method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
load all the words in the specified lexicon file into the dictionary
- loadWords(JcsegTaskConfig, ADictionary, String) - Static method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
load all the words from a specified lexicon file path
- loadWords(JcsegTaskConfig, ADictionary, InputStream) - Static method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
load words from a InputStream
- NAME_POSPEECH - Static variable in interface org.lionsoul.jcseg.tokenizer.core.IWord
-
- NAME_SINGLE_THRESHOLD - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
the threshold of the single word that is a single word
when it and the last char of the name make up a word.
- next() - Method in class org.lionsoul.jcseg.sentence.SentenceSeg
-
get the next sentence
- next() - Method in class org.lionsoul.jcseg.tokenizer.ASegment
-
- next() - Method in interface org.lionsoul.jcseg.tokenizer.core.ISegment
-
segment a word from a char array
from a specified position.
- next() - Method in class org.lionsoul.jcseg.tokenizer.DelimiterSeg
-
- next() - Method in class org.lionsoul.jcseg.tokenizer.DetectSeg
-
- next() - Method in class org.lionsoul.jcseg.tokenizer.NLPSeg
-
Override the next method to add the date-time entity recognition
And we also invoke the parent.next method to get the next token
- next - Variable in class org.lionsoul.jcseg.util.IHashQueue.Entry
-
- next - Variable in class org.lionsoul.jcseg.util.IIntFIFO.Entry
-
- next - Variable in class org.lionsoul.jcseg.util.IIntQueue.Entry
-
- nextCJKSentence(int) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
-
load a CJK char list from the stream start from the
current position till the char is not a CJK char
- nextCNNumeric(char[], int) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
-
find the Chinese number from the current position
count until the char in the specified position is not a other number or whitespace
- nextLatinString(int) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
-
the simple version of the next basic Latin fetch logic
Just return the next Latin string with the keep punctuation after it
- nextLatinWord(int, int) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
-
find the letter or digit word from the current position
count until the char is whitespace or not letter_digit
- nextLatinWord(int, int) - Method in class org.lionsoul.jcseg.tokenizer.NLPSeg
-
find the letter or digit word from the current position
count until the char is whitespace or not letter_digit
- nextLetterNumber(int) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
-
find the next other letter from the current position
find the letter number from the current position
count until the char in the specified position is not a letter number or whitespace
- nextOtherNumber(int) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
-
find the other number from the current position
count until the char in the specified position is not a other number or whitespace
- NLP_MODE - Static variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- NLPSeg - Class in org.lionsoul.jcseg.tokenizer
-
NLP segmentation implementation
And this extends all the properties of the Complex one
the rest of them are build for NLP only
- NLPSeg(Reader, JcsegTaskConfig, ADictionary) - Constructor for class org.lionsoul.jcseg.tokenizer.NLPSeg
-
- NLPSeg(JcsegTaskConfig, ADictionary) - Constructor for class org.lionsoul.jcseg.tokenizer.NLPSeg
-
- NUMERIC_POSPEECH - Static variable in interface org.lionsoul.jcseg.tokenizer.core.IWord
-
- NumericUtil - Class in org.lionsoul.jcseg.util
-
a class to deal with Chinese numeric
- NumericUtil() - Constructor for class org.lionsoul.jcseg.util.NumericUtil
-
- SEARCH_MODE - Static variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- SearchSeg - Class in org.lionsoul.jcseg.tokenizer
-
search mode implementation all the possible combination will be returned,
and build it for search of course.
- SearchSeg(JcsegTaskConfig, ADictionary) - Constructor for class org.lionsoul.jcseg.tokenizer.SearchSeg
-
- SearchSeg(Reader, JcsegTaskConfig, ADictionary) - Constructor for class org.lionsoul.jcseg.tokenizer.SearchSeg
-
- seg - Variable in class org.lionsoul.jcseg.extractor.KeyphraseExtractor
-
the ISegment object
- seg - Variable in class org.lionsoul.jcseg.extractor.KeywordsExtractor
-
the ISegment object
- SegmentFactory - Class in org.lionsoul.jcseg.tokenizer.core
-
Segment factory to create singleton ISegment object
a path of the class that has implemented the ISegment interface must be given first
- SegmentFactory() - Constructor for class org.lionsoul.jcseg.tokenizer.core.SegmentFactory
-
- Sentence - Class in org.lionsoul.jcseg.sentence
-
sentence desc class
- Sentence(String, int) - Constructor for class org.lionsoul.jcseg.sentence.Sentence
-
construct method
- Sentence(String) - Constructor for class org.lionsoul.jcseg.sentence.Sentence
-
- sentence(String) - Method in class org.lionsoul.jcseg.test.JcsegTest
-
key sentence extractor
- sentenceNum - Variable in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor
-
- sentenceSeg - Variable in class org.lionsoul.jcseg.extractor.SummaryExtractor
-
sentence splitter object
- SentenceSeg - Class in org.lionsoul.jcseg.sentence
-
document sentence splitter
- SentenceSeg(Reader) - Constructor for class org.lionsoul.jcseg.sentence.SentenceSeg
-
construct method
- SentenceSeg() - Constructor for class org.lionsoul.jcseg.sentence.SentenceSeg
-
- set(int, int) - Method in class org.lionsoul.jcseg.util.IntArrayList
-
- set(int, char) - Method in class org.lionsoul.jcseg.util.IStringBuffer
-
set the char at the specified index
- setAppendCJKPinyin(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setAppendCJKSyn(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setAppendPartOfSpeech(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setAutoFilter(boolean) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeywordsExtractor
-
- setAutoload(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setAutoMinLength(int) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
-
- setClearStopwords(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setCnFactionToArabic(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setCnNumToArabic(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setConfig(JcsegTaskConfig) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
-
set the current task configuration instance.
- setConfig(JcsegTaskConfig) - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
- setConfig(JcsegTaskConfig) - Method in class org.lionsoul.jcseg.tokenizer.DelimiterSeg
-
set the current configuration
- setConfig(JcsegTaskConfig) - Method in class org.lionsoul.jcseg.tokenizer.DetectSeg
-
set the current task config
- setDelimiter(char) - Method in class org.lionsoul.jcseg.tokenizer.DelimiterSeg
-
set the delimiter default to whitespace
- setDic(ADictionary) - Method in class org.lionsoul.jcseg.tokenizer.DelimiterSeg
-
set the current dictionary
- setDict(ADictionary) - Method in class org.lionsoul.jcseg.tokenizer.ASegment
-
set the current dictionary
- setDict(ADictionary) - Method in class org.lionsoul.jcseg.tokenizer.DetectSeg
-
set the current dictionary instance
- setEnSecondSeg(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setEntity(String) - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
-
set the entity name of the word
- setEntity(String) - Method in class org.lionsoul.jcseg.tokenizer.Word
-
- setFile(File) - Method in class org.lionsoul.jcseg.tokenizer.core.AutoLoadFile
-
- setICnName(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setIndex(int) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor.Document
-
- setKeepPunctuations(String) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setKeepUnregWords(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setKeywordsNum(int) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
-
- setKeywordsNum(int) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeywordsExtractor
-
- setLastUpdateTime(long) - Method in class org.lionsoul.jcseg.tokenizer.core.AutoLoadFile
-
- setLength(int) - Method in class org.lionsoul.jcseg.sentence.Sentence
-
- setLength(int) - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
-
self define the length
- setLength(int) - Method in class org.lionsoul.jcseg.tokenizer.Word
-
- setLexiconPath(String[]) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setLoadCJKPinyin(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setLoadCJKPos(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setLoadCJKSyn(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setLoadEntity(boolean) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setMaxCnLnadron(int) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setMaxIterateNum(int) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
-
- setMaxIterateNum(int) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeywordsExtractor
-
- setMaxIterateNum(int) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor
-
- setMaxLength(int) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setMaxWordsNum(int) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
-
- setNameSingleThreshold(int) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setPartSpeech(String[]) - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
-
- setPartSpeech(String[]) - Method in class org.lionsoul.jcseg.tokenizer.Word
-
- setPinyin(String) - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
-
set the pinying of the word
- setPinyin(String) - Method in class org.lionsoul.jcseg.tokenizer.Word
-
- setPollTime(int) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setPosition(int) - Method in class org.lionsoul.jcseg.sentence.Sentence
-
- setPosition(int) - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
-
set the position of the word
- setPosition(int) - Method in class org.lionsoul.jcseg.tokenizer.Word
-
- setPPT_MAX_LENGTH(int) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setScore(double) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor.Document
-
- setSeg(ISegment) - Method in class org.lionsoul.jcseg.extractor.KeyphraseExtractor
-
- setSeg(ISegment) - Method in class org.lionsoul.jcseg.extractor.KeywordsExtractor
-
- setSentence(Sentence) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor.Document
-
- setSentenceNum(int) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor
-
- setSentenceSeg(SentenceSeg) - Method in class org.lionsoul.jcseg.extractor.SummaryExtractor
-
- setSTokenMinLen(int) - Method in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
- setSyn(String[]) - Method in interface org.lionsoul.jcseg.tokenizer.core.IWord
-
- setSyn(String[]) - Method in class org.lionsoul.jcseg.tokenizer.Word
-
- setValue(String) - Method in class org.lionsoul.jcseg.sentence.Sentence
-
- setWindowSize(int) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeyphraseExtractor
-
- setWindowSize(int) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankKeywordsExtractor
-
- setWords(List<IWord>) - Method in class org.lionsoul.jcseg.extractor.impl.TextRankSummaryExtractor.Document
-
- setWordSeg(ISegment) - Method in class org.lionsoul.jcseg.extractor.SummaryExtractor
-
- shellSort(T[]) - Static method in class org.lionsoul.jcseg.util.Sort
-
shell sort algorithm
- SIMPLE_MODE - Static variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
simple algorithm or complex algorithm
- SimpleSeg - Class in org.lionsoul.jcseg.tokenizer
-
Jcseg simple segmentation implements extend from ASegment
- SimpleSeg(JcsegTaskConfig, ADictionary) - Constructor for class org.lionsoul.jcseg.tokenizer.SimpleSeg
-
- SimpleSeg(Reader, JcsegTaskConfig, ADictionary) - Constructor for class org.lionsoul.jcseg.tokenizer.SimpleSeg
-
- SIMSTR - Static variable in class org.lionsoul.jcseg.util.STConverter
-
- SimToTraditional(String) - Static method in class org.lionsoul.jcseg.util.STConverter
-
convert the simplified words to traditional words
of the specified string.
- SimToTraditional(String, IStringBuffer) - Static method in class org.lionsoul.jcseg.util.STConverter
-
- size(int) - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
return the size of the dictionary
- size(int) - Method in class org.lionsoul.jcseg.tokenizer.Dictionary
-
- size() - Method in class org.lionsoul.jcseg.util.IHashQueue
-
get the size of the queue
- size() - Method in class org.lionsoul.jcseg.util.IIntFIFO
-
get the size of the queue
- size() - Method in class org.lionsoul.jcseg.util.IIntQueue
-
get the size of the queue
- size() - Method in class org.lionsoul.jcseg.util.IntArrayList
-
- Sort - Class in org.lionsoul.jcseg.util
-
All kind of Sort algorithm implemented method use the default compare method
- Sort() - Constructor for class org.lionsoul.jcseg.util.Sort
-
- START_SS_MASK - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ISegment
-
- startAutoload() - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
start the lexicon autoload thread
- STConverter - Class in org.lionsoul.jcseg.util
-
Simplified and traditional chinese convert class
all the search work base on
String.indexOf(int)
you may store all the words in a HashMap for the purpuse of a faster fetch
- STConverter() - Constructor for class org.lionsoul.jcseg.util.STConverter
-
- STOKEN_MIN_LEN - Variable in class org.lionsoul.jcseg.tokenizer.core.JcsegTaskConfig
-
Less length for the second split to make up a word
- STOP_WORD - Static variable in interface org.lionsoul.jcseg.tokenizer.core.ILexicon
-
stop words
- stopAutoload() - Method in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-
- StringUtil - Class in org.lionsoul.jcseg.util
-
a class to deal with the English stop char like the English punctuation
- StringUtil() - Constructor for class org.lionsoul.jcseg.util.StringUtil
-
- summary(String) - Method in class org.lionsoul.jcseg.test.JcsegTest
-
summary extractor
- SummaryExtractor - Class in org.lionsoul.jcseg.extractor
-
document summary extractor
- SummaryExtractor(ISegment, SentenceSeg) - Constructor for class org.lionsoul.jcseg.extractor.SummaryExtractor
-
construct method
- sync - Variable in class org.lionsoul.jcseg.tokenizer.core.ADictionary
-