| Package | Description |
|---|---|
| org.lionsoul.jcseg.extractor | |
| org.lionsoul.jcseg.extractor.impl | |
| org.lionsoul.jcseg.tokenizer | |
| org.lionsoul.jcseg.tokenizer.core | |
| org.lionsoul.jcseg.util |
| Modifier and Type | Method and Description |
|---|---|
protected boolean |
KeywordsExtractor.filter(IWord word)
word item filter
|
protected boolean |
KeyphraseExtractor.filter(IWord word)
word item filter
|
| Modifier and Type | Method and Description |
|---|---|
List<IWord> |
TextRankSummaryExtractor.Document.getWords() |
| Modifier and Type | Method and Description |
|---|---|
void |
TextRankSummaryExtractor.Document.setWords(List<IWord> words) |
protected TextRankSummaryExtractor.Document[] |
TextRankSummaryExtractor.textRankSortedDocuments(List<Sentence> sentence,
List<List<IWord>> senWords)
get the documents order by relevance score.
|
| Constructor and Description |
|---|
Document(int index,
Sentence sentence,
List<IWord> words,
double score)
construct method
|
| Modifier and Type | Class and Description |
|---|---|
class |
Word
word class for Jcseg with the
org.lionsoul.jcseg.core.IWord interface implemented
at 2017/03/29:
make the synonyms series method Word.getSyn() Word.setSyn(String[]) Word.addSyn(String)
and the part of speech series method Word.getPartSpeech() Word.setPartSpeech(String[]) Word.addPartSpeech(String)
and the Word.clone() method synchronized for may coming concurrent access. |
| Modifier and Type | Field and Description |
|---|---|
protected LinkedList<IWord> |
ASegment.wordPool
CJK word cache pool, Reusable string buffer
and the array list for basic integer
|
protected LinkedList<IWord> |
DelimiterSeg.wordPool |
| Modifier and Type | Method and Description |
|---|---|
IWord |
Dictionary.add(int t,
IWord word) |
IWord |
Dictionary.add(int t,
String key,
int type) |
IWord |
Dictionary.add(int t,
String key,
int fre,
int type) |
IWord |
Dictionary.add(int t,
String key,
int fre,
int type,
String entity) |
IWord |
Dictionary.add(int t,
String key,
int type,
String entity) |
IWord |
Word.clone()
Interface to clone the current object
|
protected IWord |
ASegment.enSecondSeg(IWord w,
boolean retfw)
Do the secondary split for the specified complex Latin word
This will split a complex English, Arabic, punctuation compose word to multiple simple parts
Like 'qq2013' will split to 'qq' and '2013'
|
IWord |
Dictionary.get(int t,
String key) |
protected IWord |
ASegment.getNextCJKWord(int c,
int pos)
get the next CJK word from the current position of the input stream
|
protected IWord |
NLPSeg.getNextCJKWord(int c,
int pos) |
protected IWord |
SearchSeg.getNextCJKWord(int c,
int pos)
get the next CJK word from the current position of the input stream
and this function is the core part the most segmentation implements
|
protected IWord |
NLPSeg.getNextDatetimeWord(IWord word)
get and return the next date-time word
|
protected IWord |
ASegment.getNextLatinWord(int c,
int pos)
get the next Latin word from the current position of the input stream
|
protected IWord[] |
ASegment.getNextMatch(char[] chars,
int index)
match the next CJK word in the dictionary
|
protected IWord |
ASegment.getNextMixedWord(char[] chars,
int cjkidx)
get the next mixed word, CJK-English or CJK-English-CJK or whatever
|
protected IWord |
ASegment.getNextPunctuationPairWord(int c,
int pos)
get the next punctuation pair word from the current position
of the input stream.
|
protected IWord |
NLPSeg.getNextTimeMergedWord(IWord word)
get and return the next time merged date-time word
|
IWord[] |
Chunk.getWords() |
IWord |
DetectSeg.next() |
IWord |
ASegment.next() |
IWord |
NLPSeg.next()
Override the next method to add the date-time entity recognition
And we also invoke the parent.next method to get the next token
|
IWord |
DelimiterSeg.next() |
protected IWord |
ASegment.nextLatinWord(int c,
int pos)
find the letter or digit word from the current position
count until the char is whitespace or not letter_digit
|
protected IWord |
NLPSeg.nextLatinWord(int c,
int pos)
find the letter or digit word from the current position
count until the char is whitespace or not letter_digit
|
| Modifier and Type | Method and Description |
|---|---|
IWord |
Dictionary.add(int t,
IWord word) |
protected void |
ASegment.appendLatinSyn(IWord w)
Check and append the synonyms words of specified word included the CJK and basic Latin words
All the synonyms words share the same position part of speech, word type with the primitive word
|
protected void |
ASegment.appendWordFeatures(IWord word)
check and append the pinyin and the synonyms words of the specified word
|
protected IWord |
ASegment.enSecondSeg(IWord w,
boolean retfw)
Do the secondary split for the specified complex Latin word
This will split a complex English, Arabic, punctuation compose word to multiple simple parts
Like 'qq2013' will split to 'qq' and '2013'
|
boolean |
ASegment.findCHName(IWord w,
IChunk chunk)
Deprecated.
|
protected IWord |
NLPSeg.getNextDatetimeWord(IWord word)
get and return the next date-time word
|
protected IWord |
NLPSeg.getNextTimeMergedWord(IWord word)
get and return the next time merged date-time word
|
| Constructor and Description |
|---|
Chunk(IWord[] words) |
| Modifier and Type | Method and Description |
|---|---|
abstract IWord |
ADictionary.add(int t,
IWord word)
directly add a IWord item to the dictionary
|
abstract IWord |
ADictionary.add(int t,
String key,
int type)
add a new word to the dictionary
|
abstract IWord |
ADictionary.add(int t,
String key,
int fre,
int type)
add a new word to the dictionary
|
abstract IWord |
ADictionary.add(int t,
String key,
int fre,
int type,
String entity)
add a new word to the dictionary with its statistics frequency
|
abstract IWord |
ADictionary.add(int t,
String key,
int type,
String entity)
add a new word to the dictionary
|
IWord |
IWord.clone()
make clone available
|
abstract IWord |
ADictionary.get(int t,
String key)
return the IWord associate with the given key.
|
IWord[] |
IChunk.getWords()
get the all the words in the chunk.
|
IWord |
ISegment.next()
segment a word from a char array
from a specified position.
|
| Modifier and Type | Method and Description |
|---|---|
abstract IWord |
ADictionary.add(int t,
IWord word)
directly add a IWord item to the dictionary
|
| Modifier and Type | Class and Description |
|---|---|
class |
IHashQueue<T extends IWord>
A normal queue base one single link
but with hash index, so, it is fast for searching
|
| Modifier and Type | Method and Description |
|---|---|
static IWord[] |
TimeUtil.createDateTimePool()
create and return a date-time pool
|
| Modifier and Type | Method and Description |
|---|---|
static void |
TimeUtil.fillDateTimePool(IWord[] wPool,
int pIdx,
IWord word)
fill the date-time pool specified part with part index constant
|
static void |
TimeUtil.fillDateTimePool(IWord[] wPool,
int pIdx,
IWord word)
fill the date-time pool specified part with part index constant
|
static int |
TimeUtil.fillDateTimePool(IWord[] wPool,
IWord word)
fill the date-time pool specified part through the specified
time entity string.
|
static int |
TimeUtil.fillDateTimePool(IWord[] wPool,
IWord word)
fill the date-time pool specified part through the specified
time entity string.
|
static void |
TimeUtil.fillTimeToPool(IWord[] wPool,
String timeVal)
fill a date-time time part with a standard time format like '15:45:36'
to the specified time pool
|
static String |
TimeUtil.getTimeKey(IWord word) |
Copyright © 2017. All Rights Reserved.