使用gensim的N-gram

时间:2018-12-26 09:34:30

标签: python-3.x nlp gensim

我正在尝试使用gensim生成二元语法,但是gensim使用了搭配定理的概念,该理论主要基于某些短语的共现。

我只是按照以下方式查找二元语法。

"I", "read", "a", "book", "about", "the", "history", "of", "America"
"I read", "read a", "a book", "book about", "about the", "the history", "history of", "of America"

可以使用的参考代码:

from gensim.test.utils import datapath
from gensim.models.word2vec import Text8Corpus
from gensim.models.phrases import Phrases, Phraser
sentences = Text8Corpus(datapath('testcorpus.txt'))
phrases = Phrases(sentences, min_count=1, threshold=1)  # train model
phrases[[u'trees', u'graph', u'minors']] 

0 个答案:

没有答案