我正在使用存储在列表docToken中的令牌生成双字母组。
print(docToken[520])
输出:['sleepy','account','just','man','tired','twitter','case', 'romney','candidate','looks']
list(nltk.bigrams(docToken[520]))
输出:[('sleepy','account'),('account','just'),('just','man'), ('man','tired'),('tired','twitter'),('twitter','case'), ('case','romney'),('romney','candidate'),('candidate','looks')]
,当我在循环中使用nltk.bigrams(docToken[i])
时,在范围> = 1000上出现以下错误:
bigram=[]
for i in range(5000):
ls=list(nltk.bigrams(docToken[i]))
for j in ls:
bigram.append(list(j))
当第一个循环中的range(500)时,它工作得很好,但是当Range为1000或更大时,它给我以下错误:
StopIteration Traceback (most recent call last)
~\Anaconda3\lib\site-packages\nltk\util.py in ngrams(sequence, n, pad_left,
pad_right, left_pad_symbol, right_pad_symbol)
467 while n > 1:
--> 468 history.append(next(sequence))
469 n -= 1
StopIteration:
The above exception was the direct cause of the following exception:
RuntimeError Traceback (most recent call last)
<ipython-input-76-8982951528bd> in <module>()
1 bigram=[]
2 for i in range(5000):
----> 3 ls=list(nltk.bigrams(docToken[i]))
4 for j in ls:
5 bigram.append(list(j))
~\Anaconda3\lib\site-packages\nltk\util.py in bigrams(sequence, **kwargs)
489 """
490
--> 491 for item in ngrams(sequence, 2, **kwargs):
492 yield item
493
RuntimeError: generator raised StopIteration
答案 0 :(得分:1)
我通过将nltk从3.3-> 3.4升级来解决了此问题
做简单-pip install nltk == 3.4
希望有效!
答案 1 :(得分:1)
我也遇到了同样的错误。一种可能的原因可能是docToken
中的元素之一是一个空列表。
例如,当i=2
作为第二个元素为空列表时,以下代码引发相同的错误。
from nltk import bigrams
docToken= [['the', 'wildlings', 'are', 'dead'], [], ['do', 'the', 'dead', 'frighten', 'you', 'ser', 'waymar']]
for i in range(3):
print (i)
print (list(nltk.bigrams(docToken[i])))
输出:
0
[('the', 'wildlings'), ('wildlings', 'are'), ('are', 'dead')]
1
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\nltk\util.py in ngrams(sequence, n, pad_left, pad_right, left_pad_symbol, right_pad_symbol)
467 while n > 1:
--> 468 history.append(next(sequence))
469 n -= 1
StopIteration:
The above exception was the direct cause of the following exception:
RuntimeError Traceback (most recent call last)
<ipython-input-58-91f35cae32ed> in <module>
2 for i in range(3):
3 print (i)
----> 4 list(nltk.bigrams(docToken[i]))
~\AppData\Local\Continuum\anaconda3\lib\site-packages\nltk\util.py in bigrams(sequence, **kwargs)
489 """
490
--> 491 for item in ngrams(sequence, 2, **kwargs):
492 yield item
493
RuntimeError: generator raised StopIteration
您可以过滤docToken
中的空列表,然后创建二元组:
docToken= [['the', 'wildlings', 'are', 'dead'], [], ['do', 'the', 'dead', 'frighten', 'you', 'ser', 'waymar']]
docToken = [x for x in docToken if x]
bigram = []
for i in range(len(docToken)):
bigram.append(["_".join(w) for w in bigrams(docToken[i])])
bigram
输出:
[['the_wildlings', 'wildlings_are', 'are_dead'],
['do_the',
'the_dead',
'dead_frighten',
'frighten_you',
'you_ser',
'ser_waymar']]
另一个可能的原因可能是您在python 3.7中使用了nltk
3.3。
请使用nltk 3.4,它是第一个受Python 3.7支持的版本,您的问题应在此版本中解决。
请参阅here。
答案 2 :(得分:0)
我无法解决此错误。不知道为什么nltk.bigrams(docToken[i])
会生成此代码,但是我能够通过使用以下代码来创建二元组。
bigram={}
for i in range(size):
ls=[]
for j in range(len(docToken[i])-1):
for k in range(j,len(docToken[i])-1):
ls.append([docToken[i][j],docToken[i][k+1]])
bigram[i]=ls
答案 3 :(得分:0)
首先卸载当前版本的NLTK
pip uninstall nltk==3.2.5
然后安装最新版本的NLTK
pip install nltk==3.6.2
然后检查NLTK版本,应该是3.6.2
import nltk
print('The nltk version is {}.'.format(nltk.__version__))
这将解决问题。