从NLTK停用词列表中添加和删除词

时间:2018-07-26 08:41:12

标签: python python-3.x list set nltk

我正在尝试从NLTK停用词列表中添加和删除单词:

res

输出:

from nltk.corpus import stopwords

stop_words = set(stopwords.words('french'))

#add words that aren't in the NLTK stopwords list
new_stopwords = ['cette', 'les', 'cet']
new_stopwords_list = set(stop_words.extend(new_stopwords))

#remove words that are in NLTK stopwords list
not_stopwords = {'n', 'pas', 'ne'} 
final_stop_words = set([word for word in new_stopwords_list if word not in not_stopwords])

print(final_stop_words)

3 个答案:

答案 0 :(得分:3)

尝试一下:

from nltk.corpus import stopwords

stop_words = set(stopwords.words('french'))

#add words that aren't in the NLTK stopwords list
new_stopwords = ['cette', 'les', 'cet']
new_stopwords_list = stop_words.union(new_stopwords)

#remove words that are in NLTK stopwords list
not_stopwords = {'n', 'pas', 'ne'} 
final_stop_words = set([word for word in new_stopwords_list if word not in not_stopwords])

print(final_stop_words)

答案 1 :(得分:1)

list(set(...))插入set(...),因为只有列表具有称为extend的方法:

...
stop_words = list(set(stopwords.words('french')))
...

答案 2 :(得分:1)

您可以使用update代替extend,并以这种方式替换此行new_stopwords_list = set(stop_words.extend(new_stopwords))

stop_words.update(new_stopwords)
new_stopwords_list = set(stop_words)

顺便说一句,如果您使用包含单词set的名称来调用list,则可能会造成混淆。