我正在尝试删除列的每个值中的几个单词,但没有任何反应。
stop_words = ["and","lang","naman","the","sa","ko","na",
"yan","n","yang","mo","ung","ang","ako","ng",
"ndi","pag","ba","on","un","Me","at","to",
"is","sia","kaya","I","s","sla","dun","po","b","pro"
]
newdata['Verbatim'] = newdata['Verbatim'].replace(stop_words,'', inplace = True)
我正在尝试从替换结果中生成一个词云,但我得到的是相同的词(这并不意味着什么,但数量很大)
答案 0 :(得分:2)
对于正则表达式<script src="https://cdnjs.cloudflare.com/ajax/libs/vue/2.5.17/vue.js"></script>
<div id="app">
<basic-input :value="name"></basic-input>
<p>
<strong>Name:</strong> {{ name }}
</p>
</div>
,可以将单词边界\b
与连接值由|
一起使用:
OR
另一种解决方案是使用pat = '|'.join(r"\b{}\b".format(x) for x in stop_words)
newdata['Verbatim'] = newdata['Verbatim'].str.replace(pat, '')
值,删除停用词并在lambda函数中与sapce联接起来。
split
示例:
stop_words = set(stop_words)
f = lambda x: ' '.join(w for w in x.split() if not w in stop_words)
newdata['Verbatim'] = newdata['Verbatim'].apply(f)