在熊猫中删除字符串中的特定单词

时间:2019-04-05 11:02:03

标签: python-3.x pandas nltk

我正在尝试删除列的每个值中的几个单词,但没有任何反应。

stop_words = ["and","lang","naman","the","sa","ko","na",
              "yan","n","yang","mo","ung","ang","ako","ng",
              "ndi","pag","ba","on","un","Me","at","to",
              "is","sia","kaya","I","s","sla","dun","po","b","pro"
             ]

newdata['Verbatim'] = newdata['Verbatim'].replace(stop_words,'', inplace = True)

我正在尝试从替换结果中生成一个词云,但我得到的是相同的词(这并不意味着什么,但数量很大)

1 个答案:

答案 0 :(得分:2)

对于正则表达式<script src="https://cdnjs.cloudflare.com/ajax/libs/vue/2.5.17/vue.js"></script> <div id="app"> <basic-input :value="name"></basic-input> <p> <strong>Name:</strong> {{ name }} </p> </div>,可以将单词边界\b与连接值由|一起使用:

OR

另一种解决方案是使用pat = '|'.join(r"\b{}\b".format(x) for x in stop_words) newdata['Verbatim'] = newdata['Verbatim'].str.replace(pat, '') 值,删除停用词并在lambda函数中与sapce联接起来。

split

示例

stop_words = set(stop_words)
f = lambda x: ' '.join(w for w in x.split() if not w in stop_words)
newdata['Verbatim'] = newdata['Verbatim'].apply(f)