计算多个单词的频率

时间:2018-02-17 01:16:00

标签: pandas count frequency

syntax error snippet of the output

我使用了这段代码

Public Sub NavigateToURL1()
  driver.Get [Sheet4!B2]
End Sub

计算我的pandas数据框中每行出现mulcair的次数。我试图重复相同的内容,但是对于一组单词,例如

unclassified_df['COUNT'] = unclassified_df.tweet.str.count('mulcair')

我在某个地方看到我可以使用Liberal = ['lpc','ptlib','justin','trudeau','realchange','liberal', 'liberals', "liberal2015",'lib2015','justin2015', 'trudeau2015', 'lpc2015'] 及其collection.Counter(data)方法,请任何人帮助我。

1 个答案:

答案 0 :(得分:0)

from collections import Counter
import pandas as pd

#check frequency for the following for each row, but no repetition for row  
Liberal =  ['lpc','justin','trudeau','realchange','liberal', 'liberals', "liberal2015",       'lib2015','justin2015', 'trudeau2015', 'lpc2015']

#sample data
data = {'tweet': ['lpc living dream camerama', "jsutingnasndsa dnsadnsadnsa dsalpcdnsa",      "but", 'mulcair suggests thereslcp bad lpc blood']}

# the data frame with one coloumn tweet
df = pd.DataFrame(data,columns=['tweet'])

#no duplicates per row
print [(df.tweet.str.contains(word).sum(),word) for word in Liberal]

#captures all duplicates located in  each row
print pd.Series({w: df.tweet.str.count(w).sum() for w in Liberal})

Code Execution

<强>参考文献: Contains & match