solr字数,在一个句子中逐字逐句

时间:2014-04-25 01:43:51

标签: solr facet

{
id: "1698627066",
screen_name: "RomanceInfinity",
text: [
"Going NYP to have lunch with bro because I got too much time in between!!!",
"nyp"
],
stance: "",
source: "<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>",
fromid: "411377814521147392",
favourite: "false",
date: "2013-12-13 14:11:28",
replyto: "",
replytoid: "",
retweetfrom: "",
domaintype: "",
keywords: [
"nyp"
],
ratio: "",
latitude: 1000,
longitude: 1000,
retweet: 0,
mood_joy: "0.0",
mood_sadness: "0.0",
mood_surprised: "0.0",
mood_disgusted: "0.0",
mood_anger: "1.0",
_version_: 1454285708574326800
},
{

我是否有办法使用方面计算句子中的每个单词&#34;去纽约市与兄弟共进午餐,因为我间有太多时间!!&#34;? 例如,Going = 1 Nyp = 1 Lunch = 1的结果并且还没有计算标点符号?

1 个答案:

答案 0 :(得分:0)

使用以下命令从dict中提取text

name_of_dict['text'][0]

这将提取句子:

Going NYP to have lunch with bro because I got too much time in between!!!

要忽略任何标点符号,您可以使用.replace()等函数来通过:

str = "Going NYP to have lunch with bro because I got too much time in between!!!";
print str.replace("!", "");
>>> Going NYP to have lunch with bro because I got too much time in between

最后,我引用了你的帖子来计算字符串中的出现次数:

Efficiently calculate word frequency in a string

可行的解决方案是:

from collections import Counter

test = 'abc def abc def zzz zzz'
Counter(test.split()).most_common()
[('abc', 2), ('zzz', 2), ('def', 2)]