字符串中最常出现的3个单词(python)

时间:2015-04-09 13:09:04

标签: python

请不要导入计数器。我需要编写一个函数来取出字符串中最常出现的前3个单词,并按照最常出现的顺序将它们返回到最不常出现的顺序。

所以h("the the the the cat cat cat in in hat ")

>>> ["the", "cat", "in"]

如果字符串中的单词类型少于3种:

h("the the cat")
>>> ["the", "cat"]

2 个答案:

答案 0 :(得分:1)

频率哈希首先填充每个单词出现在给定字符串中的次数。然后根据频率哈希的计数确定前3个单词。

<强>代码

def h(string):
    return get_top_3(get_frequency_hash(string))

def get_frequency_hash(text):
    array = text.split(" ")
    frequency = {} 
    for word in array: 
        try: 
           frequency[word] += 1 
        except: 
           frequency[word]= 1
    return frequency

def get_top_3(frequency_hash):
    array_of_tuples = [(k,v) for k,v in frequency_hash.items()]
    sorted_array_of_tuples = sorted(array_of_tuples, key=lambda x: -x[1])
    return [k for k,v in sorted_array_of_tuples[0:3]]

示例

h("the the the the cat cat cat in in hat")
# ['the', 'cat', 'in']

答案 1 :(得分:0)

如果我们无法导入itertools.counter,那么让我们构建它。它只有4行代码。

代码

import operator

def counter(l):
    result = {}
    for word in l:
        result.setdefault(word, 0)
        result[word] += 1

    return result

def h(s):
    scores = counter(s.split())
    scores = sorted(scores.items(), key=operator.itemgetter(1))
    scores = reversed(scores)
    scores = list(x[0] for x in scores)
    return scores[0:3]

print h("the the the the cat cat cat in in hat ")

输出

['the', 'cat', 'in']