查找字符串中最常见的字符

时间:2010-11-09 06:43:14

标签: python algorithm optimization time-complexity

我在查看SO上的职位发布时发现了这个编程问题。我认为它非常有趣,作为一名初学Python程序员,我试图解决它。但是我觉得我的解决方案非常......凌乱......任何人都可以提出任何建议来优化它或使其更清洁吗?我知道这很简单,但我写得很开心。注意:Python 2.6

问题:

为接受字符串的函数编写伪代码(或实际代码),并返回该字符串中出现次数最多的字母。

我的尝试:

import string

def find_max_letter_count(word):

    alphabet = string.ascii_lowercase
    dictionary = {}

    for letters in alphabet:
        dictionary[letters] = 0

    for letters in word:
        dictionary[letters] += 1

    dictionary = sorted(dictionary.items(), 
                        reverse=True, 
                        key=lambda x: x[1])

    for position in range(0, 26):
        print dictionary[position]
        if position != len(dictionary) - 1:
            if dictionary[position + 1][1] < dictionary[position][1]:
                break

find_max_letter_count("helloworld")

输出:

>>> 
('l', 3)

更新示例:

find_max_letter_count("balloon") 
>>>
('l', 2)
('o', 2)

12 个答案:

答案 0 :(得分:21)

有很多方法可以做到这一点。例如,您可以使用Counter类(在Python 2.7或更高版本中):

import collections
s = "helloworld"
print(collections.Counter(s).most_common(1)[0])

如果你没有,你可以手动进行计数(2.5或更高版本有defaultdict):

d = collections.defaultdict(int)
for c in s:
    d[c] += 1
print(sorted(d.items(), key=lambda x: x[1], reverse=True)[0])

话虽如此,你的实施并没有太严重的错误。

答案 1 :(得分:4)

如果您使用的是Python 2.7,则可以使用集合模块快速完成此操作。 集合是一种高性能数据结构模块。了解更多信息 http://docs.python.org/library/collections.html#counter-objects

>>> from collections import Counter
>>> x = Counter("balloon")
>>> x
Counter({'o': 2, 'a': 1, 'b': 1, 'l': 2, 'n': 1})
>>> x['o']
2

答案 2 :(得分:2)

以下是使用字典找到最常见字符的方法

message = "hello world"
d = {}
letters = set(message)
for l in letters:
    d[message.count(l)] = l

print d[d.keys()[-1]], d.keys()[-1]

答案 3 :(得分:1)

如果您希望所有具有最大计数数字的字符,那么您可以对目前提出的两个提议之一进行修改:

import heapq  # Helps finding the n largest counts
import collections

def find_max_counts(sequence):
    """
    Returns an iterator that produces the (element, count)s with the
    highest number of occurrences in the given sequence.

    In addition, the elements are sorted.
    """

    if len(sequence) == 0:
        raise StopIteration

    counter = collections.defaultdict(int)
    for elmt in sequence:
        counter[elmt] += 1

    counts_heap = [
        (-count, elmt)  # The largest elmt counts are the smallest elmts
        for (elmt, count) in counter.iteritems()]

    heapq.heapify(counts_heap)

    highest_count = counts_heap[0][0]

    while True:

        try:
            (opp_count, elmt) = heapq.heappop(counts_heap)
        except IndexError:
            raise StopIteration

        if opp_count != highest_count:
            raise StopIteration

        yield (elmt, -opp_count)

for (letter, count) in find_max_counts('balloon'):
    print (letter, count)

for (word, count) in find_max_counts(['he', 'lkj', 'he', 'll', 'll']):
    print (word, count)

这会产生,例如:

lebigot@weinberg /tmp % python count.py
('l', 2)
('o', 2)
('he', 2)
('ll', 2)

这适用于任何序列:单词,但也有['hello','hello','bonjour'],例如。

heapq结构非常有效地查找序列的最小元素而无需完全排序。另一方面,由于字母表中的字母数量不是很多,您可能还会查看已排序的计数列表,直到找不到最大计数为止,这样就不会造成任何严重的速度损失。

答案 4 :(得分:1)

问题: 字符串中最常见的字符 输入字符串中出现的最大字符

方法1:

a = "GiniGinaProtijayi"

d ={}
chh = ''
max = 0 
for ch in a : d[ch] = d.get(ch,0) +1 
for val in sorted(d.items(),reverse=True , key = lambda ch : ch[1]):
    chh = ch
    max  = d.get(ch)


print(chh)  
print(max)  

方法2:

a = "GiniGinaProtijayi"

max = 0 
chh = ''
count = [0] * 256 
for ch in a : count[ord(ch)] += 1
for ch in a :
    if(count[ord(ch)] > max):
        max = count[ord(ch)] 
        chh = ch

print(chh)        

方法3:

import collections

a = "GiniGinaProtijayi"

aa = collections.Counter(a).most_common(1)[0]
print(aa)

答案 5 :(得分:1)

这是使用FOR LOOP和COUNT()的一种方式

w = input()
r = 1
for i in w:
    p = w.count(i)
    if p > r:
        r = p
        s = i
print(s)

答案 6 :(得分:0)

以下是我要做的一些事情:

  • 使用collections.defaultdict代替手动初始化的dict
  • 使用内置排序和最新功能,例如max,而不是自己动手 - 这更容易。

这是我的最终结果:

from collections import defaultdict

def find_max_letter_count(word):
    matches = defaultdict(int)  # makes the default value 0

    for char in word:
        matches[char] += 1

    return max(matches.iteritems(), key=lambda x: x[1])

find_max_letter_count('helloworld') == ('l', 3)

答案 7 :(得分:0)

def most_frequent(text):
    frequencies = [(c, text.count(c)) for c in set(text)]
    return max(frequencies, key=lambda x: x[1])[0]

s = 'ABBCCCDDDD'
print(most_frequent(s))

frequencies是一个元组列表,将字符计为(character, count)。我们使用count将max应用于元组并返回该元组的character。如果出现平局,此解决方案将只选择一个。

答案 8 :(得分:0)

我注意到,即使最常用的字符数量相等,大多数答案也只返回一项。例如“ iii 444 yyy 999”。有相等数量的空格,即i,4,y和9。解决方案应该返回所有内容,而不仅仅是字母i:

sentence = "iii 444 yyy 999"

# Returns the first items value in the list of tuples (i.e) the largest number
# from Counter().most_common()
largest_count: int = Counter(sentence).most_common()[0][1]

# If the tuples value is equal to the largest value, append it to the list
most_common_list: list = [(x, y)
                         for x, y in Counter(sentence).items() if y == largest_count]

print(most_common_count)

# RETURNS
[('i', 3), (' ', 3), ('4', 3), ('y', 3), ('9', 3)]

答案 9 :(得分:0)

我的做法没有使用 Python 本身的内置函数,只使用 for 循环和 if 语句。

def most_common_letter():
    string = str(input())
    letters = set(string)
    if " " in letters:         # If you want to count spaces too, ignore this if-statement
        letters.remove(" ")
    max_count = 0
    freq_letter = []
    for letter in letters:
        count = 0
        for char in string:
            if char == letter:
                count += 1
        if count == max_count:
            max_count = count
            freq_letter.append(letter)
        if count > max_count:
            max_count = count
            freq_letter.clear()
            freq_letter.append(letter)
    return freq_letter, max_count

这可确保您获得最常使用的每个字母/字符,而不仅仅是一个。它还返回它发生的频率。希望这会有所帮助:)

答案 10 :(得分:0)

如果您因任何原因不能使用集合,我会建议以下实现:

s = input()
d = {}

# We iterate through a string and if we find the element, that
# is already in the dict, than we are just incrementing its counter.
for ch in s:
    if ch in d:
        d[ch] += 1
    else:
        d[ch] = 1

# If there is a case, that we are given empty string, then we just
# print a message, which says about it.
print(max(d, key=d.get, default='Empty string was given.'))

答案 11 :(得分:-1)

#file:filename
#quant:no of frequent words you want

def frequent_letters(file,quant):
    file = open(file)
    file = file.read()
    cnt = Counter
    op = cnt(file).most_common(quant)
    return op