计算字符串中的字母频率(Python)

时间:2016-12-05 23:23:44

标签: python frequency-analysis

我正在尝试计算一个单词的每个字母的出现次数

word = input("Enter a word")

Alphabet=['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']

for i in range(0,26): 
    print(word.count(Alphabet[i]))

这目前输出每个字母出现的次数,包括那些没有出现的字母。

如何垂直列出字母及其旁边的频率,例如:

字="你好"

H 1

E 1

L 2

O 1

13 个答案:

答案 0 :(得分:13)

from collections import Counter
counts=Counter(word) # Counter({'l': 2, 'H': 1, 'e': 1, 'o': 1})
for i in word:
    print(i,counts[i])

尝试使用Counter,它将创建一个包含集合中所有项目频率的字典。

否则,只有当print大于0时,您才可以将当前代码的条件设置为word.count(Alphabet[i]),但这会慢一些。

答案 1 :(得分:3)

def char_frequency(str1):
    dict = {}
    for n in str1:
        keys = dict.keys()
        if n in keys:
            dict[n] += 1
        else:
            dict[n] = 1
    return dict
print(char_frequency('google.com'))

答案 2 :(得分:1)

正如@Pythonista所说,这是collections.Counter的工作:

from collections import Counter
print(Counter('cats on wheels'))

此打印:

{'s': 2, ' ': 2, 'e': 2, 't': 1, 'n': 1, 'l': 1, 'a': 1, 'c': 1, 'w': 1, 'h': 1, 'o': 1}

答案 3 :(得分:1)

没有lib的简单解决方案。

string=input()
f={}
for i in string:
  f[i]=f.get(i,0)+1
print(f)

这是 get() https://docs.quantifiedcode.com/python-anti-patterns/correctness/not_using_get_to_return_a_default_value_from_a_dictionary.html

的链接

答案 4 :(得分:0)

跟进LMc所说的,你的代码已经非常接近功能了,你只需要对结果集进行后期处理就可以删除“不感兴趣”的输出。这是使代码工作的一种方法:

#!/usr/bin/env python
word = raw_input("Enter a word: ")

Alphabet = [
    'a','b','c','d','e','f','g','h','i','j','k','l','m',
    'n','o','p','q','r','s','t','u','v','w','x','y','z'
]

hits = [
    (Alphabet[i], word.count(Alphabet[i]))
    for i in range(len(Alphabet))
    if word.count(Alphabet[i])
]

for letter, frequency in hits:
    print letter.upper(), frequency

但使用collections.Counter的解决方案更优雅/ Pythonic。

答案 5 :(得分:0)

  

供以后参考:当您有一个包含所有所需单词的列表时,请说wordlist很简单

for numbers in range(len(wordlist)):
    if wordlist[numbers][0] == 'a':
        print(wordlist[numbers])

答案 6 :(得分:0)

s=input()
t=s.lower()

for i in range(len(s)):
    b=t.count(t[i])
    print("{} -- {}".format(s[i],b))

答案 7 :(得分:0)

如果要避免使用库或内置函数,那么以下代码可能会有所帮助:

s = "aaabbc"  # sample string
dict_counter = {}  # empty dict for holding characters as keys and count as values
for char in s:  # traversing the whole string character by character
    if not dict_counter or char not in dict_counter.keys():  # Checking whether the dict is
        # empty or contains the character
        dict_counter.update({char: 1})  # if not then adding the character to dict with count = 1
    elif char in dict_counter.keys():  # if the char is already in the dict then update count
        dict_counter[char] += 1
for key, val in dict_counter.items(): # Looping over each key and value pair for printing
    print(key, val)

输出:
3
b 2
c 1

答案 8 :(得分:0)

另一种方法是删除重复的字符并仅对唯一字符进行迭代(通过使用set()),然后计算每个唯一字符的出现(通过使用str.count()

def char_count(string):
    freq = {}
    for char in set(string):
        freq[char] = string.count(char)
    return freq


if __name__ == "__main__":
    s = "HelloWorldHello"
    print(char_count(s))
    # Output: {'e': 2, 'o': 3, 'W': 1, 'r': 1, 'd': 1, 'l': 5, 'H': 2}

答案 9 :(得分:0)

包括字母表中的所有字母可能很有意义。例如,如果您想计算单词分布之间的余弦差,则通常需要所有字母。

您可以使用此方法:

from collections import Counter 

def character_distribution_of_string(pass_string):
  letters = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"]
  chars_in_string = Counter(pass_string)
  res = {}
  for letter in letters:
    if(letter in chars_in_string):
      res[letter] = chars_in_string[letter]
    else: 
      res[letter] = 0 
  return(res)

用法:

character_distribution_of_string("This is a string that I want to know about")

完整字符分布

{'a': 4,
 'b': 1,
 'c': 0,
 'd': 0,
 'e': 0,
 'f': 0,
 'g': 1,
 'h': 2,
 'i': 3,
 'j': 0,
 'k': 1,
 'l': 0,
 'm': 0,
 'n': 3,
 'o': 3,
 'p': 0,
 'q': 0,
 'r': 1,
 's': 3,
 't': 6,
 'u': 1,
 'v': 0,
 'w': 2,
 'x': 0,
 'y': 0,
 'z': 0}

您可以轻松提取字符向量:

list(character_distribution_of_string("This is a string that I want to know about").values())

给予...

[4, 1, 0, 0, 0, 0, 1, 2, 3, 0, 1, 0, 0, 3, 3, 0, 0, 1, 3, 6, 1, 0, 2, 0, 0, 0]

答案 10 :(得分:0)

初始化一个空字典并遍历单词的每个字符。如果词典中存在当前字符,则将其值增加1,否则将其值设置为1。

word="Hello"
characters={}
for character in word:
    if character in characters:
        characters[character] += 1
    else:
        characters[character] =  1
print(characters)

答案 11 :(得分:0)

def字符串(n):

a=list()
n=n.replace(" ","")
for i in  (n):
    c=n.count(i)
    a.append(i)
    a.append(c)
    y=dict(zip(*[iter(a)]*2))
print(y)

string(“让我们希望生活更美好”)
#Output:{'L':1,'e':5,'t':3,'s':1,'h':1,'o':2,'p':1,'f': 2,'r':2,'b':1,'l':1,'i':1}
(如果您在输出2 L字母中注意到一个大写字母和其他小写字母。.如果您希望它们一起查找下面的代码)

在输出中,它删除重复的字符,删除空白并仅对唯一字符进行迭代。 如果您想同时计算大写和小写,则:

def字符串(n):

n=n.lower() #either use (n.uperr()) 
a=list()
n=n.replace(" ","")
for i in  (n):
    c=n.count(i)
    a.append(i)
    a.append(c)
    y=dict(zip(*[iter(a)]*2))
print(y)

string(“让我们希望生活更美好”)
#output:{'l':2,'e':5,'t':3,'s':1,'h':1,'o':2,2,'p':1,'f': 2,'r':2,'b':1,'i':1}

答案 12 :(得分:0)

word = input("Enter a word:  ")
word = word.lower()

Alphabet=['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
res = []

for i in range(0,26): 
    res.append(word.count(Alphabet[i]))

for i in range (0,26):
    if str(i) != 0:
        print(str(Alphabet[i].upper()) + " " + str(res[i]))