Question

我是一名本科生，刚来这里并且喜欢编程。我在实践中遇到问题，想在这里寻求帮助。

给一个字符串一个整数n，返回第n个最常见的单词及其计数，忽略大小写。

对于单词，返回时请确保所有字母均为小写！

提示：split（）函数和字典可能会有用。

示例：

输入：“ apple apple apple blue BlUe call”，2

输出：列表[“ blue”，2]

我的代码如下：

from collections import Counter
def nth_most(str_in, n):
    split_it = str_in.split(" ")
    array = []
    for word, count in Counter(split_it).most_common(n):
        list = [word, count]
        array.append(count)
        array.sort()
        if len(array) - n <= len(array) - 1:
            c = array[len(array) - n]
            return [word, c]

测试结果如下：

Traceback (most recent call last):
  File "/grade/run/test.py", line 10, in test_one
    self.assertEqual(nth_most('apple apple apple blue blue call', 3), ['call', 1])
  File "/grade/run/bin/nth_most.py", line 10, in nth_most
    c = array[len(array) - n]
IndexError: list index out of range

以及

Traceback (most recent call last):
  File "/grade/run/test.py", line 20, in test_negative
    self.assertEqual(nth_most('awe Awe AWE BLUE BLUE call', 1), ['awe', 3])
AssertionError: Lists differ: ['BLUE', 2] != ['awe', 3]

First differing element 0:
'BLUE'
'awe'

我不知道我的代码有什么问题。

非常感谢您的帮助！

Answer 1

计数器按顺序返回大多数公社元素，因此您可以这样做：

list(Counter(str_in.lower().split()).most_common(n)[-1]) # n is nth most common word

Answer 2

由于您使用的是Counter，因此请明智地使用它：

import collections

def nth_most(str_in, n):
    c = sorted(collections.Counter(w.lower() for w in str_in.split()).items(),key = lambda x:x[1])
    return(list(c[-n])) # convert to list as it seems to be the expected output

print(nth_most("apple apple apple blue BlUe call",2))

建立词频字典，根据值（元组的第二个元素）对项目进行排序，并选择第n个最后一个元素。

这将打印['blue', 2]。

如果在第一个或第二个位置有两个具有相同频率（并列）的单词怎么办？该解决方案不起作用。相反，对出现次数进行排序，提取出第n个最常见的出现，然后再次运行计数器dict以提取匹配项。

def nth_most(str_in, n):
    c = collections.Counter(w.lower() for w in str_in.split())
    nth_occs = sorted(c.values())[-n]
    return [[k,v] for k,v in c.items() if v==nth_occs]

print(nth_most("apple apple apple call blue BlUe call woot",2))

这次打印：

[['call', 2], ['blue', 2]]

Answer 3

def nth_common(lowered_words, check):
    m = []
    for i in lowered_words:
        m.append((i, lowered_words.count(i)))
    for i in set(m):
        # print(i)
        if i[1] == check: # check if the first index value (occurrance) of tuple == check
            print(i, "found")
    del m[:] # deleting list for using it again


words = ['apple', 'apple', 'apple', 'blue', 'BLue', 'call', 'cAlL']
lowered_words = [x.lower() for x in words]   # ignoring the uppercase
check = 2   # the check

nth_common(lowered_words, check)

输出：

('blue', 2) found
('call', 2) found

Answer 4

Traceback (most recent call last):
  File "/grade/run/test.py", line 10, in test_one
    self.assertEqual(nth_most('apple apple apple blue blue call', 3), ['call', 1])
  File "/grade/run/bin/nth_most.py", line 10, in nth_most
    c = array[len(array) - n]
IndexError: list index out of range

要解决此列表中的索引错误，只需输入

maxN = 1000 #change according to your max length
array = [ 0 for _ in range( maxN ) ]

Answer 5

即使没有收集模块，您也可以得到：段落=“诺里是天主教徒，因为她的母亲是天主教徒，诺里的母亲是天主教徒，因为她的父亲是天主教徒，而她父亲是天主教徒，因为他的母亲是天主教徒，或者曾经是天主教徒。”

def nth_common(n,p):
    words=re.split('\W+',p.lower())
    word_count={}
    counter=0
    for i in words:
        if i in word_count:
            word_count[i]+=1
        else:
            word_count[i]=1

    sorted_count = sorted(word_count.items(), key=lambda x: x[1],reverse=True)         

    return sorted_count[n-1]
nth_common(3,paragraph)

输出将为（'catholic'，6）

排序（基于计数）字数输出： [（'was'，6），（'a'，6），（'catholic'，6），（'because'，3），（'her'，3），（'mother'，3），（ 'nory'，2），（'and'，2），（'father'，2），（'s'，1），（'his'，1），（'or'，1），（'had '，1），（'been'，1）]

查找第n个最常见的单词并在python中计数

5 个答案: