Question

我有2个名单：

1. ['a', 'b', 'c']
2. ['a', 'd', 'a', 'b']

我想要这样的字典输出：

{'a': 2, 'b': 1, 'c': 0}

我已经做到了：

#b = list #1
#words = list #2

c = {}
for i in b:
    c.update({i:words.count(i)})

但它很慢，我需要处理像10MB的txt文件。

编辑：整个代码，目前正在测试未使用的导入..

import string
import os
import operator
import time
from collections import Counter
def getbookwords():

    a = open("wu.txt", encoding="utf-8")

    b = a.read().replace("\n", "").lower()
    a.close()

    b.translate(string.punctuation)

    b = b.split(" ")
    return b

def wordlist(words):

    a = open("wordlist.txt")
    b = a.read().lower()
    b = b.split("\n")

    a.close()

    t = time.time()
    #c = dict((i, words.count(i)) for i in b )

    c  = Counter(words)
    result = {k: v for k, v in c.items() if k in set(b)}
    print(time.time() - t)

    sorted_d = sorted(c.items(), key=operator.itemgetter(1))    
    return(sorted_d)

print(wordlist(getbookwords()))

Answer 1

由于速度目前是一个问题，因此可能值得考虑不要通过列表来计算您想要计算的每件事。 set()功能允许您仅使用列表words中的唯一键。

在所有情况下，要记住速度的重要一点是 unique_words = set(b) 。如果没有这个，在您使用的任何类型的数据结构中，都会通过列表整个传递来创建一个来自b 的集合。

c = {k:0 for k in set(words)}
for w in words:
    c[w] += 1
unique_words = set(b)
c = {k:counts[k] for k in c if k in unique_words}

或者，可以使用defaultdicts来消除一些初始化。

from collections import defaultdict

c = defaultdict(int)
for w in words:
    c[w] += 1
unique_words = set(b)
c = {k:counts[k] for k in c if k in unique_words}

为了完整起见，我确实喜欢其他答案中基于Counter的解决方案（例如来自Reut Sharabani）。代码更清晰，虽然我还没有对它进行基准测试，但如果内置计数类比带有字典的家庭解决方案更快，我不会感到惊讶。

from collections import Counter

c = Counter(words)
unique_words = set(b)
c = {k:v for k, v in c.items() if k in unique_words}

Answer 2

尝试使用collections.Counter并将b移至set，而不是list：

from collections import Counter

c = Counter(words)
b = set(b)
result = {k: v for k, v in c.items() if k in b}

另外，如果您可以懒惰地阅读单词而不是创建一个应该更快的中间列表。

Counter提供您想要的功能（计算项目），并根据set过滤结果，使用散列应该快得多。

Answer 3

您可以在使用inventory查找跳过忽略的键的生成器上使用public static List<String> getRandomList(List<String> list, int n) { List<String> copy = new LinkedList<String>(list); Collections.shuffle(copy); return copy.subList(0, n); }。

collection.Counter

请注意，set未打印，但from collections import Counter keys = ['a', 'b', 'c'] lst = ['a', 'd', 'a', 'b'] unique_keys = set(keys) count = Counter(x for x in lst if x in unique_keys) print(count) # Counter({'a': 2, 'b': 1}) # count['c'] == 0默认仍为count['c']。

Answer 4

这是一个我刚刚在repl中咳嗽的例子。假设您没有计算清单二中的重复项。我们使用字典创建哈希表。对于列表中的每个项目匹配两个，我们创建一个键值对，项目是键，我们将值设置为0.

接下来，我们遍历第二个列表，对于每个值，我们检查是否已经定义了值（如果已经定义），而不是使用键增加值。否则，我们会忽略。

可能的迭代次数最少。您只打了一次每个列表中的每个项目。

x = [1, 2, 3, 4, 5];
z = [1, 2, 2, 2, 1];
y = {};

for n in x:
  y[n] = 0; //Set the value to zero for each item in the list

for n in z:
  if(n in y): //If we defined the value in the hash already, increment by one 
    y[n] += 1;

print(y)

Answer 5

@Makalone ，上面的答案很明显。您还可以尝试使用来自Counter()模块的Python collections的以下代码示例。

您可以在http://rextester.com/OTYG56015处尝试。

Python代码»

from collections import Counter

list1 = ['a', 'b', 'c']
list2 = ['a', 'd', 'a', 'b']
counter = Counter(list2)

d = {key: counter[key] for key in set(list1)}
print(d)

输出»

{'a': 2, 'c': 0, 'b': 1}

计算列表2中列表1中项目的次数

5 个答案:

Python代码»

输出»