从python字典中删除项目

时间:2018-11-19 23:08:24

标签: python

我正在尝试在Python中使用垃圾邮件分类应用程序,但出现以下错误。不过我不明白,因为我正在使用.keys方法从字典中删除项目,所以这应该不是问题吗? 我尝试删除字典功能中的所有功能栏以尝试查找原因,但是我似乎无法将其包裹住

Python代码

    import os
    import numpy as np
    from collections import Counter
    from sklearn.naive_bayes import MultinomialNB
    from sklearn.svm import LinearSVC
    from sklearn.metrics import confusion_matrix

    def make_Dictionary(train_dir):
        emails = [os.path.join(train_dir,f) for f in os.listdir(train_dir)]    
        all_words = []       
        for mail in emails:    
            with open(mail) as m:
                for i,line in enumerate(m):
                    if i == 2:
                        words = line.split()
                        all_words += words

        dictionary = Counter(all_words)

        list_to_remove = dictionary.keys()
        for item in list_to_remove:
            if item.isalpha() == False: 
                del dictionary[item]
            elif len(item) == 1:
                del dictionary[item]
        dictionary = dictionary.most_common(3000)
        return dictionary

    def extract_features(mail_dir): 
        files = [os.path.join(mail_dir,fi) for fi in os.listdir(mail_dir)]
        features_matrix = np.zeros((len(files),3000))
        docID = 0;
        for fil in files:
          with open(fil) as fi:
            for i,line in enumerate(fi):
              if i == 2:
                words = line.split()
                for word in words:
                  wordID = 0
                  for i,d in enumerate(dictionary):
                    if d[0] == word:
                      wordID = i
                      features_matrix[docID,wordID] = words.count(word)
            docID = docID + 1     
        return features_matrix

    # Create a dictionary of words with its frequency

    train_dir = r'.\train-mails'
    dictionary = make_Dictionary(train_dir)

    # Prepare feature vectors per training mail and its labels

    train_labels = np.zeros(702)
    train_labels[351:701] = 1
    train_matrix = extract_features(train_dir)

    # Training SVM and Naive bayes classifier and its variants

    model1 = LinearSVC()


    model1.fit(train_matrix,train_labels)


    # Test the unseen mails for Spam

    test_dir = r'.\test-mails'
    test_matrix = extract_features(test_dir)
    test_labels = np.zeros(260)
    test_labels[130:260] = 1

    result1 = model1.predict(test_matrix)


    print (confusion_matrix(test_labels,result1))
    print (confusion_matrix(test_labels,result2))

错误

RuntimeError: dictionary changed size during iteration

2 个答案:

答案 0 :(得分:0)

这在Python 3.x中不起作用,因为keys返回一个迭代器而不是列表。

另一种方法是使用列表来强制复制密钥。这个也可以在Python 3.x中使用:

for i in list(list_to_remove):

答案 1 :(得分:0)

dictionary.keys()实际上是返回对原始字典键的引用。

您可以通过以下操作进行检查:

 a_dict = {'a': 1}
 keys = a_dict.keys() # keys is dict_keys(['a'])
 a_dict['b'] = 2 # keys is dict_keys(['a', 'b'])

这就是为什么出现错误的原因:del dictionary[item]实际上影响了list_to_remove,在循环中是禁止的。

您可以通过在循环遍历原始密钥之前创建副本来避免这种情况。实现此目的的最简单方法是使用list构造函数。所以改变你的线

list_to_remove = dictionary.keys()

具有:

list_to_remove = list(dictionary.keys())

解决了问题。

评论后的版本

请注意,此行为仅在python 3中发生,在python 2中,.keys()方法返回一个普通列表,而没有引用字典:

a_dict = {'a': 1}
keys = a_dict.keys() # keys is ['a']
a_dict['b'] = 2 # keys is still ['a']

关于Python 3.0 changelog

  

一些著名的API不再返回列表:

     
      
  • dict方法dict.keys(),dict.items()和dict.values()返回“视图”而不是列表。
  •