使用python过滤行的更好方法

时间:2011-10-10 20:06:32

标签: python

import pprint

full_key_list = set(["F1", "F2", "F3", "F4", "F5"]) # all expected field
filt_key_list = set(["F2", "F5"])                   # fields should not be included

cont_list = []                                 # stores all filtered documents

read_in_cont1 = { "F1" : 1, "F2" : True,  "F3" : 'abc', "F4" : 130, "F5" : 'X1Z'} # document1
read_in_cont2 = { "F1" : 2, "F2" : False, "F3" : 'efg', "F4" : 100, "F5" : 'X4Z'} # document1
read_in_cont3 = { "F1" : 3, "F2" : True,  "F3" : 'acd', "F4" : 400, "F5" : 'X2Z'} # document1

# assume that read_in_conts contains list of documents
read_in_conts = [read_in_cont1, read_in_cont2, read_in_cont3]

for one_item in read_in_conts: # for each document in the list
    cont_dict = {}
    for key, value in one_item.iteritems():
        if key not in filt_key_list: # if the field should be included
            cont_dict[key] = value   # add this field to the temporary document
    cont_list.append(cont_dict)

pprint.pprint(cont_list)

输出:

[{'F1': 1, 'F3': 'abc', 'F4': 130},
 {'F1': 2, 'F3': 'efg', 'F4': 100},
 {'F1': 3, 'F3': 'acd', 'F4': 400}]

这是我想要实现的目标:

给定一个原始的原始文档集合(即用于模拟的read_in_conts), 我需要过滤字段,以便它们不包含在进一步的过程中。以上 是我在Python中的实现。但是,我认为这太重了,期望看到 这项任务的清洁解决方案。

谢谢

3 个答案:

答案 0 :(得分:4)

cont_list = [dict((k,v) for k,v in d.iteritems() if k not in filt_key_list)
             for d in read_in_conts]

或者如果你想要一个稍微更具因素的版本:

filter_out_keys = lambda d, x: dict((k,v) for k,v in d.iteritems() if k not in x)
cont_list = [filter_out_keys(d, filt_key_list) for d in read_in_conts]

P.S。我建议改为filt_key_list set() - 它会使in检查更快。

答案 1 :(得分:1)

def filter_dict(d, keys):
    return dict((key, value) for key, value in d.iteritems() if key not in filt_key_list))

cont_list = [filter_dict(d, filt_key_list) for d in read_in_conts]

答案 2 :(得分:1)

你的代码很好。你可以稍微缩短一下:

# sets can be faster if `ignored_keys` is actually much longer
ignored_keys = set(["F2", "F5"]) 

# the inline version of your loop
# a dict comprehension inside a list comprehension 
filtered = [{k : v for k,v in row.iteritems() if k not in ignored_keys}
            for row in read_in_conts]