Question

我有一个标题为A B C D的表格.D下的数值由A，B，C下的数据编制索引。

我还有一个由A和B列中包含的值索引的对象列表，即（A，B）。

对于我想写入文件的每个对象，表中所有条目都具有与我的对象相同的A，B索引。

这就是我正在做的事情：

prescriptions = {}

#Open ABCD table and create a dictionary mapping A,B,C to D
with open(table_file) as table:
    reader = csv.reader(table, delimiter = '\t')
    for row in reader:
        code = (row[0], row[1], row[2])
        prescriptions[code]=row[3]

for x in objects:
    x_code = (x.A, x.B)

    for p in prescriptions:
        #check to see if A,B indices on x match those of the table entry
        if p[0:2] == x_code:
            row = prescriptions[p]
            line = ",".join(p) +"," + row +"\n"
            output.write(line)

这很有效。我得到了我想要的确切输出;但是，当表格和列表变大时，需要花费大量时间。

我想修改我的迭代器（当我找到匹配项时删除一个p），but I know not to do that。

我有什么办法可以加快速度吗？

Answer 1

我猜prescription是字典？

为什么不使用字典prescription2将A，B作为键，将C列表作为值？它将为您免除迭代所有词典的麻烦。

prescriptions = {}
prescriptions2 = {}

#Open ABCD table and create a dictionary mapping A,B,C to D
with open(table_file) as table:
    reader = csv.reader(table, delimiter = '\t')
    for row in reader:
        code = (row[0], row[1], row[2])
        prescriptions[code]=row[3]
        key = (row[0],row[1])
        if not key in prescription2:
            prescription2[key] = []
        value = (row[2],row[3])
        prescription2[key].append(value)

for x in objects:
    x_code = (x.A, x.B)
    if x_code in prescription2:
        for item in prescription2[x_code]:
            line = ",".join(x_code+item)+"\n"
            output.write(line)

两组数据的匹配指标

1 个答案: