使用一个键合并字典来扫描另一个的值

时间:2012-12-02 02:29:51

标签: python dictionary

需要帮助合并两个词典,使用一个键来查看另一个词中的值。如果返回true,它会将自己的值附加到另一个字典中(更新它但不覆盖已存在的值)

代码(对不起第一个自定义脚本):

otuid2clusteridlist = dict()
finallist = otuid2clusteridlist
clusterid2denoiseidlist = dict()

#first block, also = finallist we append all other blocks into.
for line in open('cluster_97.ucm', 'r'):
    lineArray = re.split('\s+',line)
    otuid = lineArray[0]
    clusterid = lineArray[3]
    if otuid in otuid2clusteridlist:
        otuid2clusteridlist[otuid].append(clusterid)
    else:
        otuid2clusteridlist[otuid] = list()
        otuid2clusteridlist[otuid].append(clusterid)

#second block, higher tier needs to expand previous blocks hash
for line in open('denoise.ucm_test', 'r'):
    lineArray = re.split('\s+', line)
    clusterid = lineArray[4]
    denoiseid = lineArray[3]
    if clusterid in clusterid2denoiseidlist:
        clusterid2denoiseidlist[clusterid].append(denoiseid)
    else:
        clusterid2denoiseidlist[clusterid] = list()
        clusterid2denoiseidlist[clusterid].append(denoiseid)  

#print/return function for testing (will convert to write out later)
for key in finallist:
    print "OTU:", key, "has", len(finallist[key]), "sequence(s) which", "=", finallist[key]

第一个块返回

OTU: 3 has 3 sequence(s) which = ['5PLAS.R2.h_35336', 'GG13_52054', 'GG13_798']
OTU: 5 has 1 sequence(s) which = ['DEX1.h_14175']
OTU: 4 has 1 sequence(s) which = ['PLAS.h_34150']
OTU: 7 has 1 sequence(s) which = ['DEX12.13.h_545']
OTU: 6 has 1 sequence(s) which = ['GG13_45705']

阻止两次返回

OTU: GG13_45705 has 4 sequence(s) which = ['GG13_45705', 'GG13_6312', 'GG13_32148', 'GG13_35246']

所以我们的目标是将第二块输出加入第一块。我希望它像这样添加

...
 OTU: 6 has 4 sequence(s) which = ['GG13_45705', 'GG13_6312', 'GG13_32148', 'GG13_35246']

我尝试了dic.update,但它只是将第二个块内容添加到第一个块中,因为第一个块中没有该键。

我认为我的问题更复杂,我需要第二个块来查看第一个块的值,并将值附加到该列表中。

我一直在尝试循环和.append(类似于已编写的代码),但我缺乏python的整体知识来解决这个问题。

想法?

添加,

数据的一些子集:

cluster_97.ucm(阻止一个人的文件):

5 376 * DEX1.h_14175 DEX1.h_14175
6 294 * GG13_45705 GG13_45705
0 447 98.7 DEX22.h_37221 DEX29.h_4583
1 367 98.9 DEX14.15.h_35477 DEX27.h_779
1 443 98.4 DEX27.h_3794 DEX27.h_779
0 478 97.9 DEX22.h_7519 DEX29.h_4583

denoise.ucm_test(第二块文件):

11 294 * GG13_45705 GG13_45705
11 278 99.6 GG13_6312 GG13_45705
11 285 99.6 GG13_32148 GG13_45705
11 275 99.6 GG13_35246 GG13_45705

我选择了这些子集,因为文件一中的第二行是两个将要更新的文件。

如果有人想试一试。

1 个答案:

答案 0 :(得分:0)

更新以反映值的匹配...

我认为你的问题的解决方案可以在以下事实中找到:在Python中列出一个mutable,而在可变值中列出的变量只是引用。所以我们可以使用第二个字典将值映射到列表。

import re

otuid2clusteridlist = dict()
finallist = otuid2clusteridlist
clusterid2denoiseidlist = dict()
known_clusters = dict()

#first block, also = finallist we append all other blocks into.
for line in open('cluster_97.ucm', 'r'):
    lineArray = re.split('\s+',line)
    otuid = lineArray[0]
    clusterid = lineArray[3]
    if otuid in otuid2clusteridlist:
        otuid2clusteridlist[otuid].append(clusterid)
    else:
        otuid2clusteridlist[otuid] = list()
        otuid2clusteridlist[otuid].append(clusterid)

    # remeber the clusters
    known_clusters[clusterid] = otuid2clusteridlist[otuid]

#second block, higher tier needs to expand previous blocks hash
for line in open('denoise.ucm_test', 'r'):
    lineArray = re.split('\s+', line)
    clusterid = lineArray[4]
    denoiseid = lineArray[3]
    if clusterid in clusterid2denoiseidlist:
        clusterid2denoiseidlist[clusterid].append(denoiseid)
    else:
        clusterid2denoiseidlist[clusterid] = list()
        clusterid2denoiseidlist[clusterid].append(denoiseid)

    # match the cluster and update as needed
    matched_cluster = known_clusters.setdefault(clusterid, [])
    if denoiseid not in matched_cluster:
        matched_cluster.append(denoiseid)



#print/return function for testing (will convert to write out later)
for key in finallist:
    print "OTU:", key, "has", len(finallist[key]), "sequence(s) which", "=", finallist[key]

我不确定您是否需要clusterid2denoiseidlist,因此我添加了一个新的known_clusters来保存从值到列表的映射。

我不确定我是否覆盖了实际问题中的所有边缘情况,但是根据提供的测试输入,这会生成所需的输出。