Python:构建嵌套字典并迭代 - 完成此任务的最佳方法是什么?

时间:2013-03-21 11:35:29

标签: python csv dictionary loops

我正在使用csv文件来提取数据并将它们放在字典中以供分析。 CSV文件位于目录中,每个目录中都有多个csv文件:

Example:
Dir: X has several csv files, file name can be X-ax-somefile.csv, X-bx-somefile2.csv
csv files have the header: level,user-id

我构建了一个字典来存储数据并进行一些计算。最后,我将得到以下数据结构:

{'de': {'en': {'level1': 0, 'level2': 0, 'level3': 10}}, 'en': {'si': {'level2': 1, 'level3': 5, 'level5': 5, 'levelN': 5}, 'en': {'level1': 0}, 'ta': {'level1': 1, 'level2': 1, 'level3': 1, 'level4': 5}}}

我编写了以下代码来迭代这个数据结构但是,这是迭代和进一步讨论这个问题的最好方法我已经展示了如何构建数据结构,这是构建数据结构的最佳方法。

这是我的代码:

for lang1, lang2_dict in template_count.iteritems():
    if type(lang2_dict):
        for lang2, values in lang2_dict.iteritems():
            print lang2, values

这就是我构建词典的方式:

def templateUserCountStats(template_file, csv_file):
    template_count_dict = dict()
    for lang in getLanguageCodes(csv_file):
        template_count_dict[lang] = dict()
        lang_dir = os.path.join(template_file, lang)
        try:
            for filename in os.listdir(lang_dir):
                path = os.path.join(lang_dir, filename)
                if re.search(r'-.+-template-users-data.csv$',filename):
                    lang2 = filename.split("-")[1]
                    with open(path, 'rb') as template_user_data_file:
                        try:
                            reader = csv.reader(template_user_data_file)
                            reader.next()
                            template_count_dict[lang][lang2] = dict()
                            template_count_dict[lang][lang2]['level1'] = 0
                            template_count_dict[lang][lang2]['level2'] = 0
                            template_count_dict[lang][lang2]['level3'] = 0
                            template_count_dict[lang][lang2]['level4'] = 0
                            template_count_dict[lang][lang2]['level5'] = 0
                            template_count_dict[lang][lang2]['levelN'] = 0
                            print filename
                            for row in reader:
                                if int(row[0]) == 1:
                                    template_count_dict[lang][lang2]['level1'] = template_count_dict[lang][lang2]['level1'] + 1
                                if int(row[0]) == 2:
                                    template_count_dict[lang][lang2]['level2'] = template_count_dict[lang][lang2]['level2'] + 1
                                if int(row[0]) == 3:
                                    template_count_dict[lang][lang2]['level3'] = template_count_dict[lang][lang2]['level3'] + 1
                                if int(row[0]) == 4:
                                    template_count_dict[lang][lang2]['level4'] = template_count_dict[lang][lang2]['level4'] + 1
                                if int(row[0]) == 5:
                                    template_count_dict[lang][lang2]['level5'] = template_count_dict[lang][lang2]['level5'] + 1
                                if row[0] == 'N':
                                    template_count_dict[lang][lang2]['levelN'] = template_count_dict[lang][lang2]['levelN'] + 1
                        except csv.ERROR as e:
                                logging.error(e)
                else:
                    continue
        except Exception, e:
            logging.exception(e)
    return template_count_dict

除了我开放的建议,尊重建立这个数据结构更加pythonic的方式,如果你需要如何这里是一个样本:

level,user-id
1,25
1,74
1,105
3,708
3,530
3,2568
3,2730
3,2730
2,376
2,371
2,2317
2,2095
N,560
N,110
N,119
N,1059
N,1625

0 个答案:

没有答案