解析嵌套的JSON并将其写入CSV

时间:2013-12-06 12:55:33

标签: python json csv

我正在努力解决这个问题。我有一个JSON文件,并且需要将它发送到CSV,如果结构是扁平的,没有深层嵌套项,那就很好。

但在这种情况下,嵌套的RACES让我感到困惑。

我如何以这样的格式获取数据:

VENUE, COUNTRY, ITW, RACES__NO, RACES__TIME

对象中的每个对象和每个种族?

 
{
    "1": {
        "VENUE": "JOEBURG",
        "COUNTRY": "HAE",
        "ITW": "XAD",
        "RACES": {
            "1": {
                "NO": 1,
                "TIME": "12:35"
            },
            "2": {
                "NO": 2,
                "TIME": "13:10"
            },
            "3": {
                "NO": 3,
                "TIME": "13:40"
            },
            "4": {
                "NO": 4,
                "TIME": "14:10"
            },
            "5": {
                "NO": 5,
                "TIME": "14:55"
            },
            "6": {
                "NO": 6,
                "TIME": "15:30"
            },
            "7": {
                "NO": 7,
                "TIME": "16:05"
            },
            "8": {
                "NO": 8,
                "TIME": "16:40"
            }
        }
    },
    "2": {
        "VENUE": "FOOBURG",
        "COUNTRY": "ABA",
        "ITW": "XAD",
        "RACES": {
            "1": {
                "NO": 1,
                "TIME": "12:35"
            },
            "2": {
                "NO": 2,
                "TIME": "13:10"
            },
            "3": {
                "NO": 3,
                "TIME": "13:40"
            },
            "4": {
                "NO": 4,
                "TIME": "14:10"
            },
            "5": {
                "NO": 5,
                "TIME": "14:55"
            },
            "6": {
                "NO": 6,
                "TIME": "15:30"
            },
            "7": {
                "NO": 7,
                "TIME": "16:05"
            },
            "8": {
                "NO": 8,
                "TIME": "16:40"
            }
        }
    }, ...
}

我想将此输出为CSV:

VENUE, COUNTRY, ITW, RACES__NO, RACES__TIME
JOEBERG, HAE, XAD, 1, 12:35
JOEBERG, HAE, XAD, 2, 13:10
JOEBERG, HAE, XAD, 3, 13:40
...
...
FOOBURG, ABA, XAD, 1, 12:35
FOOBURG, ABA, XAD, 2, 13:10

所以首先我得到正确的密钥:

self.keys = self.data.keys()
keys = ["DATA_KEY"]
for key in self.keys:
    if type(self.data[key]) == dict:
        for k in self.data[key].keys():
            if k not in keys:
                if type(self.data[key][k]) == unicode:
                    keys.append(k)
                elif type(self.data[key][k]) == dict:
                    self.subkey = k
                    for sk in self.data[key][k].values():
                        for subkey in sk.keys():
                            subkey = "%s__%s" % (self.subkey, subkey)
                            if subkey not in keys:
                                keys.append(subkey)

然后添加数据:

但是怎么样?

对于熟练的forloopers来说,这应该是一个有趣的。 ;-)

1 个答案:

答案 0 :(得分:3)

我只为第一个对象收集密钥,然后假设格式的其余部分是一致的。

以下代码还将嵌套对象限制为 one ;你没有具体说明当有多个时应该发生什么。有两个或多个相同长度的嵌套结构可以工作(你可以将它们拉在一起),但是如果你有不同长度的结构,你需要明确选择如何处理它们; zip用空列填充,或写出这些条目的产品(A x B行,每次找到B条目时重复A的信息)。

import csv
from operator import itemgetter


with open(outputfile, 'wb') as outf:
    writer = None  # will be set to a csv.DictWriter later

    for key, item in sorted(data.items(), key=itemgetter(0)):
        row = {}
        nested_name, nested_items = '', {}
        for k, v in item.items():
            if not isinstance(v, dict):
                row[k] = v
            else:
                assert not nested_items, 'Only one nested structure is supported'
                nested_name, nested_items = k, v

        if writer is None:
            # build fields for each first key of each nested item first
            fields = sorted(row)

            # sorted keys of first item in key sorted order
            nested_keys = sorted(sorted(nested_items.items(), key=itemgetter(0))[0][1])
            fields.extend('__'.join((nested_name, k)) for k in nested_keys)

            writer = csv.DictWriter(outf, fields)
            writer.writeheader()

        for nkey, nitem in sorted(nested_items.items(), key=itemgetter(0)):
            row.update(('__'.join((nested_name, k)), v) for k, v in nitem.items())
            writer.writerow(row)

对于您的样本输入,这会产生:

COUNTRY,ITW,VENUE,RACES__NO,RACES__TIME
HAE,XAD,JOEBURG,1,12:35
HAE,XAD,JOEBURG,2,13:10
HAE,XAD,JOEBURG,3,13:40
HAE,XAD,JOEBURG,4,14:10
HAE,XAD,JOEBURG,5,14:55
HAE,XAD,JOEBURG,6,15:30
HAE,XAD,JOEBURG,7,16:05
HAE,XAD,JOEBURG,8,16:40
ABA,XAD,FOOBURG,1,12:35
ABA,XAD,FOOBURG,2,13:10
ABA,XAD,FOOBURG,3,13:40
ABA,XAD,FOOBURG,4,14:10
ABA,XAD,FOOBURG,5,14:55
ABA,XAD,FOOBURG,6,15:30
ABA,XAD,FOOBURG,7,16:05
ABA,XAD,FOOBURG,8,16:40