将嵌套字典展平为键:值对列表

时间:2019-04-03 13:52:35

标签: python json api jq

我正在查询公司内部的API以进行反腐败调查,并且我得到了嵌套JSON的结果,可以在here中看到。我想将此字典转换为简单的{key:value, key:value}格式,如果我有嵌套的对象或列表,则其中的键合并在扁平化的键字符串中。

问题还在于,API返回的某些项目可能不一定具有全部的key:value对,因为其中一些是可选的。 如果没有key:value对,那么我想插入一个NA

这是最完整的JSON-一些查询结果可能没有所有这些条目。

{
   "items" : [
      {
         "address" : {
            "address_line_1" : "string",
            "address_line_2" : "string",
            "care_of" : "string",
            "country" : "string",
            "locality" : "string",
            "po_box" : "string",
            "postal_code" : "string",
            "premises" : "string",
            "region" : "string"
         },
         "address_snippet" : "string",
         "appointment_count" : "integer",
         "date_of_birth" : {
            "month" : "integer",
            "year" : "integer"
         },
         "description" : "string",
         "description_identifiers" : [
            "integer"
         ],
         "kind" : "string",
         "links" : {
            "self" : "string"
         },
         "matches" : [
            {
               "address_snippet" : [
                  "integer"
               ],
               "snippet" : [
                  "integer"
               ],
               "title" : [
                  "integer"
               ]
            }
         ],
         "snippet" : "string",
         "title" : "string"
      }
   ],
   "items_per_page" : "integer",
   "kind" : "string",
   "start_index" : "integer",
   "total_results" : "integer"
}

重用一些旧的JQ代码,我设法创建了两个列表,一个包含所有键,一个包含所有值(请参阅jqplay here)。

这里是仅一小部分字典的示例,以使您了解:

{
   "items_address_address_line_1" : "string",
   "items_address_address_line_2" : "string"
   "items_address_care_of" : "string",
   "items_address_country" : "string",
   "items_address_locality" : "string",
   "items_address_po_box" : "string",
   "items_address_postal_code" : "string",
   "items_address_premises" : "string",
   "items_address_region" : "string"
   }

2 个答案:

答案 0 :(得分:0)

假设items数组始终只有一个元素,请使用--stream选项;

reduce (inputs|select(length == 2)) as $p
({}; .[$p[0]|map(strings)|join("_")] = $p[1])

由于使用了inputs,因此还需要-n选项。

答案 1 :(得分:0)

您可以使用pandas,特别是json_normalize

from pandas.io.json import json_normalize

d = {
    "items" : [
        {
            "address" : {
                "address_line_1" : "string",
                "address_line_2" : "string",
                "care_of" : "string",
                "country" : "string",
                "locality" : "string",
                "po_box" : "string",
                "postal_code" : "string",
                "premises" : "string",
                "region" : "string"
            },
            "address_snippet" : "string",
            "appointment_count" : "integer",
            "date_of_birth" : {
                "month" : "integer",
                "year" : "integer"
            },
            "description" : "string",
            "description_identifiers" : [
                "integer"
            ],
            "kind" : "string",
            "links" : {
                "self" : "string"
            },
            "matches" : [
                {
                    "address_snippet" : [
                        "integer"
                    ],
                    "snippet" : [
                        "integer"
                    ],
                    "title" : [
                        "integer"
                    ]
                }
            ],
            "snippet" : "string",
            "title" : "string"
        }
    ],
    "items_per_page" : "integer",
    "kind" : "string",
    "start_index" : "integer",
    "total_results" : "integer"
}


x = json_normalize(d['items'], sep="_")
print(x.to_string())
# print(x.keys()) # handy, as you may get "lost" with many keys
# x.to_dict(

 address_address_line_1 address_address_line_2 address_care_of address_country address_locality address_po_box address_postal_code address_premises address_region address_snippet appointment_count date_of_birth_month date_of_birth_year description description_identifiers    kind links_self                                            matches snippet   title
0                 string                 string          string          string           string         string              string           string         string          string           integer             integer            integer      string               [integer]  string     string  [{'address_snippet': ['integer'], 'snippet': [...  string  string

注意:

  1. 您可以根据需要重复使用json_normalize来展平嵌套的元素(列表)。
  2. 通常,我通常先将所有对象平展为新的数据帧,然后将所有内容合并为新的master_df,并将所有keys展平。希望对您有意义,否则请发表评论。