转换数据框中的嵌套字典?

时间:2019-02-14 17:18:42

标签: python pandas dataframe

我一直在尝试解析数据框中的嵌套字典。 我用dict制作了这个df,但无法弄清楚这个嵌套的那个。

df

    First   second    third              

 0     1       2      {nested dict}

嵌套字典:

   {'fourth': '4', 'fifth': '5', 'sixth': '6'}, {'fourth': '7', 'fifth': '8', 'sixth': '9'}

我期望的输出为:

        First   second  fourth   fifth   sixth   fourth   fifth   sixth          

 0     1       2       4         5        6         7       8       9

编辑: 原始词典

   'archi': [{'fourth': '115',
      'fifth': '-162',
      'sixth': '112'},
     {'fourth': '52',
      'fifth': '42',
      'sixth': ' 32'}]

3 个答案:

答案 0 :(得分:1)

我无法在“第三”列中说出嵌套字典的格式,但这是我建议使用Python: Pandas dataframe from Series of dict作为起点的内容。这是可重复的字典和数据框:

nst_dict = {'archi': [{'fourth': '115', 'fifth': '-162', 'sixth': '112'},
      {'fourth': '52', 'fifth': '42','sixth': ' 32'}]}

df = pd.DataFrame.from_dict({'First':[1,2], 'Second':[2,3], 
     'third': [nst_dict,nst_dict]})

然后您需要首先访问字典中的列表,然后访问列表中的项目:

df.thrd_1 = df.third.apply(lambda x: x['archi']) # convert to list
df.thrd_1a = df.thrd_1.apply(lambda x: x[0]) # access first item
df.thrd_1b = df.thrd_1.apply(lambda x: x[1]) # access second item

out = df.drop('third', axis=1).merge(
    df.thrd_1a.apply(pd.Series).merge(df.thrd_1a.apply(pd.Series),
    left_index=True, right_index=True),
    left_index=True, right_index=True)

print(out)

First  Second fourth_x fifth_x sixth_x fourth_y fifth_y sixth_y
0      1       2      115    -162     112      115    -162     112
1      2       3      115    -162     112      115    -162     112

我将尝试用collections.abc进行清理并将其转换为函数,但这应该可以解决您的特定情况。

答案 1 :(得分:0)

“蛮力”方法

import pandas as pd
import numpy as np

my_dict = {'Zero': 0, 'First': 1, 'Second': 2,
       'archi': [{'fourth': '115', 'fifth': '-162', 'sixth': '112'},
                {'fourth': '52', 'fifth': '42', 'sixth': ' 32'}]}

data_row=[]
columns = []
for key in my_dict.keys():
    try:
        if len(my_dict[key]):
            for item in my_dict[key]:
                # iterate over nested dicts
                for k, v in item.items():
                    columns.append(k)
                    data_row.append(v)

    except TypeError:
        data_row.append(my_dict[key])
        columns.append(key)

print(columns)
print(data_row)

data = np.array(data_row).reshape(1,9)
df = pd.DataFrame(new_d, columns=columns)
print(df)

输出:

     Zero   First   Second   fourth     fifth   sixth   fourth  fifth   sixth
0       0       1        2      115      -162     112      52      42      32

答案 2 :(得分:0)

我使用递归方法创建了一个函数来扁平化dict结构:

regex

然后创建数据框:

original_dict = {'Zero': 0, 'First': 1, 'Second': 2,
       'archi': [{'fourth': '115', 'fifth': '-162', 'sixth': '112'},
                {'fourth': '52', 'fifth': '42', 'sixth': ' 32'}]}

flattened_dict = {}

def flatten(obj, name = ''):
    if isinstance(obj, dict):
        for key, value in obj.items():
            flatten(obj[key], key)
    elif isinstance(obj, list):
        for e in obj:
            flatten(e)
    else:
        flattened_dict[name] = [obj] 

flatten(original_dict)

具有以下输出:

enter image description here