从嵌套字典中的嵌套字典创建Pandas数据框

时间:2020-10-24 17:08:21

标签: python pandas dictionary

我在美国的宾夕法尼亚州刮了一个选举网站,这是该网站json中嵌套的字典的示例:

some_dict = {'Election': {'Statewide': [{'ADAMS': [{'CandidateName': 'BIDEN, JOSEPH '
                                                     'ROBINETTE JR',
                                    'CountyName': 'ADAMS',
                                    'ElectionDayNoVotes': '0',
                                    'ElectionDayVotes': '1',
                                    'ElectionDayYesVotes': '0',
                                    'ElectionYear': '2020'
                                    },
                                   {'CandidateName': 'TRUMP, DONALD J. ',
                                    'CountyName': 'ADAMS',
                                    'ElectionDayNoVotes': '0',
                                    'ElectionDayVotes': '1',
                                    'ElectionDayYesVotes': '0',
                                    'ElectionYear': '2020'
                                    }],
                         'ALLEGHENY': [{'CandidateName': 'BIDEN, JOSEPH '
                                                         'ROBINETTE JR',
                                        'CountyName': 'ALLEGHENY',
                                        'ElectionDayNoVotes': '0',
                                        'ElectionDayVotes': '1',
                                        'ElectionDayYesVotes': '0',
                                        'ElectionYear': '2020'
                                       },
                                       {'CandidateName': 'TRUMP, DONALD '
                                                         'J. ',
                                        'CountyName': 'ALLEGHENY',
                                        'ElectionDayNoVotes': '0',
                                        'ElectionDayVotes': '1',
                                        'ElectionDayYesVotes': '0',
                                        'ElectionYear': '2020'
                                       }]}]}}

我不知道如何将其转换为如下所示的数据框:

enter image description here

2 个答案:

答案 0 :(得分:1)

import pandas as pd

some_dict = {'Election': {'Statewide': [{'ADAMS': [{'CandidateName': 'BIDEN, JOSEPH '
                                                     'ROBINETTE JR',
                                    'CountyName': 'ADAMS',
                                    'ElectionDayNoVotes': '0',
                                    'ElectionDayVotes': '1',
                                    'ElectionDayYesVotes': '0',
                                    'ElectionYear': '2020'
                                    },
                                   {'CandidateName': 'TRUMP, DONALD J. ',
                                    'CountyName': 'ADAMS',
                                    'ElectionDayNoVotes': '0',
                                    'ElectionDayVotes': '1',
                                    'ElectionDayYesVotes': '0',
                                    'ElectionYear': '2020'
                                    }],
                         'ALLEGHENY': [{'CandidateName': 'BIDEN, JOSEPH '
                                                         'ROBINETTE JR',
                                        'CountyName': 'ALLEGHENY',
                                        'ElectionDayNoVotes': '0',
                                        'ElectionDayVotes': '1',
                                        'ElectionDayYesVotes': '0',
                                        'ElectionYear': '2020'
                                       },
                                       {'CandidateName': 'TRUMP, DONALD '
                                                         'J. ',
                                        'CountyName': 'ALLEGHENY',
                                        'ElectionDayNoVotes': '0',
                                        'ElectionDayVotes': '1',
                                        'ElectionDayYesVotes': '0',
                                        'ElectionYear': '2020'
                                       }]}]}}


df = pd.DataFrame()
for d in some_dict['Election']['Statewide']:
    for k,v in d.items():
        t = pd.DataFrame(v)
        t['CountyName'] = k
        df = pd.concat([df,t])

答案 1 :(得分:0)

解决方案

您可以通过以下两种方法之一进行操作:

  • 方法1:使用pd.read_json()
  • 方法2:使用pd.DataFrame() .DataFrame()方法接受
    • 一个 single dict

      键是列名,值是列值。

    • 一个 list of dicts

      每个列表项都是数据框的一行,用dict表示:键是该特定行的列名和值。

代码

在这里,我们正在使用list of dicts方法来创建数据框。首先,我们使用自定义函数list of dicts 将数据转换为prepare_records(),然后应用以下两种方法之一。

# prepare records
records = prepare_records(data)

# Method-1: using read_json()
import json
df = pd.read_json(json.dumps(records), orient='records')

# Method-2: using DataFrame()
df = pd.DataFrame(data=records)

输出

# print(df.to_markdown(index=False))

| CandidateName              | CountyName   |   ElectionDayNoVotes |   ElectionDayVotes |   ElectionDayYesVotes |   ElectionYear |
|:---------------------------|:-------------|---------------------:|-------------------:|----------------------:|---------------:|
| BIDEN, JOSEPH ROBINETTE JR | ADAMS        |                    0 |                  1 |                     0 |           2020 |
| TRUMP, DONALD J.           | ADAMS        |                    0 |                  1 |                     0 |           2020 |
| BIDEN, JOSEPH ROBINETTE JR | ALLEGHENY    |                    0 |                  1 |                     0 |           2020 |
| TRUMP, DONALD J.           | ALLEGHENY    |                    0 |                  1 |                     0 |           2020 |

自定义功能

# custom function
def prepare_records(data):
    records = []
    for county in data['Election']['Statewide'][0].values(): 
        records.extend(county) # same as: records += county
    return records

虚拟数据

data = {
    'Election': 
        {'Statewide': [
            {
                'ADAMS': [
                    {
                        'CandidateName': 'BIDEN, JOSEPH ROBINETTE JR',
                        'CountyName': 'ADAMS',
                        'ElectionDayNoVotes': '0',
                        'ElectionDayVotes': '1',
                        'ElectionDayYesVotes': '0',
                        'ElectionYear': '2020'
                    },
                    {
                        'CandidateName': 'TRUMP, DONALD J.',
                        'CountyName': 'ADAMS',
                        'ElectionDayNoVotes': '0',
                        'ElectionDayVotes': '1',
                        'ElectionDayYesVotes': '0',
                        'ElectionYear': '2020'
                    },
                ],
                'ALLEGHENY': [
                    {
                        'CandidateName': 'BIDEN, JOSEPH ROBINETTE JR',
                        'CountyName': 'ALLEGHENY',
                        'ElectionDayNoVotes': '0',
                        'ElectionDayVotes': '1',
                        'ElectionDayYesVotes': '0',
                        'ElectionYear': '2020'
                    },
                    {
                        'CandidateName': 'TRUMP, DONALD J.',
                        'CountyName': 'ALLEGHENY',
                        'ElectionDayNoVotes': '0',
                        'ElectionDayVotes': '1',
                        'ElectionDayYesVotes': '0',
                        'ElectionYear': '2020'
                    },
                ],
            },
        ],
    }
}