在数据框内展开嵌套字典

时间:2019-04-12 11:22:26

标签: python pandas dataframe

我想在将其嵌套字典输出到csv之前重新格式化它。 我的嵌套字典:

step_list

到目前为止,我已经尝试过:

review = {'Q1': {'Question': 'question wording','Answer': {'Part 1': 'Answer part one', 'Part 2': 'Answer part 2'} ,'Proof': {'Part 1': 'The proof part one', 'Part 2': 'The proof part 2'}},
      'Q2': {'Question': 'question wording','Answer': {'Part 1': 'Answer part one', 'Part 2': 'Answer part 2'} ,'Proof': {'Part 1': 'The proof part one', 'Part 2': 'The proof part 2'}}}

并获得帮助:

my_df = pd.DataFrame(review)
my_df = my_df.unstack()

但我希望它最终看起来像这样:

Q1  Answer      {'Part 1': 'Answer part one', 'Part 2': 'Answe...
    Proof       {'Part 1': 'The proof part one', 'Part 2': 'Th...
    Question                                     question wording
Q2  Answer      {'Part 1': 'Answer part one', 'Part 2': 'Answe...
    Proof       {'Part 1': 'The proof part one', 'Part 2': 'Th...
    Question                                     question wording

所以我需要熔化/解开/枢轴/展开/ other_manipulation_word数据框中的嵌套字典。

我已将此作为指导,但无法将其应用于自己的指导: Expand pandas dataframe column of dict into dataframe columns

1 个答案:

答案 0 :(得分:2)

这是一种可能的解决方案:

1)使用东方“索引”创建初始DataFrame

df = pd.DataFrame.from_dict(review, orient='index')

2)使用Index.repeatSeries.str.lenDataFrame.loc

创建最终DataFrame的形状
df_new = df.loc[df.index.repeat(df.Answer.str.len())]

3)通过传递给DataFrame的构造函数并使用stack的值来修复“答案”和“证明”列

df_new['Answer'] = pd.DataFrame(df.Answer.tolist()).stack().values
df_new['Proof'] = pd.DataFrame(df.Proof.tolist()).stack().values
print(df_new)

            Question           Answer               Proof
Q1  question wording  Answer part one  The proof part one
Q1  question wording    Answer part 2    The proof part 2
Q2  question wording  Answer part one  The proof part one
Q2  question wording    Answer part 2    The proof part 2