Question

我的大多数数据分析需要从R切换到Python，并且遇到了以下问题。可能是我对groupby（）概念性理解的结果。

我有一个Pandas数据框，我正在尝试根据多个列汇总数据。以下代码是我想要的。

df = df[(df["Lead Source"] != "chase") & (df["Lead Source"] != "SNE")]
ndf = df[["Date", "Lead Source", "Model Group", "Leads"]].groupby(["Date", "Lead Source"]).sum()

enter image description here

这看起来很棒，但我注意到在运行以下内容时只有一个“真正的”列。（fyi ndf2只是ndf的副本）

ndf2.columns
Out[39]: Index([u'Leads'], dtype='object')

此外，行的索引显示这并不是我所希望的。

enter image description here

如何调整行以使列名出现在第一行。输出应如下所示。

Date      Lead Source    Leads
1/1/2014  ...            ... 
          ...            ...
          ...            ...

Answer 1

您可以使用：

ndf.reset_index()

请注意，groupby操作正在创建一个MultiIndex的DataFrame。由于您按Date和Lead Source进行分组，因此这些是MultiIndex的级别名称。 Date和Lead Source在列名称下方显示一行的原因是因为Pandas试图表明这些是 index 级别名称，而不是列。（看看ndf.index.names。）对reset_index的调用将索引级别移动到列并重新编号索引。

或者，更好的是，在致电groupby时使用as_index=False option：

ndf = (df[["Date", "Lead Source", "Model Group", "Leads"]]
       .groupby(["Date", "Lead Source"], as_index=False).sum())

汇总时，as_index=False会阻止分组值用作索引值。

分组后只有一列by（）

1 个答案: