Question

我正在努力调整pivot_table与groupby的性能

一方面我有：

%time pd.pivot_table(df, index='INDEX', columns='COLUMN', values='VALUE', aggfunc=[len, np.sum], fill_value=0)
CPU times: user 1min 51s, sys: 1.57 s, total: 1min 53s
Wall time: 1min 54s

另一方面，我得到：

In [97]: df["GN"] = df.groupby(["A","B"]).grouper.group_info[0]

In [98]: df["G"] = "G" + (df["GN"] + 1).astype(str)

In [99]: df
Out[99]: 
     A      B         C         D  GN   G
0  foo    one -1.245506  0.307395   3  G4
1  bar    one  0.072989 -0.402182   0  G1
2  foo    two  0.399269  0.794413   5  G6
3  bar  three  0.475859 -0.685398   1  G2
4  foo    two -0.463065 -0.222632   5  G6
5  bar    two  0.696606 -0.999691   2  G3
6  foo    one -1.211876 -0.368574   3  G4
7  foo  three -0.936385 -1.067160   4  G5

这些基本上是相同的东西，但我得到60倍的性能差异。那是为什么？

Pandas性能：pivot_table vs groupby

0 个答案: