Question

我有一个df，例如

    ID |    Status   | Color
   555    Cancelled     Green
   434    Processed     Red   
   212    Cancelled     Blue
   121    Cancelled     Green
   242    Cancelled     Blue
   352    Processed     Green
   343    Processed     Blue

我正在使用以下代码：

cc = df.groupby(by='Color').ID.count()
df.groupby(by=['Color', 'Status']).apply(lambda x: len(x)/cc.loc[x.Color.iloc[0]])

这给了我输出

Color     Status   
Blue   Cancelled    0.666667
       Processed    0.333333
Green  Cancelled    0.666667
       Processed    0.333333
Red    Processed    1.000000
dtype: float64

哪个可以给我每种颜色的状态百分比。

还有一个名为 dollar_value 的字段，如果我想在我的 1的输出中添加两个字段，则每一行都包含美元金额。 总计_美元表示该颜色和状态， 2 。表示该颜色的 dollar_per_order （这意味着如果Total_Dollars为1000，并且该颜色和状态有200行，则该行为1000 / 200或5。我可以轻松地将这两个计算都添加到我已有的输出中吗？还是需要创建一个函数？

所需的输出：

    Color     Status              Total |Dollar_Per_Order                
    Blue   Cancelled    0.666667  1000       20
           Processed    0.333333  200        5
    Green  Cancelled    0.666667  2000       20
           Processed    0.333333  1000       5
    Red    Processed    1.000000  300        10
    dtype: float64

谢谢！

Answer 1

要计算所有3列，请定义一个要应用于每个组的函数，如下所示：

def fn(grp):
    total = grp.dollar_value.sum()
    rowNo = len(grp.index)
    return pd.Series([ rowNo/cc[grp.name[0]], total, total/rowNo ],
        index=[ 'Percentage', 'Total_Dollars', 'Dollar_per_order'])

然后应用它：

df.groupby(by=['Color', 'Status']).apply(fn)

请注意，我使用的是len(grp.index)而不是len(grp)。原因是它运行更快。

我还以其他方式读取了当前组的颜色。

将计算添加到lambda或函数

1 个答案: