Question

请您帮我按多个条件对熊猫数据框进行分组。

这是我在 SQL 中的做法：

with a as (
  select high 
,sum( case when qr = 1 and now = 1 then 1 else 0 end ) q1_bad
,sum( case when qr = 2 and now = 1 then 1 else 0 end ) q2_bad
  from #tmp2
  group by high
)
select a.high from a
where q1_bad >= 2 and q2_bad >= 2 and a.high is not null

这是数据集的一部分：

import pandas as pd
a = pd.DataFrame()

a['client'] = range(35)
a['high'] = ['02','47','47','47','79','01','43','56','46','47','17','58','42','90','47','86','41','56',
'55','49','47','49','95','23','46','47','80','80','41','49','46','49','56','46','31']
a['qr'] = ['1','1','1','1','2','1','1','2','2','1','1','2','2',
'2','1','1','1','2','1','2','1','2','2','1','1','1','2','2','1','1',
'1','1','1','1','2']
a['now'] = ['0','0','0','0','0','0','0','0','0','0','0','0','1','0','0','0','0',
'0','0','0','0','0','0','0','0','0','0','0','0','0','0','1','0','0','0']

非常感谢！

Answer 1

非常相似，您需要在 groupby 之前定义您的列，然后应用您的操作。

假设你有实际的整数而不是字符串。

import numpy as np
import pandas as pd
a.assign(q1_bad = np.where((a['qr'].eq(1) & a['now'].eq(1)),1,0),
         q2_bad = np.where((a['qr'].eq(2) & a['now'].eq(1)),1,0)

).groupby('high')[['q1_bad','q2_bad']].sum()

     q1_bad  q2_bad
high                
01         0       0
02         0       0
17         0       0
23         0       0
31         0       0
41         0       0
42         0       1
43         0       0
46         0       0
47         0       0
49         1       0
55         0       0
56         0       0
58         0       0
79         0       0
80         0       0
86         0       0
90         0       0
95         0       0

对于额外的 where 子句，您可以通过多种方式之一对其进行过滤，但为了方便起见，我们可以在末尾添加 query。

a.dropna(subset='high').assign(q1_bad = np.where((a['qr'].eq(1) & a['now'].eq(1)),1,0),
         q2_bad = np.where((a['qr'].eq(2) & a['now'].eq(1)),1,0)

).groupby('high')[['q1_bad','q2_bad']].sum().query('q2_bad >= 2 and q1_bad >= 2')

熊猫：具有多个条件的 groupby

1 个答案: