请您帮我按多个条件对熊猫数据框进行分组。
这是我在 SQL 中的做法:
with a as (
select high
,sum( case when qr = 1 and now = 1 then 1 else 0 end ) q1_bad
,sum( case when qr = 2 and now = 1 then 1 else 0 end ) q2_bad
from #tmp2
group by high
)
select a.high from a
where q1_bad >= 2 and q2_bad >= 2 and a.high is not null
这是数据集的一部分:
import pandas as pd
a = pd.DataFrame()
a['client'] = range(35)
a['high'] = ['02','47','47','47','79','01','43','56','46','47','17','58','42','90','47','86','41','56',
'55','49','47','49','95','23','46','47','80','80','41','49','46','49','56','46','31']
a['qr'] = ['1','1','1','1','2','1','1','2','2','1','1','2','2',
'2','1','1','1','2','1','2','1','2','2','1','1','1','2','2','1','1',
'1','1','1','1','2']
a['now'] = ['0','0','0','0','0','0','0','0','0','0','0','0','1','0','0','0','0',
'0','0','0','0','0','0','0','0','0','0','0','0','0','0','1','0','0','0']
非常感谢!
答案 0 :(得分:2)
非常相似,您需要在 groupby 之前定义您的列,然后应用您的操作。
假设你有实际的整数而不是字符串。
import numpy as np
import pandas as pd
a.assign(q1_bad = np.where((a['qr'].eq(1) & a['now'].eq(1)),1,0),
q2_bad = np.where((a['qr'].eq(2) & a['now'].eq(1)),1,0)
).groupby('high')[['q1_bad','q2_bad']].sum()
q1_bad q2_bad
high
01 0 0
02 0 0
17 0 0
23 0 0
31 0 0
41 0 0
42 0 1
43 0 0
46 0 0
47 0 0
49 1 0
55 0 0
56 0 0
58 0 0
79 0 0
80 0 0
86 0 0
90 0 0
95 0 0
对于额外的 where 子句,您可以通过多种方式之一对其进行过滤,但为了方便起见,我们可以在末尾添加 query
。
a.dropna(subset='high').assign(q1_bad = np.where((a['qr'].eq(1) & a['now'].eq(1)),1,0),
q2_bad = np.where((a['qr'].eq(2) & a['now'].eq(1)),1,0)
).groupby('high')[['q1_bad','q2_bad']].sum().query('q2_bad >= 2 and q1_bad >= 2')