熊猫根据每个组的另一列的多个条件创建一个布尔列

时间:2018-08-30 15:19:08

标签: python-3.x pandas dataframe pandas-groupby

我有以下...ead-loader » vue-style-loader » css-loader » vue-loader » postcss-loader » sass-loader » vue-loader » components\dashboard.vue

df

我想基于cluster_id inv_id 1 A1 1 A1 2 A1111A 2 A1111A 上的两个条件,groupby cluster_id并创建一个名为invalid_inv_id的列:

inv_id

1. in each cluster, if the length of inv_id (stripped of non numerics) < 100 set "invalid_inv_id" to true;

代码就像

2. in each cluster, if the length of inv_id is < 3 set "invalid_inv_id" to true.

我想知道如何将两个条件合并为一行代码,所以结果看起来像这样,

df['inv_id_stp'] = df.inv_id.str.replace(r'\D+', '')

grouped = df.groupby('cluster_id')

invoices['invalid_inv_id'] = grouped['inv_id_stp'].transform(lambda x: x.str.len()) < 100

invoices['invalid_inv_id'] = grouped['inv_id'].transform(lambda x: x.str.len()) < 3

1 个答案:

答案 0 :(得分:1)

IIUC,这里不需要groupby

(df.inv_id.str.len()<3)|(df.inv_id.str.replace(r'\D+', '').str.len()<100)
Out[472]: 
0    True
1    True
2    True
3    True
Name: inv_id, dtype: bool

由于需要any

((df.inv_id.str.len()<3)|(df.inv_id.str.replace(r'\D+', '').str.len()<100)).groupby(df['cluster_id']).transform('any')