熊猫groupby检查一列是否严格增加另一列

时间:2020-06-22 09:24:12

标签: python pandas dataframe pandas-groupby

我有以下数据框df:

Subject Marks1 Marks2
English  1      10
English  1.5    20
English  1.7    30
English  3      40
Science  1      10
Science  1.5    20
Science  1.7    15
Science  3      35

我想按主题分组,并检查Marks2是否随着Marks1的增加而严格增加。如果不是,那么我想从df中删除该组并将其放在另一个问题数据框中。所以最后我将拥有 df:

Subject Marks1 Marks2
English  1      10
English  1.5    20
English  1.7    30
English  3      40

问题:

Subject Marks1 Marks2
Science  1      10
Science  1.5    20
Science  1.7    15
Science  3      35

2 个答案:

答案 0 :(得分:2)

对所有列使用DataFrameGroupBy.diff进行比较,以比较少的值,例如0DataFrame.any,然后通过Series.isin获得vals作为主题和过滤器输出:< / p>

m = df.groupby('Subject').diff().le(0).any(axis=1)

vals = df.loc[m, 'Subject']
mask = df['Subject'].isin(vals)
df1 = df[mask]
print (df1)
   Subject  Marks1  Marks2
4  Science     1.0      10
5  Science     1.5      20
6  Science     1.7      15
7  Science     3.0      35

df2 = df[~mask]
print (df2)
   Subject  Marks1  Marks2
0  English     1.0      10
1  English     1.5      20
2  English     1.7      30
3  English     3.0      40

编辑:每个组的瓶颈不同,如果可以对所有组进行排序,则可以通过以下方式提高性能:

#columns used for difference (passed to groupby())
cols = ['Subject','col1','col2']
#sorting by all columns (if possible and if necessary)
df = df.sort_values(cols)
m = df[['Marks1','Marks2']].diff().le(0).any(axis=1) & df.duplicated(cols)

vals = df.loc[m, 'Subject']
mask = df['Subject'].isin(vals)
df1 = df[mask]

答案 1 :(得分:0)

.filter()使用lambda函数来查找.diff()以识别问题

 issues=df.groupby('Subject').filter(lambda x : ((x.Marks1.diff()>0)&(x.Marks2.diff()<0)).any())
    print(issues)


 Subject  Marks1  Marks2
4  Science     1.0      10
5  Science     1.5      20
6  Science     1.7      15
7  Science     3.0      35


Noissues=df[~df.index.isin(issues.index)]
print(Noissues)



  Subject  Marks1  Marks2
0  English     1.0      10
1  English     1.5      20
2  English     1.7      30
3  English     3.0      40
相关问题