在groupby聚合函数中传递参数

时间:2018-08-15 05:15:10

标签: python-3.x pandas pandas-groupby

我有一个数据框,在代码中已将其称为df,并且正在将聚合函数应用于每个组的多个列。我还应用了用户定义的lambda函数f4, f5, f6, f7。某些功能非常相似,例如f4, f6f7,其中仅参数值不同。我可以从 dictionary d中传递这些参数,以便我只写一个函数而不是写多个函数吗?

f4 = lambda x: len(x[x>10]) # count the frequency of bearing greater than threshold value
f4.__name__ = 'Frequency'

f5 = lambda x: len(x[x<3.4]) # count the stop points with velocity less than threshold value 3.4
f5.__name__ = 'stop_frequency'

f6 = lambda x: len(x[x>0.2]) # count the points with velocity greater than threshold value 0.2
f6.__name__ = 'frequency'

f7 = lambda x: len(x[x>0.25]) # count the points with accelration greater than threshold value 0.25
f7.__name__ = 'frequency'

d = {'acceleration':['mean', 'median', 'min'], 
 'velocity':[f5, 'sum' ,'count', 'median', 'min'], 
 'velocity_rate':f6,
 'acc_rate':f7,
 'bearing':['sum', f4], 
 'bearing_rate':'sum',     
 'Vincenty_distance':'sum'}

df1 = df.groupby(['userid','trip_id','Transportation_Mode','segmentid'], sort=False).agg(d)

#flatenning MultiIndex in columns
df1.columns = df1.columns.map('_'.join)
#MultiIndex in index to columns
df1 = df1.reset_index(level=2, drop=False).reset_index()

我喜欢写一个像这样的函数

f4(p) = lambda x: len(x[x>p]) 
f4.__name__ = 'Frequency'

d = {'acceleration':['mean', 'median', 'min'], 
 'velocity':[f5, 'sum' ,'count', 'median', 'min'], 
 'velocity_rate':f4(0.2),
 'acc_rate':f4(0.25),
 'bearing':['sum', f4(10)], 
 'bearing_rate':'sum',     
 'Vincenty_distance':'sum'}

数据帧df的csv文件在给定的链接中可用,以使数据更清晰。 https://drive.google.com/open?id=1R_BBL00G_Dlo-6yrovYJp5zEYLwlMPi9

1 个答案:

答案 0 :(得分:1)

neilaronson可能但不容易解决问题。

还可以通过布尔掩码的name个值中的test1个来简化解决方案。

sum

编辑:您也可以传递更大或更小的参数:

True

def f4(p):
    def ipf(x):
        return (x < p).sum()
        #your solution
        #return len(x[x < p])
    ipf.__name__ = 'Frequency'
    return ipf 

d = {'acceleration':['mean', 'median', 'min'], 
 'velocity':[f4(3.4), 'sum' ,'count', 'median', 'min'], 
 'velocity_rate':f4(0.2),
 'acc_rate':f4(.25),
 'bearing':['sum', f4(10)], 
 'bearing_rate':'sum',     
 'Vincenty_distance':'sum'}

df1 = df.groupby(['userid','trip_id','Transportation_Mode','segmentid'], sort=False).agg(d)

#flatenning MultiIndex in columns
df1.columns = df1.columns.map('_'.join)
#MultiIndex in index to columns
df1 = df1.reset_index(level=2, drop=False).reset_index()