我想使用fillna填充pandas数据帧。数据帧有多个组,所以我也使用groupby。使用的命令是这样的:
df.groupby(['var1', df.index.month, df.index.day])['var2'].transform(lambda y: y.astype(float).fillna(y.astype(float).median()))
但是,我不希望填充超出当前可用的最后一个有效索引,该索引可以使用last_valid_index
pandas命令确定。我怎么能这样做?
- 样本数据:
var1 var2
datetime
2000-01-01 baa 165.792185
2000-01-02 baa 166.066959
2001-01-02 baa 146.066959
2002-01-02 baa 126.066959
2000-01-03 baa NaN
2000-01-04 baa NaN
2000-01-01 ahia 169.777814
2000-01-02 ahia 171.754605
2000-01-07 ahia 173.194531
2000-01-08 ahia NaN
答案 0 :(得分:1)
我认为您需要自定义功能:
def f(y):
idx = y.last_valid_index()
y.loc[:idx] = y.loc[:idx].astype(float).fillna(y.astype(float).median())
return y
df = df.groupby(['var1', df.index.month, df.index.day])['var2'].transform(f)
样品:
print (df)
var1 var2
datetime
2000-01-01 baa 165.792185
2000-01-02 baa 166.066959
2001-01-02 baa NaN
2002-01-02 baa 126.066959
2000-01-02 baa NaN
2000-01-02 baa NaN
2000-01-01 ahia 169.777814
2000-01-02 ahia 171.754605
2000-01-07 ahia 173.194531
2000-01-08 ahia NaN
def f(y):
idx = y.last_valid_index()
y.loc[:idx] = y.loc[:idx].astype(float).fillna(y.astype(float).median())
return y
df['new'] = df.groupby(['var1', df.index.month, df.index.day])['var2'].transform(f)
print (df)
var1 var2 new
datetime
2000-01-01 baa 165.792185 165.792185
2000-01-02 baa 166.066959 166.066959
2001-01-02 baa NaN 146.066959
2002-01-02 baa 126.066959 126.066959
2000-01-02 baa NaN NaN
2000-01-02 baa NaN NaN
2000-01-01 ahia 169.777814 169.777814
2000-01-02 ahia 171.754605 171.754605
2000-01-07 ahia 173.194531 173.194531
2000-01-08 ahia NaN NaN