扩大熊猫中多个系列的平均值

时间:2016-01-02 15:08:12

标签: python pandas dataframe

我有一个groupby对象,我将扩展均值应用于。但是,我希望同时计算另一个系列/组。这是我的代码:

if, "", a, ==, "", b, "", b, =, "", c

如何在d = { 'home' : ['A', 'B', 'B', 'A', 'B', 'A', 'A'], 'away' : ['B', 'A','A', 'B', 'A', 'B', 'B'], 'aw' : [1,0,0,0,1,0,np.nan], 'hw' : [0,1,0,1,0,1, np.nan]} df2 = pd.DataFrame(d, columns=['home', 'away', 'hw', 'aw']) df2['tie'] = np.where(df2.hw == df2.aw, 1, 0) df2.index = range(1,len(df2) + 1) avgcol = ['hw','tie','aw'] homenames = ['home_win_at_home', 'home_tie_at_home', 'home_loss_at_home'] awaynames = ['away_win_at_away', 'away_tie_at_away', 'away_loss_at_away'] def win_at_venue(df, venuecol, avgcol, name): df[name] = df.groupby('away')[avgcol].apply(lambda x:pd.expanding_mean(x).shift()) win_at_venue(df2, 'home', avgcol, homenames) win_at_venue(df2, 'away', avgcol[::-1], awaynames) 对象中使用pd.expanding_mean,对groupby'home'列进行平均,以便我看到他们在所有场地的平均胜利/关系/损失?现在它只给出了一个在主场或客场比赛的球队的先前平均胜利,而不是家庭和球场。程。

我一直在尝试不同级别和df.stack()并重新索引但没有运气。

任何得到帮助的帮助。

以下是家庭和家庭赢得所有场地的正确结果:

'away'

1 个答案:

答案 0 :(得分:1)

您可能需要介绍一个'团队' column无论场地如何,都要跟随球队的记录。下面可以让你更接近。从:

开始
d = {'home': ['A', 'B', 'B', 'A', 'B', 'A', 'A'],
     'away': ['B', 'A', 'A', 'B', 'A', 'B', 'B'],
     'aw': [1, 0, 0, 0, 1, 0, np.nan],
     'hw': [0, 1, 0, 1, 0, 1, np.nan]}

df = pd.DataFrame(d, columns=['home', 'away', 'hw', 'aw'])
df.index = range(1, len(df) + 1)
df.index.name = 'game'

获得:

  home away  hw  aw
0    A    B   0   1
1    B    A   1   0
2    B    A   0   0
3    A    B   1   0
4    B    A   0   1
5    A    B   1   0
6    A    B NaN NaN

df.index = range(1, len(df) + 1)
df.index.name = 'game'

     home away  hw  aw
game                  
1       A    B   0   1
2       B    A   1   0
3       B    A   0   0
4       A    B   1   0
5       B    A   0   1
6       A    B   1   0
7       A    B NaN NaN

接下来,堆叠,以便您可以关注每个团队:

df = df.set_index(['hw', 'aw'], append=True).stack().reset_index().rename(columns={'level_3': 'role', 0: 'team'}).loc[:,
     ['game', 'team', 'role', 'hw', 'aw']]

    game team  role  hw  aw
0      1    A  home   0   1
1      1    B  away   0   1
2      2    B  home   1   0
3      2    A  away   1   0
4      3    B  home   0   0
5      3    A  away   0   0
6      4    A  home   1   0
7      4    B  away   1   0
8      5    B  home   0   1
9      5    A  away   0   1
10     6    A  home   1   0
11     6    B  away   1   0
12     7    A  home NaN NaN
13     7    B  away NaN NaN

然后,定义“胜利”,计算总体记录并应用expanding_mean

def wins(row):
    if row['role'] == 'home':
        return row['hw']
    else:
        return row['aw']
df['wins'] = df.apply(wins, axis=1)

df['expanding_mean'] = df.groupby('team')['wins'].apply(lambda x: pd.expanding_mean(x).shift())

    game team  role  hw  aw  wins  expanding_mean
0      1    A  home   0   1     0             NaN
1      1    B  away   0   1     1             NaN
2      2    B  home   1   0     1        1.000000
3      2    A  away   1   0     0        0.000000
4      3    B  home   0   0     0        1.000000
5      3    A  away   0   0     0        0.000000
6      4    A  home   1   0     1        0.000000
7      4    B  away   1   0     0        0.666667
8      5    B  home   0   1     0        0.500000
9      5    A  away   0   1     1        0.250000
10     6    A  home   1   0     1        0.400000
11     6    B  away   1   0     0        0.400000
12     7    A  home NaN NaN   NaN        0.500000
13     7    B  away NaN NaN   NaN        0.333333

由于您有游戏和团队的参考资料,您可以mergefilter来获得首选版面。

相关问题