如何检查值A比值B大多少倍

时间:2019-09-18 15:34:08

标签: python pandas list comparison

我有一个用熊猫阅读的csv: 数据看起来像这样

home_team    away_team    home_score    away_score
Scotland     England      0             0
England      Scotland     4             2
Scotland     England      2             1
...          ...          ...           ...

我想编写一个带有两个参数的函数-两个团队。 并且会输出第1队,第2队赢得比赛的次数以及那里的鬃毛抽奖游戏的次数

我尝试过比较得分,但不确定当同一个团队同时出现在主队和客队列中时,我将如何编码

def who_won(team1, team2):

    home = data['home_team']
    away = data['away_team']
    home_score = data['home_score']
    away_score = data['away_score']
    counter_won = 0
    counter_lost = 0
    counter_draw = 0
    for item in range(len(data['home_team'])):

        if home_score > away_score:
            home.append(counter_won)
            counter_won = counter_won + 1
        elif home_score < away_score:
            home.append(counter_won)
            counter_lost = counter_lost + 1
        else:
            counter_draw = counter_draw + 1

但是我不确定如何比较游戏并计算每次赢,输或平局的次数。

所需的输出为

England won 1 time versus Scotland
Scotland won 1 time versus England
Scotland and England had one draw

3 个答案:

答案 0 :(得分:4)

您可以对数据进行一些预处理,然后使用pandas DataFrame的groupby方法获取所需的输出

1)预处理

添加两列,其中一列包含我称为match的(主场,客场)球队的元组,而另一列则显示比赛result

df['match'] = list(zip(df.home_team, df.away_team))

要获得匹配结果,您将需要一个函数:

def match_result(row):
    if row.home_score > row.away_score:
        return row.home_team + ' won'
    elif row.home_score < row.away_score:
        return row.away_team + ' won'
    else:
        return 'draw'
df['result'] = df.apply(match_result, axis=1)

2)分组依据

然后,您过滤数据集以仅包括输入主队和客队之间的比赛。最后,您将数据按结果分组并计算每个可能结果的数量:

df.loc[df.match.isin([(team1, team2), (team2, team1)]), 'result'].groupby(df.result).count()

测试

  home_team away_team  home_score  away_score        result  \
0  Scotland   England           0           0          draw   
1   England  Scotland           4           2   England won   
2  Scotland   England           2           1  Scotland won   

                 match  
0  (Scotland, England)  
1  (England, Scotland)  
2  (Scotland, England)
result
England won     1
Scotland won    1
draw            1
Name: result, dtype: int64

答案 1 :(得分:0)

实际上,away-home的过滤器更容易实现:

df['won'] = np.sign(df['home_score']-df['away_score'])
df.groupby(['home_team','away_team'])['won'].value_counts()

输出:

home_team  away_team  won
England    Scotland   1      1
Scotland   England    0      1
                      1      1
Name: won, dtype: int64

就您而言,这有点棘手:

# home team won/lost/tied
df['won'] = np.sign(df['home_score']-df['away_score'])

# we don't care about home/away, so we sort the pair by name
# but we need to revert the result first:
df['won'] = np.where(df['home_team'].lt(df['away_team']),
                     df['won'], -df['won'])

# sort the pair home/away
df[['home_team','away_team']] = np.sort(df[['home_team','away_team']], axis=1)

# value counts:
df.groupby(['home_team','away_team'])['won'].value_counts()

输出:

home_team  away_team  won
England    Scotland   -1     1
                       0     1
                       1     1
Name: won, dtype: int64

答案 2 :(得分:0)

我的解决方案考虑了以下细节:

  • 两个团队( team1 team2 )可以是主场离开,但是 您想知道 team1 赢得/丢失/与 team2 并列的次数。
  • 源DataFrame还包含与其他团队的比赛或 home away 团队都是“其他”团队(与我们的2个团队不同 感兴趣)。

要获得结果,请按如下所示定义函数:

def who_won(team1, team2):
    df1 = df.query('home_team == @team1 and away_team == @team2')\
        .set_axis(['tm1', 'tm2', 's1', 's2'], axis=1, inplace=False)
    df2 = df.query('home_team == @team2 and away_team == @team1')\
        .set_axis(['tm2', 'tm1', 's2', 's1'], axis=1, inplace=False)
    df3 = pd.concat([df1, df2], sort=False).reset_index(drop=True)
    dif = df3.s1 - df3.s2
    bins = pd.cut(dif, bins=[-100, -1, 0, 100], labels=['lost', 'draw', 'won'])
    return dif.groupby(bins).count()

请注意一个巧妙的技巧,当 team2 出现时,我如何“交换”主队和客队 home 小组( df2 )。 然后,我将 df1 df2 串联起来,这样 team1 总是在 tm1 中 柱。 所以现在 df3.s1-df3.s2 team1 的目标与目标之间的差异 (请注意,其他解决方案无法识别这种差异)。

然后,调用 cut 引入适当的类别名称(丢失 / draw / won ),从而可以直观地访问最终结果的各个组成部分。

为了测试此功能,我使用了更大的DataFrame,包括其他团队:

  home_team away_team  home_score  away_score
0  Scotland   England           0           0
1   England  Scotland           4           2
2   England  Scotland           3           1
3  Scotland   England           2           1
4  Scotland     Wales           3           1
5     Wales  Scotland           2           1

然后我打电话给who_won('England', 'Scotland')得到结果:

lost    1
draw    1
won     2
dtype: int64

结果是具有 CategoricalIndex 丢失 / draw / 获胜系列 >)。

如果您想将此结果重新格式化为所需的输出, 并获得每个“组件”,这很容易。 例如。获得英格兰与苏格兰获胜时的比赛次数, 运行res['won']

相关问题