创建新列,将信息按字母顺序分为两列

时间:2018-08-09 14:02:56

标签: r dplyr

我有一个足球队数据集,如下所示:

Home_team Away_team Home_score Away_score
Arsenal    Chelsea      1        3
Manchester U  Blackburn 2        9
Liverpool      Leeds    0        8
Chelsea     Arsenal     4        1

我想将参与其中的球队归为一组,而不管哪个球队在家中和外地比赛。例如,如果切尔西出战阿森纳,无论比赛是在切尔西还是在阿森纳,我都希望新列“ teams_involved”为阿森纳-切尔西。我的猜测是这样做的方法是将这些团队按字母顺序添加到新列中,但是我不确定如何做到这一点。

所需的输出:

Home_team Away_team Home_score Away_score teams_involved
Arsenal    Chelsea      1        3     Arsenal - Chelsea
Manchester U  Blackburn 2        9   Blackburn - Manchester U
Liverpool      Leeds    0        8      Leeds - Liverpool 
Chelsea     Arsenal     4        1     Arsenal - Chelsea

我寻求这个目的的原因是,无论比赛的地点在哪里,我都可以看到每支球队对特定球队的获胜次数。

3 个答案:

答案 0 :(得分:2)

df = read.table(text = "
Home_team Away_team Home_score Away_score
Arsenal    Chelsea      1        3
ManchesterU  Blackburn 2        9
Liverpool      Leeds    0        8
Chelsea     Arsenal     4        1
", header=T, stringsAsFactors=F)

library(dplyr)

df %>%
  rowwise() %>%      # for each row
  mutate(Teams = paste(sort(c(Home_team, Away_team)), collapse = " - ")) %>%  # sort the teams alphabetically and then combine them separating with -
  ungroup()          # forget the row grouping

# # A tibble: 4 x 5
#   Home_team   Away_team Home_score Away_score Teams                  
#   <chr>       <chr>          <int>      <int> <chr>                  
# 1 Arsenal     Chelsea            1          3 Arsenal - Chelsea      
# 2 ManchesterU Blackburn          2          9 Blackburn - ManchesterU
# 3 Liverpool   Leeds              0          8 Leeds - Liverpool      
# 4 Chelsea     Arsenal            4          1 Arsenal - Chelsea 

没有rowwise的替代解决方案:

# create function and vectorize it
f = function(x,y) {paste(sort(c(x, y)), collapse = " - ")}
f = Vectorize(f)

# apply function to your dataset
df %>% mutate(Teams = f(Home_team, Away_team))

答案 1 :(得分:1)

我们可以使用map2按字母顺序遍历行和sort的“ Home_team”,“ Away_team”列的元素

library(tidyverse)
df %>% 
  mutate(Teams = map2(Home_team, Away_team, ~
                 paste(sort(c(.x, .y)), collapse= ' - ')))
#  Home_team Away_team Home_score Away_score                   Teams
#1     Arsenal   Chelsea          1          3       Arsenal - Chelsea
#2 ManchesterU Blackburn          2          9 Blackburn - ManchesterU
#3   Liverpool     Leeds          0          8       Leeds - Liverpool
#4     Chelsea   Arsenal          4          1       Arsenal - Chelsea

或者另一个选择是pmin/pmax

df %>%
   mutate(Teams = paste(pmin(Home_team, Away_team), 
                        pmax(Home_team, Away_team), sep= " - "))

或使用base R

df$Teams <- paste(do.call(pmin, df[1:2]), do.call(pmax, df[1:2]), sep= ' - ')

数据

df <- structure(list(Home_team = c("Arsenal", "ManchesterU", "Liverpool", 
"Chelsea"), Away_team = c("Chelsea", "Blackburn", "Leeds", "Arsenal"
), Home_score = c(1L, 2L, 0L, 4L), Away_score = c(3L, 9L, 8L, 
 1L)), .Names = c("Home_team", "Away_team", "Home_score", "Away_score"
 ), class = "data.frame", row.names = c(NA, -4L))

答案 2 :(得分:0)

一个简单的ifelse语句也可以工作:

df$teams_involved <- ifelse(df$Home_team > df$Away_team, 
                            paste(df$Away_team, df$Home_team, sep = " - "), 
                            paste(df$Home_team, df$Away_team, sep = " - "))