Question

我在R中有一个data.frame，它是棒球比赛结果的目录，适用于多个赛季的每支球队。有些列是team，opponent_team，date，result，team_runs，opponent_runs等。我的问题是因为data.frame是每个团队的日志组合，每一行在data.frame中的其他地方基本上有另一行，它是该行的镜像。

例如

team  opponent_team  date           result team_runs opponent_runs
BAL   BOS            2010-04-05      W      5         4

在其他地方还有另一行

team  opponent_team  date           result team_runs opponent_runs
BOS   BAL            2010-04-05      L      4         5

我想在dplyr或类似的内容中编写一些代码，以选择具有team的唯一组合的行，{ {1}}和opponent_team列。我强调这里的单词组合，因为顺序并不重要，我只是想摆脱镜像的行。

由于

Answer 1

您是否尝试过dplyr中的distinct功能？对于您的情况，它可能类似于

library(dplyr)
df %>% distinct(team, opponent_team, date)

另一种方法是在dplyr的duplicated函数中使用基数R中的filter函数，如下所示。

filter(!duplicated(team, opponent_team, date)

从数据框中选择具有多个列的唯一值组合的行

1 个答案: