按组查找特定行之间的差异

时间:2018-02-09 03:48:28

标签: r dataframe rows

在一个组中,我想找到该行与用户第一次出现在数据中的区别。例如,我需要在下面创建diff变量。用户具有不同的行数,如以下数据所示:

df <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 4L, 4L), 
    money = c(9L, 12L, 13L, 15L, 5L, 7L, 8L, 5L, 2L, 10L), occurence = c(1L, 
    2L, 3L, 4L, 1L, 2L, 3L, 1L, 1L, 2L), diff = c(NA, 3L, 4L, 
    6L, NA, 2L, 3L, NA, NA, 8L)), .Names = c("ID", "money", "occurence", 
"diff"), class = "data.frame", row.names = c(NA, -10L))

   ID money occurence diff
1   1     9         1   NA
2   1    12         2    3
3   1    13         3    4
4   1    15         4    6
5   2     5         1   NA
6   2     7         2    2
7   2     8         3    3
8   3     5         1   NA
9   4     2         1   NA
10  4    10         2    8

2 个答案:

答案 0 :(得分:3)

您可以使用ave()。我们只删除每个组的第一个值并将其替换为NA,然后从其余值中减去第一个值。

with(df, ave(money, ID, FUN = function(x) c(NA, x[-1] - x[1])))
# [1] NA  3  4  6 NA  2  3 NA NA  8

答案 1 :(得分:1)

解决方案,它使用first函数获取第一个值并计算差异。

library(dplyr)

df2 <- df %>%
  group_by(ID) %>%
  mutate(diff = money - first(money)) %>%
  mutate(diff = replace(diff, diff == 0, NA)) %>%
  ungroup()
df2
# # A tibble: 10 x 4
#       ID money occurence  diff
#    <int> <int>     <int> <int>
#  1     1     9         1    NA
#  2     1    12         2     3
#  3     1    13         3     4
#  4     1    15         4     6
#  5     2     5         1    NA
#  6     2     7         2     2
#  7     2     8         3     3
#  8     3     5         1    NA
#  9     4     2         1    NA
# 10     4    10         2     8

更新

这是Sotos提供的解决方案。请注意,无需用NA替换0。

library(data.table)

setDT(df)[, money := money - first(money), by = ID][]
#     ID money occurence diff
#  1:  1     0         1   NA
#  2:  1     3         2    3
#  3:  1     4         3    4
#  4:  1     6         4    6
#  5:  2     0         1   NA
#  6:  2     2         2    2
#  7:  2     3         3    3
#  8:  3     0         1   NA
#  9:  4     0         1   NA
# 10:  4     8         2    8

数据

dput(df)
structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 4L, 4L), 
    money = c(9L, 12L, 13L, 15L, 5L, 7L, 8L, 5L, 2L, 10L), occurence = c(1L, 
    2L, 3L, 4L, 1L, 2L, 3L, 1L, 1L, 2L)), .Names = c("ID", "money", 
"occurence"), row.names = c(NA, -10L), class = "data.frame")