R中每个id的唯一值的累积计数

时间:2013-12-16 14:25:10

标签: r unique

我有一个名字和一些资格日期的df。我想根据时间创建一个人有多少独特elig_end_dates的指标。这是我的df:

 names date_of_claim elig_end_date
1    tom    2010-01-01    2010-07-01
2    tom    2010-05-04    2010-07-01
3    tom    2010-06-01    2014-01-01
4    tom    2010-10-10    2014-01-01
5   mary    2010-03-01    2014-06-14
6   mary    2010-05-01    2014-06-14
7   mary    2010-08-01    2014-06-14
8   mary    2010-11-01    2014-06-14
9   mary    2011-01-01    2014-06-14
10  john    2010-03-27    2011-03-01
11  john    2010-07-01    2011-03-01
12  john    2010-11-01    2011-03-01
13  john    2011-02-01    2011-03-01

这是我想要的输出:

 names date_of_claim elig_end_date obs
1    tom    2010-01-01    2010-07-01   1
2    tom    2010-05-04    2010-07-01   1
3    tom    2010-06-01    2014-01-01   2
4    tom    2010-10-10    2014-01-01   2
5   mary    2010-03-01    2014-06-14   1
6   mary    2010-05-01    2014-06-14   1
7   mary    2010-08-01    2014-06-14   1
8   mary    2010-11-01    2014-06-14   1
9   mary    2011-01-01    2014-06-14   1
10  john    2010-03-27    2011-03-01   1
11  john    2010-07-01    2011-03-01   1
12  john    2010-11-01    2011-03-01   1
13  john    2011-02-01    2011-03-01   1

我发现这篇文章很有用R: Count unique values by category,但答案是作为一个单独的表格给出的,而不是包含在df中。

我也试过这个:

df$ob = ave(df$elig_end_date, df$elig_end_date, FUN=seq_along)

但是这创造了一个计数,我真的只想要一个指标。

提前谢谢

STEPHEN代码的产品(这不是正确的代码 - 仅作为学习点发布)

names date_of_claim elig_end_date ob
1    tom    2010-01-01    2010-07-01  2
2    tom    2010-05-04    2010-07-01  2
3    tom    2010-06-01    2014-01-01  2
4    tom    2010-10-10    2014-01-01  2
5   mary    2010-03-01    2014-06-14  5
6   mary    2010-05-01    2014-06-14  5
7   mary    2010-08-01    2014-06-14  5
8   mary    2010-11-01    2014-06-14  5
9   mary    2011-01-01    2014-06-14  5
10  john    2010-03-27    2011-03-01  4
11  john    2010-07-01    2011-03-01  4
12  john    2010-11-01    2011-03-01  4
13  john    2011-02-01    2011-03-01  4

1 个答案:

答案 0 :(得分:5)

使用ave的另一种可能性:

df$obs <- with(df, ave(elig_end_date, names,
                       FUN = function(x) cumsum(!duplicated(x))))

#    names date_of_claim elig_end_date obs
# 1    tom    2010-01-01    2010-07-01   1
# 2    tom    2010-05-04    2010-07-01   1
# 3    tom    2010-06-01    2014-01-01   2
# 4    tom    2010-10-10    2014-01-01   2
# 5   mary    2010-03-01    2014-06-14   1
# 6   mary    2010-05-01    2014-06-14   1
# 7   mary    2010-08-01    2014-06-14   1
# 8   mary    2010-11-01    2014-06-14   1
# 9   mary    2011-01-01    2014-06-14   1
# 10  john    2010-03-27    2011-03-01   1
# 11  john    2010-07-01    2011-03-01   1
# 12  john    2010-11-01    2011-03-01   1
# 13  john    2011-02-01    2011-03-01   1