根据2列或更多列中的值计算频率

时间:2013-06-04 05:04:26

标签: r count frequency multiple-columns

我有一个非常简单的问题但我想不出使用if语句

的方法

我看到的数据如下:

df <- structure(list(years = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), id = c(1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), x = structure(c(2L, 
1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 
1L), .Label = c("E", "I"), class = "factor")), .Names = c("years", 
"id", "x"), class = "data.frame", row.names = c(NA, -18L))

所以表格如下:

   years id x
1      1  1 I
2      2  1 E
3      3  1 E
4      1  1 E
5      2  1 I
6      3  1 I
7      1  2 I
8      2  2 E
9      3  2 I
10     1  2 E
11     2  2 E
12     3  2 I
13     1  3 I
14     2  3 E
15     3  3 I
16     1  3 I
17     2  3 I
18     3  3 E

我希望输出报告每个id和每年的“I”的分数:

   years id xnew
1      1  1 0.5
2      2  1 0.5
3      3  1 0.5
4      1  2 0.5
5      2  2 0.0
6      3  2 1.0
7      1  3 1.0
8      2  3 0.5
9      3  3 0.5

任何帮助将不胜感激!谢谢!

1 个答案:

答案 0 :(得分:0)

aggregate(x ~ years + id, data=df, function(y) sum(y=="I")/length(y) )

  years id   x
1     1  1 0.5
2     2  1 0.5
3     3  1 0.5
4     1  2 0.5
5     2  2 0.0
6     3  2 1.0
7     1  3 1.0
8     2  3 0.5
9     3  3 0.5