Question

我有数据框：

station     date        classification
 1    June - 01/16          A
 2    June - 03/16          B
 1    June - 01/16          A
 7    June - 01/16          C
 1    June - 03/16          A
 2    June - 03/16          B
 2    June - 03/16          B

我想获得由电台号码和日期汇总的A，B和C总出现次数：

例如，6月1日的1号站有2个As，而6月3号的2号站有3个B。

我试过了，

aggregate(x = list(data_frame$classification), by = list(station=data_frame$station, Date=data_frame$date), function(x) length(unique(x))

Answer 1

如果我们需要计算A＆＃39;，＆＃39; B＆＃39;和＆＃39; C＆＃39;，重塑可能会更好。我们转换了＆＃39; data.frame＆＃39;到＆＃39; data.table＆＃39; （setDT(data_frame)）并使用dcast中的data.table来重塑“长期＆＃39;广泛的＆＃39;格式，将fun.aggregate指定为length。

library(data.table)
dcast(setDT(data_frame), station+date~classification, length)
#   station         date A B C
#1:       1 June - 01/16 2 0 0
#2:       1 June - 03/16 1 0 0
#3:       2 June - 03/16 0 3 0
#4:       7 June - 01/16 0 0 1

dplyr选项

library(dplyr)
data_frame %>%
        group_by(station, date, classification) %>%
        tally()
# station         date classification     n
#    (int)        (chr)          (chr) (int)
#1       1 June - 01/16              A     2
#2       1 June - 03/16              A     1
#3       2 June - 03/16              B     3
#4       7 June - 01/16              C     1

数据

data_frame <- structure(list(station = c(1L, 2L, 1L, 7L, 1L, 2L, 2L), 
date = c("June - 01/16", 
"June - 03/16", "June - 01/16", "June - 01/16", "June - 03/16", 
"June - 03/16", "June - 03/16"), classification = c("A", "B", 
"A", "C", "A", "B", "B")), .Names = c("station", "date", "classification"
), class = "data.frame", row.names = c(NA, -7L))

Answer 2

包plyr非常适合这个。

library(plyr) 
count(data_frame, c("classification", "station", "date"))

Answer 3

sql方式。

sqldf("select station, date ,classification, count(classification) from x group by station, date ,classification")

计算数据框（R）中分类变量的出现次数

3 个答案:

数据