Question

对于示例数据框：

df <- structure(list(area = c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k"), 
                      count = c(1L, 1L, 1L, 3L, 4L, 2L, 2L, 4L, 2L, 5L, 6L)), 
                 .Names = c("area", "count"), class = c("tbl_df", "tbl", "data.frame"), 
                 row.names = c(NA, -11L), spec = structure(list(cols = structure(list(area = structure(list(), 
                 class = c("collector_character", "collector")), count = structure(list(), class = c("collector_integer",
                 "collector"))), .Names = c("area", "count")), default = structure(list(), class = c("collector_guess", 
                "collector"))), .Names = c("cols", "default"), class = "col_spec"))

...列出了每个区域出现的事件数量，我希望生成另一个汇总表，显示有多少个区域有一次出现，两次出现，三次出现等。例如，有三个区域带有＆＃ 39;每个区域出现一次＆＃34;，每个区域出现两次＆＃34;三个区域，一个区域有＆＃39;每个区域出现三次＆＃34;等

产生我想要的结果的最佳包装/代码是什么？我尝试过使用聚合和plyr，但到目前为止还没有成功。

Answer 1

我喜欢data.table语法

library(data.table)
setDT(df) # transform data.frame into data.table format

# .N calculates the number of observations, by instance of the count variable
df[, .(n_areas = .N), by = count]

   count n_areas
1:     1       3
2:     3       1
3:     4       2
4:     2       3
5:     5       1
6:     6       1

请参阅此问题，以便比较最常用于此类操作的两个大包：dplyr和data.table data.table vs dplyr: can one do something well the other can't or does poorly?

Answer 2

您可以使用基本R功能：使用@Jimbou解决方案

table(df$count)
1 2 3 4 5 6 
3 3 1 2 1 1

Answer 3

使用精彩的dplyr库非常直观。

首先，我们按照count的唯一值对数据进行分组，然后使用n()计算每个组的出现次数。

library(dplyr)
df %>%
    group_by(count) %>%
    summarise(number = n())

# A tibble: 6 x 2
  count number
  <int>  <int>
1     1      3
2     2      3
3     3      1
4     4      2
5     5      1
6     6      1

计算R中出现的次数

3 个答案: