计算数据帧中每一列的出现次数

时间:2019-07-26 14:34:28

标签: r dplyr

我的数据框的列数未知(它可能经常更改),我需要计算给定ID和每一列的年份的观察次数,并为我的每一列创建一个costum“ n”列数据框告诉我对该特定列进行了多少观察。

我尝试过:

library(dplyr)
count <- tally(group_by(final_database,ID,Year))

但这将计算ID + Year的唯一组合。尽管我需要知道这些年来针对每个特征观察到我的ID的次数。示例:

ID  Year    CHAR1   n_CHAR1
A   2016    0       3   
A   2017    5       3
A   2018    2       3
A   2019            3
B   2016    1       2
B   2017            2
B   2018            2
B   2019    1       2

对于所有特征,依此类推。我会将“ n_CHAR”列插入原始数据框。

它不需要整洁。 谢谢!

2 个答案:

答案 0 :(得分:3)

尝试:

transform(final_database, n_CHAR1 = ave(CHAR1, ID, FUN = function(x) sum(x != "")))

如果空白行实际上是NA,则只需将sum(x != "")替换为sum(!is.na(x))

编辑

如果多个n列需要多个NCHAR列,则可以执行以下操作:

library(dplyr)

final_database %>%
  group_by(ID) %>%
  mutate_at(vars(starts_with("CHAR")),
            list(n = ~ sum(. != "")))

此示例假定所有相关的NCHAR列都以字符串NCHAR开头(例如NCHAR1NCHAR2NCHAR3等)。 / p>

如果您要引用的列是倒数第三,那么您可以执行以下操作:

library(dplyr)

finalDatabase <- final_database %>%
  group_by(ID) %>%
  mutate_at(vars(3:ncol(.)), # If you don't have many other vars except NCHAR, you can also do vars(-ID, -Year) as suggested by @camille
            list(n = ~ sum(. != ""))) %>%
  select(ID, Year, ends_with("_n"))

答案 1 :(得分:0)

我们也可以使用data.table来做到这一点:

library(data.table)

setDT(df)[, n_CHAR1 := sum(CHAR1 != ""), by = "ID"]

输出:

   ID Year CHAR1 n_CHAR1
1:  A 2016     0       3
2:  A 2017     5       3
3:  A 2018     2       3
4:  A 2019             3
5:  B 2016     1       2
6:  B 2017             2
7:  B 2018             2
8:  B 2019     1       2

数据:

df <- structure(list(ID = c("A", "A", "A", "A", "B", "B", "B", "B"), 
    Year = c(2016L, 2017L, 2018L, 2019L, 2016L, 2017L, 2018L, 
    2019L), CHAR1 = c("0", "5", "2", "", "1", "", "", "1")), row.names = c(NA, 
-8L), class = "data.frame")