第二行变量的频率计数

时间:2020-10-28 18:40:08

标签: r dplyr

我有一个像下面这样的数据框,我正在寻找一种简单的解决方案,以列名以数字开头的变量计数。在数据框中添加第二行之后。

Bar
其输出应如下所示。就像以任何数字开头的列名称的值计数和最后一列的总数之和。在数据框中添加第二行之后。

enter image description here

2 个答案:

答案 0 :(得分:0)

您可以使用以下方式创建行

summary_row = 
  df %>% 
  summarize(across(c(matches("^[0-9]"), Concatenate), ~sum(!is.na(.) & . != "" & . != "NA")))

summary_row
#   1A 2B 3C 4D Concatenate
# 1  3  5  0  7           8

result = bind_rows(mutate(summary_row, across(everything(), as.character)), df)
# reorder columns
result[names(df)]
#    AA   BB     CC DD   EE      FF GG 1A 2B 3C 4D Concatenate
# 1  NA <NA>   <NA> NA <NA>    <NA> NA  3  5  0  7           8
# 2  72  AMK  TAMAN 62   CA ENGLISH 33                        
# 3  62 KAMl  GHUSI 41   NY  FRENCH 44  D  A     G         DAG
# 4  43  HAJ KELVIN 37   GA ENGLISH 51           G           G
# 5  66  NHS  DEREK 41   DE  FRENCH 51 NA        G         NAG
# 6  54  KUL   LOKU 32   MN ENGLISH 37     A     G          AG
# 7  64  GAF MNDHUL 74   LA ENGLISH 58  D  A     G         DAG
# 8  47  BGA JASMIN 52   GA SPANISH 24     A     G          AG
# 9  47  NHU  BINNY 75   VA ENGLISH 67     A     G          AG
# 10 27  VGY BURTAM 59   TM SPANISH 41  D                    D
# 11 68  NHU  DAVID 36   BA RUSSIAN 75                        

您可以使用bind_rows将其绑定到数据框的顶部,但这仅是出于演示目的。数据框列只能是一种类型,因此,如果将汇总行中的数字与您已经拥有的character列结合使用,则会将其转换为字符。


我使用了这些数据(将check.names = FALSE添加到您的data.frame()代码中,以使列名如您的示例中所示):

df <- data.frame(AA=c(72,62,43,66,54,64,47,47,27,68),
                 BB=c("AMK","KAMl","HAJ","NHS","KUL","GAF","BGA","NHU","VGY","NHU"),
                 CC=c("TAMAN","GHUSI","KELVIN","DEREK","LOKU","MNDHUL","JASMIN","BINNY","BURTAM","DAVID"),
                 DD=c(62,41,37,41,32,74,52,75,59,36),
                 EE=c("CA","NY","GA","DE","MN","LA","GA","VA","TM","BA"),
                 FF=c("ENGLISH","FRENCH","ENGLISH","FRENCH","ENGLISH","ENGLISH","SPANISH","ENGLISH","SPANISH","RUSSIAN"),
                 GG=c(33,44,51,51,37,58,24,67,41,75),
                 `1A`=c("","D","","NA","","D","","","D",""),
                 `2B`=c("","A","","","A","A","A","A","",""),
                 `3C`=c("","","","","","","","","",""),
                 `4D`=c("","G","G","G","G","G","G","G","",""),
                  "Concatenate" = c("","DAG","G","NAG","AG","DAG","AG","AG","D",""), check.names = F)

答案 1 :(得分:0)

我们可以将base RcolSums一起使用

nm1 <- grep('^[0-9]', names(df), value = TRUE)
colSums(!is.na(df[nm1]) & df[nm1] != "" & df[nm1] != "NA")
相关问题