R - dplyr变量活动摘要

时间:2015-06-26 12:48:18

标签: r sequence dplyr summary

我遇到了一些摘要解决方案proposed here的问题。

我只是通过活动和两个变量(性别和子女)来尝试summarise序列数据。

这是我的序列数据集

dta = structure(c("d nuclear", "d nuclear", "e nuclear and acquaintance", 
        "e nuclear and acquaintance", "d nuclear", "e nuclear and acquaintance", 
        "d nuclear", "d nuclear", "j work study sleep", "c child", "d nuclear", 
        "d nuclear", "e nuclear and acquaintance", "e nuclear and acquaintance", 
        "d nuclear", "e nuclear and acquaintance", "d nuclear", "d nuclear", 
        "j work study sleep", "c child", "d nuclear", "d nuclear", "e nuclear and acquaintance", 
        "e nuclear and acquaintance", "d nuclear", "e nuclear and acquaintance", 
        "d nuclear", "c child", "j work study sleep", "c child", "d nuclear", 
        "d nuclear", "e nuclear and acquaintance", "e nuclear and acquaintance", 
        "d nuclear", "e nuclear and acquaintance", "d nuclear", "c child", 
        "j work study sleep", "c child", "d nuclear", "d nuclear", "e nuclear and acquaintance", 
        "e nuclear and acquaintance", "d nuclear", "e nuclear and acquaintance", 
        "d nuclear", "d nuclear", "j work study sleep", "c child", "d nuclear", 
        "a alone", "e nuclear and acquaintance", "e nuclear and acquaintance", 
        "b partner", "b partner", "d nuclear", "d nuclear", "j work study sleep", 
        "c child", "d nuclear", "a alone", "e nuclear and acquaintance", 
        "e nuclear and acquaintance", "b partner", "b partner", "d nuclear", 
        "d nuclear", "j work study sleep", "c child", "d nuclear", "a alone", 
        "e nuclear and acquaintance", "d nuclear", "b partner", "b partner", 
        "d nuclear", "d nuclear", "i True Missing", "c child", "d nuclear", 
        "d nuclear", "d nuclear", "d nuclear", "b partner", "b partner", 
        "d nuclear", "d nuclear", "i True Missing", "c child", "d nuclear", 
        "d nuclear", "d nuclear", "d nuclear", "b partner", "b partner", 
        "d nuclear", "d nuclear", "j work study sleep", "c child", "d nuclear", 
        "d nuclear", "d nuclear", "d nuclear", "b partner", "b partner", 
        "d nuclear", "d nuclear", "j work study sleep", "c child"), .Dim = 10:11, .Dimnames = list(
          NULL, c("12:10", "12:20", "12:30", "12:40", "12:50", "13:00", 
                  "13:10", "13:20", "13:30", "13:40", "13:50")))

当我只summarise次活动

时,解决方案proposed here完全正常
require(dplyr)

data_frame(var = c(dta)) %>% 
  group_by_("var") %>% 
  summarise( smn = n() * 10, min = smn / 20) 

然而,我现在需要做的是分组儿童和性别。

dta = as.data.frame(dta)
dta$children <- rep(x = c(1,0), times = 5)
dta$gender <- c( rep('H', 5), rep('F', 5) )
dta$idno <- c( 1:10 ) # personal identifier 

我在考虑这样的解决方案,但它不起作用

data_frame(var = c(dta)) %>% 
  group_by_("var", "children", "gender") %>% 
  summarise( smn = n() * 10, min = smn / 20) 

你知道为什么这是不正确的吗?

我的原始数据集按此dta组织,其中包含变量儿童性别 idno

所以我想要的输出与常规summarise类似,但我需要按性别和儿童取消分类 - 所以基本summarise

                         var smn  min
                 a alone  30  1.5
               b partner 120  6.0
                 c child 130  6.5
               d nuclear 510 25.5
e nuclear and acquaintance 200 10.0
          i True Missing  20  1.0
      j work study sleep  90  4.5

此处除以20 /20是错误的,每个类别应除以

table(dta$children, dta$gender)

    F H
  0 3 2
  1 2 3

有没有办法直接划分正确数量的类别而不是手动

谢谢

0 个答案:

没有答案