我遇到了一些摘要解决方案proposed here的问题。
我只是通过活动和两个变量(性别和子女)来尝试summarise
序列数据。
这是我的序列数据集
dta = structure(c("d nuclear", "d nuclear", "e nuclear and acquaintance",
"e nuclear and acquaintance", "d nuclear", "e nuclear and acquaintance",
"d nuclear", "d nuclear", "j work study sleep", "c child", "d nuclear",
"d nuclear", "e nuclear and acquaintance", "e nuclear and acquaintance",
"d nuclear", "e nuclear and acquaintance", "d nuclear", "d nuclear",
"j work study sleep", "c child", "d nuclear", "d nuclear", "e nuclear and acquaintance",
"e nuclear and acquaintance", "d nuclear", "e nuclear and acquaintance",
"d nuclear", "c child", "j work study sleep", "c child", "d nuclear",
"d nuclear", "e nuclear and acquaintance", "e nuclear and acquaintance",
"d nuclear", "e nuclear and acquaintance", "d nuclear", "c child",
"j work study sleep", "c child", "d nuclear", "d nuclear", "e nuclear and acquaintance",
"e nuclear and acquaintance", "d nuclear", "e nuclear and acquaintance",
"d nuclear", "d nuclear", "j work study sleep", "c child", "d nuclear",
"a alone", "e nuclear and acquaintance", "e nuclear and acquaintance",
"b partner", "b partner", "d nuclear", "d nuclear", "j work study sleep",
"c child", "d nuclear", "a alone", "e nuclear and acquaintance",
"e nuclear and acquaintance", "b partner", "b partner", "d nuclear",
"d nuclear", "j work study sleep", "c child", "d nuclear", "a alone",
"e nuclear and acquaintance", "d nuclear", "b partner", "b partner",
"d nuclear", "d nuclear", "i True Missing", "c child", "d nuclear",
"d nuclear", "d nuclear", "d nuclear", "b partner", "b partner",
"d nuclear", "d nuclear", "i True Missing", "c child", "d nuclear",
"d nuclear", "d nuclear", "d nuclear", "b partner", "b partner",
"d nuclear", "d nuclear", "j work study sleep", "c child", "d nuclear",
"d nuclear", "d nuclear", "d nuclear", "b partner", "b partner",
"d nuclear", "d nuclear", "j work study sleep", "c child"), .Dim = 10:11, .Dimnames = list(
NULL, c("12:10", "12:20", "12:30", "12:40", "12:50", "13:00",
"13:10", "13:20", "13:30", "13:40", "13:50")))
当我只summarise
次活动
require(dplyr)
data_frame(var = c(dta)) %>%
group_by_("var") %>%
summarise( smn = n() * 10, min = smn / 20)
然而,我现在需要做的是分组儿童和性别。
dta = as.data.frame(dta)
dta$children <- rep(x = c(1,0), times = 5)
dta$gender <- c( rep('H', 5), rep('F', 5) )
dta$idno <- c( 1:10 ) # personal identifier
我在考虑这样的解决方案,但它不起作用
data_frame(var = c(dta)) %>%
group_by_("var", "children", "gender") %>%
summarise( smn = n() * 10, min = smn / 20)
你知道为什么这是不正确的吗?
我的原始数据集按此dta
组织,其中包含变量儿童,性别和 idno 。
所以我想要的输出与常规summarise
类似,但我需要按性别和儿童取消分类 - 所以基本summarise
:
var smn min
a alone 30 1.5
b partner 120 6.0
c child 130 6.5
d nuclear 510 25.5
e nuclear and acquaintance 200 10.0
i True Missing 20 1.0
j work study sleep 90 4.5
此处除以20 /20
是错误的,每个类别应除以
table(dta$children, dta$gender)
F H
0 3 2
1 2 3
有没有办法直接划分正确数量的类别而不是手动?
谢谢