多级(集群)数据的描述性统计

时间:2015-03-12 13:46:43

标签: r function aggregate plyr multi-level

我无法为多级数据生成复杂的描述性统计数据横截面。我试图从几个不同的角度来看待这个问题,但无济于事。请在下面找到我用于失败的plyr解决方案的一些代码。问题是学校存在于一个学区内。我需要区级的摘要统计数据来匹配该区的每所学校。 plyr解决方案显然只能在地区一级为每个学校子样本生成描述性统计数据,而不是将每个学校的综合地区信息应用于每个学校。

当我有片刻的时候,我一直试图找到解决这个问题的方法。

通过,聚合,data.table会提供更好的解决方案吗?

#Generate Data
set.seed(500)
School <- rep(seq(1:20), 2)
District <- rep(c(rep("East", 10), rep("West", 10)), 2)
Score <- rnorm(40, 100, 15)
Student.ID <- sample(1:1000,8,replace=T)
items <- data.frame(replicate(10, sample(1:4, 40, replace=TRUE)))
gender <- rep( c("Male","Female"), 100*c(0.4,0.6) )  
gender <- sample(gender, 40)
low.inc <- rep( c("Status.A", "Status.B", "Status.c"), 100*c(0.3,0.2,0.5) )  
low.inc <- sample(low.inc, 40)
items <- data.frame(lapply(items, factor, ordered=TRUE, 
                           levels=1:4))
                           labels=c("Strongly disagree","Disagree",
                                    "Agree","Strongly Agree")
school.data <- data.frame(Student.ID, School, District, Score, items, gender, low.inc)
sd1 = sd(school.data$Score)
m1 = mean(school.data$Score)
sd.above = m1 + sd1
sd.below = m1 - sd1
school.data$scorecat[Score >= sd.above] <- "High"
school.data$scorecat[Score > sd.below & Score <= sd.above] <- "Moderate"
school.data$scorecat[Score <= sd.below] <- "Low"

#Attempt to generate table
library(plyr)
b1 <- ddply(school.data, .var = c("gender", "District", "School"), .fun = summarise,
  n = length(scorecat),
  high = sum(scorecat %in% c("High")),
  high.prop = high / n, # Referring to vars I just created
  mod = sum(scorecat %in% c("Moderate")),
  mod.prop = mod / n, # Referring to vars I just created
  low = sum(scorecat %in% c("Low")),
  low.prop = low / n # Referring to vars I just created
)
drops <- c("high","mod", "low") #set up a list to drop columns
b1 <- b1[,!(names(b1) %in% drops)]
colnames(b1)[1] <- "Demographic Variable"

注意:此表格生成正确的区域值,应该唯一地分配给每个学校。我喜欢像第一个例子一样的桌子,每个学校都有相应的区域。

b1 <- ddply(school.data, .var = c("gender", "District"), .fun = summarise,
  n = length(scorecat),
  high = sum(scorecat %in% c("High")),
  high.prop = high / n, # Referring to vars I just created
  mod = sum(scorecat %in% c("Moderate")),
  mod.prop = mod / n, # Referring to vars I just created
  low = sum(scorecat %in% c("Low")),
  low.prop = low / n # Referring to vars I just created
)
drops <- c("high","mod", "low") #set up a list to drop columns
b1 <- b1[,!(names(b1) %in% drops)]
colnames(b1)[1] <- "Demographic Variable"

1 个答案:

答案 0 :(得分:2)

如果我理解得很清楚,你想要的是在学区一级计算变量,然后将其归因于学校水平。我很难理解你的其他帖子。

您可以在基础R中连续使用     骨料 和     合并

鉴于您已经计算了摘要     B1 使用dplyr的表,你可以将它合并到初始     school.data 数据集。

    school.data2 <- merge(school.data,b1,by=c("District","gender"))

让我知道是否会削减它。