rmarkdown中的循环,knitr和xtable,用于在多个报表中创建唯一表

时间:2015-02-06 01:53:34

标签: r knitr xtable

我正在全面修改我的问题。我意识到这很长,我的观点正在迷失。

这就是我需要做的事情:

为学校创建自动报告,其中包含将其数据与学校所在地区以及整个州进行比较的表格。 State是整个数据集。

以下是我的理解:

如何创建横穿数据的自动循环,并为每所学校创建唯一的PDF报告。 This post在设置生成报告的框架方面非常有用。

以下是我需要帮助的地方:

我需要一个包含以下列的表:School,District,State 我还需要表格的第一列包含一行:Sample Size,Mean,Standard Deviation。

我正在尝试在for循环的上下文中创建它,因为我需要在每个创建的唯一pdf中使用唯一的表。如果有更好的方法,我很乐意听到它。

无论如何,这是我测试过的可重现的例子。我在制作表格方面做得还不够。

非常感谢任何帮助。

driver.r

# Create dataset
set.seed(500)
School <- rep(seq(1:20), 2)
District <- rep(c(rep("East", 10), rep("West", 10)), 2)
Score <- rnorm(40, 100, 15)
Student.ID <- sample(1:1000,8,replace=T)
school.data <- data.frame(School, District, Score, Student.ID)

#prepare for multicore processing 
require(parallel)
# generate the rmd files, one for each school in df
library(knitr)
mclapply(unique(school.data$School), function(x) 
  knit("F:/sample-auto/auto.Rmd", 
       output=paste('report_', x, '.Rmd', sep="")))

# generate PDFs from the rmd files, one for each school in df
mclapply(unique(school.data$School), function(x)
  rmarkdown::render(paste0("F:/sample-auto/", paste0('report_', x, '.Rmd'))))

auto.Rmd

---
title: "Automated Report Generation for Data"
author: "ME"
date: "February 5, 2015"
output: 
  pdf_document:
  toc: true
  number_sections: true
---

```{r, echo=FALSE}
library(xtable)
library(plyr)
df <- data.frame(school.data)
subgroup <- df[school.data$School == x,]
```

# Start of attempt 

```{r results='asis', echo=FALSE}
 for(school in unique(subgroup$School))
{
subgroup2 <- subgroup[subgroup$School == school,]
savename <- paste(x, school)
df2<- mean(subgroup2$Score, na.rm=TRUE)
df2 <- data.frame(df2)
print(xtable(df2))
}
```

我也尝试用以下代码替换循环:

```{r results='asis', echo=FALSE}
df2 <- ddply(school.data, .(School), summarise, n = length(School), mean =      
mean(Score), sd = sd(Score))
print(xtable(df2))
```

这给了我我不想要的东西,因为所有学校都获得了每所学校的数据而不仅仅是他们的学校。

1 个答案:

答案 0 :(得分:2)

如果您在将数据传递给.rmd文件之前使用循环来对数据进行子集化,那么您真的不需要plyr或ddply来为您执行split / apply / combine。由于您有很多观察结果,因此可能会产生明显的开销。

此外,如果您在运行.rmd之前创建子组,则也不需要文件内的循环。您只需要使用所需的统计数据制作数据框,然后使用xtable

---
title: "Automated Report Generation for Data"
author: "ME"
date: "February 5, 2015"
output: 
  pdf_document:
    toc: true
    number_sections: true
---

```{r, echo=FALSE}
library(xtable)
library(plyr)
# Create dataset
set.seed(500)
School <- rep(seq(1:20), 2)
District <- rep(c(rep("East", 10), rep("West", 10)), 2)
Score <- rnorm(40, 100, 15)
Student.ID <- sample(1:1000,8,replace=T)
school.data <- data.frame(School, District, Score, Student.ID)


x <- unique(school.data$School)[1]
subgroup <- school.data[school.data$School == x, ]
```

# Start of attempt 

```{r results='asis', echo=FALSE}
options(xtable.comment = FALSE)
## for one school, it is redundant to split based on school but works
## likewise, it is redundant to have a loop here to split based on school
## if you have already used a loop to create the subgroup data 
res <- ddply(subgroup, .(School), summarise,
             n = length(School),
             mean = mean(Score),
             SD = sd(Score),
             VAR = var(Score))
xtable(res)

## if you pass in the entire data frame you will get all schools
## then you can subset the one you want
res <- ddply(school.data, .(School), summarise,
             n = length(School),
             mean = mean(Score),
             SD = sd(Score),
             VAR = var(Score))

xtable(res[res$School %in% x, ])
```

enter image description here