从Rmarkdown文件中提取块作为原始文本

时间:2018-03-29 17:26:48

标签: r regex r-markdown

假设我想将rmarkdown块中的所有行提取为原始文本。假设下面的foo对象是readLines("my_rmarkdown_file.Rmd")的输出:

> foo <- c("", "```{r package, include=FALSE}", "library(tidyverse)", "library(lubridate)", 
   "library(kableExtra)", "library(knitr)", "library(sf)", "library(bcmaps)", 
   "```", "", "", "```{r}", "", "## A dataframe ", "pct_flow <- HIST_FLOW %>%", 
   "  group_by(year_day, STATION_NUMBER) %>%", "  mutate(prctile = ecdf(Value)(Value)) %>%", 
   "  mutate(Date_no_year = dmy(paste0(day(Date),\"-\",month(Date),\"-\",year(Sys.Date())))) %>%", 
   "  ungroup()", "")
> 
> foo
 [1] ""                                                                                            
 [2] "```{r package, include=FALSE}"                                                               
 [3] "library(tidyverse)"                                                                          
 [4] "library(lubridate)"                                                                          
 [5] "library(kableExtra)"                                                                         
 [6] "library(knitr)"                                                                              
 [7] "library(sf)"                                                                                 
 [8] "library(bcmaps)"                                                                             
 [9] "```"                                                                                         
[10] ""                                                                                            
[11] ""                                                                                            
[12] "```{r}"                                                                                      
[13] ""                                                                                            
[14] "## A dataframe "                                                                             
[15] "pct_flow <- HIST_FLOW %>%"                                                                   
[16] "  group_by(year_day, STATION_NUMBER) %>%"                                                    
[17] "  mutate(prctile = ecdf(Value)(Value)) %>%"                                                  
[18] "  mutate(Date_no_year = dmy(paste0(day(Date),\"-\",month(Date),\"-\",year(Sys.Date())))) %>%"
[19] "  ungroup()"                                                                                 
[20] "" 

所以我的问题是,如何提取"```{r package, include=FALSE}"和```的下一个实例之间的所有行?所需的输出是:

foo[3:8]
[1] "library(tidyverse)"  "library(lubridate)"  "library(kableExtra)" "library(knitr)"      "library(sf)"         "library(bcmaps)" 

然而,概括一下rmarkdown块的开始是一致的;行号不是。

总结一下,任何想法如何从readLines()输出中提取由常规模式限制的任何行?

1 个答案:

答案 0 :(得分:2)

start = which(foo == "```{r package, include=FALSE}")[1] + 1
end   = which(foo == "```")[1] - 1

foo[start:end]

如果您在开始之前可以有匹配项,那么您也可以获得更具体的结果(不是OP示例的情况):

ends = which(foo == "```")
end = ends[ends > start][1] - 1
相关问题