有条件地将字符串连接到多行

时间:2019-01-06 15:53:20

标签: r string string-concatenation

我从PDF中提取了多个表,其中包含多行字符串。我已经使用了Tabulizer包中的extract_table()函数,唯一的问题是字符串导入为单独的行。

例如

action <- c(1, NA, NA, 2, NA, 3, NA, NA, NA, 4, NA)

description <- c("a", "b", "c", "a", "b", "a", "b", "c", "d", "a", "b")

data.frame(action, description)

       action description
1       1           a
2      NA           b
3      NA           c
4       2           a
5      NA           b
6       3           a
7      NA           b
8      NA           c
9      NA           d
10      4           a
11     NA           b

我想将字符串连接起来,以便它们显示为相同的元素,例如:

  action description
1      1       a b c
2      2         a b
3      3     a b c d
4      4         a b

希望如此,感谢您的帮助!

4 个答案:

答案 0 :(得分:3)

tidyverse的方式是使用先前的非NA值fill action列,然后是group_by Actionpaste { 1}}。

description

答案 1 :(得分:1)

一个base R选项

dat <- data.frame(action, description)
aggregate(
  description ~ action,
  transform(dat, action = cumsum(!is.na(dat$action))),
  FUN = paste,
  ... = collapse = " "
)
#  action description
#1      1       a b c
#2      2         a b
#3      3     a b c d
#4      4         a b

要使aggregate工作,我们需要将action更改为cumsum(!is.na(dat$action)))返回的值,即

cumsum(!is.na(dat$action)))
#[1] 1 1 1 2 2 3 3 3 3 4 4

答案 2 :(得分:1)

这是data.table

的一个选项
library(data.table)
setDT(df1)[, .(description = paste(description, collapse = ' ')), 
                  .(action = cumsum(!is.na(action)))]
#   action description
#1:      1       a b c
#2:      2         a b
#3:      3     a b c d
#4:      4         a b

或使用na.locf中的zoo

library(zoo)
setDT(df1)[, .(description = paste(description, collapse = ' ')),
              .(action = na.locf(action))]

数据

df1 <- data.frame(action, description)

答案 3 :(得分:0)

您可以像这样使用zoodplyr软件包

library(zoo)
library(dplyr)
action <- c(1, NA, NA, 2, NA, 3, NA, NA, NA, 4, NA)
description <- c("a", "b", "c", "a", "b", "a", "b", "c", "d", "a", "b")
df = data.frame(action, description)
df$action = na.locf(df$action)
df = 
    df %>% 
    group_by(action) %>% 
    summarise(description = paste(description, collapse = ' '))
相关问题