Question

我有一个数据框，需要根据正则表达式搜索将其分为几个数据框。搜索没有设置模式，即有时只有一个正则表达式，有时是多个正则表达式的组合。这是一个最小的示例，仅提取了一组行：

Name <- c("John", "Jane", "Arthur", "Maggie")
Age <- c(20, 30, 31, 33)
City <- c("London", "Paris", "New York", "Delhi")

main_df <- data.frame(Name, Age, City)

sub_df <- main_df %>% 
  filter(grepl("J", Name))

main_df <- main_df %>% 
  filter(!grepl("J", Name))

请注意，我正在将一些行提取到新的数据框中，然后从主数据框中删除提取的行。

我正在寻找一个单行命令来执行此操作。非常感谢您的帮助，尤其是在使用dplyr的情况下。

Answer 1

我们可以编写类似的功能

split_df <- function(df, char) {
  split(df, grepl(char, df$Name))
}

new_df <- split_df(main_df, "J")

new_df[[1]]
#    Name Age     City
#3 Arthur  31 New York
#4 Maggie  33    Delhi

new_df[[2]]
#  Name Age   City
#1 John  20 London
#2 Jane  30  Paris

请确保使用适当的字符代替char进行分割。您还可以对char使用正则表达式，例如^J（以J开头）或J$（以J结尾）等等。

例如

new_df <- split_df(main_df, "^J")

将提供与上述相同的输出。

Answer 2

我认为以下内容将使您能够基于多种条件从原始df中提取行，并根据要求使用dplyr从原始Name <- c("John", "Jane", "Arthur", "Maggie") Age <- c(20, 30, 31, 33) City <- c("London", "Paris", "New York", "Delhi") main_df <- data.frame(Name, Age, City, stringsAsFactors = F) conditions <- c(grepl("J",main_df$Name)) # works with several conditions as well extractanddelete <- function(x, conditions) { condf <- data.frame(conditions) #fullcondition <- sapply(conditions, all) newdfs.list <- lapply(1:ncol(condf), function(i) x %>% filter(condf[,i])) newmain <<- x notcondf <- !condf sapply(1:ncol(condf), function(i) newmain <<- newmain %>% filter(notcondf[,i])) return(newdfs.list) } ndflist <- extractanddelete(main_df, conditions) newmain ndflist > newmain Name Age City 1 Arthur 31 New York 2 Maggie 33 Delhi > ndflist [[1]] Name Age City 1 John 20 London 2 Jane 30 Paris中删除行。

list

您收到的main_df <- newmain包含的元素数量与用于过滤和删除的条件一样多。

为完整起见，您可以执行grepl

此解决方案还可以用于除--api/ ---src/ ---pom.xml [api] --core/ ---src/ ---pom.xml [core] --web/ ---src/ ---pom.xml [web] --pom.xml [main]以外的其他条件

Answer 3

我通过mapply()函数实现了这一功能，该函数将函数assign()应用于多个list（vector）参数。

注意： pos = 1是必需的

mapply(FUN = assign, x = c("main_df", "sub_df"),
                     value = split(main_df, grepl("J", main_df$Name)),
                     pos = 1)

main_df

#     Name Age     City
# 3 Arthur  31 New York
# 4 Maggie  33    Delhi

sub_df

#   Name Age   City
# 1 John  20 London
# 2 Jane  30  Paris

将一个数据帧拆分为几个数据帧

3 个答案: