Question

我正在尝试根据以特定字符串开头的列名来对数据帧进行子集化。我有一些列，如ABC_1 ABC_2 ABC_3，有些像ABC_XYZ_1，ABC_XYZ_2，ABC_XYZ_3

如何对我的数据框进行子集，使其仅包含ABC_1，ABC_2，ABC_3 ... ABC_n列而不包含ABC_XYZ_1，ABC_XYZ_2 ......？

我试过这个选项

set.seed(1)
df <- data.frame( ABC_1 = sample(0:1,3,repl = TRUE),
            ABC_2 = sample(0:1,3,repl = TRUE),
            ABC_XYZ_1 = sample(0:1,3,repl = TRUE),
            ABC_XYZ_2 = sample(0:1,3,repl = TRUE) )


df1 <- df[ , grepl( "ABC" , names( df ) ) ]

ind <- apply( df1 , 1 , function(x) any( x > 0 ) )

df1[ ind , ]

但这给了我两个列名称ABC_1 ... ABC_n ...和ABC_XYZ_1 ... ABC_XYZ_n ...我对ABC_XYZ_1列不感兴趣，只有ABC_1的列，....任何建议都很多赞赏。

Answer 1

指定＆＃34; ABC _＆＃34;后跟一个或多个数字（即\\d+或[0-9]+），您可以使用

df1 <- df[ , grepl("ABC_\\d+", names( df ), perl = TRUE ) ]
# df1 <- df[ , grepl("ABC_[0-9]+", names( df ), perl = TRUE ) ] # another option

强制列名以＆＃34; ABC _＆＃34;开头。您可以将^添加到正则表达式，以便仅在＆＃34; ABC_ \ d +＆＃34;时匹配发生在字符串的开头，而不是发生在字符串的任何地方。

df1 <- df[ , grepl("^ABC_\\d+", names( df ), perl = TRUE ) ]

如果dplyr更符合您的喜好，您可以尝试

library(dplyr)
select(df, matches("^ABC_\\d+"))

Answer 2

另一个直截了当的解决方案是使用substr：

df1 <- df[,substr(names(df),5,7) != 'XYZ']

具有特定字符串的子集列名称

2 个答案: