Question

我有一个数据框（d3），其中某些列名称带有“ Date_Month.Year”，我想将这些列名称替换为“ Month.Year”，所以如果有多个列具有相同的“ Month.Year”它们只是汇总列。

下面是我尝试过的代码和输出

library(stringr)

print(colnames(d3))
 #below is output of the print statement
 #[1] "ProductCategoryDesc" "RegionDesc"          "SourceDesc"          "variable"           
 #[5] "2019-02-28_Feb.2019" "2019-03-01_Mar.2019" "2019-03-04_Mar.2019" "2019-03-05_Mar.2019"
 #[9] "2019-03-06_Mar.2019" "2019-03-07_Mar.2019" "2019-03-08_Mar.2019" 

d3 <- d3 %>% mutate(col = str_remove(col, '*._'))

这是我得到的错误：评估错误：参数str应该是字符向量（或可强制转换的对象）。

所以我得到了问题的第一部分答案，我曾经以Month.Year格式获取所有列名称，但是现在我在对具有相同名称的列求和时遇到了问题，因为我查看了{{3} }

colnames(d3) <- gsub('.*_', '', colnames(d3))

下面是我用来获取具有重复名称的求和列的代码，但是使用此代码，不必将求和值放入正确的列中。

indx <- sapply(d3, is.numeric)#check which columns are numeric
nm1 <- which(indx)#get the numeric index of the column
indx2 <- duplicated(names(nm1))|duplicated(names(nm1),fromLast=TRUE)
nm2 <- nm1[indx2]
indx3 <- duplicated(names(nm2))
d3[nm2[!indx3]] <- Map(function(x,y) rowSums(x[y],na.rm = FALSE), 
                        list(d3),split(nm2, names(nm2)))
d3 <- d3[ -nm2[indx3]]

Answer 1

如果要更改列名，则应更改colnames：

colnames(d3) <- gsub('.*_', '', colnames(d3))

请注意，在您的正则表达式中，量词（即*）紧随其后。因此应该是.*_而不是*._

一个示例，其中我们删除.中iris之前的文本：

colnames(iris)
[1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"     

# In regex, . means any character, so to match an actual '.',
#   we need to 'escape' it with \\.
colnames(iris) <- gsub('.*\\.', '', colnames(iris))

colnames(iris)
[1] "Length"  "Width"   "Length"  "Width"   "Species"

Answer 2

colnames(d3) <- sapply(colnames(d3), function(colname){
    return( str_remove(colname, '.*_') )
})

正则表达式应为“。* _”以匹配所需的大小写

用“ _”的字符串右边替换列名

2 个答案: