跨列使用gsub

时间:2018-09-18 19:12:28

标签: r gsub

我有一些数据:

testData <- tibble(fname = c("Alice", "Bob", "Charlie", "Dan", "Eric"), 
lname = c("Smith", "West", "CharlieBlack", "DanMcDowell", "Bush"))

一些姓氏与他们的名字串联在一起。

解决并修复lname列的有效方法是什么?

我希望它看起来像这样:

lname = c("Smith", "West", "Black", "McDowell", "Bush")

我可以使用for循环,但是我有50万行数据,所以我想找到一种更有效的方法。

3 个答案:

答案 0 :(得分:2)

We can use str_remove

library(tidyverse)
testData %>%
   mutate(lname = str_remove(lname, fname))
# A tibble: 5 x 2
#  fname   lname   
#  <chr>   <chr>   
#1 Alice   Smith   
#2 Bob     West    
#3 Charlie Black   
#4 Dan     McDowell
#5 Eric    Bush    

答案 1 :(得分:0)

We can use gsub within apply:

apply(testData,1,function(x) gsub(x['fname'],"",x['lname']))

Output:

[1] "Smith"    "West"     "Black"    "McDowell" "Bush"    

答案 2 :(得分:0)

try mutate with an ifelse clause to catch the lname entires that are concatenated, e.g.:

library(dplyr) testData <- testData %>% mutate(lname = ifelse(grepl('[[:upper:]][[:lower:]]+[[:upper:]]', lname), gsub('^[[:upper:]][[:lower:]]+', "", lname), lname))

In this example, you are saying "mutate lname IF the string has an uppercase letter + at least one lowercase letter + an uppercase letter. If that condition is met, replace the first uppercase letter and following lowercase letters with nothing. If that condition is not met, just keep the original lname text".