有条件地拆分单个单元格

时间:2016-03-11 14:18:13

标签: r strsplit stringr

我有data.frame,我想确定来自sample1$domain的哪些单元格" www",将其替换为""strsplit相应的sample1$suffix。数据如下所示:

              domain         suffix
1              wbx2            com
2            redhat            com
3          something           com
4           gstatic            com
5               www googleapis.com
6       smartfilter            com

我设法解决了这个问题,如下所示,但它改变了行的位置(我希望它保持在第5位)并且考虑到它将运行数百万个案例,我不会这样做。认为这是最有效的方法。:

library("stringr")
sample1$domain <- ifelse(sample1$domain == "www", "", sample1$domain)
sample1[sample1$domain == "", c("domain", "suffix")] <- sample1[sample1$domain == "", c("suffix", "domain")]
y <- sample1$domain[sample1$suffix == ""]
z <- as.data.frame(unlist(str_split_fixed(y, "[.]", 2)))
colnames(z) <- c("domain", "suffix")
sample1 <- rbind(sample1, z)
sample1 <- subset(sample1, sample1$suffix != "")
rownames(sample1) <- NULL
sample1 
#             domain suffix
#1              wbx2    com
#2            redhat    com
#3         something    com
#4           gstatic    com
#5       smartfilter    com
#6        googleapis    com

数据

sample1 <- structure(list(domain = c("wbx2", "redhat", "something", 
"gstatic", "www", "smartfilter"), suffix = c("com", "com", "com", 
"com", "googleapis.com", "com")), .Names = c("domain", "suffix"
), row.names = c(NA, 6L), class = "data.frame")

1 个答案:

答案 0 :(得分:1)

我们可以使用"www"为值创建索引。然后使用该索引替换站点名称,最后替换站点后缀:

ind <- sample1$domain == "www"
sample1$domain[ind] <- sub("^(.*)\\..*", "\\1", sample1$suffix[ind])
sample1$suffix[ind] <- sub(".*\\.(.*)", "\\1", sample1$suffix[ind])
sample1
#        domain suffix
# 1        wbx2    com
# 2      redhat    com
# 3   something    com
# 4     gstatic    com
# 5  googleapis    com
# 6 smartfilter    com