获取匹配字符串?

时间:2017-01-10 06:44:10

标签: r

我有两个数据框,一个有产品名称&另一个有类别..现在我需要将类别与产品名称相匹配。如果字符串匹配,则为各个名称分配相应的类别。

因此,包含产品名称(Product_Name.csv)的第一个数据框是:

           **Product.Name**
       Black Printed Blouse
Silver Embellished Crop Top
   Maroon Solid Strappy Top

包含类别(Category.csv)的其他数据框是:

**Category**
     Strappy
      Blouse
        Crop 

最终输出应该是:

       Black Printed Blouse       Blouse
Silver Embellished Crop Top         Crop
   Maroon Solid Strappy Top      Strappy

现在,我正在使用grepl,它给出了真或假

product <- read.csv("Product_Name.csv", header = T, sep = ",")
category <- read.csv("Category.csv", header = T, sep = ",")


for (i in 1:nrow(product)){

product[i, 2] <- grepl(Category$Category[1], product$Product.Name[i], ignore.case = TRUE)
product[i, 3] <- grepl(Category$Category[2], product$Product.Name[i], ignore.case = TRUE)
product[i, 4] <- grepl(Category$Category[3], product$Product.Name[i], ignore.case = TRUE)


}

2 个答案:

答案 0 :(得分:1)

我们可以使用str_extract

library(stringr)
product$Category <- str_extract(product$Product.Name, paste(category$Category, collapse="|"))
product
#                 Product.Name Category
#1        Black Printed Blouse   Blouse
#2 Silver Embellished Crop Top     Crop
#3    Maroon Solid Strappy Top  Strappy

答案 1 :(得分:0)

使用base - R

indices = sapply(category$Category, function(x) which(grepl(x, product$Product.Name)))

product$new_col = 1:nrow(product)
product$new_col[indices] = names(indices)
#> df
#            X..Product.Name.. new_col
#1        Black Printed Blouse  Blouse
#2 Silver Embellished Crop Top    Crop
#3    Maroon Solid Strappy Top Strappy
# incase of any no-match cases(which we need to handle well)
# below code manages both well (a generalised version)

category$Category[2] = "Bloiuse"

indices = sapply(category$Category, function(x) which(grepl(x, product$Product.Name)))
indices.loc <- as.numeric(indices)
indices.name <- names(indices)

product$new_col[indices.loc[!is.na(indices.loc)]] = indices.name[!is.na(indices.loc)]

#> product
#                 Product.Name new_col
#1        Black Printed Blouse    <NA>
#2 Silver Embellished Crop Top    Crop
#3    Maroon Solid Strappy Top Strappy