什么是R等价的SQL" LIKE'%searching_word%'"?

时间:2014-10-11 11:50:24

标签: sql r

如何使用R检查数据集中字段文本中是否包含特定单词。

在SQL中,我们可以使用LIKE比较运算符。例如,

SELECT * FROM schools WHERE name LIKE '%Public School%'

如果我必须在R中做同样的事情,我该怎么做?

4 个答案:

答案 0 :(得分:7)

鉴于

schools <- data.frame(rank = 1:20, 
                 name = rep(c("X Public School", "Y Private School"), 10))

试试这个:

subset(schools, grepl("Public School", name))

或者这个:

schools[ grep("Public School", schools$name), ]

或者这个:

library(sqldf)
sqldf("SELECT * FROM schools WHERE name LIKE '%Public School%'")

或者这个:

library(data.table)
data.table(schools)[ grep("Public School", name) ]

或者这个:

library(dplyr)
schools %>% filter(grepl("Public School", name))

答案 1 :(得分:0)

在Base R中,可以使用%in%来分组数据,例如dataframe [dataframe $ variable%in%dataframe2 $ variable2]

答案 2 :(得分:0)

qdap包有agrep的便捷包装,可让您搜索数据框或特定字段中的所有字段:

schools <- data.frame(
    rank = 1:20, 
    schools = rep(c("X Public School", "Y Private School"), 10)
)


library(qdap)
Search(schools, "Public School", "schools")

##    rank         schools
## 1     1 X Public School
## 3     3 X Public School
## 5     5 X Public School
## 7     7 X Public School
## 9     9 X Public School
## 11   11 X Public School
## 13   13 X Public School
## 15   15 X Public School
## 17   17 X Public School
## 19   19 X Public School

答案 3 :(得分:0)

我认为以下内容可能会以简单的方式回答这个问题。

它合并了%in%和%like%function

'%inlike%'<-function(namevec1,namevec2){
  temp1<-strsplit(namevec1," ")
  temp2<-strsplit(namevec2," ")
  ifelse(is.na(charmatch(temp1,temp2)),F,T)
}

namevec1<-c("ffd","ff","hello_world")
namevec2<-c("ffde","ff ","hello_wor")
  

namevec1%inlike%namevec2

[1] TRUE TRUE FALSE

  

namevec2%inlike%namevec1

[1] FALSE TRUE

(请注意空格差异)