正则表达式删除除字母以外的所有内容并删除多个空格

时间:2015-04-18 16:56:12

标签: regex r

我试图制作一个正则表达式来删除除了以外的所有内容:

  1. 字母
  2. 撇号' S
  3. 单一空格
  4. 我尝试使用Lookbehind ([^\\p{L} ']+获取额外的空格(?<=\\s)\\s+。每个都是孤立的:

    gsub("(?<=\\s)\\s+", "", "I like 56 dogs that's him55.", perl = TRUE)
    ## [1] "I like 56 dogs that's him55."
    
    gsub("[^\\p{L} ']+", "", "I like 56 dogs that's him55.", perl = TRUE)
    ## [1] "I like  dogs that's him"
    

    但是当我使用或(|)来连接它们时:

    gsub("((?<=\\s)\\s+)|([^\\p{L} ']+)", "", "I like 56 dogs that's him55.", perl = TRUE)
    

    返回:

    [1] "I like  dogs that's him"
    

    我希望删除多个额外空间(喜欢和狗之间),如:

    [1] "I like dogs that's him"
    

    如何使用一个正则表达式删除除字母,撇号和额外空格之外的所有内容?

2 个答案:

答案 0 :(得分:2)

似乎问题来自你的正则表达式中的空间,它将每个数字转换为空格,代码对于我来说工作正常:

gsub("[^\\p{L}']+", " ", "I like 56 dogs that's him55.", perl = TRUE)

答案 1 :(得分:2)

如果您在一次通话中尝试执行此操作,则可以尝试以下操作:

gsub("[^\\pL' ]+\\h+(?=\\h)|\\h+(?=[^\\pL' ]+)|[^\\pL' ]+", "", x, perl=T)
# [1] "I like dogs that's him"

如果你想要更有效的IMO,这是你可以采用的另一种方法。

x <- "I like 56 dogs that's him55."
r <- gsub("[^\\pL' ]+", '', x, perl=T)
paste(strsplit(r, '\\s+')[[1]], collapse = ' ')
# [1] "I like dogs that's him"