Question

我有一个包含很多单词的字符变量。例如......

    words   
1   funnel  
2   funnels
3   sprout
4   sprouts
5   sprouts.
6   chicken
7   chicken)
8   chicken(2)

许多字词都是相同的，只是末尾有s或符号（)，.）作为类型

我想找到彼此复数/单数的单词，所以我可以从最后删除s并保留只有奇异值。

我还想删除最后拼写错误的所有符号。例如，
*删除chicken)，因为它不是一个平衡的parathesis *但保留chicken(2)

我目前的尝试是

# Find words that end in `s`
grep("s$", df$words, ignore.case = TRUE, value = T)
# Remove the `s` from the end of words
df$words <- gsub("s$", "", df$words, ignore.case = T)
# Remove any typos with symbols at the the end of a word
gsub("[^A-z|0-9]|$", "", df$words)

我的最终代码还包含chicken(2)等字词，我不想编辑。

这显示了许多复数单词（以s结尾的单词），但我不知道是否有单数版本（没有s的同一个单词）。
如何找到以语法符号/标点符号结尾的单词标记拼写错误并将其删除？（即(，.，!）。即删除不平衡的括号，例如chicken)，但不删除chicken(2)

例如......

    words   
1   funnel  
2   funnel
3   sprout
4   sprout
5   sprout
6   chicken
7   chicken
8   chicken(2)

Answer 1

str_replace_all包中的stringr函数将连续将模式和替换应用于字符串向量。你可以试试

library(stringr)
str_replace_all(words, c("[^[:alnum:]]$" = "",  "s$" = "", "(\\(\\d*)" = "\\1\\)" ))

在R

1 个答案: