如何删除缺少括号的引文部分

时间:2019-02-19 12:51:27

标签: r regex

数据

mystring1 <- "Other work has shown that, in addition to language-general features such as a decreased speaking rate and an expanded pitch range, clear speech production involves the enhancement of the acoustic-phonetic distance between phonologically contrastive categories e.g., Ferguson and Kewley-Port, 2002; Krause and Braida, 2004, Picheny et al, 1986; Smiljanic and Bradlow, 2005, 2007."

mystring2 <- "Other work has shown that, in addition to language-general features such as a decreased speaking rate and an expanded pitch range, clear speech production involves the enhancement of the acoustic-phonetic distance between phonologically contrastive categories e.g., Ferguson and Kewley-Port, 2002; Krause and Braida, 2004, Picheny et al, 1986; Smiljanic and Bradlow, 2005, 2007. Therefore, reduced sensitivity to any or all of the language-specific acoustic-phonetic dimensions of contrast and clear speech enhancement would yield a diminished clear speech benefit for non-native listeners. This may appear somewhat surprising given that clear speech production was elicited in our studies by instructing the talkers to speak clearly for the sake of listeners with either a hearing impairment or from a different native language background. However, as discussed further in Bradlow and Bent 2002, the limits of clear speech as a means of enhancing non-native speech perception likely reflect the “mistuning” that characterizes spoken language communication between native and non-native speakers."

我想得到一些有关正则表达式的帮助。我得到了一些文本数据。基本上,我想删除出现在句子中最后一个单词和句点之间的引文部分。但是,括号以某种方式丢失了。 mystring1就是一个例子。在此示例中,我要删除e.g., Ferguson and Kewley-Port, 2002; Krause and Braida, 2004, Picheny et al, 1986; Smiljanic and Bradlow, 2005, 2007。但是此句子只是段落中的句子之一。 mystring2mystring1之后又包含三个句子。我的目标是从mystring2中删除引用部分。但是我没有成功。模式正在删除比我想要的更多的文本。如何修改正则表达式模式?谢谢您的提前帮助。

# This works for mystring1.
gsub(x = mystring1, pattern = "e\\.g\\.,.*[0-9]{4}(?=.)", replacement = "", perl = T)

[1] "Other work has shown that, in addition to language-general features such as a 
     decreased speaking rate and an expanded pitch range, clear speech production involves
     the enhancement of the acoustic-phonetic distance between phonologically contrastive
     categories ."

# But this pattern does not work for mystring2; gsub() removes texts more than I want.
gsub(x = mystring2, pattern = "e\\.g\\.,.*[0-9]{4}(?=.)", replacement = "", perl = T)

[1] "Other work has shown that, in addition to language-general features such as a decreased
     speaking rate and an expanded pitch range, clear speech production involves the
     enhancement of the acoustic-phonetic distance between phonologically contrastive
     categories , the limits of clear speech ... (I trimmed texts here) speakers."

1 个答案:

答案 0 :(得分:2)

我建议使用

\be\.g\.,.*?[0-9]{4}[^\w.]*(?=\.)

请参见regex demo

详细信息

  • \be\.g\.-整个单词e.g.\b是单词边界)
  • ,-逗号
  • .*?-除换行符以外的任何0+字符(在模式开头也添加(?s)使其与换行符匹配)
  • [0-9]{4}-四位数
  • [^\w.]*-除单词和点以外的0+个字符
  • (?=\.)-(与位置相匹配的正向超前).必须紧邻当前位置的右边。

R demo

rx <- "\\be\\.g\\.,.*?[0-9]{4}[^\\w.]*(?=\\.)"
gsub(x = mystring1, pattern = rx, replacement = "", perl = TRUE)
## => [1] "Other work has shown that, in addition to language-general features such as a decreased speaking rate and an expanded pitch range, clear speech production involves the enhancement of the acoustic-phonetic distance between phonologically contrastive categories ."
gsub(x = mystring2, pattern = rx, replacement = "", perl = TRUE)
## => [1] "Other work has shown that, in addition to language-general features such as a decreased speaking rate and an expanded pitch range, clear speech production involves the enhancement of the acoustic-phonetic distance between phonologically contrastive categories . Therefore, reduced sensitivity to any or all of the language-specific acoustic-phonetic dimensions of contrast and clear speech enhancement would yield a diminished clear speech benefit for non-native listeners. This may appear somewhat surprising given that clear speech production was elicited in our studies by instructing the talkers to speak clearly for the sake of listeners with either a hearing impairment or from a different native language background. However, as discussed further in Bradlow and Bent 2002, the limits of clear speech as a means of enhancing non-native speech perception likely reflect the “mistuning” that characterizes spoken language communication between native and non-native speakers."