如何用R提取点和括号之间的句子?

时间:2019-03-01 11:02:12

标签: r regex

我有:

  

Stringa =“这与研究人员专门创建的基本数据不同,它反映了更高阶和更抽象的概念(Lee,1991; Walsham,1995)。鉴于大数据与研究收集的数据之间的主要区别,令人惊讶的是,关于使用大数据如何改变理论上的信息系统研究实践的讨论很少,一些学者指出,鉴于大数据集,先进的算法和强大的功能,查询的本质可能会改变。计算能力可以在没有人工干预的情况下发起和修正问题(Agarwal&Dhar,2014年)。其他评论者认为,科学方法可能会过时,例如“大量数据的可用性以及用于处理这些问题的统计工具”。数字……科学甚至可以在没有连贯的模型,统一的理论或根本没有任何机械解释的情况下得以发展”(安德森,2008年)。也许“科学家不再需要进行有根据的猜测,构建假设和模型,并在基于数据的实验和示例中对其进行测试。取而代之的是,它们可以消除用于揭示效应的模式的完整数据集,而无需进一步实验即可产生科学结论”(Prensky,2009)。 “

所需的输出:

[1]This is different from primary data created specifically by researchers to reflect concepts that are higher-order and more abstract(Lee,1991;Walsham,1995).
[2]Some scholars have noted that the very nature of inquiry is likely to change, given that large data sets, advanced algorithms, and powerful computing capabilities can initiate and refine questions without human intervention (Agarwal & Dhar, 2014)
[3] Other commentators argue that the scientific method is likely to become obsolete, as with the “availability of huge amounts of data, along with the statistical tools to crunch these numbers … science can advance even without coherent models, unified theories, or really any mechanistic explanation at all” (Anderson, 2008)
[4]Instead, they canmine thecomplete setof data forpatterns that reveal effects, producing scientific conclusions without further experimentation” (Prensky, 2009)

我使用:unlist(str_extract_all(string =Stringa, pattern = "\\. [A-Za-z][^()]+ \\(")) 但这不起作用

我不想提取‘鉴于大数据与研究收集的数据之间的主要差异,令人惊讶的是,关于使用大数据如何改变理论上的信息学研究实践的讨论很少。 ‘和’也许”科学家不再需要进行有根据的猜测,构造假设和模型,也无需在基于数据的实验和示例中对其进行测试。 ‘

2 个答案:

答案 0 :(得分:1)

如果文本中没有缩写,则可以使用

regmatches(Stringa, gregexpr("[^.?!\\s][^.!?]*?\\([^()]*\\)", Stringa, perl=TRUE))
[[1]]
[1] "This is different from primary data created specifically by researchers to reflect concepts that are higher-order and more abstract(Lee,1991;Walsham,1995)"                                                                                                                                                                         
[2] "Some scholars have noted that the very nature of inquiry is likely to change, given that large data sets, advanced algorithms, and powerful computing capabilities can initiate and refine questions without human intervention (Agarwal & Dhar, 2014)"                                                                           
[3] "Other commentators argue that the scientific method is likely to become obsolete, as with the “availability of huge amounts of data, along with the statistical tools to crunch these numbers … science can advance even without coherent models, unified theories, or really any mechanistic explanation at all” (Anderson, 2008)"
[4] "Instead, they canmine thecomplete setof data forpatterns that reveal effects, producing scientificconclusions without further experimentation” (Prensky, 2009)"                                                                                                                                                                    

请参见regex demoR demo

详细信息

  • [^.?!\\s]-除.?!和空格之外的任何字符
  • [^.!?]*?-除.?!以外的任何0+个字符,应尽可能少
  • \([^()]*\)-一个(,除了()以外的0多个字符,然后是)

答案 1 :(得分:0)

我们可以通过以下正则表达式模式使用grepexprregmatches处理此问题:

.*?\([^)]+\).*?(?=\w|$)

这将捕获直到第一个括号之后的所有内容,后接(...)项。下面的脚本将捕获源文本中的所有此类匹配项。

m <- gregexpr(".*?\\([^)]+\\).*?(?=\\w|$)", x, perl=TRUE)
regmatches(x, m)

[[1]]
[1] "This is different from primary data created specifically by researchers to reflect concepts that are higher-order and more abstract(Lee,1991;Walsham,1995)."                                                                                                                                                                                                                                                                                                              
[2] "Given the major differences between big data and research-collected data, it is surprising how little discussion has arisen about how using big data should change the practice of theory-informed IS research. Some scholars have noted that the very nature of inquiry is likely to change, given that large data sets, advanced algorithms, and powerful computing capabilities can initiate and refine questions without human intervention (Agarwal & Dhar, 2014). "
[3] "Other commentators argue that the scientific method is likely to become obsolete, as with the “availability of huge amounts of data, along with the statistical tools to crunch these numbers … science can advance even without coherent models, unified theories, or really any mechanistic explanation at all” (Anderson, 2008). "
[4] "Perhaps “scientists no longer have to make educated guesses, construct hypotheses and models, test them in data-based experiments andexamples. Instead, they canmine thecomplete setof data forpatterns that reveal effects, producing scientificconclusions without further experimentation”(Prensky, 2009). "
相关问题