如何创建包含由正则表达式构建的三个部分的列表?

时间:2019-04-05 01:25:40

标签: r regex

所以我在这里有一个文本,我想通过使用R的正则表达式将文本分为三部分:harper_presenter,harper_time和harper_text。

文本为:HARPER'S [第1天,上午9:00]:当计算机还很小的时候,黑客一词就用来描述聪明的学生的工作,他们探索并扩展了这项新技术的应用范围。甚至有人在谈论“黑客道德”。在随后的几年中,这个词以某种暗含的含义出现,暗示了犯罪分子的行为。黑客道德是什么,它能生存吗?”

HARPER's将是harper_presenter,[第1天,上午9:00]将是harper_time,其余为harper_text。

如果我们不使用确切的单词进行过滤,那将是最好的选择。

实际结果将是一个列表。

1 个答案:

答案 0 :(得分:1)

如果您想使用正则表达式来执行此操作,则可以使用stringr::str_extract_all;

text <- "HARPER'S [Day 1, 9:00 A.M.]: When the computer was young, the word hacking was used to describe the work of brilliant students who explored and expanded the uses to which this new technology might be employed. There was even talk of a \"hacker ethic.\" Somehow, in the succeeding years, the word has taken on dark connotations, suggestion the actions of a criminal. What is the hacker ethic, and does it survive?"

stringr::str_extract_all(text, "^([A-Z]+'*[A-Z]*)|(\\[.*\\])|(:.*)")
[[1]]
[1] "HARPER'S"                                                                                                                                                                                                                                                                                                                                                                                         
[2] "[Day 1, 9:00 A.M.]"                                                                                                                                                                                                                                                                                                                                                                               
[3] ": When the computer was young, the word hacking was used to describe the work of brilliant students who explored and expanded the uses to which this new technology might be employed. There was even talk of a \"hacker ethic.\" Somehow, in the succeeding years, the word has taken on dark connotations, suggestion the actions of a criminal. What is the hacker ethic, and does it survive?"

^([A-Z]+'*[A-Z]*)|(\\[.*\\])|(:.*)可以分为3个部分,由“或” |运算符分隔。

第一个([A-Z]+'*[A-Z]*)说要查找一组一个或多个大写字母,后跟0个或多个',然后是0个或多个大写字母。 ^指定这必须是一行的开始。

第二个(\\[.*\\])说,寻找一个包含0个或多个(.)并用方括号括起来的组。

第三个(:.*)表示查找:,后跟0个或多个(.