在特定单词

时间:2016-12-11 17:44:59

标签: regex python-3.5

我有几篇关于恐怖袭击的文章,其中包括有关遇难和受伤人数的信息。我试图提取有关受伤人员的数字。

这是要定位的句子样本:

at least 22 others were wounded
additional 20 soldiers were wounded
more than 40 people had been wounded
wounding at least six people
injuring at least 60 others
wounding more than 25
27 others were wounded 
wounding 14
wounding 33
185 people were wounded
28 people wounded

正如你所看到的那样,受伤,受伤,伤害这个词要么在我想要提取的数字之前或之后,通常在与该数字相距3或4个单词的范围内。

在此链接中,您可以找到我试图应用但未成功的文章示例和regualr表达式: [正则表达式](https://regex101.com/r/0DRayP/10

1 个答案:

答案 0 :(得分:1)

您需要使用捕获组来进入所需匹配的组,例如:

(\d+)?.*?(wound(?:ed|ing)|injured).*?(\d+)

您对团体$ 1,$ 2和$ 3

感兴趣

以下是一个例子:

<强> Online Demo

相关问题