提取与特定短语相关联的数字

时间:2018-01-18 05:23:52

标签: python regex nlp

我想用Python中的组名和相关数字提取。

示例输入:

34 patients have admitted in hospital and distributed in Pune group with 20 patients, Mumbai group with 10 patients and Nagpur group with 4 patients.

示例输出:

'Pune group, 20'
'Mumbai group, 10'
'Nagpur group, 4'

1 个答案:

答案 0 :(得分:2)

您可以尝试:

\b(\S+)\s+group\s+with\s+(\d+)\s+patients

在上面的正则表达式中,你得到pune作为组1,患者计数为组2

Demo

示例来源(run here):

import re
regex = r"\s+(\S+)\s+group\s+with\s+(\d+)\s+patients"

test_str = "34 patients have admitted in hospital and distributed in Pune group with 20 patients, Mumbai group with 10 patients and Nagpur group with 4 patients."
matches = re.finditer(regex, test_str, re.DOTALL | re.IGNORECASE)


for match in matches:
    print(match.group(1)+" group, "+match.group(2));