使用python3从文本中提取一些字符串之前和之后的数字

时间:2019-03-25 13:41:44

标签: python regex python-3.x

如何在某些特定字符串之前和之后提取字符串?并只提取12位数字以表示滚动编号?

input_file ="my bday is on 04/01/1997 and 
            frnd bday on 28/12/2018, 
            account no is A000142116 and 
            valid for 30 days for me and 
            for my frnd only 4 DAYS.my roll no is 130302101786
            and register number is 1600523941. Admission number is 
            181212001103" 

for line in input_file:
    m1 = re.findall(r"[\d]{1,2}/[\d]{1,2}/[\d]{4}", line)
    m2 = re.findall(r"A(\d+)", line)
    m3 = re.findall(r"(\d+)days", line)
    m4 = re.findall(r"(\d+)DAYS", line)
    m5 = re.findall(r"(\d+)", line)
    m6 = re.findall(r"(\d+)", line)
    m7 = re.findall(r"(\d+)", line)
    for date_n in m1:
       print(date_n)
    for account_no in m2:
       print(account_no)
    for valid_days in m3:
       print(valid_days)
    for frnd_DAYS in m4:
       print(frnd_DAYS)
    for roll_no in m5:
       print(roll_no)
    for register_no in m6:
       print(register_no)
    for admission_no in m7:
       print(admission_no)

预期输出:

04/01/1997
28/12/2018
A000142116
30 days
4 DAYS
130302101786
1600523941
181212001103

2 个答案:

答案 0 :(得分:1)

对它们全部使用一个表达式:

\b[A-Z]?\d[/\d]*\b(?:\s+days)?

请参见a demo on regex101.com
您需要在此处确定“帐号”格式。

答案 1 :(得分:0)

我会在所有可能的匹配项中使用正则表达式模式,并交替显示:

\d{2}/\d{2}/\d{4}|\d+ days|[A-Z0-9]{10,}

这与日期,数字days或帐号匹配。对于帐号,我假设长度为10或更大,仅由字母和数字组成。

input_file = """my bday is on 04/01/1997 and 
                frnd bday on 28/12/2018, 
                account no is A000142116 and 
                valid for 30 days for me and 
                for my frnd only 4 DAYS.my roll no is 130302101786
                and register number is 1600523941. Admission number is 
                181212001103"""

results = re.findall(r'\d{2}/\d{2}/\d{4}|\d+ days|[A-Z0-9]{10,}', input_file, flags=re.IGNORECASE)
print(results)

['04/01/1997', '28/12/2018', 'A000142116', '30 days', '4 DAYS', '130302101786',
 '1600523941', '181212001103']
相关问题