从python中的字符串中提取日期

时间:2017-07-11 06:12:27

标签: python regex python-2.7 date nlp

我有一个字符串

 fmt_string2 = I want to apply for leaves from 12/12/2017 to 12/18/2017

在这里,我想提取以下日期。但我的代码需要很强大,因为它可以是2017年1月12日或1月12日的任何格式,其位置也可以改变。 对于上面的代码,我尝试过:

''.join(fmt_string2.split()[-1].split('.')[::-10])

但是我在这里给出了约会的位置。我不想要的。 任何人都可以帮助制作一个强大的代码来提取日期。

2 个答案:

答案 0 :(得分:5)

如果auto res = map1.insert(std::pair<std::string, std::string>(key, value)); std::cout << std::boolalpha; std::cout << "Success? " << res.second << '\n'; // Success? true // try again (and fail) auto res = map1.insert(std::pair<std::string, std::string>(key, value)); std::cout << "Success? " << res.second << '\n'; // Success? false 12/12/201712 January 2017是唯一可能的模式,则以下使用正则表达式的代码就足够了。

12 Jan 17

输出:

import re

string = 'I want to apply for leaves from 12/12/2017 to 12/18/2017 I want to apply for leaves from 12 January 2017 to ' \
       '12/18/2017 I want to apply for leaves from 12/12/2017 to 12 Jan 17 '

matches = re.findall('(\d{2}[\/ ](\d{2}|January|Jan|February|Feb|March|Mar|April|Apr|May|May|June|Jun|July|Jul|August|Aug|September|Sep|October|Oct|November|Nov|December|Dec)[\/ ]\d{2,4})', string)

for match in matches:
    print(match[0])

要了解正则表达式hare in regex101

答案 1 :(得分:3)

使用正则表达式

我建议使用以下方法,而不是完全通过正则表达式:

import re
from dateutil.parser import parse

示例文字

text = """
I want to apply for leaves from 12/12/2017 to 12/18/2017
then later from 12 January 2018 to 18 January 2018
then lastly from 12 Feb 2018 to 18 Feb 2018
"""

正则表达式,用于查找“从A到B”形式的任何内容。这里的优点是我不必处理每一个案例并继续构建我的正则表达式。相反,这是动态的。

pattern = re.compile(r'from (.*) to (.*)')    
matches = re.findall(pattern, text)

来自上述正则表达式的文本格式为

[('12/12/2017', '12/18/2017'), ('12 January 2018', '18 January 2018'), ('12 Feb 2018', '18 Feb 2018')]

对于每场比赛我都会解析日期。对于非日期的值,抛出异常,因此除了我们传递的块之外。

for val in matches:
    try:
        dt_from = parse(val[0])
        dt_to = parse(val[1])

        print("Leave applied from", dt_from.strftime('%d/%b/%Y'), "to", dt_to.strftime('%d/%b/%Y'))
    except ValueError:
        print("skipping", val)

输出:

Leave applied from 12/Dec/2017 to 18/Dec/2017
Leave applied from 12/Jan/2018 to 18/Jan/2018
Leave applied from 12/Feb/2018 to 18/Feb/2018

使用pyparsing

使用正则表达式有一个限制,它可能最终变得非常复杂,以使其更加动态,可以处理不那么简单的输入。

text = """
I want to apply for leaves from start 12/12/2017 to end date 12/18/2017 some random text
then later from 12 January 2018 to 18 January 2018 some random text
then lastly from 12 Feb 2018 to 18 Feb 2018 some random text
"""

所以,Pyton的pyparsing模块最适合这里。

import pyparsing as pp

这里的方法是创建一个可以解析整个文本的字典。

为可用作pyparsing关键字的月份名称创建关键字

months_list= []
for month_idx in range(1, 13):
    months_list.append(calendar.month_name[month_idx])
    months_list.append(calendar.month_abbr[month_idx])

# join the list to use it as pyparsing keyword
month_keywords = " ".join(months_list)

解析词典:

# date separator - can be one of '/', '.', or ' '
separator = pp.Word("/. ")

# Dictionary for numeric date e.g. 12/12/2018
numeric_date = pp.Combine(pp.Word(pp.nums, max=2) + separator + pp.Word(pp.nums, max=2) + separator + pp.Word(pp.nums, max=4))

# Dictionary for text date e.g. 12/Jan/2018
text_date = pp.Combine(pp.Word(pp.nums, max=2) + separator + pp.oneOf(month_keywords) + separator + pp.Word(pp.nums, max=4))

# Either numeric or text date
date_pattern = numeric_date | text_date

# Final dictionary - from x to y
pattern = pp.Suppress(pp.SkipTo("from") + pp.Word("from") + pp.Optional("start") + pp.Optional("date")) + date_pattern
pattern += pp.Suppress(pp.Word("to") + pp.Optional("end") + pp.Optional("date")) + date_pattern

# Group the pattern, also it can be multiple
pattern = pp.OneOrMore(pp.Group(pattern))

解析输入文本:

result = pattern.parseString(text)

# Print result
for match in result:
    print("from", match[0], "to", match[1])

输出:

from 12/12/2017 to 12/18/2017
from 12 January 2018 to 18 January 2018
from 12 Feb 2018 to 18 Feb 2018
相关问题