匹配包含特定字符串的两个单词之间的所有行

时间:2018-07-19 12:01:18

标签: python regex python-3.x icalendar

我需要RegEx的帮助。 我要匹配 BEGIN:VEVENT END:VEVENT 之间的所有行,但前提是这些行之间是字符串 PARTSTAT = DECLINED 。 下面,我放置了3个事件的文本示例(其中两个包含PARTSTAT = DECLINED,其中一个包含PARTSTAT = ACCEPTED)。 我想删除我拒绝的事件。

BEGIN:VEVENT
UID:040000008200E00074C5B7101A82E0080000000090E9AB1DA717D4010000000000000000
 10000000FF519C52170B604C82055C2922E0EA43
RRULE:FREQ=WEEKLY;BYDAY=MO
X-ALT-DESC;FMTTYPE=text/html:<html xmlns:v="urn:schemas-microsoft-com:vml" x
 mlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-micros
 oft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/om
 ml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-T
 ype content="text/html\; charset=iso-8859-2"><meta name=Generator content="M
 icrosoft Word 15 (filtered medium)"><style><!--\n/* Font Definitions */\n@fo
 nt-face\n{font-family:"Cambria Math"\;\npanose-1:2 4 5 3 5 4 6 3 2 4\;}\n@fo
 nt-face\n{font-family:Calibri\;\npanose-1:2 15 5 2 2 2 4 3 2 4\;}\n/* Style 
 Definitions */\np.MsoNormal\, li.MsoNormal\, div.MsoNormal\n{margin:0cm\;\nm
 argin-bottom:.0001pt\;\nfont-size:11.0pt\;\nfont-family:"Calibri"\,sans-seri
 f\;\nmso-fareast-language:EN-US\;}\na:link\, span.MsoHyperlink\n{mso-style-p
 riority:99\;\ncolor:#0563C1\;\ntext-decoration:underline\;}\na:visited\, spa
 n.MsoHyperlinkFollowed\n{mso-style-priority:99\;\ncolor:#954F72\;\ntext-deco
 ration:underline\;}\np.msonormal0\, li.msonormal0\, div.msonormal0\n{mso-sty
 le-name:msonormal\;\nmso-margin-top-alt:auto\;\nmargin-right:0cm\;\nmso-marg
 in-bottom-alt:auto\;\nmargin-left:0cm\;\nfont-size:12.0pt\;\nfont-family:"Ti
 mes New Roman"\,serif\;}\nspan.Stylwiadomocie-mail18\n{mso-style-type:person
 al-compose\;\nfont-family:"Calibri"\,sans-serif\;\ncolor:windowtext\;}\n.Mso
 ChpDefault\n{mso-style-type:export-only\;\nfont-size:10.0pt\;}\n@page WordSe
 <o:p></o:p></p></div></body></html>
LOCATION:sala_3.11@test.com
ATTENDEE;CN=sala_3.11@test.com;PARTSTAT=DECLINED:mailto:sala_3.11@test.com
ATTENDEE;CN=Name Surname
PRIORITY:5
X-MICROSOFT-CDO-BUSYSTATUS:TENTATIVE
X-MICROSOFT-CDO-IMPORTANCE:1
X-MS-OLK-AUTOSTARTCHECK:FALSE
X-MS-OLK-CONFTYPE:0
SUMMARY:None
DTSTART;TZID="Europe/UK":19980615T110000
DTEND;TZID="Europe/UK":19980615T113000
STATUS:CONFIRMED
CLASS:PUBLIC
X-MICROSOFT-CDO-INTENDEDSTATUS:BUSY
TRANSP:OPAQUE
LAST-MODIFIED:20180709T150603Z
DTSTAMP:20180709T150602Z
SEQUENCE:0
BEGIN:VALARM
ACTION:DISPLAY
TRIGGER;RELATED=START:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
UID:040000008200E00074C5B7101A82E0080000000090D3C2088E0DD4010000000000000000
 1000000079086417F9C0F9478C1916D1A1E58267
X-ALT-DESC;FMTTYPE=text/html:<html xmlns:v="urn:schemas-microsoft-com:vml" x
 mlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-micros
 oft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/om
 ml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-T
 ype content="text/html\; charset=us-ascii"><meta name=Generator content="Mic
 rosoft Word 15 (filtered medium)"><style><!--\n/* Font Definitions */\n@font
 -face\n{font-family:"Cambria Math"\;\npanose-1:2 4 5 3 5 4 6 3 2 4\;}\n@font
 -face\n{font-family:Calibri\;\npanose-1:2 15 5 2 2 2 4 3 2 4\;}\n/* Style De
 finitions */\np.MsoNormal\, li.MsoNormal\, div.MsoNormal\n{margin:0cm\;\nmar
 gin-bottom:.0001pt\;\nfont-size:11.0pt\;\nfont-family:"Calibri"\,sans-serif\
 ;\nmso-fareast-language:EN-US\;}\na:link\, span.MsoHyperlink\n{mso-style-pri
 ority:99\;\ncolor:#0563C1\;\ntext-decoration:underline\;}\na:visited\, span.
 MsoHyperlinkFollowed\n{mso-style-priority:99\;\ncolor:#954F72\;\ntext-decora
 tion:underline\;}\np.msonormal0\, li.msonormal0\, div.msonormal0\n{mso-style
 nk="#954F72"><div class=WordSection1><p class=MsoNormal><o:p>&nbsp\;</o:p></
 p></div></body></html>
LOCATION:sala_3.11@test.com
ATTENDEE;CN=sala_3.11@test.com;PARTSTAT=ACCEPTED:mailto:sala_3.11@test.com
PRIORITY:5
X-MICROSOFT-CDO-BUSYSTATUS:TENTATIVE
X-MICROSOFT-CDO-IMPORTANCE:1
X-MS-OLK-AUTOSTARTCHECK:FALSE
X-MS-OLK-CONFTYPE:0
SUMMARY:None
DTSTART;TZID="Europe/UK":20180628T103000
DTEND;TZID="Europe/UK":20180628T140000
STATUS:CONFIRMED
CLASS:PUBLIC
X-MICROSOFT-CDO-INTENDEDSTATUS:BUSY
TRANSP:OPAQUE
LAST-MODIFIED:20180626T184118Z
DTSTAMP:20180626T184118Z
SEQUENCE:0
BEGIN:VALARM
ACTION:DISPLAY
TRIGGER;RELATED=START:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
UID:040000008200E00074C5B7101A82E008000000008030BEAE1C0ED4010000000000000000
 100000008AEEB06CBD136945961F46812BD0D171
X-ALT-DESC;FMTTYPE=text/html:<html xmlns:v="urn:schemas-microsoft-com:vml" x
 mlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-micros
 oft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/om
 ml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-T
 ype content="text/html\; charset=windows-1250"><meta name=Generator content=
 "Microsoft Word 15 (filtered medium)"><style><!--\n/* Font Definitions */\n@
 font-face\n{font-family:"Cambria Math"\;\npanose-1:2 4 5 3 5 4 6 3 2 4\;}\n@
 font-face\n{font-family:Calibri\;\npanose-1:2 15 5 2 2 2 4 3 2 4\;}\n/* Styl
 e Definitions */\np.MsoNormal\, li.MsoNormal\, div.MsoNormal\n{margin:0cm\;\
 nmargin-bottom:.0001pt\;\nfont-size:11.0pt\;\nfont-family:"Calibri"\,sans-se
 rif\;\nmso-fareast-language:EN-US\;}\na:link\, span.MsoHyperlink\n{mso-style
 -priority:99\;\ncolor:#0563C1\;\ntext-decoration:underline\;}\na:visited\, s
 pan.MsoHyperlinkFollowed\n{mso-style-priority:99\;\ncolor:#954F72\;\ntext-de
 <o:p></o:p></p></div></body></html>
LOCATION:Sala 3.11
ATTENDEE;CN=Sala kon 3.11;PARTSTAT=DECLINED:mailto:sala_3.1
 1@test.com
PRIORITY:5
X-MICROSOFT-CDO-BUSYSTATUS:TENTATIVE
X-MICROSOFT-CDO-IMPORTANCE:1
X-MS-OLK-AUTOSTARTCHECK:FALSE
X-MS-OLK-CONFTYPE:0
SUMMARY:None
DTSTART;TZID="Europe/UK":19980615T110000
DTEND;TZID="Europe/UK":19980615T113000
STATUS:CONFIRMED
CLASS:PUBLIC
X-MICROSOFT-CDO-INTENDEDSTATUS:BUSY
TRANSP:OPAQUE
LAST-MODIFIED:20180627T114346Z
DTSTAMP:20180627T114346Z
SEQUENCE:0
BEGIN:VALARM
ACTION:DISPLAY
TRIGGER;RELATED=START:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
UID:040000008200E00074C5B7101A82E008000000008077B51A0819D4010000000000000000
 1000000027DB863B9FBE90468D3B3F888327EF15

2 个答案:

答案 0 :(得分:0)

由于目标是删除包含PARTSTAT=DECLINED的条目,因此以下操作将通过仅保留带有PARTSTAT=ACCEPTED的条目来实现:

import re
print([m for m, s in re.findall(r'\b(BEGIN:VEVENT\b.*?\bPARTSTAT=(ACCEPTED|DECLINED)\b.*?\bEND:VEVENT)\b', data, re.DOTALL) if s == 'ACCEPTED'])

例如,给定:

data = '''BEGIN:VEVENT SOME TEXT PARTSTAT=DECLINED END:VEVENT BEGIN:VEVENT SOME TEXT PARTSTAT=ACCEPTED END:VEVENT BEGIN:VEVENT SOME TEXT PARTSTAT=DECLINED END:VEVENT'''

上面的代码将输出:

['BEGIN:VEVENT SOME TEXT PARTSTAT=ACCEPTED END:VEVENT']

答案 1 :(得分:0)

如果您要查找的唯一字符串是BEGIN:VEVENTEND:VEVENTPARTSTAT=DECLINED(只要它们是常数),您甚至可能不需要正则表达式。

解析它的代码可能更冗长,但是对于不熟悉正则表达式的人来说,它比使用re.DOTALL的正则表达式更明确。

例如在Python中,您可以做类似的事情

def _next_event(lines, start=0):
    """
    Find the body of the next event as a string.
    Returns (body, end_index) if found, or (None, -1) if not.

    body is defined to be every line after BEGIN:VEVENT and before END:VEVENT.
    end_index is the index of END:VEVENT.
    """
    for i, line in enumerate(lines, start):
        if line.strip() == 'BEGIN:VEVENT':
            start = i
            break
    else:
        # Return None if there is not BEGIN:VEVENT, -1 for "not found"
        return None, -1
    for i, line in enumerate(lines[start+1:], start+1):
        if line.strip() == 'END:VEVENT':
            end = i
            break
    else:
        # Return None if there is not END:VEVENT, -1 for "not found"
        return None, -1
    return '\n'.join(lines[start+1:end]), end


def get_events(lines):
    """
    Get the bodies of all events.
    """
    events = []
    body, i = _next_event(lines)
    while i != -1:
        events.append(body)
        body, i = _next_event(lines, i)
    return events

if __name__ == '__main__':
    with open(calendar_file, 'r') as f:
        lines = f.readlines()

    events = get_events(lines)

    for event in events:
        if event.find('PARTSTAT=DECLINED') != -1:
            # You'll need to define "delete event"
            delete_event(event)

如果要将逻辑扩展到某人拒绝,再扩展为任何人拒绝,则可以将其扩展为以下内容:

 def anyone_declined(event):
     rsvps = re.findall('PARTSTAT=(ACCEPTED|DECLINED)', event, re.DOTALL)
     return any(rsvp == 'DECLINED' for rsvp in rsvps)