regex findall()忽略一个换行符,但识别多个换行符

时间:2018-06-14 03:20:37

标签: python regex

我有一个基本上看起来像这样的文本文件:

Game #16406772158 starts.\n#Game No : 16406772158\n

....

wins $0.75 USD\n\n\n_ 

很多\ n(新文本)\ n(新文本)然后\ n \ n \ n。我想在我的文本文件中找到发生这种情况的所有实例。当我的代码看起来像这样时,它可以工作(但仅适用于第一个实例):

gameRegex = re.compile(r"""Game #(.+\n)*""") 
game = gameRegex.search(totalContent)

当我切换到findall方法时,输出"游戏"变量看起来像这样:

['Yl9Ui1OhAPyGV0JlCPLRrg wins $0.75 USD\n',
  'G72AzGPQLTOWfYoNST1K/g wins $10 USD\n',
 '4bSQFjpEWTIcsil7GJkkVA wins $39.99 USD from the main pot with three of a kind, Kings.\n',
 'U3xFxCVFfFBt50sL9VgLgQ wins $1.45 USD\n', ..., ]

编程很新,我不知道该怎么做。我希望它看起来像这样,它创建一个列表。在列表的每个项目中,它会显示文本,直到\ n \ n \ n:

game = ['Game #16406772158 starts.\n#Game No : 16406772158\n***** Hand 
History for Game 16406772158 *****\n$50 USD NL Texas Hold'em - Wednesday, 
July 01, 00:00:01 EDT 2009 ... Yl9Ui1OhAPyGV0JlCPLRrg wins $0.75 USD\n', 
'Game #16406772158 starts.\n#Game No : 16406772158\n***** Hand History for 
Game 16406772158 *****\n$50 USD NL Texas Hold'em - Wednesday, July 01, 
00:00:01 EDT 2009 ... Yl9Ui1OhAPyGV0JlCPLRrg wins $0.75 USD\n']

1 个答案:

答案 0 :(得分:1)

我认为您正在寻找的模式是这样的:

(?:(?!\\n\\n\\n).)+\\n\\n\\n

Demo

要删除列表项末尾的两个额外\ n,请使用此正则表达式:

(?:(?!\\n\\n\\n).)+\\n(?=\\n\\n)

Sample Code

import re
regex = r"(?:(?!\\n\\n\\n).)+\\n(?=\\n\\n)"
test_str = ("Game #16406772158 starts.\\n#Game No : 16406772158\\n\n"
    "Yl9Ui1OhAPyGV0JlCPLRrg wins $0.75 USD\\nG72AzGPQLTOWfYoNST1K/g wins $10 USD\\n'4bSQFjpEWTIcsil7GJkkVA wins $39.99 USD from the main pot with three of a kind, Kings.\\n'U3xFxCVFfFBt50sL9VgLgQ wins $1.45 USD\\nwins $0.75 USD\\n\\n\\nGame #16406772158 starts.\\n#Game No : 16406772158\\n....\n"
    "wins $0.75 USD\\n\\n\\n\n"
    "Game #16406772158 starts.\\n#Game No : 16406772158\\n\n"
    "....\n"
    "wins $0.75 USD\\n\\n\\n")
result = []
matches = re.finditer(regex, test_str, re.DOTALL)
for match in matches:
    #print ("Match was found at {start}-{end}: {match}".format(start = match.start(), end = match.end(), match = match.group()))
    result.append(match.group())
print(result)

输出:

["Game #16406772158 starts.\\n#Game No : 16406772158\\n\nYl9Ui1OhAPyGV0JlCPLRrg wins $0.75 USD\\nG72AzGPQLTOWfYoNST1K/g wins $10 USD\\n'4bSQFjpEWTIcsil7GJkkVA wins $39.99 USD from the main pot with three of a kind, Kings.\\n'U3xFxCVFfFBt50sL9VgLgQ wins $1.45 USD\\nwins $0.75 USD\\n", '\\n\\nGame #16406772158 starts.\\n#Game No : 16406772158\\n....\nwins $0.75 USD\\n', '\\n\\n\nGame #16406772158 starts.\\n#Game No : 16406772158\\n\n....\nwins $0.75 USD\\n']

相关问题