如何在正则表达式中划分换行和两个换行?

时间:2019-02-06 12:09:34

标签: python regex newline

我想通过以下方式输出分组:

  1. 换行符'\ n'
  2. 两个换行符'\ n \ n'

我如何分成两组以使用其他正则表达式拆分方法?

找到单独的换行符或我管理的两个换行符。 例如:

Facebook and Google exploited a feature__(\n)__  
intended for “enterprise developers” to__(\n)__  
distribute apps that collect large amounts__(\n)__  
of data on private users, TechCrunch first reported.__(\n\n)__   

Apple’s maneuver has been characterized by some as a chilling demonstration of the company’s power.__(\n)__  
Verge editor-in-chief Nilay Patel suggested in a tweet that it was cause for concern: First, they came for our enterprise certificates, then… well, what, exactly?__(\n\n)__  

Some text so on... 

我尝试了以下代码:

def find_newlines(file):
    with open(file, "r") as content:
       text = content.read()
       content = re.split("\n+", text)
    return content

结果是:

['Apple' , 'Something', 'Enything']

我想要以下输出:

['Facebook and Google exploited a feature intended for “enterprise developers” to distribute apps that collect large amounts of data on private users, TechCrunch first reported.' __,__ 'Apple’s maneuver has been characterized by some as a chilling demonstration of the company’s power. Verge editor-in-chief Nilay Patel suggested in a tweet that it was cause for concern: First, they came for our enterprise certificates, then… well, what, exactly?']

我想获得1组换行符 和2组两个换行符。

1 个答案:

答案 0 :(得分:0)

您似乎正在尝试将文本分组为两个(或更多)由双换行符分隔的块。因此,一种方法是首先分割\n\n上的文本。这将导致blocks仍包含单个换行符。每个块然后可以用空格替换所有剩余的换行符。都可以使用Python列表理解来完成,如下所示:

text = """Facebook and Google exploited a feature
intended for “enterprise developers” to
distribute apps that collect large amounts
of data on private users, TechCrunch first reported.

Apple’s maneuver has been characterized by some as a chilling demonstration of the company’s power.
Verge editor-in-chief Nilay Patel suggested in a tweet that it was cause for concern: First, they came for our enterprise certificates, then… well, what, exactly?"""

content = [block.replace('\n', ' ') for block in text.split('\n\n')]

print(content)

为您提供一个包含两个条目且没有换行符的列表:

['Facebook and Google exploited a feature intended for “enterprise developers” to distribute apps that collect large amounts of data on private users, TechCrunch first reported.', 'Apple’s maneuver has been characterized by some as a chilling demonstration of the company’s power. Verge editor-in-chief Nilay Patel suggested in a tweet that it was cause for concern: First, they came for our enterprise certificates, then… well, what, exactly?']

正则表达式可用于块被两个或更多空行分隔的情况,如下所示:

import re

text = """Facebook and Google exploited a feature
intended for “enterprise developers” to
distribute apps that collect large amounts
of data on private users, TechCrunch first reported.



Apple’s maneuver has been characterized by some as a chilling demonstration of the company’s power.
Verge editor-in-chief Nilay Patel suggested in a tweet that it was cause for concern: First, they came for our enterprise certificates, then… well, what, exactly?"""

content = [block.replace('\n', ' ') for block in re.split('\n{2,}', text)]

print(content)