Question

我有以下字符串

messages = """Jan 09, 07:03 AM - +91 12345 12345:‬ added ‪+91 45678 47263‬
Jan 10, 07:03 AM - +91 12345 12345: Hello
Jan 11, 07:03 AM - +91 12345 12345: How are you?.
Jan 12, 07:03 AM - +91 12345 12345: What's up?
"""

我想使用正则表达式解析上面的消息并仅打印消息

输出应该是

added ‪+91 45678 47263‬
Hello
How are you?.
What's up?

Answer 1

如果你想要正则表达式：

for i in re.findall(".+:\s*(.*)", messages):
    print(i)

但这并不能处理你在那里的特殊字符。

Answer 2

这应该这样做：

import re
result = [re.split('\d+:', line)[-1] for line in messages.split('\n')]
for item in result:
    print item

Answer 3

python有find和index方法，它们会从字符串中从左到右搜索字符串，并将其位置作为整数返回。它也有同样的rfind和rindex，但是它们从右到左搜索。因此，您可以在换行符上拆分文本并对每一行进行切片。看起来像这样：

messages = """Jan 09, 07:03 AM - +91 12345 12345:‬ added ‪+91 45678 47263‬
Jan 10, 07:03 AM - +91 12345 12345: Hello
Jan 11, 07:03 AM - +91 12345 12345: How are you?.
Jan 12, 07:03 AM - +91 12345 12345: What's up?
"""

for line in messages.split('\n'):
    if line:
        print(line[line.rindex(':') + 2:])

产出输出：

added ‪+91 45678 47263‬
Hello
How are you?.
What's up?

if line:被抛入其中，因为最终"""位于一个新行上，该行会以空字符串形式出现而index / rindex会引发错误如果它找不到字符串中的子字符串。如果这是一个问题，您可以使用返回-1的find或rfind方法，而不是抛出错误。

值得注意的是，如果邮件本身中包含:，则会显示意外结果。

Answer 4

如果您有一个字符串并希望在FIRST＆＃34;之后获取所有文本：＆＃34;，请尝试以下内容：

myString = "Jan 10, 07:03 AM - +91 12345 12345: Hello"
index = myString.find(":") # Gets index of first ":" in the myString variable
message = myString[index:] # Starts at index and gets everything afterwards
# message is now ": Hello"

如果要从邮件中删除冒号，只需将1添加到索引

即可

message = myString[index+1:] # message is now " Hello"

然后您可以使用messages.split（＆＃39; \ n＆＃39;）为每一行执行此操作，如下所示

for line in messages.split('\n'):
    index = line.find(':') # Gets index of first ":" in line
    message = line[index+1:]
    print message

从python中的多行字符串解析消息

4 个答案: