使用正则表达式从文本文件中提取字符串

时间:2015-08-12 00:10:18

标签: python python-3.x

基本上我有一个带有这个的txt文档,

The sound of a horse at a gallop came fast and furiously up the hill.
"So-ho!" the guard sang out, as loud as he could roar.
"Yo there! Stand! I shall fire!"
The pace was suddenly checked, and, with much splashing and floundering, a man's voice called from the mist, "Is that the Dover mail?"
"Never you mind what it is!" the guard retorted. "What are you?"
"_Is_ that the Dover mail?"
"Why do you want to know?"
"I want a passenger, if it is."
"What passenger?"
"Mr. Jarvis Lorry."
Our booked passenger showed in a moment that it was his name.
The guard, the coachman, and the two other passengers eyed him distrustfully.

使用正则表达式我需要在双引号内打印所有内容,我不想要完整的代码我只需要知道我应该如何去做,正则表达式最有用。请提示和指示!

2 个答案:

答案 0 :(得分:3)

r'(".*?")'将匹配双引号内的每个字符串。括号表示捕获的组,.匹配每个字符(换行符除外),*表示重复,?表示非贪婪(在...之前停止匹配)下一个双引号)。如果需要,请添加re.DOTALL选项,以使.也匹配换行符。

答案 1 :(得分:0)

这应该这样做(下面的解释):

from __future__ import print_function

import re

txt = """The sound of a horse at a gallop came fast and furiously up the hill.
"So-ho!" the guard sang out, as loud as he could roar.
"Yo there! Stand! I shall fire!"
The pace was suddenly checked, and, with much splashing and floundering,
a man's voice called from the mist, "Is that the Dover mail?"
"Never you mind what it is!" the guard retorted. "What are you?"
"_Is_ that the Dover mail?"
"Why do you want to know?"
"I want a passenger, if it is."
"What passenger?"
"Mr. Jarvis Lorry."
Our booked passenger showed in a moment that it was his name.
The guard, the coachman, and the two other passengers eyed him distrustfully.
"""

strings = re.findall(r'"(.*?)"', txt)

for s in strings:
    print(s)

结果:

So-ho!
Yo there! Stand! I shall fire!
Is that the Dover mail?
Never you mind what it is!
What are you?
_Is_ that the Dover mail?
Why do you want to know?
I want a passenger, if it is.
What passenger?
Mr. Jarvis Lorry.

r'"(.*?)"'将匹配双引号内的每个字符串。括号表示一个捕获组,因此您只能获得没有双引号的文本。 .匹配每个字符(换行符除外),*表示“最后一个零或更多”,最后一个是.?之后的*使*“非贪婪”,这意味着它尽可能少地匹配。如果你没有使用?,你只能获得一个结果;包含第一个和最后一个双引号之间所有内容的字符串。

如果要提取跨行的字符串,可以包含re.DOTALL标志,以便.也匹配换行符。如果您想这样做,请使用re.findall(r'"(.*?)"', txt, re.DOTALL)。新行包含在字符串中,因此您必须检查该内容。

解释与@ TigerhawkT3的答案不可避免地相似/基于投票也回答!