Question

基本上我有一个带有这个的txt文档，

The sound of a horse at a gallop came fast and furiously up the hill.
"So-ho!" the guard sang out, as loud as he could roar.
"Yo there! Stand! I shall fire!"
The pace was suddenly checked, and, with much splashing and floundering, a man's voice called from the mist, "Is that the Dover mail?"
"Never you mind what it is!" the guard retorted. "What are you?"
"_Is_ that the Dover mail?"
"Why do you want to know?"
"I want a passenger, if it is."
"What passenger?"
"Mr. Jarvis Lorry."
Our booked passenger showed in a moment that it was his name.
The guard, the coachman, and the two other passengers eyed him distrustfully.

使用正则表达式我需要在双引号内打印所有内容，我不想要完整的代码我只需要知道我应该如何去做，正则表达式最有用。请提示和指示！

Answer 1

r'(".*?")'将匹配双引号内的每个字符串。括号表示捕获的组，.匹配每个字符（换行符除外），*表示重复，?表示非贪婪（在...之前停止匹配）下一个双引号）。如果需要，请添加re.DOTALL选项，以使.也匹配换行符。

Answer 2

这应该这样做（下面的解释）：

from __future__ import print_function

import re

txt = """The sound of a horse at a gallop came fast and furiously up the hill.
"So-ho!" the guard sang out, as loud as he could roar.
"Yo there! Stand! I shall fire!"
The pace was suddenly checked, and, with much splashing and floundering,
a man's voice called from the mist, "Is that the Dover mail?"
"Never you mind what it is!" the guard retorted. "What are you?"
"_Is_ that the Dover mail?"
"Why do you want to know?"
"I want a passenger, if it is."
"What passenger?"
"Mr. Jarvis Lorry."
Our booked passenger showed in a moment that it was his name.
The guard, the coachman, and the two other passengers eyed him distrustfully.
"""

strings = re.findall(r'"(.*?)"', txt)

for s in strings:
    print(s)

结果：

So-ho!
Yo there! Stand! I shall fire!
Is that the Dover mail?
Never you mind what it is!
What are you?
_Is_ that the Dover mail?
Why do you want to know?
I want a passenger, if it is.
What passenger?
Mr. Jarvis Lorry.

r'"(.*?)"'将匹配双引号内的每个字符串。括号表示一个捕获组，因此您只能获得没有双引号的文本。 .匹配每个字符（换行符除外），*表示“最后一个零或更多”，最后一个是.。 ?之后的*使*“非贪婪”，这意味着它尽可能少地匹配。如果你没有使用?，你只能获得一个结果;包含第一个和最后一个双引号之间所有内容的字符串。

如果要提取跨行的字符串，可以包含re.DOTALL标志，以便.也匹配换行符。如果您想这样做，请使用re.findall(r'"(.*?)"', txt, re.DOTALL)。新行将包含在字符串中，因此您必须检查该内容。

解释与@ TigerhawkT3的答案不可避免地相似/基于投票也回答！

使用正则表达式从文本文件中提取字符串

2 个答案: