Question

我正在尝试仅提取一个包含$字符的字符串。输入基于我使用BeautifulSoup提取的输出。

代码

price = [m.split() for m in re.findall(r"\w+/$(?:\s+\w+/$)*", soup_content.find('blockquote', { "class": "postcontent restore" }).text)]

输入

For Sale is my Tag Heuer Carrera Calibre 6 with box and papers and extras.
39mm
47 ish lug to lug
19mm in between lugs
Pretty thin but not sure exact height. Likely around 12mm (maybe less)
I've owned it for about 2 years. I absolutely love the case on this watch. It fits my wrist and sits better than any other watch I've ever owned. I'm selling because I need cash and other pieces have more sentimental value
I am the second owner, but the first barely wore it.
It comes with barely worn blue leather strap, extra suede strap that matches just about perfectly and I'll include a blue Barton Band Elite Silicone.
I also purchased an OEM bracelet that I personally think takes the watch to a new level. This model never came with a bracelet and it was several hundred $ to purchase after the fact.
The watch was worn in rotation and never dropped or knocked around.
The watch does have hairlines, but they nearly all superficial. A bit of time with a cape cod cloth would take care of a lot it them. The pics show the imperfections in at "worst" possible angle to show the nature of scratches.
The bracelet has a few desk diving marks, but all in all, the watch and bracelet are in very good shape.
Asking $2000 obo. PayPal shipped. CONUS.
It's a big hard to compare with others for sale as this one includes the bracelet.

输出应该是这样的。

Answer 1

您不需要正则表达式。相反，您可以遍历行和遍历每个单词以检查以'$'开头并提取单词：

[word[1:] for line in s.split('\n') for word in line.split() if word.startswith('$') and len(word) > 1]

其中s是您的段落。

输出：

['2000']

Answer 2

我会做类似的事情（提供的输入是您在上面编写的字符串）-

price_start = input.find('$')
price = input[price_start:].split(' ')[0]

如果，只有1次出现，如您所说。

替代方法-您可以使用像这样的正则表达式-

price = re.findall('\S*\$\S*\d', input)[0]
price = price.replace('$', '')

Answer 3

由于这很简单，您不需要正则表达式解决方案，因此可以满足以下条件：

words = text.split()
words_with_dollar = [word for word in words if '$' in word]
print(words_with_dollar)

>>> ['$', '$2000']

如果您不想单独使用美元符号，则可以添加如下过滤器：

words_with_dollar = [word for word in words if '$' in word and '$' != word]
print(words_with_dollar)

>>> ['$2000']

如何在Python中提取包含特定字符的字符串

3 个答案: