正则表达式找到特定单词后的所有单词?

时间:2016-09-26 10:28:07

标签: python regex python-3.x pattern-matching

我有一个如下字符串:

Features:  -Includes hanging accessories.  -Artist: William-Adolphe Bouguereau.  -Made with 100pct cotton canvas.  -100pct Anti-shrink pine wood bars and Epson anti-fade ultra chrome inks.  -100pct Hand-made and inspected in the U.S.A.  -Orientation: Horizontal.  **Subject: -Figures/Nautical and beach.**  Gender: -Unisex/Both.  Size: -Mini 17'' and under/Small 18''-24''/Medium 25''-32''/Large 33''-40''/Oversized 41'' and above.  Style: -Fine art.  Color: -Blue.  Country of Manufacture: -United States.  Product Type: -Print of painting.  Region: -Europe.  Primary Art Material: -Canvas. Dimensions:  -8'' H x 12'' W x 0.75'' D: 0.72 lb.  -12'' H x 18'' W x 0.75'' D: 1.14 lbs.  -12'' H x 18'' W x 1.5'' D: 2.45 lbs.  -18'' H x 26'' W x 0.75'' D: 1.44 lbs.  Paintings Prints Tori White Wildon Photography Photos Posters Abstract Black D cor Designs Framed Hazelwood Hokku Home Landscape Oil Accent 075 12 15 18 26 40 60 8 D H W x 1 1017 1824 2532 holidays, christmas gift gifts for girls boys

我必须找到特定单词之后的单词。

我想在上面的示例中提取单词"Subject"之后的单词。

输出应如下所示:

Subject: -Figures/Nautical and beach.

我试过以下正则表达式:

re.compile('(?<=subject)(.{30}(?:\s|.))',re.I)

但是,在指定主题关键字之后没有固定数量的单词,因此我无法指定确切的单词数。

我如何停留在&#34; peroid&#34;或者空间。没有特定的停止标准。

3 个答案:

答案 0 :(得分:2)

document.getElementById('elem').onlick = function test(e) { document.getElementById('divtext').innerHTML=''; e.preventDefault(); x--; } 正则表达式在(?<=subject)(.{30}(?:\s|.))之后断言位置。然后抓取除了换行符号以外的30个字符,然后匹配空格或任何字符,但匹配换行符号。这不符合您的要求,因为子字符串可以是任何长度。

您可以将基于交替的正则表达式与捕获组一起使用:

subject

请参阅regex demo

<强>详情:

  • subject:\s*([^.]+|\S+) - 文字subject:字符串
  • subject: - 0+ whitespaces
  • \s* - 第1组捕获1个或多个非期间符号或1个非空白符号

注意:替换的顺序,因为([^.]+|\S+)匹配空格,而[^.]+则不匹配。如果\S+之后的子字符串以点开头,则\s*将匹配该子字符串直到空格。

Python demo

\S+

答案 1 :(得分:1)

尝试:

re.compile('Subject: [^*]+')

Demo

答案 2 :(得分:0)

<强>正则表达式:

(Subject:.+)\*\*

Match Subject and content after that till '**'

<强>代码:

str = 'Features:  -Includes hanging accessories.  -Artist: William-Adolphe Bouguereau.  -Made with 100pct cotton canvas.  -100pct Anti-shrink pine wood bars and Epson anti-fade ultra chrome inks.  -100pct Hand-made and inspected in the U.S.A.  -Orientation: Horizontal.  **Subject: -Figures/Nautical and beach.**  Gender: -Unisex/Both.  Size: -Mini 17'' and under/Small 18''-24''/Medium 25''-32''/Large 33''-40''/Oversized 41'' and above.  Style: -Fine art.  Color: -Blue.  Country of Manufacture: -United States.  Product Type: -Print of painting.  Region: -Europe.  Primary Art Material: -Canvas. Dimensions:  -8'' H x 12'' W x 0.75'' D: 0.72 lb.  -12'' H x 18'' W x 0.75'' D: 1.14 lbs.  -12'' H x 18'' W x 1.5'' D: 2.45 lbs.  -18'' H x 26'' W x 0.75'' D: 1.44 lbs.  Paintings Prints Tori White Wildon Photography Photos Posters Abstract Black D cor Designs Framed Hazelwood Hokku Home Landscape Oil Accent 075 12 15 18 26 40 60 8 D H W x 1 1017 1824 2532 holidays, christmas gift gifts for girls boys'
import re

a = re.search(r'(Subject:.+)\*\*',str)
print(a.group(1))