Question

我试图使用正则表达式和python从电子邮件中提取特定的重复项目，模式总是：

OS - TYPE - VER - en - he_IL - 1.1.2 - U: username - hash

我试图用以下条件来做：

if re.search('U: \s*( - )', message_body)

希望得到＆＃34;用户名＆＃34;不幸的是，它并没有给出任何东西。

另外，尝试if re.search('U: \w*())', message_body)给了我过于宽泛的回应，包括实际的＆＃34; U：＆＃34;使用用户名。

我很想得到一些不包含手册链接的指示。

Answer 1

使用具有实际表达式的捕获组：

match = re.search('U:\s*(\S+)')
if match: username = match.group(1)

match = re.search('U:\s*(\S+ - \S+)')
if match: username_and_hash = match.group(1)

match = re.search('U:\s*(\S+) - (\S+)')
if match:
    username = match.group(1)
    userhash = match.group(2)

Answer 2

U:\s*(\S+)

试试这个。使用print re.search(r"U:\s*(\S+)",x).group(1)获取username。

此处x是您的字符串。

参见演示。

http://regex101.com/r/lS5tT3/73

Answer 3

您可以使用split：

s = "OS - TYPE - VER - en - he_IL - 1.1.2 - U: username - hash"
print (s.split("U: ")[1].split()[0])
username

或使用re：

import re
(re.findall(" U:\s+(\w+)",s)[0])
username

re慢得多：

In [20]: timeit (re.findall(" U:\s+(\w+)",s)[0])
100000 loops, best of 3: 2.5 µs per loop

In [21]: timeit (s.split("U: ")[1].split()[0])

1000000 loops, best of 3: 764 ns per loop

正则表达式/ Python noob需要帮助

3 个答案: