从字符串Python中提取值

时间:2019-05-09 17:53:25

标签: python

正在处理机器人应用程序,因此我需要从消息字符串中提取值并将其传递给变量。消息字符串可以采用不同的方式,例如:

message = 'name="Raj",lastname="Paul",gender="male", age=23'
message = 'name="Raj",lastname="Paul",age=23'
message = 'name="Raj",lastname="Paul",gender="male"'

用户提供的数据可能包含所有值,或者有时会缺少年龄或性别字段。

我困在的地方是I am not sure how to check if age is present in the message text. If it is then extract value corresponding to age. If age is not in message, ignore age.

可以在循环中检查每个单词并提取字符串,但是它变得很冗长。请让我知道是否还有更简便的方法

if Age is present in message then get the value of age,
if lastname is present in message then get the value of lastname
if gender is present in message then get the value of gender
if name is present in message then get the value of name

5 个答案:

答案 0 :(得分:1)

使用正则表达式:

(?:[, ])age=(\d+)

从字符串中提取'age='之后的数字。

代码

import re

message = 'name="Raj",lastname="Paul",gender="male", age=23'
m = re.search(r'(?:[, ])age=(\d+)', message)
if m:
    print(m.group(1))

# 23

答案 1 :(得分:1)

如果您只想测试age,则可以搜索字符串。如果除检查年龄外还想将其用于其他用途,则可以将其拆分为字典。

message = 'name="Raj",lastname="Paul",gender="male", age=23'
pairs = [pair.replace('"', '').strip() for pair in message.split(',')]
d = dict([p.split('=') for p in pairs])

'age' in d # True
d['name'] # 'Raj'

答案 2 :(得分:1)

您可以做的一件事是使用正则表达式并提取各个部分。

例如,假设您的消息是message = 'name="Raj",lastname="Paul",gender="male", age=23',则可以将正则表达式设为(?P<var>.*?)=(?P<out>.*?),

这就是我要做的:

import re
message = 'name="Raj",lastname="Paul",gender="male", age=23'
message += ',' # Add a comma for the regex
findall = re.findall(r'(?P<var>.*?)=(?P<out>.*?),', message) # Note the additional comma
extracted = {k.strip(): v.strip() for k,v in findall}
if 'age' in extracted:
    print(extracted['age']) # prints 23

然后提取的将是一张看起来像这样的地图:{'name': '"Raj"', 'lastname': '"Paul"', 'gender': '"male"', 'age': '23'}。如果您确实需要,可以删除双引号,然后从此处将age转换为int。

要显示所有字段,您可以执行以下操作:

for field in extracted:
    print(field, extracted[field])

# Prints
name "Raj"
lastname "Paul"
gender "male"
age 23

答案 3 :(得分:1)

message = 'name="Raj",lastname="Paul",gender="male", age=23'

new_msg = message.replace('"', '').replace(' ', '').split(',')  # 2nd replace to delete the extra space before age

msg_dict = dict([x.split('=') for x in new_msg])

print(msg_dict)

此代码将以下输出作为字典返回。您可以遍历每条消息,它将使用正确的键放置正确的属性。

{'name': 'Raj', 'lastname': 'Paul', 'gender': 'male', 'age': '23'}

答案 4 :(得分:0)

这是另一种可能性:

message1 = 'name="Raj",lastname="Paul",gender="male", age=23'

message2 = 'name="Raj",lastname="Paul",age=23'

message3 = 'name="Raj",lastname="Paul",gender="male"'

messages = [message1, message2, message3]

splits = [m.split(",") for m in messages]

def flatten(lst):
    temp = []
    for l in lst:
        val1, val2 = l.split("=")
        val1 = val1.strip()
        val2 = val2.strip()
        temp.append(val1)
        temp.append(val2)
    return temp

clean = list(map(lambda x: flatten(x), splits))

final = [x for x in clean if 'age' in x]

final

这将保留包含“年龄”的邮件

相关问题