我有一个包含两列的 Pandas 数据框:票号和历史记录。
History 是一个具有以下结构的字符串。我需要创建第三列,其中包括将状态从“新建”更改为“打开”的作者姓名。可能吗?
[
{
"id": "1,
"author": {
"name": "user1",
"emailAddress": "user1@test.com",
"displayName": "user1"
},
"created": "2021-06-09T12:54:22.915+0000",
"items": [
{
"field": "name",
"from": "1",
"fromString": null,
"to": "2",
"toString": "test"
}
]
},
{
"id": "2",
"author": {
"name": "user2",
"emailAdress": "user2@test.com",
"displayName": "user2"
},
"created": "2021-06-11T09:33:18.692+0000",
"items": [
{
"field": "status",
"from": 3,
"fromString": "New",
"to": "7",
"toString": "Open"
}
]
}]
答案 0 :(得分:1)
如果您的数据框名为 df
,则历史列(第 2 列)名为 history
并且历史列中的项目实际上是具有您提供的结构的 json 字符串,您可以执行以下操作:
import json
def extract_author(json_string):
records = json.loads(json_string)
for record in records:
items = record['items'][0]
if (items['field'] == 'status'
and items['fromString'] == 'New'
and items['toString'] == 'Open'):
return record['author']['name']
return None
df['author'] = df['history'].map(extract_author)