我有这个代表 Fedex 跟踪的列表
history = ['Tuesday, March 16, 2021', '3:03 PM Hollywood, FL\nDelivered\nLeft at front door. Signature Service not requested.', '5:52 AM MIAMI, FL\nOn FedEx vehicle for delivery', '5:40 AM MIAMI, FL\nAt local FedEx facility', 'Monday, March 15, 2021', '11:42 PM OCALA, FL\nDeparted FedEx location', '10:01 PM OCALA, FL\nArrived at FedEx location', '8:28 PM OCALA, FL\nIn transit', '12:42 AM OCALA, FL\nIn transit']
答案 0 :(得分:2)
history = [
"Tuesday, March 16, 2021",
"3:03 PM Hollywood, FL\nDelivered\nLeft at front door. Signature Service not requested.",
"5:52 AM MIAMI, FL\nOn FedEx vehicle for delivery",
"5:40 AM MIAMI, FL\nAt local FedEx facility",
"Monday, March 15, 2021",
"11:42 PM OCALA, FL\nDeparted FedEx location",
"10:01 PM OCALA, FL\nArrived at FedEx location",
"8:28 PM OCALA, FL\nIn transit",
"12:42 AM OCALA, FL\nIn transit",
]
import re
r = re.compile("^(?:Sunday|Monday|Tuesday|Wednesday|Thursday|Friday|Saturday)")
data, cur_group = [], ""
for line in history:
if r.match(line):
cur_group = line
else:
data.append([cur_group, *line.split("\n", maxsplit=1)])
df = pd.DataFrame(data)
print(df)
打印:
0 1 2
0 Tuesday, March 16, 2021 3:03 PM Hollywood, FL Delivered\nLeft at front door. Signature Servi...
1 Tuesday, March 16, 2021 5:52 AM MIAMI, FL On FedEx vehicle for delivery
2 Tuesday, March 16, 2021 5:40 AM MIAMI, FL At local FedEx facility
3 Monday, March 15, 2021 11:42 PM OCALA, FL Departed FedEx location
4 Monday, March 15, 2021 10:01 PM OCALA, FL Arrived at FedEx location
5 Monday, March 15, 2021 8:28 PM OCALA, FL In transit
6 Monday, March 15, 2021 12:42 AM OCALA, FL In transit
答案 1 :(得分:2)
您可以使用 dateutil.parser.parse
来检查元素是否是有效的日期时间。
这应该比仅仅检查元素是否包含日期字符串(Monday
、Tuesday
等)更安全,以防事件在某处也包含日期字符串(例如,{{1} }).
Delivery failed\nWill reattempt on Monday
答案 2 :(得分:0)
好的,这有点hacky,但如果格式一致,可能会完成工作,长期正则表达式可能是更好的方法
col1 = []
col2 = []
col3 = []
for h in history:
if 'FL' in h:
col1.append(date)
new_list = h.split(',')
item2 = new_list[0][4:]
item3 = new_list[1][4:]
col2.append(item2.replace('\n', '. '))
col3.append(item3.replace('\n', '. '))
else:
date = h
pd.DataFrame({'col1': col1,
'col2': col2,
'col3': col3})