如何将此列表转换为数据框

时间:2021-03-24 19:38:56

标签: python pandas dataframe

我有这个代表 Fedex 跟踪的列表

history = ['Tuesday, March 16, 2021', '3:03 PM Hollywood, FL\nDelivered\nLeft at front door. Signature Service not requested.', '5:52 AM MIAMI, FL\nOn FedEx vehicle for delivery', '5:40 AM MIAMI, FL\nAt local FedEx facility', 'Monday, March 15, 2021', '11:42 PM OCALA, FL\nDeparted FedEx location', '10:01 PM OCALA, FL\nArrived at FedEx location', '8:28 PM OCALA, FL\nIn transit', '12:42 AM OCALA, FL\nIn transit']

如何将此列表转换为此 3 列数据框 enter image description here

3 个答案:

答案 0 :(得分:2)

history = [
    "Tuesday, March 16, 2021",
    "3:03 PM Hollywood, FL\nDelivered\nLeft at front door. Signature Service not requested.",
    "5:52 AM MIAMI, FL\nOn FedEx vehicle for delivery",
    "5:40 AM MIAMI, FL\nAt local FedEx facility",
    "Monday, March 15, 2021",
    "11:42 PM OCALA, FL\nDeparted FedEx location",
    "10:01 PM OCALA, FL\nArrived at FedEx location",
    "8:28 PM OCALA, FL\nIn transit",
    "12:42 AM OCALA, FL\nIn transit",
]


import re

r = re.compile("^(?:Sunday|Monday|Tuesday|Wednesday|Thursday|Friday|Saturday)")

data, cur_group = [], ""
for line in history:
    if r.match(line):
        cur_group = line
    else:
        data.append([cur_group, *line.split("\n", maxsplit=1)])

df = pd.DataFrame(data)
print(df)

打印:

                         0                      1                                                  2
0  Tuesday, March 16, 2021  3:03 PM Hollywood, FL  Delivered\nLeft at front door. Signature Servi...
1  Tuesday, March 16, 2021      5:52 AM MIAMI, FL                      On FedEx vehicle for delivery
2  Tuesday, March 16, 2021      5:40 AM MIAMI, FL                            At local FedEx facility
3   Monday, March 15, 2021     11:42 PM OCALA, FL                            Departed FedEx location
4   Monday, March 15, 2021     10:01 PM OCALA, FL                          Arrived at FedEx location
5   Monday, March 15, 2021      8:28 PM OCALA, FL                                         In transit
6   Monday, March 15, 2021     12:42 AM OCALA, FL                                         In transit

答案 1 :(得分:2)

您可以使用 dateutil.parser.parse 来检查元素是否是有效的日期时间。

这应该比仅仅检查元素是否包含日期字符串(MondayTuesday 等)更安全,以防事件在某处也包含日期字符串(例如,{{1} }).

Delivery failed\nWill reattempt on Monday

答案 2 :(得分:0)

好的,这有点hacky,但如果格式一致,可能会完成工作,长期正则表达式可能是更好的方法

col1 = []
col2 = []
col3 = []
for h in history:
    if 'FL' in h:
        col1.append(date)
        new_list = h.split(',')
        item2 = new_list[0][4:]
        item3 = new_list[1][4:]
        col2.append(item2.replace('\n', '. '))
        col3.append(item3.replace('\n', '. '))
    else:
        date = h

pd.DataFrame({'col1': col1,
              'col2': col2,
              'col3': col3})
相关问题