为什么我得到一个空的数据框?

时间:2019-05-10 12:03:55

标签: python pandas file-io

我正在尝试将WhatsApp聊天导出到一个数据框中,然后进行分析。但是我得到一个空的数据框。

下面是聊天记录文件chat.txt中的一小部分示例:

07/02/19, 3:08 pm - Messages to this group are now secured with end-to-end encryption. Tap for more info.
22/01/19, 3:27 pm - kai Sir created group "Weekday batch 201901"
07/02/19, 3:08 pm - kai Sir added you
07/02/19, 3:08 pm - kai Sir removed +91 85949 03087
07/02/19, 3:08 pm - kai Sir changed the subject from "Weekday batch 201901" to "Weekday batch 201902"
07/02/19, 3:09 pm - kai Sir: Hi All this is weekday batch staring from 11th Feb from morning 7.30 am to 10 am

我的代码:

import pandas as pd

import re

import itertools

def parse_file(text_file):

    #Convert WhatsApp chat log text file to a Pandas dataframe.

    # some regex to account for messages taking up multiple lines
    pat = re.compile(r'^(\d\d\/\d\d\/\d\d\d\d.*?)(?=^^\d\d\/\d\d\/\d\d\d\d|\Z)', re.S | re.M)
    with open(text_file, encoding='latin1') as f:
        data = [m.group(1).strip().replace('\n', ' ') for m in pat.finditer(f.read())]

    sender = []; message = []; datetime = []
    for row in data:

        # timestamp is before the first dash
        datetime.append(row.split(' - ')[0])

        # sender is between am/pm, dash and colon
        try:
            s = re.search('m - (.*?):', row).group(1)
            sender.append(s)
        except:
            sender.append('')

        # message content is after the first colon
        try:
            message.append(row.split(': ', 1)[1])
        except:
            message.append('')

    df = pd.DataFrame(zip(datetime, sender, message), columns=['timestamp', 'sender', 'message'])
    df['timestamp'] = pd.to_datetime(df.timestamp, format='%d/%m/%Y, %I:%M %p')

    # remove events not associated with a sender
    df = df[df.sender != ''].reset_index(drop=True)

    return df


df = parse_file(r"C:\Users\RASHMI\Desktop\python_full\chat.txt")

我的输出低于输出。

In [17]: df
Out[17]: Empty DataFrame Columns: [timestamp, sender, message] Index: []

0 个答案:

没有答案