大输入值时,熊猫需要更多时间执行甚至不执行

时间:2018-07-23 17:08:51

标签: python-3.x pandas dataframe

device.csv具有以下值(head(5))。

    DEVICE_ADDRESS      START_TIME  UPDATE_TIME
0   00:0A:20:46:86:D2   1528711800  1528764903
1   00:0A:20:6A:17:38   1528659901  1528764905
2   00:0A:20:37:4D:C4   1528578901  1528764901
3   00:0A:20:42:96:E8   1528669200  1528764903
4   00:0A:20:3D:DF:5C   1528728729  1528764906

每个DEVICE_MAC都有多个具有不同START_TIME, UPDATE_TIME值的条目。 CSV文件在数据框中为红色,然后按Device_address的升序排序。排序后,我们将计算LATENCY_MIS, LATENCY_RB, RCOUNT个值

import pandas as pd 
from pandas import DataFrame

df = pd.read_csv(r"C:\Tool\Device.csv" ,names = [ "DEVICE_MAC", "START_TIME", "UPDATE_TIME"])
df=df.sort_values(['DEVICE_MAC', 'START_TIME', 'UPDATE_TIME'], ascending=[True, True,True])

df['LATENCY_MIS'],df['LATENCY_RB'], df['RCOUNT'], df['PAD'] = 0, 0, 0, 0

mac_ref = df.loc[0,'DEVICE_MAC']
start_refernce_time = df['UPDATE_TIME'].min()
end_reference_time = df['UPDATE_TIME'].max()
for index, row in df.iterrows():
    if(mac_ref == row['DEVICE_MAC']):
        if(index==0):                                                           #Starting of MAC processing
            start_time_ref = row['START_TIME']
            event_time_ref = row['UPDATE_TIME']
            df.loc[index,'RCOUNT'] = 0
            df.loc[index, 'PAD'] = row['UPDATE_TIME'] - start_refernce_time
        elif(row['START_TIME'] == start_time_ref):                              #The same session prevails
            difference_event_ts = row['UPDATE_TIME']-event_time_ref
            event_time_ref = row['UPDATE_TIME']
            df.loc[index,'LATENCY_MIS'] = difference_event_ts -300
            df.loc[index,'RCOUNT'] = 0
            if(index+1 in df.index):
                if(row['DEVICE_MAC']!= df.loc[index+1,'DEVICE_MAC']):
                    df.loc[index, 'PAD'] = end_reference_time -row['UPDATE_TIME']
            if(index== df.index[-1]):
                df.loc[index, 'PAD'] = end_reference_time -row['UPDATE_TIME']
        elif(row['START_TIME'] != start_time_ref):                              #New Session Starts
            #difference_event_ts = row['START_TIME']-event_time_ref+(row['UPDATE_TIME']-row['START_TIME']-300)
            df.loc[index,'LATENCY_RB'] = row['START_TIME']-event_time_ref
            df.loc[index, 'LATENCY_MIS']= row['UPDATE_TIME']-row['START_TIME']        #-300*****
            event_time_ref = row['UPDATE_TIME']
            df.loc[index,'RCOUNT'] = 1
            start_time_ref = row['START_TIME']
            event_time_ref = row['UPDATE_TIME']
    else:                                                                      #Starting of new MAC Processing
        mac_ref = row['DEVICE_MAC'] 
        start_time_ref = row['START_TIME']
        event_time_ref = row['UPDATE_TIME']
        df.loc[index,'RCOUNT'] = 0
        df.loc[index, 'PAD'] = row['UPDATE_TIME'] - start_refernce_time

每行的LATENCY_MIS, LATENCY_RB, RCOUNT取决于上一行和连续的下一行START_TIME, UPDATE_TIME的值。 (每个DEVICE_MAC组的第一行和最后一行除外)。 输出看起来像这样

        DEVICE_MAC_ADDRESS  START_TIME  UPDATE_TIME LATENCY_MIS LATENCY_RB  RCOUNT  PAD
18228   00:A0:BC:33:04:F0   1527703135  1528787401  1199        0           0       7219
18995   00:A0:BC:33:04:F0   1527703135  1528788601  600         0           0       6019
21007   00:A0:BC:33:04:F0   1527703135  1528791001  1200        0           0       3619
17981   00:A0:BC:37:60:76   1527697084  1528787100  899         0           0       7520
1384    00:A0:BC:3A:91:5C   1528596621  1528766734  599         0           0       27886
2945    00:A0:BC:3A:91:5C   1528596621  1528768533  899         0           0       26087
5832    00:A0:BC:3A:91:5C   1528596621  1528772133  600         0           0       22487
9091    00:A0:BC:3A:91:5C   1528596621  1528776334  600         0           0       18286
11989   00:A0:BC:3A:91:5C   1528596621  1528779934  600         0           0       14686
12880   00:A0:BC:3A:91:5C   1528596621  1528780834  600         0           0       13786

当输入LATENCY_MIS, LATENCY_RB, RCOUNT, PAD较大时,用于计算CSV的中间代码块将花费更多的时间执行或不执行。

0 个答案:

没有答案