device.csv
具有以下值(head(5)
)。
DEVICE_ADDRESS START_TIME UPDATE_TIME
0 00:0A:20:46:86:D2 1528711800 1528764903
1 00:0A:20:6A:17:38 1528659901 1528764905
2 00:0A:20:37:4D:C4 1528578901 1528764901
3 00:0A:20:42:96:E8 1528669200 1528764903
4 00:0A:20:3D:DF:5C 1528728729 1528764906
每个DEVICE_MAC
都有多个具有不同START_TIME, UPDATE_TIME
值的条目。 CSV
文件在数据框中为红色,然后按Device_address
的升序排序。排序后,我们将计算LATENCY_MIS, LATENCY_RB, RCOUNT
个值
import pandas as pd
from pandas import DataFrame
df = pd.read_csv(r"C:\Tool\Device.csv" ,names = [ "DEVICE_MAC", "START_TIME", "UPDATE_TIME"])
df=df.sort_values(['DEVICE_MAC', 'START_TIME', 'UPDATE_TIME'], ascending=[True, True,True])
df['LATENCY_MIS'],df['LATENCY_RB'], df['RCOUNT'], df['PAD'] = 0, 0, 0, 0
mac_ref = df.loc[0,'DEVICE_MAC']
start_refernce_time = df['UPDATE_TIME'].min()
end_reference_time = df['UPDATE_TIME'].max()
for index, row in df.iterrows():
if(mac_ref == row['DEVICE_MAC']):
if(index==0): #Starting of MAC processing
start_time_ref = row['START_TIME']
event_time_ref = row['UPDATE_TIME']
df.loc[index,'RCOUNT'] = 0
df.loc[index, 'PAD'] = row['UPDATE_TIME'] - start_refernce_time
elif(row['START_TIME'] == start_time_ref): #The same session prevails
difference_event_ts = row['UPDATE_TIME']-event_time_ref
event_time_ref = row['UPDATE_TIME']
df.loc[index,'LATENCY_MIS'] = difference_event_ts -300
df.loc[index,'RCOUNT'] = 0
if(index+1 in df.index):
if(row['DEVICE_MAC']!= df.loc[index+1,'DEVICE_MAC']):
df.loc[index, 'PAD'] = end_reference_time -row['UPDATE_TIME']
if(index== df.index[-1]):
df.loc[index, 'PAD'] = end_reference_time -row['UPDATE_TIME']
elif(row['START_TIME'] != start_time_ref): #New Session Starts
#difference_event_ts = row['START_TIME']-event_time_ref+(row['UPDATE_TIME']-row['START_TIME']-300)
df.loc[index,'LATENCY_RB'] = row['START_TIME']-event_time_ref
df.loc[index, 'LATENCY_MIS']= row['UPDATE_TIME']-row['START_TIME'] #-300*****
event_time_ref = row['UPDATE_TIME']
df.loc[index,'RCOUNT'] = 1
start_time_ref = row['START_TIME']
event_time_ref = row['UPDATE_TIME']
else: #Starting of new MAC Processing
mac_ref = row['DEVICE_MAC']
start_time_ref = row['START_TIME']
event_time_ref = row['UPDATE_TIME']
df.loc[index,'RCOUNT'] = 0
df.loc[index, 'PAD'] = row['UPDATE_TIME'] - start_refernce_time
每行的LATENCY_MIS, LATENCY_RB, RCOUNT
取决于上一行和连续的下一行START_TIME, UPDATE_TIME
的值。 (每个DEVICE_MAC
组的第一行和最后一行除外)。
输出看起来像这样
DEVICE_MAC_ADDRESS START_TIME UPDATE_TIME LATENCY_MIS LATENCY_RB RCOUNT PAD
18228 00:A0:BC:33:04:F0 1527703135 1528787401 1199 0 0 7219
18995 00:A0:BC:33:04:F0 1527703135 1528788601 600 0 0 6019
21007 00:A0:BC:33:04:F0 1527703135 1528791001 1200 0 0 3619
17981 00:A0:BC:37:60:76 1527697084 1528787100 899 0 0 7520
1384 00:A0:BC:3A:91:5C 1528596621 1528766734 599 0 0 27886
2945 00:A0:BC:3A:91:5C 1528596621 1528768533 899 0 0 26087
5832 00:A0:BC:3A:91:5C 1528596621 1528772133 600 0 0 22487
9091 00:A0:BC:3A:91:5C 1528596621 1528776334 600 0 0 18286
11989 00:A0:BC:3A:91:5C 1528596621 1528779934 600 0 0 14686
12880 00:A0:BC:3A:91:5C 1528596621 1528780834 600 0 0 13786
当输入LATENCY_MIS, LATENCY_RB, RCOUNT, PAD
较大时,用于计算CSV
的中间代码块将花费更多的时间执行或不执行。