Python计算每个IP每小时的请求率

时间:2019-04-09 08:18:20

标签: python algorithm

我有以下格式的字典 {str:[datetime_object]}

示例:

test_data={'127.0.0.1':[datetime.datetime(2016, 5, 31, 2, 3, 48), datetime.datetime(2016, 5, 31, 3, 0, 53)],
    '127.0.0.2':  [datetime.datetime(2016, 5, 30, 0, 15, 10), datetime.datetime(2016, 5, 31, 2, 18, 29), datetime.datetime(2016, 5, 31, 2, 18, 41), datetime.datetime(2016, 5, 31, 2, 18, 49), datetime.datetime(2016, 5, 31, 2, 21, 32), datetime.datetime(2016, 5, 31, 2, 21, 40), datetime.datetime(2016, 5, 31, 2, 21, 46), datetime.datetime(2016, 5, 31, 2, 22), datetime.datetime(2016, 5, 31, 23, 0, 0)],
    '127.0.0.3':  [datetime.datetime(2016, 5, 31, 2, 19, 34), datetime.datetime(2016, 5, 31, 2, 19, 39)],
    '127.0.0.4':  [datetime.datetime(2016, 5, 31, 2, 20, 36), datetime.datetime(2016, 5, 31, 2, 20, 41)],
    '127.0.0.5':  [datetime.datetime(2016, 5, 31, 2, 21, 5)],
    '127.0.0.6':  [datetime.datetime(2016, 5, 31, 2, 21, 6)],
    '127.0.0.7':  [datetime.datetime(2016, 5, 31, 2, 21, 5)],
    '127.0.0.8':  [datetime.datetime(2016, 5, 31, 2, 21, 34), datetime.datetime(2016, 5, 31, 2, 21, 38)],
    '127.0.0.9': [datetime.datetime(2016, 5, 31, 2, 22, 3), datetime.datetime(2016, 5, 31, 2, 23, 5)],
    '127.0.0.10':  [datetime.datetime(2016, 5, 31, 2, 10, 22), datetime.datetime(2016, 5, 31, 2, 12, 27)],
    '127.0.0.11':  [datetime.datetime(2016, 5, 31, 3, 11, 46), datetime.datetime(2016, 5, 31, 3, 13, 54)],
    '127.0.0.12':  [datetime.datetime(2016, 5, 31, 3, 13, 9), datetime.datetime(2016, 5, 31, 3, 13, 17)]}

这些条目是从每个IP收到的请求日期时间

我需要计算每个IP每小时的平均请求次数​​

我当前的尝试在此代码处结束

def count_accesses():
    for key, value in ip_request_datetime_dict.items():
        for recived in value:
            yield recived.hour

for x in count_accesses():
    print(x)

以上基于此解决方案的代码 How to count accesses per hour from log file entries?

正确的解决方案输出可能是包含比率的字典。例如:

  • 此127.0.0.1的平均请求速率为每小时2个请求,因为您仍然可以看到02:03:48-> 03:00:53

  • 此127.0.0.2的平均请求速率为每小时3个请求

    ip_hit_rate = {'127.0.0.1':2, '127.0.0.2':3, '127.0.0.3':2 '127.0.0.4':2 '127.0.0.5':1, '127.0.0.6':1}

非常感谢您的帮助

4 个答案:

答案 0 :(得分:1)

使用itertools.groupby

import itertools

res = {}
for k,v in test_data.items():  
    counts = [len(list(g)) for _, g in itertools.groupby(sorted(v), lambda x:(x.year, x.month, x.day, x.hour))]
    res[k] = round(sum(counts)/len(counts))

输出:

{'127.0.0.1': 1,
 '127.0.0.10': 2,
 '127.0.0.11': 2,
 '127.0.0.12': 2,
 '127.0.0.2': 3,
 '127.0.0.3': 2,
 '127.0.0.4': 2,
 '127.0.0.5': 1,
 '127.0.0.6': 1,
 '127.0.0.7': 1,
 '127.0.0.8': 2,
 '127.0.0.9': 2}

答案 1 :(得分:0)

这是我每小时计算请求的方式:

import datetime

test_data={'127.0.0.1':[datetime.datetime(2016, 5, 28, 2, 3, 48), datetime.datetime(2016, 5, 31, 2, 3, 53)],
    '127.0.0.2':  [datetime.datetime(2016, 5, 30, 0, 15, 10), datetime.datetime(2016, 5, 31, 2, 18, 29), datetime.datetime(2016, 5, 31, 2, 18, 41), datetime.datetime(2016, 5, 31, 2, 18, 49), datetime.datetime(2016, 5, 31, 2, 21, 32), datetime.datetime(2016, 5, 31, 2, 21, 40), datetime.datetime(2016, 5, 31, 2, 21, 46), datetime.datetime(2016, 5, 31, 2, 22), datetime.datetime(2016, 5, 31, 23, 0, 0)],
    '127.0.0.3':  [datetime.datetime(2016, 5, 31, 2, 19, 34), datetime.datetime(2016, 5, 31, 2, 19, 39)],
    '127.0.0.4':  [datetime.datetime(2016, 5, 31, 2, 20, 36), datetime.datetime(2016, 5, 31, 2, 20, 41)],
    '127.0.0.5':  [datetime.datetime(2016, 5, 31, 2, 21, 5)],
    '127.0.0.6':  [datetime.datetime(2016, 5, 31, 2, 21, 6)],
    '127.0.0.7':  [datetime.datetime(2016, 5, 31, 2, 21, 5)],
    '127.0.0.8':  [datetime.datetime(2016, 5, 31, 2, 21, 34), datetime.datetime(2016, 5, 31, 2, 21, 38)],
    '127.0.0.9': [datetime.datetime(2016, 5, 31, 2, 22, 3), datetime.datetime(2016, 5, 31, 2, 23, 5)],
    '127.0.0.10':  [datetime.datetime(2016, 5, 31, 2, 10, 22), datetime.datetime(2016, 5, 31, 2, 12, 27)],
    '127.0.0.11':  [datetime.datetime(2016, 5, 31, 3, 11, 46), datetime.datetime(2016, 5, 31, 3, 13, 54)],
    '127.0.0.12':  [datetime.datetime(2016, 5, 31, 3, 13, 9), datetime.datetime(2016, 5, 31, 3, 13, 17)]}

def delta_to_hours(delta):
    return delta.days * 24 + delta.seconds / 3600

def calc_rate(values):
    num = len(values)
    if num <= 1:
        return 1
    diff = delta_to_hours(values[-1] - values[0])
    return num / diff

rates = {key:calc_rate(value) for key,value in test_data.items()}
print(rates)

输出为:

{
'127.0.0.1': 0.02777724195135125, 
'127.0.0.2': 0.1925248083665102, 
'127.0.0.3': 1440.0, 
'127.0.0.4': 1440.0, 
'127.0.0.5': 1, 
'127.0.0.6': 1, 
'127.0.0.7': 1, 
'127.0.0.8': 1800.0, 
'127.0.0.9': 116.12903225806451, 
'127.0.0.10': 57.599999999999994, 
'127.0.0.11': 56.25, 
'127.0.0.12': 900.0
}

答案 2 :(得分:0)

itertools.groupby很酷,但有时似乎很难理解。 collections.Counter可能也对您有所帮助,并且更易于使用:

import datetime
from collections import Counter

test_data={'127.0.0.1':[datetime.datetime(2016, 5, 28, 2, 3, 48), datetime.datetime(2016, 5, 31, 2, 3, 53)],
    '127.0.0.2':  [datetime.datetime(2016, 5, 30, 0, 15, 10), datetime.datetime(2016, 5, 31, 2, 18, 29), datetime.datetime(2016, 5, 31, 2, 18, 41), datetime.datetime(2016, 5, 31, 2, 18, 49), datetime.datetime(2016, 5, 31, 2, 21, 32), datetime.datetime(2016, 5, 31, 2, 21, 40), datetime.datetime(2016, 5, 31, 2, 21, 46), datetime.datetime(2016, 5, 31, 2, 22), datetime.datetime(2016, 5, 31, 23, 0, 0)],
    '127.0.0.3':  [datetime.datetime(2016, 5, 31, 2, 19, 34), datetime.datetime(2016, 5, 31, 2, 19, 39)],
    '127.0.0.4':  [datetime.datetime(2016, 5, 31, 2, 20, 36), datetime.datetime(2016, 5, 31, 2, 20, 41)],
    '127.0.0.5':  [datetime.datetime(2016, 5, 31, 2, 21, 5)],
    '127.0.0.6':  [datetime.datetime(2016, 5, 31, 2, 21, 6)],
    '127.0.0.7':  [datetime.datetime(2016, 5, 31, 2, 21, 5)],
    '127.0.0.8':  [datetime.datetime(2016, 5, 31, 2, 21, 34), datetime.datetime(2016, 5, 31, 2, 21, 38)],
    '127.0.0.9': [datetime.datetime(2016, 5, 31, 2, 22, 3), datetime.datetime(2016, 5, 31, 2, 23, 5)],
    '127.0.0.10':  [datetime.datetime(2016, 5, 31, 2, 10, 22), datetime.datetime(2016, 5, 31, 2, 12, 27)],
    '127.0.0.11':  [datetime.datetime(2016, 5, 31, 3, 11, 46), datetime.datetime(2016, 5, 31, 3, 13, 54)],
    '127.0.0.12':  [datetime.datetime(2016, 5, 31, 3, 13, 9), datetime.datetime(2016, 5, 31, 3, 13, 17)]}

def extract_hour(d: datetime.datetime):
    return d.date(), d.hour

result = {}
for k, v in test_data.items():
    cnt = Counter(map(extract_hour, v))
    result[k] = sum(cnt.values()) / len(cnt)

print(result)

将输出

{
    '127.0.0.1': 1.0, 
    '127.0.0.2': 3.0, 
    '127.0.0.3': 2.0, 
    '127.0.0.4': 2.0, 
    '127.0.0.5': 1.0, 
    '127.0.0.6': 1.0, 
    '127.0.0.7': 1.0, 
    '127.0.0.8': 2.0, 
    '127.0.0.9': 2.0, 
    '127.0.0.10': 2.0, 
    '127.0.0.11': 2.0, 
    '127.0.0.12': 2.0
}

答案 3 :(得分:0)

这是我一小时的解决方案:

result={}
for k in test_data.keys():
    ref = test_data[k][0]
    counter= []
    c = 1
    for h in range(1, len(test_data[k])):
        if (test_data[k][h] - ref).total_seconds() / 3600 < 1.0:
            c = c + 1
        else:
            counter.append(c)
            c = 1
            ref = test_data[k][h]
            if h == len(test_data[k])-1:
                counter.append(c)

    result[k] = float(c) if len(counter) == 0 else float(sum(counter)) / len(counter) 

print(result)

输出:

{'127.0.0.1': 2.0, 
'127.0.0.2': 3.0, 
'127.0.0.3': 2.0, 
'127.0.0.4': 2.0, 
'127.0.0.5': 1.0, 
'127.0.0.6': 1.0, 
'127.0.0.7': 1.0, 
'127.0.0.8': 2.0, 
'127.0.0.9': 2.0, 
'127.0.0.10': 2.0, 
'127.0.0.11': 2.0, 
'127.0.0.12': 2.0}