Question

我有一个文件，只要我收到ping超时，它就会运行CMD跟踪路由命令，然后将其打印到文件中。我有一个格式如下的文件：

Sun 02/17/2019 13:20:44.27 PING ERROR 1

Tracing route to _____________ [IP_REDACTED]
over a maximum of 30 hops:

  1    <1 ms    <1 ms    <1 ms  192.168.1.1
  2    <1 ms    <1 ms    <1 ms  [IP_REDACTED]
  3     1 ms    <1 ms     1 ms  [IP_REDACTED]
  4     *        *        *     Request timed out.
  5     7 ms    10 ms     6 ms  [IP_REDACTED]
  6     8 ms     4 ms     6 ms  [IP_REDACTED]
  7     5 ms     7 ms     6 ms  [IP_REDACTED]

Trace complete.

Sun 02/17/2019 13:45:59.27 PING ERROR 2

Tracing route to _____________ [IP_REDACTED]
over a maximum of 30 hops:

  1    <1 ms    <1 ms    <1 ms  192.168.1.1
  2    <1 ms    <1 ms    <1 ms  [IP_REDACTED]
  3     1 ms    <1 ms     1 ms  [IP_REDACTED]
  4    23 ms     *        *     [IP_REDACTED]
  5     7 ms    10 ms     6 ms  [IP_REDACTED]
  6     8 ms     4 ms     6 ms  [IP_REDACTED]
  7     5 ms     7 ms     6 ms  [IP_REDACTED]

Trace complete.

Sun 02/17/2019 15:45:59.27 PING ERROR 3

Tracing route to _____________ [IP_REDACTED]
over a maximum of 30 hops:

  1    <1 ms    <1 ms    <1 ms  192.168.1.1
  2    <1 ms    <1 ms    <1 ms  [IP_REDACTED]
  3     1 ms    <1 ms     1 ms  [IP_REDACTED]
  4    23 ms    12 ms    11 ms  [IP_REDACTED]
  5     7 ms    10 ms     6 ms  [IP_REDACTED]
  6     8 ms     *        6 ms  [IP_REDACTED]
  7     5 ms     7 ms     6 ms  [IP_REDACTED]

Trace complete.

第一行具有trace route命令的时间戳。我想使用Python来绘制第4跳随时间丢失数据包（用“ *”字符表示）的次数。在走那条路之前，我需要整理数据。我认为嵌套字典是使用python的方法。

我是Python的新手，语法令人困惑。

以下代码显示了我的尝试。这是我想要的基本流程：

查看文件中的一行。
如果行中包含单词“ ERROR”，请保存该行
查看其他行。如果行以“ 4”开头，则分析步骤2中的数据
获取月份，日期，小时和分钟，并将它们放入单独的变量
使用此数据创建嵌套字典。
针对文件中的所有错误重复这些步骤。添加到步骤5的字典
最后，能够从任何范围打印数据（例如，一天中的错误数或一天中特定时间的错误数）

例如，字典可能看起来像这样：

day{ 6 : hour{ 2 : min{ 15 : 2, 30 : 1, 59 : 1 }, 9 : min{ 10: 1 }}}

第6天的第2小时有4个错误。这些错误发生在第15、20和59分钟。

day_d = {}

with open("2019-02-17_11-54-AM.log", "r") as fo:

    for line in fo:
        list = line.strip() # Expected: each index in list is a word
        if list.count('ERROR'):
            # Save the line to parse if trace route reports
            # bad data on hop 4
            lineToParse = line

        if "4" in list[0]:
            # We found the line that starts with "4"
            if "*" in list[1] or "*" in list[2] or "*" in list[3]:
                # We should parse the data in lineToParse

                # Expected: lineToParse[1] = "02/17/2019"
                word  = lineToParse[1].split("/")
                month = word[0] # I don't care about month
                day   = word[1]
                year  = word[2] # I don't care about year

                # Expected: lineToParse[2] == "13:20:44.27"
                word = lineToParse[2].split(":")
                hour = word[0]
                min  = word[1]
                sec  = word[2] # I don't care about seconds


                # Keep track of number occurances in min
                if day in day_d:
                    if hour in day_d[day]:
                        if min in day_d[day[hour]]
                            day_d[day[hour[min]]] += 1
                        else:
                            day_d[day[hour[min]]] = 1
                    else:
                        min_d = { min : 1 }
                        day_d[day[hour]] = min_d
                else:
                    min_d = { min : 1 }
                    hour_d = { hour : min_d }
                    day_d[day] = hour_d


#Print number of occurances in hour "12" of day "01"
hourCounter = 0;
if "01" in day_d:
    if "12" in day:
        day["12"] = hour_d
        for min in hour_d:
            hourCounter += int(hour_d[min], 10) # Convert string to base 10 int
print(hourCounter)

编辑：在查看Gnudiff的答复后，我能够完成我想做的事情。我的代码如下：

from matplotlib import pyplot as plt
from matplotlib import style

style.use('ggplot')

from datetime import datetime as DT

ping_errors = dict()
data = dict()

with open("2019-02-17_02-41-PM.log", "r") as fo:
    for line in fo:
        if 'ERROR' in line: # A tracert printout will follow
            pingtime = DT.strptime(line[:23],'%a %m/%d/%Y %H:%M:%S') # fixed format datetime format allows us just to cut the string precisely
        words = line.strip().split()
        if len(words) > 0:
            if words[0] == '4':
                if '*' in line:
                    # Found packet timeout in hop # 4
                    ping_errors[pingtime] = 1


# Create key value pairs. Keys are the hours from 1 to 24
# and values are the drops for each hour.
for i in range(1,24):
    data[i] = 0
    for x in ping_errors.keys():
        if x.time().hour == i:
            data[i] += 1


# Prepare the chart         
x_axis = list(data.keys())
y_axis = list(data.values())

fig, ax = plt.subplots()

ax.bar(x_axis, y_axis, align='center')

ax.set_title('10-second drops from ___ to ____')
ax.set_ylabel('Number of drops')
ax.set_xlabel('Hour')

ax.set_xticks(x_axis)
ax.set_yticks(y_axis)

plt.show()

Answer 1

嵌套词典看上去并不像是适合该工作的正确工具，因为语法和存储空间非常复杂，就像您自己发现的那样。

从ping的输出中可以看到，表格中的内容已经变得更加表格化了，如果保持原样，它将变得更加容易处理。

因此，您希望存储ping错误并能够找到在什么时候发生了多少错误。

如果这是一个大项目，我可能会针对一些外部数据库来存储和查询数据。但是，让我们看看如何使用Python。

这里有些东西行不通，需要更改：

1）正如Laurent在评论中提到的那样，您不能将保留字用作变量名。在这种情况下，“列表”应重命名

2）line是一个字符串，而line.strip（）仍然是字符串，而不是列表。如果要用空格分隔行字符串，则应使用类似以下内容的方法： linewords=line.split() #and use this variable instead of your list variable

3）对于尝试日期和时间操作，通常使用适当的模块非常有帮助。在这种情况下，datetime.datetime

因此，循环的开始可能像这样：

from datetime import datetime as DT

ping_errors=dict()

with open("2019-02-17_11-54-AM.log", "r") as fo:
    firstline=fo.readline()
    if 'ERROR' in firstline: # this file has an error, so we will process it
       pingtime=DT.strptime(firstline[:23],'%a %m/%d/%Y %H:%M:%S') # fixed format datetime format allows us just to cut the string precisely
       ping_errors[pingtime]=list()
       for line in fo:
           words=line.strip().split()
           if words[1]=='*':
              # this is the hop with error, add its info to this
              ping_errors[pingtime].append(words[0]) # add the number of hop which had the error

在此之后，您会得到一个不错的未嵌套字典ping_errors，该字典索引在日期时间上的精确度为秒（是的，尽管通常可能更有用，但您无需在字符串上对字典进行索引），因此可以使用过滤字典以查询您感兴趣的时间。

字典将如下所示：

{datetime.datetime(2019, 2, 17, 13, 20, 44): [4], datetime.datetime(2019, 2, 17, 13, 33, 11): [7, 8]}

这意味着在17 / feb / 2019，13:20:44我们在跃点4中出现1个错误并且在17 / feb / 2019，13:33:11我们分别在第7跳和第8跳出现了2个错误。

然后，例如，一个查询来选择在13点（每分一秒）出现错误的ping数（对于上述虚构字典）：

sum([len(ping_errors[x]) for x in ping_errors.keys() if x.time().hour==13])

这段时间内哪些跃点受到影响？

[ping_errors[x] for x in ping_errors.keys() if x.time().hour==13]

如果我们只对30到45之间的秒数感兴趣？

sum([len(ping_errors[x]) for x in ping_errors.keys() if x.time().second >= 30 and x.time().second <= 45 ])

逐行解析文本文件并将数据组织到嵌套字典中？

1 个答案: