我对数据结构有点陌生,我在从多个.txt文件中提取特定信息时遇到了一些麻烦。我想从垃圾输入文件中分组特定信息。
文件格式如下:
---------------------------------------------------
Block 1
---------------------------------------------------
Block 2
---------------------------------------------------
Block 3
---------------------------------------------------
.
.
.
作为输入的随机.txt文件(parsed.txt)看起来像这样:
---------------------------------------------------
Timestamp: 1453939200(2016-01-28 01:00:00)
Peer AS Number: 37989
Local AS Number: 12654
Peer IP Address: 203.123.48.6
Local IP Address: 193.0.4.28
---------------------------------------------------
Timestamp: 1453939200(2016-01-28 01:00:00)
Peer AS Number: 1836
Local AS Number: 12654
Peer IP Address: 146.228.1.3
Local IP Address: 193.0.4.28
---------------------------------------------------
Timestamp: 1453939200(2016-01-28 01:00:00)
Peer AS Number: 1836
Local AS Number: 12654
Peer IP Address: 146.228.1.3
Local IP Address: 193.0.4.28
---------------------------------------------------
Timestamp: 1453939200(2016-01-28 01:00:00)
Peer AS Number: 1836
Local AS Number: 12654
Peer IP Address: 2a01:2a8::3
Local IP Address: 2001:67c:2e8:2:ffff:0:4:28
必需:
每个区块中的主要字段是"本地AS号码"。我想阅读每个块,检查"本地AS号码"并更新某种数据结构,以便:
结果应如下所示:
AS 12654
Timestamp Peer AS Number Peer IP Address Local IP Address
1453939200 1836 146.228.1.3 193.0.4.28
1453939200 1836 146.228.1.3 193.0.4.28
1453939200 1836 2a01:2a8::3 2001:67c:2e8:2:ffff:0:4:28
我尝试了一些字符串操作,但事实证明它是一个完整的混乱,所以我认为应该有一个更合适的数据结构。 请注意,必须激活表才能更新,直到解析了最后一个.txt文件。这是一个问题,我绝对不知道哪里开始解决它。
答案 0 :(得分:3)
正如@AxxE建议的那样,元组列表的字典可以满足您的需求。每个列表包含存储在元组中的给定本地AS号的所有块。
我使用re模块从每一行中提取数字,将每个块的数据收集到一个元组中,该元组被添加到由字典中的本地AS编号键入的列表中。当然,可能会添加错误检查。
import re
import fileinput
records = {}
file = open('parsed.txt', 'r')
in_line = file.readline()
while in_line:
time_stamp = re.search(r': (\d+)\(',file.readline()).group(1)
peer_AS = re.search(r': (\d+)',file.readline()).group(1)
local_AS = re.search(r' \d+',file.readline()).group(0)
peer_IP = re.search(r': (.+)$',file.readline()).group(1)
local_IP = re.search(r': (.+)$',file.readline()).group(0)
if local_AS in records:
records[local_AS].append((time_stamp, peer_AS, peer_IP, local_IP))
else:
records[local_AS] = [(time_stamp, peer_AS, peer_IP, local_IP)]
in_line = file.readline()
file.close()
现在可以按照您的指示打印records
。
for i in records:
entry = records[i]
print('\t\t\tLocal AS Number: {}'.format(i))
print('Timestamp\tPeer AS Number\tPeer IP Address\t\tLocal IP Address')
for item in entry:
print('{}\t{}\t\t{}\t\t{}'.format(item[0],item[1],item[2],item[3]))
这会产生下面的输出。我将示例文件扩展为将本地AS号更改为另一个,以显示该想法。
Local AS Number 12654
Timestamp Peer AS Number Peer IP Address Local IP Address
1453939200 37989 203.123.48.6 193.0.4.28
1453939200 1836 146.228.1.3 193.0.4.28
1453939200 1836 146.228.1.3 193.0.4.28
1453939200 1836 2a01:2a8::3 2001:67c:2e8:2:ffff:0:4:28
Local AS Number 12655
Timestamp Peer AS Number Peer IP Address Local IP Address
1453939200 37989 203.123.48.6 193.0.4.28
1453939200 1836 146.228.1.3 193.0.4.28
1453939200 1836 146.228.1.3 193.0.4.28
1453939200 1836 2a01:2a8::3 2001:67c:2e8:2:ffff:0:4:28