在多个列表中对唯一身份进行排序

时间:2013-01-17 14:11:56

标签: python sorting readline

我有以下问题:

>>> lines = tuple(open('/var/log/fail2ban.log', 'r'))
>>> for item in lines:
...     item = item.strip('\n')
...     if "fail2ban.actions:" in item and "[postfix]" in item and "Ban" in item:
...             item = item.split(' ')
...             print item
...
['2013-01-17', '11:03:51,752', 'fail2ban.actions:', 'WARNING', '[postfix]', 'Ban', '87.111.253.157']
['2013-01-17', '11:10:42,612', 'fail2ban.actions:', 'WARNING', '[postfix]', 'Ban', '37.206.77.26']
['2013-01-17', '11:23:08,674', 'fail2ban.actions:', 'WARNING', '[postfix]', 'Ban', '37.2.185.188']
['2013-01-17', '12:40:44,997', 'fail2ban.actions:', 'WARNING', '[postfix]', 'Ban', '37.2.185.188']
['2013-01-17', '13:28:38,006', 'fail2ban.actions:', 'WARNING', '[postfix]', 'Ban', '194.106.26.177']
['2013-01-17', '13:43:56,959', 'fail2ban.actions:', 'WARNING', '[postfix]', 'Ban', '70.27.53.95']
['2013-01-17', '14:42:36,601', 'fail2ban.actions:', 'WARNING', '[postfix]', 'Ban', '95.120.42.12']
['2013-01-17', '14:45:35,147', 'fail2ban.actions:', 'WARNING', '[postfix]', 'Ban', '95.120.42.12']

我非常想知道如何过滤重复项(第[6]项,本例中为ip),以便只打印唯一值。

3 个答案:

答案 0 :(得分:0)

您可以创建一个您已经看过的列表或IP集,然后在打印该行之前检查列表。

这样的事情:

lines = tuple(open('/var/log/fail2ban.log', 'r'))
seen = set()    
for item in lines:
  item = item.strip('\n')
  if "fail2ban.actions:" in item and "[postfix]" in item and "Ban" in item:
    item = item.split(' ')
    if item[6] not in seen:
      seen.add(item[6])
      print item

答案 1 :(得分:0)

>>> lines = tuple(open('/var/log/fail2ban.log', 'r'))
>>> seen = set()    
>>> for item in lines:
...     item = item.strip('\n')
...     if "fail2ban.actions:" in item and "[postfix]" in item and "Ban" in item:
...             item = item.split(' ')
...             if item[6] not in seen: 
...                 print item
...             else:
...                 seen.add(item[6])

答案 2 :(得分:0)

如果您只想要每个IP一个条目,并且没有结果是哪个条目,请尝试:

item_dict = dict()
lines = tuple(open('/var/log/fail2ban.log', 'r'))
for item in lines:
    item = item.strip('\n')
    if "fail2ban.actions:" in item and "[postfix]" in item and "Ban" in item:
            item = item.split(' ')
            item_dict[item[6]]=item[:-1]

print(item_dict)

[编辑]: 如果订单很重要,您可以使用OrderedDict。要做到这一点,只需替换

item_dict = dict()

from collections import OrderedDict
item_dict = OrderedDict()

[编辑2]: 如果您只需要一组符合条件的IP,那么就应该使用一套。

item_set = set()
lines = tuple(open('/var/log/fail2ban.log', 'r'))
for item in lines:
    item = item.strip('\n')
    if "fail2ban.actions:" in item and "[postfix]" in item and "Ban" in item:
            item = item.split(' ')
            item_set.add(item[6])

print('\n'.join(item_set))

根据定义,集合中的每个元素都是唯一的。