解析文本(日志文件)

时间:2012-03-04 17:18:42

标签: perl parsing text logfile

我是一个perl菜鸟:)我需要你的帮助 我想解析我的log.txt但我很困惑,我不确定 这个编码。

我已经尝试过log.pcap,但它不是我需要的。所以我想解析文本不解析pcap与递归。有没有人编辑我的编码? http://pastebin.com/mwZ1y2kM

这是我的log.txt :(输入) http://pastebin.com/P0g0D6Pi

Frame 1 (640 bytes on wire, 640 bytes captured)
   Arrival Time: Jan 31, 2012 19:41:17.121115000
   [Time delta from previous captured frame: 0.000000000 seconds]
   [Time delta from previous displayed frame: 0.000000000 seconds]
   [Time since reference or first frame: 0.000000000 seconds]
   Frame Number: 1
   Frame Length: 640 bytes
   Capture Length: 640 bytes
   [Frame is marked: False]
   [Protocols in frame: eth:ip:tcp:http]
Ethernet II, Src: SunMicro_45:39:78 (01:24:4c:50:79:95), Dst:
Cisco_03:3c:dc (03:49:12:65:3f:dc)
   Destination: Cisco_03:3c:dc (03:49:12:65:3f:dc)
       Address: Cisco_03:3c:dc (03:49:12:65:3f:dc)
       .... ...0 .... .... .... .... = IG bit: Individual address
(unicast)
       .... ..0. .... .... .... .... = LG bit: Globally unique
address (factory default)
   Source: SunMicro_45:39:78 (01:24:4c:50:79:95)
       Address: SunMicro_45:39:78 (01:24:4c:50:79:95)
       .... ...0 .... .... .... .... = IG bit: Individual address
(unicast)
       .... ..0. .... .... .... .... = LG bit: Globally unique
address (factory default)
   Type: IP (0x0800)
Internet Protocol, Src: 221.255.225.143 (221.255.225.143), Dst:
10.12.264.43 (10.12.264.43)
   Version: 4
   Header length: 20 bytes
   Differentiated Services Field: 0x01 (DSCP 0x00: Default; ECN:
0x01)
       0000 00.. = Differentiated Services Codepoint: Default (0x01)
       .... ..0. = ECN-Capable Transport (ECT): 0
       .... ...0 = ECN-CE: 0
   Total Length: 626
   Identification: 0x3b68 (15208)
   Flags: 0x02 (Don't Fragment)
       0.. = Reserved bit: Not Set
       .1. = Don't fragment: Set
       ..0 = More fragments: Not Set
   Fragment offset: 0
   Time to live: 118
   Protocol: TCP (0x06)
   Header checksum: 0xfc4b [correct]
       [Good: True]
       [Bad : False]
   Source:221.255.225.143 (221.255.225.143)
   Destination: 10.12.264.43 (10.12.264.43)
Transmission Control Protocol, Src Port: 45267 (45267), Dst Port: http
(80), Seq: 1, Ack: 1, Len: 566
   Source port: 45267 (45267)
   Destination port: http (80)
   [Stream index: 0]
   Sequence number: 1    (relative sequence number)
   [Next sequence number: 587    (relative sequence number)]
   Acknowledgement number: 1    (relative ack number)
   Header length: 20 bytes
   Flags: 0x18 (PSH, ACK)
       0... .... = Congestion Window Reduced (CWR): Not set
       .0.. .... = ECN-Echo: Not set
       ..0. .... = Urgent: Not set
       ...1 .... = Acknowledgement: Set
       .... 1... = Push: Set
       .... .0.. = Reset: Not set
       .... ..0. = Syn: Not set
       .... ...0 = Fin: Not set
   Window size: 17520
   Checksum: 0xc19e [validation disabled]
       [Good Checksum: False]
       [Bad Checksum: False]
   [SEQ/ACK analysis]
       [Number of bytes in flight: 586]
Hypertext Transfer Protocol
   [truncated] GET /index.php?page=rilis&artikel=999999.9%27+union+all
+select+0x31303235343830303536%2C
       [[truncated] Expert Info (Chat/Sequence): GET /index.php?
page=rilis&artikel=999999.9%27+union+all+select
+0x31303235343830303536%2C
           [Message [truncated]: GET /index.php?
page=rilis&artikel=999999.9%27+union+all+select
+0x31303235343830303536%2C
           [Severity level: Chat]
           [Group: Sequence]
       Request Method: GET
       Request URI [truncated]: /index.php?
page=rilis&artikel=999999.9%27+union+all+select
+0x31303235343830303536%2C
       Request Version: HTTP/1.1
   Host: example.com\r\n
   Accept: */*\r\n
   User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1;
SV1; .NET CLR 2.0.50727) Havij\r\n
   Connection: Close\r\n
   \r\n

是的,我想解析这样的:(输出)

Time: Jan 31, 2012 19:41:17
   IP Address Source: 221.255.225.143
           Mac Address Source: 010912063cfc
           Port Numbers Source: 44535
   IP Address Destination: 10.12.264.43
           Mac Address Destination: 00f04c080788
           Port Numbers Destination: 3306
   HTTP Host: example.com
           Request Method: GET
           Request URI: /index.php?page=rilis artikel=999999.9%27+union
                         +all+select+0x31303235343830303536%2C
           Tool: Havij
谢谢...... 你能帮助我吗??? 我现在在做什么......

2 个答案:

答案 0 :(得分:1)

你的日志行有点奇怪。某些字段(请求URI和工具)在冒号前面包含一个空格,这是意外的。我会检查你的日志行真的包含那些意想不到的空格。此外,如果没有Source或Destination限定符,端口号会出现两次,这有点令人惊讶。再次,检查您的真实日志文件。以下代码段适用于问题中发布的日志行。由于字段名称不是唯一的,因此它假定它们始终以相同的顺序打印。

my @fields = ('Time', 'IP Address Source', 'Mac Address Source', 'Port Numbers',
    'IP Address Destination', 'Mac Address Destination', 'Port Numbers', 'HTTP Host',
    'Request Method', 'Request URI ', 'Tool ');

my $re = join(':\s+(.*?)\s+', @fields);
$re .= ':\s+(.*)';

warn $re;

$line = 'Time: Jan 31, 2012 19:41:17 IP Address Source: 221.255.225.143 Mac Address Source: 010912063cfc Port Numbers: 44535 IP Address Destination: 10.12.264.43 Mac Address Destination: 00f04c080788 Port Numbers: 3306 HTTP Host: example.com Request Method: GET Request URI : /index.php?page=rilis artikel=999999.9%27+union +all+select+0x31303235343830303536%2C Tool : Havij';

my @values = $line =~ /$re/;

print "values = @values\n"

答案 1 :(得分:1)

您是否尝试过Net::Pcap或描述的方法here