如何在perl中提取部分行

时间:2014-02-17 08:04:16

标签: regex perl perl-module

我想知道如何从perl中的文件中提取部分行。 我有一个日志文件,我想从中通过perl脚本提取一些有意义的信息。 我能够获得我正在寻找的整条生产线,但我只需要该生产线的一部分。

Perl脚本(我已经使用过):

#!/usr/bin/perl
use strict;
use warnings;

my $file='F:\3Np_RoboSitter\perl pgm\input.txt';

open my $fh, "<", $file or die $!;

print "************************************************************\n";
print "DC status:\n\n";

while (<$fh>) {
        print if /DC messages Picked/ .. /DC messages Picked from the Queue/;
}

print "\n************************************************************\n\n";
close ($fh); 

输入文件:

adfaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaadfafafafqdrareeaf
2014-02-14 00:18:04,840 1754897056   INFO    ApplicationService    aadfafa123    ApplicationService    ApplicationServiceCustomerID     ApplicationServiceSessionToken    Parse of XML started. |HostName=AAAAAA|TimeStamp=2014-02-14 00:16:39.044|Message=OUT;submitApplications.SubmitApplicationBatchProcess;Total 1311 DC messages Picked from the Queue.|Detail=<XMLNSC><LogMessage><messageText>Total 1311 DC messages Picked from the Queue.</messageText></LogMessage></XMLNSC>    
dafafafzcvzvsfdfafafffffffffffffffffffffffff

输出:

************************************************************
DC status:

2014-02-14 00:18:04,840 1754897056   INFO    ApplicationService    aadfafa123
 ApplicationService    ApplicationServiceCustomerID     ApplicationServiceSessio
nToken    Parse of XML started. |HostName=AAAAAA|TimeStamp=2014-02-14 00:16:39.0
44|Message=OUT;submitApplications.SubmitApplicationBatchProcess;Total 1311 DC me
ssages Picked from the Queue.|Detail=<XMLNSC><LogMessage><messageText>Total 1311
 DC messages Picked from the Queue.</messageText></LogMessage></XMLNSC>
************************************************************

期望的输出:

2014-02-14 00:18:04
Total 1311 DC messages Picked from the Queue. *(Which is between <messagetext> tag)* 

团队,请在您的空闲时刻提供您宝贵的建议!...

1 个答案:

答案 0 :(得分:2)

它总是基于输入。您的输入格式不正确(不是固定长度,不是CSV),因此最简单的是regexp方法。

while (my $line = <$fh>){
  my ($date) = split(/,/,$line,2);
  if ($line =~ s!<messageText>(.+?)</messageText>!!is){
     print "$date\n$1\n";
  }
}