Perl - 解析文件 - 写出两个不同的文件

时间:2016-05-20 20:19:52

标签: perl file parsing

我编写了一个Perl脚本来解析文件,将其擦除并将其放入新文件中。使用我最初使用的测试数据,但是现在我已经获得了所有实际数据,结果发现在新擦除的文件中我不想要大量的记录(主要是因为太多这些记录中的字段是空的。

所以我现在需要检查记录中的特定字段是否为空,如果是,则将其写入"错误"文件,而不是写出到清理数据文件。下面是我的脚本(在人们提起之前,我没有Text :: CSV模块,我也没有它可用)

注意 - 在我尝试将IF / ELSE语句放入其中之前,代码正在处理我在获得这些问题记录的实际数据之前的数据。

#!/usr/bin/perl/

use strict;
use warnings;
use Data::Dumper;
use Time::Piece;

my $filename = 'uncleanData.csv';

open my $FH, $filename
  or die "Could not read from $filename <$!>, program halting.";

# Read the header line.
chomp(my $line = <$FH>);
my @fields = split(/,/, $line);
print Dumper(@fields), $/;

my @data;
# Read the lines one by one.
while($line = <$FH>) {

    chomp($line);

以下是我使用ELSE下面的代码添加的新IF语句,但我之前的工作脚本没有改变 -

# Check if the storeNbr field is empty. If so, write record to error file.
    if (!length $fields[28]) {
        open ( my $ERR_FH, '>', "errorFiles.csv" ) or die $!;
        print $ERR_FH join(',', @$_), $/ for @data;
        close $ERR_FH;
        }

    else

        {

# Scrub data of characters that cause scripting problems down the line.
    $line =~ s/[\'\\]/ /g;

# split the fields, concatenate fields 28-30, and add the
# concatenated field to the beginning of each line in the file

    my @fields = split(/,/, $line);
    unshift @fields, join '_', @fields[28..30];

# Format the DATE fields for MySQL
    $_ = join '-', (split /\//)[2,0,1] for @fields[10,14,24,26];

# Scrub colons from the data
    $line =~ s/:/ /g;

# If Spectro_Model is "UNKNOWN", change
    if($fields[22] eq "UNKNOWN"){
        $_ = 'UNKNOW' for $fields[22];
        }

# If tran_date is blank, insert 0000-00-00
    if(!length $fields[10]){
        $_ = '0000-00-00' for $fields[10];
        }

# If init_tran_date is blank, insert 0000-00-00
    if(!length $fields[14]){
        $_ = '0000-00-00' for $fields[14];
        }

# If update_tran_date is blank, insert 0000-00-00
    if(!length $fields[24]){
        $_ = '0000-00-00' for $fields[24];
        }

# If cancel_date is blank, insert 0000-00-00
    if(!length $fields[26]){
        $_ = '0000-00-00' for $fields[26];
        }

# Format the PROD_NBR field by deleting any leading zeros before decimals.
    $fields[12] =~ s/^\s*0\././;

# put the records back
    push @data, \@fields;
}
}

close $FH;

print "Unsorted:\n", Dumper(@data); #, $/;

#Sort the clean files on Primary Key, initTranDate, updateTranDate, and updateTranTime
@data = sort {
    $a->[0] cmp $b->[0] ||
    $a->[14] cmp $b->[14] ||
    $a->[26] cmp $b->[26] ||
    $a->[27] cmp $b-> [27]
} @data;

#open my $OFH, '>', '/swpkg/shared/batch_processing/mistints/parsedMistints.csv';
open my $OFH, '>', '/swpkg/shared/batch_processing/mistints/cleaned1502.csv';
print $OFH join(',', @$_), $/ for @data;
close $OFH;

exit;

我猜测我的问题是我为声明的ELSE部分放置闭括号}的地方。以下是文件中的一些示例记录,其中最后一个文件是&#34;问题&#34;记录 -

650096571,1,1,used as store paint,14,IFC 8012NP,Standalone-9,3596,56,1/31/2015,80813,A97W01251,,1/16/2015,0.25,0.25,,SW,CUSTOM MATCH,TRUE,O,xts,,,,,,,1568,61006,1,FALSE
650368376,1,3,Tinted Wrong Color,16,IFC 8012NP,01DX8015206,,6,1/31/2015,160720,A87W01151,MATCH,1/31/2015,1,1,ENG,CUST,CUSTOM MATCH,TRUE,O,Ci52,,,,,,,1584,137252,1,FALSE
650175433,3,1,not tinted - e.w.,16,COROB MODULA HF,Standalone-7,,2,1/31/2015,95555,B20W02651,,1/29/2015,3,3,,COMP,CUSTOM MATCH,TRUE,P,xts,,,,,,,1627,68092,5,FALSE
650187016,2,1,checked out under cash ,,,,,,,,,,,,,,,,,,,,,,,,,,,,

当我运行此脚本时,它仍在处理&#34;错误记录&#34;并抛出各种&#34;单一价值&#34;警告。

1 个答案:

答案 0 :(得分:0)

如果您需要处理引号或嵌入式换行符,

Text::CSV非常有用。如果您需要该功能,Text::ParseWords可以替代。

但只要你没有引用担心,split就可以了。

您可以执行以下操作:

#!/usr/bin/env perl
use strict;
use warnings; 

open ( my $normal_fh, '>', "output.txt" ) or die $!;
open ( my $err_fh, '>', "errors.txt" ) or die $!;

while ( <> ) {
    if ( ( split /,/ ) [27] =~ /\w/ ) { 
        select $normal_fh;
    }
    else { 
        select $err_fh; 
    }
    print;
}