Question

我编写了一个Perl脚本来解析文件，将其擦除并将其放入新文件中。使用我最初使用的测试数据，但是现在我已经获得了所有实际数据，结果发现在新擦除的文件中我不想要大量的记录（主要是因为太多这些记录中的字段是空的。

所以我现在需要检查记录中的特定字段是否为空，如果是，则将其写入＆＃34;错误＆＃34;文件，而不是写出到清理数据文件。下面是我的脚本（在人们提起之前，我没有Text :: CSV模块，我也没有它可用）

注意 - 在我尝试将IF / ELSE语句放入其中之前，代码正在处理我在获得这些问题记录的实际数据之前的数据。

#!/usr/bin/perl/

use strict;
use warnings;
use Data::Dumper;
use Time::Piece;

my $filename = 'uncleanData.csv';

open my $FH, $filename
  or die "Could not read from $filename <$!>, program halting.";

# Read the header line.
chomp(my $line = <$FH>);
my @fields = split(/,/, $line);
print Dumper(@fields), $/;

my @data;
# Read the lines one by one.
while($line = <$FH>) {

    chomp($line);

以下是我使用ELSE下面的代码添加的新IF语句，但我之前的工作脚本没有改变 -

# Check if the storeNbr field is empty. If so, write record to error file.
    if (!length $fields[28]) {
        open ( my $ERR_FH, '>', "errorFiles.csv" ) or die $!;
        print $ERR_FH join(',', @$_), $/ for @data;
        close $ERR_FH;
        }

    else

        {

# Scrub data of characters that cause scripting problems down the line.
    $line =~ s/[\'\\]/ /g;

# split the fields, concatenate fields 28-30, and add the
# concatenated field to the beginning of each line in the file

    my @fields = split(/,/, $line);
    unshift @fields, join '_', @fields[28..30];

# Format the DATE fields for MySQL
    $_ = join '-', (split /\//)[2,0,1] for @fields[10,14,24,26];

# Scrub colons from the data
    $line =~ s/:/ /g;

# If Spectro_Model is "UNKNOWN", change
    if($fields[22] eq "UNKNOWN"){
        $_ = 'UNKNOW' for $fields[22];
        }

# If tran_date is blank, insert 0000-00-00
    if(!length $fields[10]){
        $_ = '0000-00-00' for $fields[10];
        }

# If init_tran_date is blank, insert 0000-00-00
    if(!length $fields[14]){
        $_ = '0000-00-00' for $fields[14];
        }

# If update_tran_date is blank, insert 0000-00-00
    if(!length $fields[24]){
        $_ = '0000-00-00' for $fields[24];
        }

# If cancel_date is blank, insert 0000-00-00
    if(!length $fields[26]){
        $_ = '0000-00-00' for $fields[26];
        }

# Format the PROD_NBR field by deleting any leading zeros before decimals.
    $fields[12] =~ s/^\s*0\././;

# put the records back
    push @data, \@fields;
}
}

close $FH;

print "Unsorted:\n", Dumper(@data); #, $/;

#Sort the clean files on Primary Key, initTranDate, updateTranDate, and updateTranTime
@data = sort {
    $a->[0] cmp $b->[0] ||
    $a->[14] cmp $b->[14] ||
    $a->[26] cmp $b->[26] ||
    $a->[27] cmp $b-> [27]
} @data;

#open my $OFH, '>', '/swpkg/shared/batch_processing/mistints/parsedMistints.csv';
open my $OFH, '>', '/swpkg/shared/batch_processing/mistints/cleaned1502.csv';
print $OFH join(',', @$_), $/ for @data;
close $OFH;

exit;

我猜测我的问题是我为声明的ELSE部分放置闭括号}的地方。以下是文件中的一些示例记录，其中最后一个文件是＆＃34;问题＆＃34;记录 -

650096571,1,1,used as store paint,14,IFC 8012NP,Standalone-9,3596,56,1/31/2015,80813,A97W01251,,1/16/2015,0.25,0.25,,SW,CUSTOM MATCH,TRUE,O,xts,,,,,,,1568,61006,1,FALSE
650368376,1,3,Tinted Wrong Color,16,IFC 8012NP,01DX8015206,,6,1/31/2015,160720,A87W01151,MATCH,1/31/2015,1,1,ENG,CUST,CUSTOM MATCH,TRUE,O,Ci52,,,,,,,1584,137252,1,FALSE
650175433,3,1,not tinted - e.w.,16,COROB MODULA HF,Standalone-7,,2,1/31/2015,95555,B20W02651,,1/29/2015,3,3,,COMP,CUSTOM MATCH,TRUE,P,xts,,,,,,,1627,68092,5,FALSE
650187016,2,1,checked out under cash ,,,,,,,,,,,,,,,,,,,,,,,,,,,,

当我运行此脚本时，它仍在处理＆＃34;错误记录＆＃34;并抛出各种＆＃34;单一价值＆＃34;警告。

Answer 1

如果您需要处理引号或嵌入式换行符，

Text::CSV非常有用。如果您需要该功能，Text::ParseWords可以替代。

但只要你没有引用担心，split就可以了。

您可以执行以下操作：

#!/usr/bin/env perl
use strict;
use warnings; 

open ( my $normal_fh, '>', "output.txt" ) or die $!;
open ( my $err_fh, '>', "errors.txt" ) or die $!;

while ( <> ) {
    if ( ( split /,/ ) [27] =~ /\w/ ) { 
        select $normal_fh;
    }
    else { 
        select $err_fh; 
    }
    print;
}

Perl - 解析文件 - 写出两个不同的文件

1 个答案: