Perl:比较两个CSV文件并打印匹配(修改此代码)

时间:2014-12-08 13:54:30

标签: perl csv

我是perl的新手,并且发现了解决方案: Perl: Compare Two CSV Files and Print out differences

我已经经历了许多其他解决方案,这是最接近的,除了找到2个CSV文件之间的差异,我想找到第二个CSV文件与列和行中的第一个匹配的位置。如何修改以下脚本以查找列/行中的匹配项而不是差异。我希望剖析这段代码并从那里学习数组,但是想找出这个应用程序的解决方案。非常感谢。

use strict;
my @arr1;
my @arr2;
my $a;

open(FIL,"a.txt") or die("$!");
while (<FIL>)
    {chomp; $a=$_; $a =~ s/[\t;, ]*//g; push @arr1, $a if ($a ne  '');};
close(FIL);

open(FIL,"b.txt") or die("$!");
while (<FIL>)
    {chomp; $a=$_; $a =~ s/[\t;, ]*//g; push @arr2, $a if ($a ne  '');};
close(FIL);

my %arr1hash;
my %arr2hash;
my @diffarr;
foreach(@arr1) {$arr1hash{$_} = 1; }
foreach(@arr2) {$arr2hash{$_} = 1; }

foreach $a(@arr1)
{
    if (not defined($arr2hash{$a})) 
     {
        push @diffarr, $a;
     }
}

foreach $a(@arr2)
{
   if (not defined($arr1hash{$a})) 
   { 
       push @diffarr, $a;
   }
}

print "Diff:\n";
foreach $a(@diffarr)
{
    print "$a\n";
}
# You can print to a file instead, by: print FIL "$a\n";
好吧,我意识到这更像是我在寻找的东西:

use strict;
use warnings;
use feature qw(say);
use autodie;

use constant {
    FILE_1  => "file1.txt",
    FILE_2  => "file2.txt",
};

#
# Load Hash #1 with value from File #1
#
my %hash1;
open my $file1_fh, "<", FILE_1;
while ( my $value = <$file1_fh> ) {
    chomp $value;
    $hash1{$value} = 1;
}
close $file1_fh;

#
# Load Hash #2 with value from File #2
#
my %hash2;
open my $file2_fh, "<", FILE_2;
while ( my $value = <$file2_fh> ) {
    chomp $value;
    $hash2{$value} = 1;
}
close $file2_fh;

现在我想搜索file2的哈希来检查file1的哈希是否有任何匹配。这就是我被困住的地方

使用新代码建议,代码现在看起来像这样

#!/usr/bin/env perl
use strict;
use warnings;
use feature qw(say);
use autodie;

use constant {
    FILE_1  => "masterlist.csv",
    FILE_2  => "pastebin.csv",
};

#
# Load Hash #1 with value from File #1
#
my %hash1;
open my $file1_fh, "<", FILE_1;
while ( my $value = <$file1_fh> ) {
    chomp $value;
    $hash1{$value} = 1;
}
close $file1_fh;

    my %hash2;
open my $file2_fh, "<", FILE_2;
while ( my $value = <$file2_fh> ) {
    chomp $value;
    if ( $hash1{$value} ) { 
       print "Match found $value\n";
       $hash2{$value}++;
    }
}
close $file2_fh;

print "Matches found:\n";
foreach my $key ( keys %hash2 ) {
    print "$key found $hash2{$key} times\n";
}

我用split()更新了一个部分,它似乎有用,但是必须测试更多以确认它是否适合我正在寻找的解决方案,或者我还有更多的工作要做它

#
# Load Hash #1 with value from File #1
#
my %hash1;  
open my $file1_fh, "<", FILE_1;    
while ( my $value = <$file1_fh> ) { 
chomp $value;
$hash1{$value} = ( %hash1, (split(/,/, $_))[1,2] );
}
close $file1_fh;

2 个答案:

答案 0 :(得分:1)

因此,使用您的代码 - 您已将'file1'读入哈希。

为什么不将文件2读入哈希,而是:

my %hash2;
open my $file2_fh, "<", FILE_2;
while ( my $value = <$file2_fh> ) {
    chomp $value;
    if ( $hash1{$value} ) { 
       print "Match found $value\n";
       $hash2{$value}++;
    }
}
close $file2_fh;

print "Matches found:\n";
foreach my $key ( keys %hash2 ) {
    print "$key found $hash2{$key} times\n";
}

答案 1 :(得分:0)

我认为此代码标识文件A中的数据字段与文件B中的数据字段匹配的每个位置(至少它在我的有限测试数据上):

use strict;
use warnings;
my @arr1;
my @arr2;

# a.txt -> @arr1

my $file_a_name = "poster_a.txt";
open(FIL,$file_a_name) or die("$!");
my $a_line_counter = 0;
while (my $a_line = <FIL>)
{
    $a_line_counter = $a_line_counter + 1;
    chomp($a_line); 
    my @fields = (split /,/,$a_line);
    my $num_fields = scalar(@fields);
    s{^\s+|\s+$}{}g foreach @fields;
    push @arr1, \@fields if ( $num_fields ne 0);
};;

close(FIL);
my $file_b_name = "poster_b.txt";
open(FIL,$file_b_name) or die("$!");

while (my $b_line = <FIL>)
{
    chomp($b_line); 
    my @fields = (split /,/,$b_line);    
    my $num_fields = scalar(@fields);
    s{^\s+|\s+$}{}g foreach @fields;
    push @arr2, \@fields if ( $num_fields ne 0) 
};
close(FIL);

# b.txt -> @arr2

#print "\n",@arr2, "\n";


my @match_array;
my $file_a_line_ctr = 1;
foreach my $file_a_line_fields (@arr1) 
{
    my $file_a_column_ctr = 1;
    foreach my $file_a_line_field (@{$file_a_line_fields})
    {
        my $file_b_line_ctr = 1;
        foreach my $file_b_line_fields(@arr2)
        {
            my $file_b_column_ctr = 1;
            foreach my $file_b_field (@{$file_b_line_fields})
            {
                if ( $file_b_field eq $file_a_line_field ) 
                {
                    my $match_info = 
                      "$file_a_name line $file_a_line_ctr column $file_a_column_ctr"  .
                      "  (${file_a_line_field}) matches: "  .
                      "$file_b_name line $file_b_line_ctr column $file_b_column_ctr ";
                    push(@match_array, $match_info);
                    print "$match_info \n";
                }
                $file_b_column_ctr = $file_b_column_ctr + 1;
            }
            $file_b_line_ctr = $file_b_line_ctr + 1;               
        }
        $file_a_column_ctr = $file_a_column_ctr + 1;
    }
    $file_a_line_ctr = $file_a_line_ctr + 1;
}
print "there were ", scalar(@match_array)," matches\n";