如何使用Perl从CSV文件中提取多个列

时间:2012-02-15 21:55:29

标签: perl csv

我对Perl很新,并希望有人能帮我解决这个问题。我需要从CSV文件嵌入的逗号中提取两列。格式如下:

"ID","URL","DATE","XXID","DATE-LONGFORMAT"

我需要在DATE之后立即提取XXID列,XXID列和列。请注意,每行不一定遵循相同的列数。

XXID列包含2个字母的前缀,并不总是以相同的字母开头。它几乎可以是aplhabet的任何字母。长度总是一样的。

最后,提取这三列后,我需要对XXID列进行排序并计算重复项。

3 个答案:

答案 0 :(得分:3)

以下是使用Text::CSV模块解析csv数据的示例脚本。请参阅模块的文档以找到适合您的数据的设置。

#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;

my $csv = Text::CSV->new({ binary => 1 });

while (my $row = $csv->getline(*DATA)) {
    print "Date: $row->[2]\n";
    print "Col#1: $row->[3]\n";
    print "Col#2: $row->[4]\n";
}

答案 1 :(得分:3)

我发布了一个名为Tie::Array::CSV的模块,它允许Perl与您的CSV作为本机Perl嵌套数组进行交互。如果您使用它,您可以使用搜索逻辑并应用它,就像您的数据已经在数组引用数组中一样。看看吧!

#!/usr/bin/env perl

use strict;
use warnings;

use File::Temp;
use Tie::Array::CSV;
use List::MoreUtils qw/first_index/;
use Data::Dumper;

# this builds a temporary file from DATA
# normally you would just make $file the filename
my $file = File::Temp->new;
print $file <DATA>;
#########

tie my @csv, 'Tie::Array::CSV', $file;

#find column from data in first row
my $colnum = first_index { /^\w.{6}$/ } @{$csv[0]};
print "Using column: $colnum\n";

#extract that column
my @column = map { $csv[$_][$colnum] } (0..$#csv);

#build a hash of repetitions
my %reps;
$reps{$_}++ for @column;

print Dumper \%reps;

答案 2 :(得分:0)

您肯定希望使用CPAN库来解析CSV,因为您永远不会考虑格式的所有怪癖。

请参阅:How can I parse quoted CSV in Perl with a regex?

请参阅:How do I efficiently parse a CSV file in Perl?

但是,对于您提供的特定字符串,这是一个非常幼稚且非惯用的解决方案:

use strict;
use warnings;

my $string = '"ID","URL","DATE","XXID","DATE-LONGFORMAT"';

my @words = ();
my $word = "";
my $quotec = '"';
my $quoted = 0;

foreach my $c (split //, $string)
{
  if ($quoted)
  {
    if ($c eq $quotec)
    {
      $quoted = 0;
      push @words, $word;
      $word = "";
    }
    else
    {
      $word .= $c;
    }
  }
  elsif ($c eq $quotec)
  {
    $quoted = 1;
  }
}

for (my $i = 0; $i < scalar @words; ++$i)
{
  print "column " . ($i + 1) . " = $words[$i]\n";
}
相关问题