使用csv_xs perl删除引号之间的逗号和引号

时间:2014-08-01 22:33:19

标签: regex perl

我想了解如何在引用的字段中删除逗号和引号

例: #&34; 11""" BC"" S"上午,"" T" OM,&#34 ;"" 15"

在上面的示例中,值a" bc,""应该删除,对于s" am,"并且,应该删除,并且t" om," " ,"应该删除。我在CSV_XS模块中使用以下代码,但它无法正常工作。

#!/usr/bin/perl -w
#
use strict;
use Text::CSV_XS;

# Read the input filename from the command line
my $file = shift or die "Usage: $0 <csv_filename>\n";

# instantiate the CSV parser
my $csv = Text::CSV_XS->new ({ binary => 1 }, escape_char         => '"',allow_loose_quotes  => 1); 

# open the input file for read
open my $inputFH,  "<", $file or die "$file: $!";

# open the output file for write
open my $outputFH, ">", "$file.out" or die "$file.out: $!";

my @out;             # declare variables outside the loops for better performance
my $outputRow;
my $inputRow;
while ($inputRow = $csv->getline($inputFH)) { # iterate over each row in the input file
  @out=();           # empty out the array which will hold our corrected fields
  foreach (@$inputRow) {   # iterate over each field in the input row
    s/[\0\|\n\r]//g;         # get rid NUL, pipe, CR, and LF characters
    s/\s+/ /g;               # change multi-whitespace to single
    push(@out,$_);         # push the corrected field on to an array
    print "Loop"
  }
  $outputRow = join('|',@out);  # create a pipe-delimited line from the corrected array

  $outputRow =~ s/^\s+//;  # trim leading whitespace
  $outputRow =~ s/\s+$//;  # trim trailing whitespace

  print $outputFH "$outputRow\n";
}

1 个答案:

答案 0 :(得分:0)

文档说,它应该能够使用allow_loose_quotes设置为1并将escape_char设置为"以外的任何内容来处理严重损坏的CSV。