解析格式错误的CSV行

时间:2011-08-22 22:45:09

标签: ruby-on-rails ruby ruby-on-rails-3 csv fastercsv

我正在解析以下CSV行。我需要拯救下面看起来像"Malformed"的格式错误的线条。我可以使用什么正则表达式来执行此操作?我需要考虑哪些因素?

body = %(
"Sensitive",2416,159,"Test "Malformed" Failure",2789,111,7-24-11,1800,0600,"R2","12323","",""
"Sensitive",2742,107,"Test",2791,112,7-24-11,1800,0600,"R1","","",""
"Sensitive",2700,135,"Test",2792,113,7-24-11,1800,0600,"R1","12110","","")

rows = []
body.each_line do |line|
  begin
    rows << FasterCSV.parse_line(line)
  rescue FasterCSV::MalformedCSVError => e
    rows << line if rescue_from_malformed_line(line)
  rescue => e
    Rails.logger.error(e.to_s)
    Rails.logger.info(line)
  end
end

2 个答案:

答案 0 :(得分:2)

我不确定您的数据格式错误,但这是一种方法。

> puts line
"Sensitive",2416,159,"Test "Malformed" Failure",2789,111,7-24-11,1800,0600,"R2","12323","",""
>
> puts line.scan /[\d.-]+|(?:"[^"]*"[^",]*)+/
"Sensitive"
2416
159
"Test "Malformed" Failure"
2789
111
7-24-11
1800
0600
"R2"
"12323"
""
""

注意:在ruby 1.9.2p290上测试

答案 1 :(得分:0)

您可以使用正则表达式将嵌套的双引号替换为单引号,然后再将其传递给解析器。

这样的东西
.gsub(/(?<!^|,)"(?!,|$)/,"'")