使用带引号字段内的双引号的OpenCSV解析CSV

时间:2017-01-31 01:42:42

标签: java csv opencsv

我正在尝试使用OpenCSV解析CSV文件。其中一列以YAML序列化格式存储数据并引用,因为它可以在其中包含逗号。它内部也有引号,因此通过输入两个引号进行转义。我能够在Ruby中轻松解析这个文件,但是使用OpenCSV我无法完全解析它。它是一个UTF-8编码文件。

这是我的Java代码片段,它试图读取文件

CSVReader reader = new CSVReader(new InputStreamReader(new FileInputStream(csvFilePath), "UTF-8"), ',', '\"', '\\');

此文件中有两行。第一行没有被正确解析,并且由于我猜错了双引号而在""[Fair Trade Certified]""被拆分。

1061658767,update,1196916,Product,28613099,Product::Source,"---
product_attributes:
-
- :name: Ornaments
  :brand_id: 49120
  :size: each
  :alcoholic: false
  :details: ""[Fair Trade Certified]""
  :gluten_free: false
  :kosher: false
  :low_fat: false
  :organic: false
  :sugar_free: false
  :fat_free: false
  :vegan: false
  :vegetarian: false
",,2015-11-01 00:06:19.796944,,,,,,
1061658768,create,,,28613100,Product::Source,"---
product_id:
retailer_id:
store_id:
source_id: 333790
locale: en_us
source_type: Product::PrehistoricProductDatum
priority: 1
is_definition:
product_attributes:
",,2015-11-01 00:06:19.927948,,,,,,

2 个答案:

答案 0 :(得分:2)

解决方案是使用兼容RFC4180的CSV解析器,如https://github.com/angular/bower-angular-animate.git/所示。我曾经使用过OpenCSV的CSVReader,它没有工作,也许我无法让它正常工作。

我使用了https://stackoverflow.com/users/103081/paul,一个RFC4180 CSV解析器,它无缝地工作。

File file = new File(csvFilePath);
CsvReader csvReader = new CsvReader();
CsvContainer csv = csvReader.read(file, StandardCharsets.UTF_8);
for (CsvRow row : csv.getRows()) {
    System.out.println(row.getFieldCount());  
}

答案 1 :(得分:0)

首先我很高兴FastCSV为你工作但我运行了疑似子串并通过3.9 openCSV运行它并且它与CsvParser和RFC4180Parser一起使用。您能否详细说明它如何解析和/或尝试使用3.9 openCSV来查看您是否遇到同样的问题,然后尝试使用下面的配置。

以下是我使用的测试:

CSVParser:

@Test
public void parseBigStringFromStackOverflowWithMultipleQuotesInLine() throws IOException {

    String bigline = "28613099,Product::Source,\"---\n" +
            "product_attributes:\n" +
            "-\n" +
            "- :name: Ornaments\n" +
            "  :brand_id: 49120\n" +
            "  :size: each\n" +
            "  :alcoholic: false\n" +
            "  :details: \"\"[Fair Trade Certified]\"\"\n" +
            "  :gluten_free: false\n" +
            "  :kosher: false\n" +
            "  :low_fat: false\n" +
            "  :organic: false\n" +
            "  :sugar_free: false\n" +
            "  :fat_free: false\n" +
            "  :vegan: false\n" +
            "  :vegetarian: false\n" +
            "\",,2015-11-01 00:06:19.796944";

    String suspectString = "---\n" +
            "product_attributes:\n" +
            "-\n" +
            "- :name: Ornaments\n" +
            "  :brand_id: 49120\n" +
            "  :size: each\n" +
            "  :alcoholic: false\n" +
            "  :details: \"[Fair Trade Certified]\"\n" +
            "  :gluten_free: false\n" +
            "  :kosher: false\n" +
            "  :low_fat: false\n" +
            "  :organic: false\n" +
            "  :sugar_free: false\n" +
            "  :fat_free: false\n" +
            "  :vegan: false\n" +
            "  :vegetarian: false\n" ;

    StringReader stringReader = new StringReader(bigline);

    CSVReaderBuilder builder = new CSVReaderBuilder(stringReader);
    CSVReader csvReader = builder.withFieldAsNull(CSVReaderNullFieldIndicator.BOTH).build();

    String item[] = csvReader.readNext();

    assertEquals(5, item.length);
    assertEquals("28613099", item[0]);
    assertEquals("Product::Source", item[1]);
    assertEquals(suspectString, item[2]);
}

RFC4180Parser

def 'parse big line from stackoverflow with complex string'() {
    given:
    RFC4180ParserBuilder builder = new RFC4180ParserBuilder()
    RFC4180Parser parser = builder.build()
    String bigline = "28613099,Product::Source,\"---\n" +
            "product_attributes:\n" +
            "-\n" +
            "- :name: Ornaments\n" +
            "  :brand_id: 49120\n" +
            "  :size: each\n" +
            "  :alcoholic: false\n" +
            "  :details: \"\"[Fair Trade Certified]\"\"\n" +
            "  :gluten_free: false\n" +
            "  :kosher: false\n" +
            "  :low_fat: false\n" +
            "  :organic: false\n" +
            "  :sugar_free: false\n" +
            "  :fat_free: false\n" +
            "  :vegan: false\n" +
            "  :vegetarian: false\n" +
            "\",,2015-11-01 00:06:19.796944"

    String suspectString = "---\n" +
            "product_attributes:\n" +
            "-\n" +
            "- :name: Ornaments\n" +
            "  :brand_id: 49120\n" +
            "  :size: each\n" +
            "  :alcoholic: false\n" +
            "  :details: \"[Fair Trade Certified]\"\n" +
            "  :gluten_free: false\n" +
            "  :kosher: false\n" +
            "  :low_fat: false\n" +
            "  :organic: false\n" +
            "  :sugar_free: false\n" +
            "  :fat_free: false\n" +
            "  :vegan: false\n" +
            "  :vegetarian: false\n"

    when:
    String[] values = parser.parseLine(bigline)

    then:
    values.length == 5
    values[0] == "28613099"
    values[1] == "Product::Source"
    values[2] == suspectString
}