在Java中拆分CSV文件,其中包含额外的逗号和额外的引号

时间:2016-05-11 17:31:37

标签: java csv split

我需要解析一个单列的CSV文件,它不仅有额外的逗号,而且还有一些名称包含额外的引号。我已经查看并阅读了之前的其他问题,其中一个最佳答案是Achintya Jha's Answer。但是,该解决方案在我的案例中似乎不起作用。一个例子是名称

ADAMS COUNTY SHERIFF "ADAMS COUNTY SHERIFF'S OFFICE, CO"

正在打印出来:

ADAMS COUNTY SHERIFF 
"ADAMS COUNTY SHERIFF'S OFFICE, CO"

它正在分裂正确的位置并且正在处理额外的逗号,但是它没有达到额外的引号并且现在也在那里分裂,所以String csvSplitBy = ",(?=([^\"]*\"[^\"]*\")*[^\"]*$)";将不起作用。有没有人知道在Java中处理这个问题的另一种方法?其他人已经用其他语言提出了这个问题的答案,但除了Java之外,我找不到任何其他的链接。谢谢!

这是我的Java代码:

package csvdatacompareapplication;
import java.io.*;

public class CSVDataCompareApplication {
    public static void main(String[] args) {

        BufferedReader br = null;
        BufferedReader br2 = null;
        String customerListAllCustomers = "C:\\Users\\Desktop\\customerListAllCustomers.csv";
        String customerListToRemove = "C:\\Users\\Desktop\\customerListToRemove.csv";
        String line = "";
        String csvSplitBy = ",";

        try {
            br = new BufferedReader(new FileReader(customerListAllCustomers));
            while ((line = br.readLine()) != null) {
                // use comma as separator
                //String [] customersAll = line.split(csvSplitBy);
                System.out.println(line);
            }
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if (br != null) {
                try {
                    br.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }

}

我的.CSV文件的前几行

ADAMS COUNTY SHERIFF'S OFFICE, CO
ADAMSON POLICE PRODUCTS
ADAN DAVILA
ADAPT SECURE
ADDISON PD - MIKE VINCENT
ADDISON POLICE - IL
ADDISON PORTER
ADIN MCGARVIE
ADMIRAL FIRE & SAFETY
ADMON IRAMIYA
ADRIAN DANG
ADRIAN HUMPHRIES
ADRIAN KEPKA
ADRIAN SALDANA
ADRIAN SOLER
ADRIAN YORK
ADRIENNE BAKER
ADRIENNE MOOS
ADS INC.
ADS, INC

我更新了我的java代码,现在这就是打印出来的

"ADAMS COUNTY SHERIFF'S OFFICE, CO"
ADAMSON POLICE PRODUCTS
ADAN DAVILA
ADAPT SECURE
ADDISON PD - MIKE VINCENT
ADDISON POLICE - IL
ADDISON PORTER
ADIN MCGARVIE
ADMIRAL FIRE & SAFETY
ADMON IRAMIYA
ADRIAN DANG
ADRIAN HUMPHRIES
ADRIAN KEPKA
ADRIAN SALDANA
ADRIAN SOLER
ADRIAN YORK
ADRIENNE BAKER
ADRIENNE MOOS
ADS INC.
"ADS, INC"

为什么报价会被放入?

1 个答案:

答案 0 :(得分:1)

感谢AndreasTamas Hegedus帮助您澄清问题!尝试:

        br = new BufferedReader(new FileReader(customerListAllCustomers));
        while ((line = br.readLine()) != null) {
            // one column, so don't need to use comma as separator
            String line2 = line.replaceAll("^\"","").replaceAll("\"$","").replaceAll("\\\"","\"");
            System.out.println(line2);

replaceAll调用剥离引号(^\")和尾随引号(\"$),然后取消剩下的引号(\\\")。

相关问题