我需要解析一个单列的CSV文件,它不仅有额外的逗号,而且还有一些名称包含额外的引号。我已经查看并阅读了之前的其他问题,其中一个最佳答案是Achintya Jha's Answer。但是,该解决方案在我的案例中似乎不起作用。一个例子是名称
ADAMS COUNTY SHERIFF "ADAMS COUNTY SHERIFF'S OFFICE, CO"
正在打印出来:
ADAMS COUNTY SHERIFF
"ADAMS COUNTY SHERIFF'S OFFICE, CO"
它正在分裂正确的位置并且正在处理额外的逗号,但是它没有达到额外的引号并且现在也在那里分裂,所以String csvSplitBy = ",(?=([^\"]*\"[^\"]*\")*[^\"]*$)";
将不起作用。有没有人知道在Java中处理这个问题的另一种方法?其他人已经用其他语言提出了这个问题的答案,但除了Java之外,我找不到任何其他的链接。谢谢!
package csvdatacompareapplication;
import java.io.*;
public class CSVDataCompareApplication {
public static void main(String[] args) {
BufferedReader br = null;
BufferedReader br2 = null;
String customerListAllCustomers = "C:\\Users\\Desktop\\customerListAllCustomers.csv";
String customerListToRemove = "C:\\Users\\Desktop\\customerListToRemove.csv";
String line = "";
String csvSplitBy = ",";
try {
br = new BufferedReader(new FileReader(customerListAllCustomers));
while ((line = br.readLine()) != null) {
// use comma as separator
//String [] customersAll = line.split(csvSplitBy);
System.out.println(line);
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (br != null) {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
}
ADAMS COUNTY SHERIFF'S OFFICE, CO
ADAMSON POLICE PRODUCTS
ADAN DAVILA
ADAPT SECURE
ADDISON PD - MIKE VINCENT
ADDISON POLICE - IL
ADDISON PORTER
ADIN MCGARVIE
ADMIRAL FIRE & SAFETY
ADMON IRAMIYA
ADRIAN DANG
ADRIAN HUMPHRIES
ADRIAN KEPKA
ADRIAN SALDANA
ADRIAN SOLER
ADRIAN YORK
ADRIENNE BAKER
ADRIENNE MOOS
ADS INC.
ADS, INC
我更新了我的java代码,现在这就是打印出来的
"ADAMS COUNTY SHERIFF'S OFFICE, CO"
ADAMSON POLICE PRODUCTS
ADAN DAVILA
ADAPT SECURE
ADDISON PD - MIKE VINCENT
ADDISON POLICE - IL
ADDISON PORTER
ADIN MCGARVIE
ADMIRAL FIRE & SAFETY
ADMON IRAMIYA
ADRIAN DANG
ADRIAN HUMPHRIES
ADRIAN KEPKA
ADRIAN SALDANA
ADRIAN SOLER
ADRIAN YORK
ADRIENNE BAKER
ADRIENNE MOOS
ADS INC.
"ADS, INC"
为什么报价会被放入?
答案 0 :(得分:1)
感谢Andreas和Tamas Hegedus帮助您澄清问题!尝试:
br = new BufferedReader(new FileReader(customerListAllCustomers));
while ((line = br.readLine()) != null) {
// one column, so don't need to use comma as separator
String line2 = line.replaceAll("^\"","").replaceAll("\"$","").replaceAll("\\\"","\"");
System.out.println(line2);
replaceAll
调用剥离引号(^\"
)和尾随引号(\"$
),然后取消剩下的引号(\\\"
)。