Univocity-多次读取期间文件头验证中的问题

时间:2018-07-03 16:52:44

标签: univocity

我正在使用Univocity-Parser的bean迭代器来读取文件的每一行并获取bean。尝试多次读取同一文件时,我在库中观察到一种奇怪的行为。

将File对象传递给CsvParser实例时的代码: 私有静态void testBeanIterator()引发异常{

        try {

            File sampleFile = generateFile(0);
            /*
            System.out.println("Sample file content = " + FileUtils.readFileToString(sampleFile,
                    Charset.defaultCharset()));
                    */
            for (int i = 0; i < 1000; i++) {

                BufferedReader reader =
                        new BufferedReader(new InputStreamReader(new FileInputStream(sampleFile),
                                StandardCharsets.UTF_8));

                AtomicInteger atomicInteger = new AtomicInteger();

                final BeanProcessor<CustomerSegmentMapping> rowProcessor =
                        new BeanProcessor<CustomerSegmentMapping>(CustomerSegmentMapping.class) {

                    @Override
                    public void beanProcessed(@Nonnull final CustomerSegmentMapping customerSegmentMapping,
                            @Nonnull final ParsingContext context) {

                        try {

                            System.out.println(OBJECT_MAPPER.writeValueAsString(customerSegmentMapping));
                            atomicInteger.getAndAdd(1);

                        } catch (Exception ex) {
                            throw new RuntimeException("error");
                        }
                    }
                };
                rowProcessor.setStrictHeaderValidationEnabled(true);

                final CsvParserSettings parserSettings = new CsvParserSettings();
                parserSettings.setRowProcessor(rowProcessor);
                parserSettings.setHeaderExtractionEnabled(true);

                final CsvParser parser = new CsvParser(parserSettings);
                //parser.parse(reader);
                parser.parse(sampleFile);

                System.out.println("Finished parser");

                if (atomicInteger.get() != 10) {
                    throw new Exception("mismatch");
                }

                reader.close();
            }
        } catch (Exception ex) {

            throw new RuntimeException("exception = " + ex.getMessage(), ex);
        } finally {

        }
}

执行代码时,控制台输出如下:

{"customerId":"6bc12a7a-2c28-4aea-a7be-6be45e16ffb2","segmentId":"S1"}
{"customerId":"da736310-e508-47ff-92b8-59d490e37a72","segmentId":"S1"}
{"customerId":"9a5d4454-e6d4-49a5-bb04-8354154d0493","segmentId":"S1"}
{"customerId":"ec2ed5cc-cd18-443b-bd69-e56fc09ba0f5","segmentId":"S1"}
{"customerId":"94ea24b0-0c83-4039-a391-1d2439c88be8","segmentId":"S1"}
{"customerId":"2baef5f9-d8cd-451d-b579-a626cb58b284","segmentId":"S1"}
{"customerId":"022a184b-1b06-49aa-b1c4-b94a6f343b04","segmentId":"S1"}
{"customerId":"bcb3984c-0495-4da8-b146-9af3983cc158","segmentId":"S1"}
{"customerId":"feef62de-1aaf-43d4-a83b-afe053db97cf","segmentId":"S1"}
{"customerId":"5825c924-55d5-4fd6-8468-ca36d47a7cae","segmentId":"S1"}

Finished parser

{"customerId":"6bc12a7a-2c28-4aea-a7be-6be45e16ffb2","segmentId":"S1"}
{"customerId":"da736310-e508-47ff-92b8-59d490e37a72","segmentId":"S1"}
{"customerId":"9a5d4454-e6d4-49a5-bb04-8354154d0493","segmentId":"S1"}
{"customerId":"ec2ed5cc-cd18-443b-bd69-e56fc09ba0f5","segmentId":"S1"}
{"customerId":"94ea24b0-0c83-4039-a391-1d2439c88be8","segmentId":"S1"}
{"customerId":"2baef5f9-d8cd-451d-b579-a626cb58b284","segmentId":"S1"}
{"customerId":"022a184b-1b06-49aa-b1c4-b94a6f343b04","segmentId":"S1"}
{"customerId":"bcb3984c-0495-4da8-b146-9af3983cc158","segmentId":"S1"}
{"customerId":"feef62de-1aaf-43d4-a83b-afe053db97cf","segmentId":"S1"}
{"customerId":"5825c924-55d5-4fd6-8468-ca36d47a7cae","segmentId":"S1"}

Finished parser

{"customerId":"6bc12a7a-2c28-4aea-a7be-6be45e16ffb2","segmentId":"S1"}
{"customerId":"da736310-e508-47ff-92b8-59d490e37a72","segmentId":"S1"}
{"customerId":"9a5d4454-e6d4-49a5-bb04-8354154d0493","segmentId":"S1"}
{"customerId":"ec2ed5cc-cd18-443b-bd69-e56fc09ba0f5","segmentId":"S1"}
{"customerId":"94ea24b0-0c83-4039-a391-1d2439c88be8","segmentId":"S1"}
{"customerId":"2baef5f9-d8cd-451d-b579-a626cb58b284","segmentId":"S1"}
{"customerId":"022a184b-1b06-49aa-b1c4-b94a6f343b04","segmentId":"S1"}
{"customerId":"bcb3984c-0495-4da8-b146-9af3983cc158","segmentId":"S1"}
{"customerId":"feef62de-1aaf-43d4-a83b-afe053db97cf","segmentId":"S1"}
{"customerId":"5825c924-55d5-4fd6-8468-ca36d47a7cae","segmentId":"S1"}

Finished parser

Exception in thread "main" java.lang.RuntimeException: exception = Could not find fields [CustomerId]' in input. Names found: [ustomerId, SegmentId]
Internal state when error was thrown: line=2, column=0, record=1, charIndex=60, headers=[ustomerId, SegmentId]
    at com.poppins.cube.common.UnivocityNahiHatanaHai.testBeanIterator(UnivocityNahiHatanaHai.java:95)
    at com.poppins.cube.common.UnivocityNahiHatanaHai.main(UnivocityNahiHatanaHai.java:37)
Caused by: com.univocity.parsers.common.DataProcessingException: Could not find fields [CustomerId]' in input. Names found: [ustomerId, SegmentId]
Internal state when error was thrown: line=2, column=0, record=1, charIndex=60, headers=[ustomerId, SegmentId]
    at com.univocity.parsers.common.processor.core.BeanConversionProcessor.mapFieldIndexes(BeanConversionProcessor.java:414)
    at com.univocity.parsers.common.processor.core.BeanConversionProcessor.mapValuesToFields(BeanConversionProcessor.java:340)
    at com.univocity.parsers.common.processor.core.BeanConversionProcessor.createBean(BeanConversionProcessor.java:508)
    at com.univocity.parsers.common.processor.core.AbstractBeanProcessor.rowProcessed(AbstractBeanProcessor.java:54)
    at com.univocity.parsers.common.Internal.process(Internal.java:21)
    at com.univocity.parsers.common.AbstractParser.rowProcessed(AbstractParser.java:596)
    at com.univocity.parsers.common.AbstractParser.parse(AbstractParser.java:133)
    at com.univocity.parsers.common.AbstractParser.parse(AbstractParser.java:605)
    at com.poppins.cube.common.UnivocityNahiHatanaHai.testBeanIterator(UnivocityNahiHatanaHai.java:83)
    ... 1 more

Process finished with exit code 1

以下是文件的内容:

CustomerId,SegmentId
6bc12a7a-2c28-4aea-a7be-6be45e16ffb2,S1
da736310-e508-47ff-92b8-59d490e37a72,S1
9a5d4454-e6d4-49a5-bb04-8354154d0493,S1
ec2ed5cc-cd18-443b-bd69-e56fc09ba0f5,S1
94ea24b0-0c83-4039-a391-1d2439c88be8,S1
2baef5f9-d8cd-451d-b579-a626cb58b284,S1
022a184b-1b06-49aa-b1c4-b94a6f343b04,S1
bcb3984c-0495-4da8-b146-9af3983cc158,S1
feef62de-1aaf-43d4-a83b-afe053db97cf,S1
5825c924-55d5-4fd6-8468-ca36d47a7cae,S1

据我了解,出现此问题的原因是我将File对象传递给CsvParser。 CsvParser在内部创建一个未关闭的InputStream对象。 如果我传递的是缓冲读取器对象而不是文件对象,则不会出现此问题。

我无法理解这是否是Univocity-Parsers的已知问题,或者我有什么需要理解的地方。

1 个答案:

答案 0 :(得分:1)

此处是库的作者。我可以看到您的异常显示它的标题为ustomerId,而不是CustomerId

这似乎是2.5.0版中引入的错误,如果我没有记错的话,该错误已在2.5.6版中修复。这困扰了我一段时间,因为这是一个内部并发问题,很难追踪。基本上,当您传递没有显式编码的文件时,它将尝试在输入中查找UTF BOM标记(有效地使用第一个字符)来自动确定编码。这仅发生在InputStreams和Files中。

无论如何,此问题已得到解决,因此只需更新至最新版本即可为您解决问题(请告知我您是否未使用2.5版。)

如果要保留当前版本,请致电

parser.parse(sampleFile, Charset.defaultCharset());

这将防止解析器尝试发现文件中是否存在BOM标记,从而避免了令人讨厌的错误。

希望这会有所帮助

相关问题