get hadoop ChecksumException:校验和错误

时间:2012-09-18 00:28:24

标签: java hadoop hdfs

我们正在尝试将文件从本地复制到hadoop。 但偶尔会得到:

org.apache.hadoop.fs.ChecksumException: Checksum error: /crawler/twitcher/tmp/twitcher715632000093292278919867391792973804/Televisions_UK.20120912 at 0
    at org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:277)
    at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:241)
    at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:189)
    at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)
    at java.io.DataInputStream.read(DataInputStream.java:83)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:66)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:45)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:98)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:224)
    at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1119)
    at mcompany.HadoopTransfer.copyToHadoop(HadoopTransfer.java:81)
    at mcompany.apps.Start.pushResultFileToSubfolder(Start.java:498)
    at mcompany.apps.Start.run(Start.java:299)
    at mcompany.apps.Start.main(Start.java:89)
    at mcompany.apps.scheduler.CrawlerJobRoutine.execute(CrawlerJobRoutine.java:15)
    at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
    at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:525)

ERROR 2012-09-17 16:45:49,991 [amzn_mkpl_Worker-1] mcompany.apps.Start - 无法将文件推送到出站位置

调用copyFromLocal文件时出现异常。如果我们删除.crc文件,它可以正常工作。任何人都可以提出一些关于为什么会有这个crc问题的建议? 非常感谢你

1 个答案:

答案 0 :(得分:1)

您应该检查用于计算crc的算法是否与HDFS的版本相当。