错误org.apache.sqoop.tool.ExportTool - 导出期间出错:导出作业失败

时间:2013-08-23 11:42:20

标签: mysql hadoop sqoop

我们正在尝试使用sqoop将数据从HDFS导出到mysql,并面临以下问题。

示例数据:

4564,38,153,2013-05-30 10:40:42.767,false,No credentials attempted,,,00 00 00 00 01 64 e6 a6

4565,38,160,2013-05-30 10:40:42.767,false,No credentials attempted,,,00 00 00 00 01 64 e6 a7

4566,38,80,2013-03-07 12:16:26.03,false,No SSH or Telnet credentials available. If an HTTP(S) exists for this asset, it was not able to authenticate.,,,00 00 00 00 01 0f c7 e6

遵循Sqoop程序,我们曾经将数据从HDFS导出到MYSQL,我们在表格中指定了架构:

public static void main(String[] args) {

        String[] str = { "export", "--connect", "jdbc:mysql://-------/test", 
            "--table", "status", "--username", "root", "--password", "******", 
            "--export-dir", "hdfs://-----/user/hdfs/InventoryCategoryStatus/", 
            "--input-fields-terminated-by", ",", "--input-lines-terminated-by", "\n"
            };

        Sqoop.runTool(str);
    }

程序执行后出错:

[exec:exec]
0    [main] WARN  org.apache.sqoop.tool.SqoopTool  - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
123  [main] WARN  org.apache.sqoop.tool.BaseSqoopTool  - Setting your password on the command-line is insecure. Consider using -P instead.
130  [main] WARN  org.apache.sqoop.ConnFactory  - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
Note: /tmp/sqoop-manish/compile/fd0060344195ec9b06030b84cdf6e243/status.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
9516 [main] WARN  org.apache.hadoop.util.NativeCodeLoader  - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
11166 [main] WARN  org.apache.hadoop.conf.Configuration  - mapred.jar is deprecated. Instead, use mapreduce.job.jar
16598 [main] WARN  org.apache.hadoop.conf.Configuration  - mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
16612 [main] WARN  org.apache.hadoop.conf.Configuration  - mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
16614 [main] WARN  org.apache.hadoop.conf.Configuration  - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
16618 [main] WARN  org.apache.sqoop.mapreduce.JobBase  - SQOOP_HOME is unset. May not be able to find all job dependencies.
17074 [main] WARN  org.apache.hadoop.conf.Configuration  - session.id is deprecated. Instead, use dfs.metrics.session-id
17953 [main] WARN  org.apache.hadoop.conf.Configuration  - mapred.job.classpath.files is deprecated. Instead, use mapreduce.job.classpath.files
17956 [main] WARN  org.apache.hadoop.conf.Configuration  - mapred.cache.files is deprecated. Instead, use mapreduce.job.cache.files
17957 [main] WARN  org.apache.hadoop.conf.Configuration  - mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
17958 [main] WARN  org.apache.hadoop.conf.Configuration  - mapred.mapoutput.value.class is deprecated. Instead, use mapreduce.map.output.value.class
17959 [main] WARN  org.apache.hadoop.conf.Configuration  - mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
17959 [main] WARN  org.apache.hadoop.conf.Configuration  - mapred.job.name is deprecated. Instead, use mapreduce.job.name
17959 [main] WARN  org.apache.hadoop.conf.Configuration  - mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class
17960 [main] WARN  org.apache.hadoop.conf.Configuration  - mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
17960 [main] WARN  org.apache.hadoop.conf.Configuration  - mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class
17960 [main] WARN  org.apache.hadoop.conf.Configuration  - mapred.cache.files.timestamps is deprecated. Instead, use mapreduce.job.cache.files.timestamps
17961 [main] WARN  org.apache.hadoop.conf.Configuration  - mapred.mapoutput.key.class is deprecated. Instead, use mapreduce.map.output.key.class
17961 [main] WARN  org.apache.hadoop.conf.Configuration  - mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
19283 [main] WARN  org.apache.hadoop.mapred.LocalDistributedCacheManager  - LocalJobRunner does not support symlinking into current working dir.
19312 [main] WARN  org.apache.hadoop.conf.Configuration  - mapred.cache.localFiles is deprecated. Instead, use mapreduce.job.cache.local.files
20963 [Thread-29] WARN  org.apache.hadoop.mapred.LocalJobRunner  - job_local_0001
java.lang.Exception: java.lang.NumberFormatException: For input string: " it was not able to authenticate."
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400)
Caused by: java.lang.NumberFormatException: For input string: " it was not able to authenticate."
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
    at java.lang.Integer.parseInt(Integer.java:481)
    at java.lang.Integer.valueOf(Integer.java:582)
    at status.__loadFromFields(status.java:412)
    at status.parse(status.java:334)
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:77)
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:36)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:183)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:338)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:232)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)
21692 [main] WARN  mapreduce.Counters  - Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
21698 [main] WARN  mapreduce.Counters  - Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
21699 [main] ERROR org.apache.sqoop.tool.ExportTool  - Error during export: Export job failed!
------------------------------------------------------------------------
BUILD SUCCESS
------------------------------------------------------------------------
Total time: 30.419s
Finished at: Fri Aug 23 15:28:03 IST 2013
Final Memory: 14M/113M

之后,我们检查了mysql表中只包含1600条记录中的100条记录。相同的程序,我们在另一个表上执行然后在8000条记录中只有6800条记录和376927条记录中的235202条记录被导出到mysql表中。任何人都可以提供有关上述程序执行错误的一些建议。

期待回复,非常感谢您的帮助。

2 个答案:

答案 0 :(得分:2)

查看您的示例,您似乎使用逗号作为列(字段)分隔符,但是您允许逗号成为数据本身的一部分。请注意示例数据中的第三行:

4566,38,80,2013-03-07 12:16:26.03,false,No SSH or Telnet credentials available. If an HTTP(S) exists for this asset, it was not able to authenticate.,,,00 00 00 00 01 0f c7 e6

第6列(无SSH ...)里面包含逗号。因此,这一列将被Sqoop拆分为两个不同的列,因此您将获得异常。我建议清理你的数据。如果您使用Sqoop将它们导入HDFS,您可以使用参数--enclosed-by或--escaped-by来克服此问题。

答案 1 :(得分:0)

您似乎有一个字符串,其中一个数字预期为“它无法进行身份验证”。 (正如我从你分享的踪迹中看到的那样)。请检查正在推送到数据库的源数据。

修改

使用其他字符作为分隔符。当数据被写入(我假设MR程序正在生成此数据。)到HDFS时,使用稀有字符(如^ A,#,@)作为分隔符。

'export'命令中有各种选项,例如'--enclosed-by',' - escaped-by'。但是你的数据应该做好相应的准备。最简单的选项是选择一个在数据中不太可能发生的分隔符。

修改-2 的 在这种情况下,没有任何工具可以做任何事情,因为分隔符字符进入数据字段之间没有任何转义字符或没有包含字符串(如“你好,你好吗”)。存储时需要控制数据。因此,在通过水槽提取时,您应该使用与','或'escape','(如“Hello \,How are you”)字符不同的分隔符,或者能够包含每个字段(“Hello,How are you”)。

所以你应该通过水槽提取和存储数据来实现这一目标。您应该探索是否有任何关于实现这些目的的水槽选项。

或者,您可以编写MR程序来清理或过滤掉问题记录(单独处理)或将数据加载到MYSQL上的临时表中并编写SP以解决问题记录方案并插入目标表。