我如何转储我的数据?

时间:2014-07-27 15:56:48

标签: hadoop apache-pig

我已经安装了PIG,我正在通过执行以下操作来加载csv:

grunt> boys = LOAD '/user/pig_input/student-boys.txt' USING PigStorage ('\t') AS (name:chararray,state:chararray,attendance:float);

2014-07-27 11:54:04,414 [main] WARN  org.apache.hadoop.conf.Configuration - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-07-27 11:54:04,414 [main] WARN  org.apache.hadoop.conf.Configuration - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2014-07-27 11:54:04,525 [main] WARN  org.apache.hadoop.conf.Configuration - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-07-27 11:54:04,525 [main] WARN  org.apache.hadoop.conf.Configuration - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

但是,当我尝试转储数据集时,我得到以下错误:

grunt> DUMP boys;
2014-07-27 11:54:12,710 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias boys
Details at logfile: /home/hduser/tmp/pig_1406476412229.log

我试着调查为什么会这样,但我无法理解确切的原因。

grunt> hduser@hadoop:~/tmp$ cat /home/hduser/tmp/pig_1406476412229.log
Pig Stack Trace
---------------
ERROR 1066: Unable to open iterator for alias boys

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias boys
    at org.apache.pig.PigServer.openIterator(PigServer.java:880)
    at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:774)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
    at org.apache.pig.Main.run(Main.java:541)
    at org.apache.pig.Main.main(Main.java:156)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:192)
Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: Parent path is not a directory: /tmp tmp
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.mkdirs(FSDirectory.java:1464)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:2175)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:2127)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.mkdirs(NameNode.java:988)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:349)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1482)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1478)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1476)

    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
    at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
    at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:1258)
    at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:504)
    at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1505)
    at org.apache.pig.backend.hadoop.datastorage.HDirectory.create(HDirectory.java:63)
    at org.apache.pig.backend.hadoop.datastorage.HPath.create(HPath.java:159)
    at org.apache.pig.impl.io.FileLocalizer.getTemporaryPath(FileLocalizer.java:481)
    at org.apache.pig.impl.io.FileLocalizer.getTemporaryPath(FileLocalizer.java:474)
    at org.apache.pig.PigServer.openIterator(PigServer.java:855)
    ... 12 more
Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: Parent path is not a directory: /tmp tmp
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.mkdirs(FSDirectory.java:1464)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:2175)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:2127)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.mkdirs(NameNode.java:988)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:349)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1482)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1478)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1476)

    at org.apache.hadoop.ipc.Client.call(Client.java:1028)
    at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198)
    at com.sun.proxy.$Proxy0.mkdirs(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:84)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at com.sun.proxy.$Proxy0.mkdirs(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:1256)
    ... 19 more

我需要帮助理解为什么我会收到这些错误?我需要更改哪些内容才能正确转储我加载的数据集?

2 个答案:

答案 0 :(得分:0)

/user/pig_input/student-boys.txt。

您可以尝试使用以下行创建文件

boys = LOAD' /user/pig_input/student-boys.txt'使用PigStorage(' \ t')AS(姓名:chararray,州:chararray,出勤:浮动);

DUMP男孩;

然后尝试使用pig filename.pig。

运行它

通常,grunt模式以本地模式运行。

谢谢,

此致 Dheeraj Rampally。

答案 1 :(得分:0)

你是如何开始咕噜声的贝壳的?您似乎处于mapreduce模式,并且要加载的文件位于本地文件系统上。 要么启动grunt shell 猪-x本地 并从本地系统加载该文件 要么 将要加载的文件复制到HDFS并启动shell 猪