Luigi-HdfsTarget模块所需的示例代码

时间:2019-02-04 11:36:54

标签: luigi

我是Luigi的新手。

我有以下示例代码,但是执行它会给我错误,因为我们有python3,我们使用HdfsTarget对象在namenode 1上连接hdfs,我正在从本地计算机执行此代码。

import os import luigi import luigi.contrib.hdfs

from luigi.contrib import webhdfs

class TestWebHdfs(luigi.Task):

    '''
    This test requires a running Hadoop cluster with WebHdfs enabled
    This test requires the luigi.cfg file to have a `hdfs` section
    with the namenode_host, namenode_port and user settings.
    '''

    def output(self):
        return luigi.contrib.hdfs.HdfsTarget("tmp/words.txt")

    def run(self):

        words = [
            'apple',
            'banana',
            'grapefruit'
            ]

        with self.output().open('w') as f:
            for word in words:
                f.write('{word}\n'.format(word=word))



         if __name__ == '__main__':
    luigi.run()

我有luigi.cgf配置

[core] default-scheduler-host=name-node-1 default-scheduler-port=8082 default-scheduler-url=http://name-node-1:8082/luigi/ hdfs-tmp-dir=/tmp log_level=DEBUG


[hdfs] snakebite_autoconfig=False namenode_host=name-node-1 namenode_port=50070 effective_user=admin client=hadoopcli


[hadoop] command=/usr/hdp/2.6.5.0-292/hadoop/bin/hadoop

error

**DEBUG: Checking if TestWebHdfs() is complete
DEBUG: Running file existence check: /usr/hdp/2.6.5.0-292/hadoop/bin/hadoop fs -stat tmp/words.txt
WARNING: Will not run TestWebHdfs() or any dependencies due to error in complete() method:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/luigi/worker.py", line 401, in check_complete
    is_complete = task.complete()
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/luigi/task.py", line 573, in complete
    return all(map(lambda output: output.exists(), outputs))
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/luigi/task.py", line 573, in <lambda>
    return all(map(lambda output: output.exists(), outputs))
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/luigi/target.py", line 243, in exists
    return self.fs.exists(path)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/luigi/contrib/hdfs/hadoopcli_clients.py", line 78, in exists
    p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, close_fds=True, universal_newlines=True)
  File "/home/ubuntu/anaconda3/lib/python3.7/subprocess.py", line 769, in __init__
    restore_signals, start_new_session)
  File "/home/ubuntu/anaconda3/lib/python3.7/subprocess.py", line 1516, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '/usr/hdp/2.6.5.0-292/hadoop/bin/hadoop': '/usr/hdp/2.6.5.0-292/hadoop/bin/hadoop'
INFO: Informed scheduler that task   TestWebHdfs__99914b932b   has status   UNKNOWN
INFO: Done scheduling tasks
INFO: Running Worker with 1 processes
DEBUG: Asking scheduler for work...
DEBUG: Done**

0 个答案:

没有答案
相关问题