distributed.utils - ERROR - module' pyarrow'没有属性' hdfs'

时间:2018-03-21 23:41:40

标签: dask dask-distributed pyarrow

我正在尝试使用to_parquet api中的pyarrow引擎将dask数据帧写入hdfs镶木地板。

但写错失败,但有以下异常:

dask_df.to_parquet(parquet_path,engine=engine)
  File "/ebs/d1/agent/miniconda3/envs/dask-distributed/lib/python3.6/site-packages/dask/dataframe/core.py", line 985, in to_parquet
    return to_parquet(self, path, *args, **kwargs)
  File "/ebs/d1/agent/miniconda3/envs/dask-distributed/lib/python3.6/site-packages/dask/dataframe/io/parquet.py", line 618, in to_parquet
    out.compute()
  File "/ebs/d1/agent/miniconda3/envs/dask-distributed/lib/python3.6/site-packages/dask/base.py", line 135, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/ebs/d1/agent/miniconda3/envs/dask-distributed/lib/python3.6/site-packages/dask/base.py", line 333, in compute
    results = get(dsk, keys, **kwargs)
  File "/ebs/d1/agent/miniconda3/envs/dask-distributed/lib/python3.6/site-packages/distributed/client.py", line 1999, in get
    results = self.gather(packed, asynchronous=asynchronous)
  File "/ebs/d1/agent/miniconda3/envs/dask-distributed/lib/python3.6/site-packages/distributed/client.py", line 1437, in gather
    asynchronous=asynchronous)
  File "/ebs/d1/agent/miniconda3/envs/dask-distributed/lib/python3.6/site-packages/distributed/client.py", line 592, in sync
    return sync(self.loop, func, *args, **kwargs)
  File "/ebs/d1/agent/miniconda3/envs/dask-distributed/lib/python3.6/site-packages/distributed/utils.py", line 254, in sync
    six.reraise(*error[0])
  File "/ebs/d1/agent/miniconda3/envs/dask-distributed/lib/python3.6/site-packages/six.py", line 693, in reraise
    raise value
  File "/ebs/d1/agent/miniconda3/envs/dask-distributed/lib/python3.6/site-packages/distributed/utils.py", line 238, in f
    result[0] = yield make_coro()
  File "/ebs/d1/agent/miniconda3/envs/dask-distributed/lib/python3.6/site-packages/tornado/gen.py", line 1055, in run
    value = future.result()
  File "/ebs/d1/agent/miniconda3/envs/dask-distributed/lib/python3.6/site-packages/tornado/concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "", line 4, in raise_exc_info
  File "/ebs/d1/agent/miniconda3/envs/dask-distributed/lib/python3.6/site-packages/tornado/gen.py", line 1063, in run
    yielded = self.gen.throw(*exc_info)
  File "/ebs/d1/agent/miniconda3/envs/dask-distributed/lib/python3.6/site-packages/distributed/client.py", line 1315, in _gather
    traceback)
  File "/ebs/d1/agent/miniconda3/envs/dask-distributed/lib/python3.6/site-packages/six.py", line 692, in reraise
    raise value.with_traceback(tb)
  File "/ebs/d1/agent/miniconda3/envs/dask-distributed/lib/python3.6/site-packages/dask/dataframe/io/parquet.py", line 410, in _write_partition_pyarrow
    import pyarrow as pa
  File "/ebs/d1/agent/miniconda3/envs/dask-distributed/lib/python3.6/site-packages/pyarrow/__init__.py", line 113, in 
    import pyarrow.hdfs as hdfs
AttributeError: module 'pyarrow' has no attribute 'hdfs'

pyarrow版本:0.8.0和分发版本:1.20.2

但是当我尝试在python控制台中导入包时,它没有任何错误:

import pyarrow.hdfs as hdfs

1 个答案:

答案 0 :(得分:0)

这个错误来自你的一个工人。也许您的客户端计算机和工作人员的环境不同?