为什么我在运行pandas操作时会收到dask警告?

时间:2018-03-10 07:08:39

标签: dask dask-distributed

我有一个带有pandas和dask操作的笔记本。

当我没有启动客户端时,一切都按预期进行。但是一旦我启动了dask.distributed客户端,我就会在运行pandas操作的单元格中收到警告,例如pd.read_parquet('my_file')

因为我已经开始工作,所以我得到了保姆线的数量。

警告示例:

distributed.core - WARNING - Event loop was unresponsive in Nanny for 1.26s.  This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
distributed.core - WARNING - Event loop was unresponsive in Nanny for 1.38s.  This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
distributed.core - WARNING - Event loop was unresponsive in Nanny for 1.38s.  This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
distributed.core - WARNING - Event loop was unresponsive in Nanny for 1.38s.  This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
distributed.core - WARNING - Event loop was unresponsive in Nanny for 1.37s.  This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
distributed.core - WARNING - Event loop was unresponsive in Scheduler for 1.37s.  This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
distributed.core - WARNING - Event loop was unresponsive in Nanny for 1.36s.  This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.

我想知道原因,以及如何让它们停止。

1 个答案:

答案 0 :(得分:1)

此警告意味着Dask工作进程长时间没有响应。这很糟糕,因为工作人员无法向其他工作人员提供数据,与调度程序等交谈。即使在运行计算时也不正常,因为这些计算是在不同的线程中运行的。

这个问题有两个主要原因:

  1. 您的任务运行的功能不会释放GIL。这些日子罕见(大多数熊猫操作释放GIL)但可能会发生。我相信 read_parquet的所有变体都会释放GIL
  2. 如果这只发生一次且仅在启动时发生,那么这是一个在distributed.__version__ == '1.21.3'附近修复的错误。您可能想要升级。
  3. 您还可以通过增加〜/ .dask / config.yaml文件中允许的最大滴答时间来使警告静音

    tick-maximum-delay: 10 s