Python多处理池无法处理大型数据帧

时间:2020-07-29 19:55:16

标签: python pandas python-multiprocessing

我正在尝试使用python多处理程序包来加速一系列大型数据框的旋转,尺寸约为。 10k * 12k-by-3-> 10k--12k

数据集本质上是一个长格式矩阵。

这是我尝试运行的代码:

parts = [df1, df2, df3]
pool = mp.Pool(3)
matrices = pool.map(long_to_matrix, parts)

这会触发较大矩阵的以下错误。

error                                     Traceback (most recent call last)
<ipython-input-111-7ebabe0045b8> in <module>
----> 1 matrices = pool.map(long_to_matrix, parts)

/XXXX/tools/miniconda2/envs/aj_work/lib/python3.7/multiprocessing/pool.py in map(self, func, iterable, chunksize)
    266         in a list that is returned.
    267         '''
--> 268         return self._map_async(func, iterable, mapstar, chunksize).get()
    269
    270     def starmap(self, func, iterable, chunksize=None):

/XXXX/tools/miniconda2/envs/aj_work/lib/python3.7/multiprocessing/pool.py in get(self, timeout)
    655             return self._value
    656         else:
--> 657             raise self._value
    658
    659     def _set(self, i, obj):

/XXXX/tools/miniconda2/envs/aj_work/lib/python3.7/multiprocessing/pool.py in _handle_tasks(taskqueue, put, outqueue, pool, cache)
    429                         break
    430                     try:
--> 431                         put(task)
    432                     except Exception as e:
    433                         job, idx = task[:2]

/XXXX/tools/miniconda2/envs/aj_work/lib/python3.7/multiprocessing/connection.py in send(self, obj)
    204         self._check_closed()
    205         self._check_writable()
--> 206         self._send_bytes(_ForkingPickler.dumps(obj))
    207
    208     def recv_bytes(self, maxlength=None):

/XXXX/tools/miniconda2/envs/aj_work/lib/python3.7/multiprocessing/connection.py in _send_bytes(self, buf)
    391         n = len(buf)
    392         # For wire compatibility with 3.2 and lower
--> 393         header = struct.pack("!i", n)
    394         if n > 16384:
    395             # The payload is large so Nagle's algorithm won't be triggered

error: 'i' format requires -2147483648 <= number <= 2147483647

有人知道我怎么可能解决这个问题吗?我实际上尝试将矩阵拆分为较小的块,但是切片数据帧的过程实际上要比数据透视表花费更长的时间,因此这不是一个好选择。这是错误吗?

0 个答案:

没有答案
相关问题