Python 2.7 ProcessPoolExecutor抛出IOError:[Errno 32]管道损坏

时间:2017-07-06 01:10:30

标签: python python-2.7 multiprocessing python-multiprocessing

我将数据以块的形式传输到一个类中。对于每个数据块,在同一ProcessPoolExecutor上执行两种不同类型的np.convolve()。调用的卷积类型由返回变量确定。  必须保持数据的顺序,因此每个未来都有一个相关的序列号。 output函数强制只返回来自连续期货的数据(下面未显示)。根据我的理解,我正在调用ProcessPoolExecutor.shutdown()函数,但我仍然得到IOError

错误是:

$ python processpoolerror.py 
ran 5000000 samples in 3.70395112038 sec: 1.34990982265 Msps
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe

对不起它有点长,但是我在保留错误的同时尽可能地修剪了这个课程。在我的计算机Ubuntu 16.04.2上使用Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz配对的代码始终会出现此错误。在此代码的非修剪版本中,Broken管道占25%的时间。

如果您将行78编辑为True,并在执行期间打印,则不会抛出错误。如果减少行100上的数据量,则不会引发错误。我在这做错了什么?感谢。

import numpy as np
from concurrent.futures import ProcessPoolExecutor
import time


def _do_xcorr3(rev_header, packet_chunk, seq):
    r1 = np.convolve(rev_header, packet_chunk, 'full')
    return 0, seq, r1

def _do_power3(power_kernel, packet_chunk, seq):
    cp = np.convolve(power_kernel, np.abs(packet_chunk) ** 2, 'full')
    return 1, seq, cp

class ProcessPoolIssues():

    ## Constructor
    # @param chunk_size how many samples to feed in during input() stage
    def __init__(self,header,chunk_size=500,poolsize=5):
        self.chunk_size = chunk_size  ##! How many samples to feed

        # ProcessPool stuff
        self.poolsize = poolsize
        self.pool = ProcessPoolExecutor(poolsize)
        self.futures = []

        # xcr stage stuff
        self.results0 = []
        self.results0.append((0, -1, np.zeros(chunk_size)))

        # power stage stuff
        self.results1 = []
        self.results1.append((1, -1, np.zeros(chunk_size)))

        self.countin = 0
        self.countout = -1

    def shutdown(self):
        self.pool.shutdown(wait=True)


    ## Returns True if all data has been extracted for given inputs
    def all_done(self):
        return self.countin == self.countout+1

    ## main function
    # @param packet_chunk an array of chunk_size samples to be computed
    def input(self, packet_chunk):
        assert len(packet_chunk) == self.chunk_size

        fut0 = self.pool.submit(_do_xcorr3, packet_chunk, packet_chunk, self.countin)
        self.futures.append(fut0)

        fut1 = self.pool.submit(_do_power3, packet_chunk, packet_chunk, self.countin)
        self.futures.append(fut1)

        self.countin += 1


    # loops through thread pool, copying any results from done threads into results0/1 (and then terminating them)
    def cultivate_pool(self):
        todel = []

        for i, f in enumerate(self.futures):
            # print "checking", f
            if f.done():
                a, b, c = f.result()
                if a == 0:
                    self.results0.append((a,b,c))  # results from one type of future
                elif a == 1:
                    self.results1.append((a,b,c))  # results from another type of future
                todel.append(i)

        # now we need to remove items from futures that are done
        # we need do it in reverse order so we remove items from the end first (thereby not affecting indices as we go)
        for i in sorted(todel, reverse=True):
            del self.futures[i]

            if False:  # change this to true and error goes away
                print "deleting future #", i

    # may return None
    def output(self):

        self.cultivate_pool()  # modifies self.results list

        # wait for both results to be done before clearing
        if len(self.results0) and len(self.results1):
            del self.results0[0]
            del self.results1[0]
            self.countout += 1

        return None


def testRate():
    chunk = 500

    # a value of 10000 will throw:  IOError: [Errno 32] Broken pipe
    # smaller values like 1000 do not
    din = chunk * 10000

    np.random.seed(666)
    search = np.random.random(233) + np.random.random(233) * 1j
    input = np.random.random(din) + np.random.random(din) * 1j

    pct = ProcessPoolIssues(search, chunk, poolsize=8)

    st = time.time()
    for x in range(0, len(input), chunk):
        slice = input[x:x + chunk]
        if len(slice) != chunk:
            break
        pct.input(slice)
        pct.output()

    while not pct.all_done():
        pct.output()

    ed = time.time()
    dt = ed - st

    print "ran", din, "samples in", dt, "sec:", din / dt / 1E6, "Msps"

    pct.shutdown()

if __name__ == '__main__':
    testRate()

1 个答案:

答案 0 :(得分:1)

这可能正在发生,因为当你尝试一次发送更大的块时,你超过了管道的缓冲区大小。

def _do_xcorr3(rev_header, packet_chunk, seq):
    r1 = np.convolve(rev_header, packet_chunk, 'full')
    return 0, seq, r1

def _do_power3(power_kernel, packet_chunk, seq):
    cp = np.convolve(power_kernel, np.abs(packet_chunk) ** 2, 'full')
    return 1, seq, cp

值r1和cp非常大,因为你正在使用块的平方进行卷积。

因此,当您尝试使用较大的块大小运行时,IO管道的缓冲区无法处理它。请参阅this以获得更清晰的理解。

至于问题的第二部分,

if False:  # change this to true and error goes away
                print "deleting future #", i

在py3 docs中找到了这个:

  

16.2.4.4。重入   二进制缓冲对象(BufferedReader,BufferedWriter,BufferedRandom和BufferedRWPair的实例)不可重入。虽然在正常情况下不会发生可重入调用,但它们可能来自在信号处理程序中执行I / O.如果某个线程试图重新输入它正在访问的缓冲对象,则会引发RuntimeError。请注意,这不会禁止其他线程进入缓冲对象。   上面隐式扩展到文本文件,因为open()函数将一个缓冲的对象包装在TextIOWrapper中。这包括标准流,因此也会影响内置函数print()。