Question

我在使用线程和scipy.stats.randint模块时遇到了一些麻烦。实际上，当启动多个线程时，一个本地数组（下面的代码中的bootIndexs）似乎用于所有启动的线程。

这是引发的错误

> Exception in thread Thread-559:
Traceback (most recent call last):
...
  File "..\calculDomaine3.py", line 223, in bootThread
    result = bootstrap(nbB, distMod)
  File "...\calculDomaine3.py", line 207, in bootstrap
    bootIndexs = spstats.randint.rvs(0, nbTirages-1, size = nbTirages)
  File "C:\Python27\lib\site-packages\scipy\stats\distributions.py", line 5014, in rvs
    return super(rv_discrete, self).rvs(*args, **kwargs)
  File "C:\Python27\lib\site-packages\scipy\stats\distributions.py", line 582, in rvs
    vals = reshape(vals, size)
  File "C:\Python27\lib\site-packages\numpy\core\fromnumeric.py", line 171, in reshape
    return reshape(newshape, order=order)
ValueError: total size of new array must be unchanged

这是我的代码：

import threading
import Queue
from scipy import stats as spstats

nbThreads = 4

def test(nbBoots, nbTirages,  modules ):

    def bootstrap(nbBootsThread, distribModules) :

         distribMax = []            

         for j in range(nbBootsThread): 
             bootIndexs = spstats.randint.rvs(0, nbTirages-1, size = nbTirages) 
             boot = [distribModules[i] for i in bootIndexs]

             distribMax.append(max(boot))

         return distribMax

    q = Queue.Queue()

    def bootThread (nbB, distMod):
        result = bootstrap(nbB, distMod )
        q.put(result, False)
        q.task_done()

    works = []

    for i in range(nbThreads) :     
        works.append(threading.Thread(target = bootThread, args = (nbBoots//nbThreads, modules[:],) ))


    for w in works:
        w.daemon = True
        w.start()

    q.join()

        distMaxResult = []

        for j in range(q.qsize()):
            distMaxResult += q.get()

        return distMaxResult

class classTest:
    def __init__(self):
        self.launch()

    def launch(self):
        print test(100, 1000, range(1000) )

感谢您的回答。

Answer 1

实际上，当启动多个线程时，一个本地数组（下面的代码中的bootIndexs）似乎用于所有已启动的线程。

这就是线程的全部要点：轻量级任务与产生过程共享一切！ :)如果你正在寻找一个无共享的解决方案，那么你应该看看multiprocessing module（请记住，系统中的进程比产生一个线程要重得多）。

然而，回到你的问题......我的不仅仅是在黑暗中拍摄，但你可以尝试改变这一行：

boot = [distribModules[i] for i in bootIndexs]

为：

boot = [distribModules[i] for i in bootIndexs.copy()]

（使用数组的副本而不是数组本身）。这似乎不太可能是问题（你只是迭代数组，而不是实际使用它），但是当你在你的线程中使用它时我能看到的唯一一点......

当然，如果您的数组内容不被操作它的线程更改，这当然有效。如果更改“全局”数组的值是正确的行为，那么您应该反过来实现Lock()以禁止同时访问该资源。然后，您的主题应该执行以下操作：

lock.acquire()
# Manipulate the array content here
lock.release()

Answer 2

我没有使用线程的经验，所以这可能完全不合适。

scipy.stats.randint，作为scipy.stats中的其他发行版，是相应发行版的实例。这意味着每个线程都在访问同一个实例。在rvs调用期间，设置了属性_size。如果具有不同大小的不同线程同时访问该实例，那么您将获得在重新整形时大小不匹配的ValueError。这听起来像是我的竞争条件。

我建议在这种情况下直接使用numpy.random（这是scipy.stats.randint中的调用）

numpy.random.randint(min, max, self._size)

也许你有更好的运气。

如果你需要numpy.random中没有的发行版，那么你需要在每个线程中实例化发行版的新实例，如果我的猜测是正确的。

用python进行线程化：局部变量有问题

2 个答案: