Question

我在尝试并行化蒙特卡罗模拟时遇到了一个奇怪的问题。我有一个模拟器来模拟圆周尘粒子的动力学。我基本上建立了一个简单的蒙特卡罗模拟，执行数以千计的尘埃粒子模拟，这些模拟都是独立的，并且由于不同的初始状态而只是不同。

运行蒙特卡罗模拟的for循环在我在单个线程上执行时似乎工作正常，并使用任何一个调用for循环体内的odeint集成函数之一可用的步进器。

一旦我将for循环并行化到多个线程上，就会发生一件非常奇怪的事情：整个蒙特卡罗模拟突然需要更长时间才能执行。例如，当我使用Bulirsch-Stoer密集输出步进器运行10次尘埃粒子模拟，每次1000年时，1个螺纹需要15分钟，2个螺纹需要1小时15分钟！

这绝对没有意义，而且我已逐行浏览代码以了解造成这种情况的原因。我完全难过了，现在我开始怀疑是否由于某种原因导致OpenMP行为不端。我可以看到没有竞争条件，任何共享数据都是const。

我的模拟器代码库可以在这里找到：https://github.com/kartikkumar/dustsim

问题与我的散装粒子模拟器中的这一行有关：https://github.com/kartikkumar/dustsim/blob/master/src/bulkParticleSimulator.cpp#L342

我使用以下配置在服务器上运行模拟：

64x AMD Opteron（TM）处理器6276
256 GB RAM
openSUSE 12.1
gcc 6.1.0
cmake 3.9.4
boost 1.65.1

以下是代码的简化版本：

const int numberOfThreads = 2;
const int numberOfParticles = 10;
const Dynamics dynamics( parameters, ... );

const double startEpoch = 0.0;
const double stepSize = 3600.0 * 24.0 * 365.25;
const int outputSteps = 1000;

typedef std::map< Int, State > InitialStates;
InitialStates initialStates;

for ( int i = 0; i < numberOfParticles; ++i )
{
    // Generate random initial state (size: 6) using Boost generators and push them into map.
    State state = f( randomGenerator, parameters, ... );

    // Key is set to a unique simulation ID.
    initialStates[ simulationId ] = state;
}

#pragma omp parallel for num_threads( numberOfThreads )
    for ( unsigned int j = 0; j < initialStates.size( ); ++j )
    {
        InitialStates::iterator initialState = initialStates.begin( );
        std::advance( initialState, j );

        const int simulationId = initialState->first;
        const State currentState = initialState->second;

#pragma omp critical( outputToConsole )
        {
            std::cout << "Executing simulation ID: " << simulationId << std::endl;
        }

        // Convert state using a conversion function.
        State convertedState = f( currentState, parameters, ... );

        std::ostringstream integrationOutput;
        const StateHistoryWriter writer( integrationOutput, parameters, ... );

        using namespace boost::numeric::odeint;
        bulirsch_stoer_dense_out< State > stepper( absoluteTolerance, relativeTolerance );
        integrate_n_steps( stepper,
                           dynamics,
                           convertedState,
                           startEpoch,
                           stepSize,
                           outputSteps,
                           writer );

        // Write output generated by numerical integrator to SQLite database.
        // To avoid locking of the database, this section is thread-critical, so will be
        // executed one-by-one by multiple threads.
#pragma omp critical( writeOutputToDatabase )
           {
           }
    }

非常感谢任何帮助！

在多个线程

0 个答案: