std :: poisson_distribution中C ++标准库中的错误?

时间:2017-12-01 04:06:19

标签: c++ c++11 libstdc++ c++-standard-library

我认为我在C ++标准库中遇到了std :: poisson_distribution的错误行为。

问题:

  1. 你能否确认这确实是一个错误,而不是我的错误?
  2. poisson_distribution函数的标准库代码究竟出了什么问题,假设它确实是一个bug?
  3. 详细说明:

    以下C ++代码(文件poisson_test.cc)用于生成泊松分布数:

    #include <array>
    #include <cmath>
    #include <iostream>
    #include <random>
    
    int main() {
      // The problem turned out to be independent on the engine
      std::mt19937_64 engine;
    
      // Set fixed seed for easy reproducibility
      // The problem turned out to be independent on seed
      engine.seed(1);
      std::poisson_distribution<int> distribution(157.17);
    
      for (int i = 0; i < 1E8; i++) {
        const int number = distribution(engine);
        std::cout << number << std::endl;
      }
    }
    

    我按如下方式编译此代码:

    clang++ -o poisson_test -std=c++11 poisson_test.cc
    ./poisson_test > mypoisson.txt
    

    以下python脚本用于分析文件mypoisson.txt中的随机数序列:

    import numpy as np
    import matplotlib.pyplot as plt
    
    def expectation(x, m):
        " Poisson pdf " 
        # Use Ramanujan formula to get ln n!
        lnx = x * np.log(x) - x + 1./6. * np.log(x * (1 + 4*x*(1+2*x))) + 1./2. * np.log(np.pi)
        return np.exp(x*np.log(m) - m - lnx)
    
    data = np.loadtxt('mypoisson.txt', dtype = 'int')
    
    unique, counts = np.unique(data, return_counts = True)
    hist = counts.astype(float) / counts.sum()
    stat_err = np.sqrt(counts) / counts.sum()
    plt.errorbar(unique, hist, yerr = stat_err, fmt = '.', \
                 label = 'Poisson generated \n by std::poisson_distribution')
    plt.plot(unique, expectation(unique, expected_mean), \
             label = 'expected probability \n density function')
    plt.legend()
    plt.show()
    
    # Determine bins with statistical significance of deviation larger than 3 sigma
    deviation_in_sigma = (hist - expectation(unique, expected_mean)) / stat_err
    d = dict((k, v) for k, v in zip(unique, deviation_in_sigma) if np.abs(v) > 3.0)
    print d
    

    该脚本生成以下图:

    你可以用肉眼看到问题。 n = 158时的偏差具有统计显着性,实际上是22σ偏差!

    You can see the problem by bare eye. The deviation at n = 158 is statistically significant, it is in fact a 22σ deviation!

    上一个情节的特写。

    Close-up of the previous plot.

1 个答案:

答案 0 :(得分:18)

我的系统设置如下(Debian测试):

libstdc++-7-dev:
  Installed: 7.2.0-16

libc++-dev:
  Installed: 3.5-2

clang:
  Installed: 1:3.8-37

g++:
  Installed: 4:7.2.0-1d1

我可以在使用libstdc++时确认错误:

g++ -o pois_gcc -std=c++11 pois.cpp
clang++ -o pois_clang -std=c++11 -stdlib=libstdc++ pois.cpp
clang++ -o pois_clang_libc -std=c++11 -stdlib=libc++ pois.cpp

结果:

enter image description here