Question

我在Probability＆amp; amp;统计，为了清楚地描述我的问题，请耐心等待长期介绍。感谢！

我的问题背景

假设我有几个独立的随机变量，比如说X和Y，它们的分布是已知的，例如：X~N（0,1）$和$ y~N（0,1）$。现在有一个X和Y的随机函数，$$ g（X，Y）= X ^ 2 + Y ^ 2 $$，g（X，Y）也是一个随机变量，使Z = g（X，Y ），g（X，Y）的分布未知，标记为。我的目标是得到g（）分布的期望，即E [Z]。

这里，g（X，Y）有一个简单的形式，我可以告诉g（X，Y）的分布是chi2（2）。但是当g（）的形式更复杂时，我无法直接获得g（）的分布，更难以通过g（X，Y）的分布得到期望。 - 但是，我找到了web page on wiki which talking about "Law of the unconscious statistician"。然后我意识到我可以通过来获得g（）的分布：在这个例子中，联合分布：和。

对于E [Z]的等式，我认为MC（蒙特卡罗）方法是一种适合它的方法。我根据重要性抽样重新提出了等式：

h（x，y）是已知形式的提案分布。这是一个使用Metropolis-Hasting Method的好方法。

我想测试的是：通过chi2（2）分布的样本计算的期望值应该等于MC方法计算的期望值。我在python中实现了以上所有内容，如下所示：< / p>

import numpy as np
import time
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as st
from scipy.stats import chi2
%matplotlib inline
# np.random.seed(123)

def sampler(mean=[0,0], cov=[[1,0],[0,1]], n=10):
    return st.multivariate_normal(mean, cov).rvs(n) 

def g(point):
    return point[0]**2+point[1]**2
k=2
samples_from_chi2 = chi2.rvs(k,loc=0, scale=1, size=400)
mean_chi2 = np.mean(samples_from_chi2)

iterN = 10000
begin = np.array([2**.5, 2**.5])
samples = np.array(begin)
mulN_mean = np.array([0,0])
mulN_cov = np.array([[1,0],[0,1]])
multiNormal = st.multivariate_normal
p = multiNormal(mulN_mean, mulN_cov)
timeTest_begin = time.clock()
for i in range(iterN):
    #print("------------round{a:2d}---------------".format(a=i))
    father = begin
    pdf_father = p.pdf(father)
    son = multiNormal(father, mulN_cov).rvs(1)
    pdf_son = p.pdf(son)
    # print("father={a:%f}, son={b:%f}".format(a=father, b=son))

    ''' 
    in this example, P(father|son) == P(son|father) is true,
    so just ignore the conditional probability of Metropolis–Hastings 
    algorithm
    '''
    # q_son_father = multiNormal(father, mulN_cov).pdf(son)
    # q_father_son = multiNormal(son, mulN_cov).pdf(father)
    # print("q(son|father)=", q_son_father)
    # print("q(father|son)=", q_father_son)

    g_father = g(father)
    g_son = g(son)

    r_up = g_son * pdf_son
    r_down = g_father * pdf_father
    r = r_up / r_down
    u = np.random.rand()

    if r >= u:
        samples = np.vstack((samples, son))
    else:
        samples = np.vstack((samples, father))

    begin = samples[-1]

timeTest_end = time.clock()
T = timeTest_end - timeTest_begin
#print("cost time:%d\n", T)
# print("samples=\n", samples)

G_samples = samples[:,0]**2 + samples[:,1]**2
mean_from_sample = np.mean(G_samples)
print("chi2 mean:{a}    sample_mean:{b}".format(a=mean_chi2, 
b=mean_from_sample))

x = range(iterN+1)
y_max = np.max(G_samples)
y = np.linspace(0, y_max, int(y_max*40))
fig, (ax1, ax2) = plt.subplots(2,1)
ax1.plot(x,G_samples,'-')
ax1.set_title('Trace plot')
ax1.set_xlabel('Number of points')
ax1.set_ylabel('Points location')
ax2.hist(G_samples, int(iterN/100),normed=1)
ax2.set_title('Histogram plot')
ax2.set_xlabel('Points location')
ax2.set_ylabel('Probability')
ax2.plot(y,st.chi2.pdf(2, y), 'r')
fig.subplots_adjust(left=None, bottom=None, right=None, top=None,
            wspace=None, hspace=0.8)

实施结果：

chi2 mean:2.0085551240837773    sample_mean:4.069302525056825

我的问题来了

无论样本点的数量是400还是10000，直接来自卡方分布的样本的平均值约为2（我确认它是正确的，因为如果chi2分布的自由度是k，那么期望是k，方差为2k），但MC方法计算的期望值约为4，正如上图所示，我想知道为什么两个结果之间存在2次关系，如果const存在是正确的，当我不知道随机变量函数的分布时（在本例中，卡方分布），如何找到它的值

感谢您在此阅读，请给我一些建议，非常感谢！

更新

问题已经解决了。我的假设得到了证实：两种方式的结果相同，而M-H采样方式更快。这是M-H方式的新图片（红线是$ \ chi ^ 2（2）$的pdf图），以及通过两种方式计算的期望值的平均值。

chi2 mean:2.0159110138904883    sample_mean:1.9693578501381137

Answer 1

你是对的，期望值必须相同，所以我们必须假设代码中某处应该有错误。通过它我发现了一些我不理解的步骤（也许我错了，但我试图理解你是如何实现MC的）

首先：

begin

这里你将每次父亲设置为son，但是如果移动被接受，那么要构建一个链，你必须更新它，即转到儿子。然后，从father开始构造变量rvs，但将参数1赋予if r >= u: samples = np.vstack((samples, son)) else: samples = np.vstack((samples, father))方法。这不是将平均值转换为1吗？

二。最后，正确地说，有Metropolis接受拒绝步骤

但son在哪里初始化？它必须从[0,1]范围内的均匀分布中挑选，但我在代码中找不到它。

将这些疑问（如果我是正确的话）放在一起可能的答案可能如下：father变量是从u定义的，但变量mean被移动了1.然后，没有初始化son可能是您始终接受father移动，因此您的MC链变成了多变量随机正态变量的集合，均值为1.这样您就可以得到额外的2在你的结果中。请注意，这也应该解释为什么begin到{{1}}的集合不会破坏MC链，因为它总是被抛弃，不会产生偏差。

通过联合分布中的抽样来获得随机变量函数分布的期望

我的问题背景

我的问题来了

更新

1 个答案: