Question

我有一个10000 X 22维数组（观察x特征），我使用高斯混合物和一个组件如下：

mixture = sklearn.mixture.GaussianMixture(n_components=1, covariance_type='full').fit(my_array)

然后，我想计算前两个特征的条件分布的平均值和协方差，按照Bishop's Pattern Recognition and Machine learning方程2.81和2.82在p.87中计算。我的工作如下：

covariances = mixture.covariances_ # shape = (1, 22, 22) where 1 is the 1 component I fit and 22x22 is the covariance matrix
means = mixture_component.means_ # shape = (1, 22), 22 means; one for each feautre
dependent_data = features[:, 0:2] #shape = (10000, 2)
conditional_data = features[:, 2:] #shape = (10000, 20)
mu_a = means[:, 0:2]  # Mu of the dependent variables
mu_b = means[:, 2:]  # Mu of the independent variables
cov_aa = covariances[0, 0:2, 0:2] # Cov of the dependent vars       
cov_bb = covariances[0, 2:, 2:]  # Cov of independent vars         
cov_ab = covariances[0, 0:2, 2:]                                  
cov_ba = covariances[0, 2:, 0:2]
A = (conditional_data.transpose() - mu_b.transpose())
B = cov_ab.dot(np.linalg.inv(cov_bb))
conditional_mu = mu_a + B.dot(A).transpose()
conditional_cov = cov_aa - cov_ab.dot(np.linalg.inv(cov_bb)).dot(cov_ba)

我的问题是在计算conditional_mu和conditional_cov时，我得到以下形状：

conditional_mu.shape
(10000, 2)
conditional_cov.shape
(2,2)

我期待conditional_mu的形状应该是（1,2），因为我只想找到前两个特征的方法。为什么我会为每个观察得到一个平均值呢？

Answer 1

是的，这是预期的维度。

对于每个数据点，独立特征是固定的，并且从属特征遵循正态分布。根据独立特征，每个数据点将为从属特征赋予不同的均值。

由于您有10000个数据点，因此您应该拥有10000个依赖功能的方法，每个方法用于一个数据点。

条件分布的均值和协方差

1 个答案: