估计朴素 - 贝叶斯概率函数

时间:2017-04-20 10:08:16

标签: python matlab numpy

我一直有使用Matlab转换为Python的问题。我有Matlab的代码我去年写过(工作),现在尝试将函数转换为Python。其中5个工作,4个不工作。我真的被困住了,我会很乐意帮忙。 这个是关于估计朴素贝叶斯概率。这是Matlab中的函数:

function [ p_x_y ] = estimate_p_x_y_NB(Xtrain,ytrain,a,b )

% Function calculates probability distribution p(x|y), assuming that x is binary
% and its elements are independent from each other

% Xtrain - training dataset NxD
% ytrain - training dataset class labels 1xN
% p_x_y - binomial distribution estimators - element at position(m,d)
% represents estimator p(x_d=1|y=m) MxD
% N - number of elements in training dataset
D = size(Xtrain,2);
M = length(unique(ytrain));
p_x_y = zeros(M,D);
for i=1:M
    for j=1:D
        numerator = sum((ytrain==i).*((Xtrain(:,j)==1))')+a-1;
        denominator = sum(ytrain==i)+a+b-2;
        p_x_y(i,j) = numerator/denominator;
    end
end
end

这是我对Python的翻译:

def estimate_p_x_y_nb(Xtrain, ytrain, a, b):
    """
    :param Xtrain: training data NxD
    :param ytrain: class labels for training data 1xN
    :param a: parameter a of Beta distribution
    :param b: parameter b of Beta distribution
    :return: Function calculated probality p(x|y) assuming that x takes binary values and elements
    x are independent from each other. Function returns matrix p_x_y that has size MxD.
    """
    D = Xtrain.shape[1]
    M = len(np.unique(ytrain))
    p_x_y = np.zeros((M, D))
    for i in range (M):
        for j in range(D):
            up = np.sum((ytrain == i+1).dot((Xtrain[:, j]==1)).conjugate().T) + a - 1
            down = np.sum((ytrain == i+1) + a + b -2)
            p_x_y[i,j] = up/down
    return p_x_y

回溯:

    p_x_y[i,j] = up/down
ValueError: setting an array element with a sequence.

如果你能看到这个功能的任何问题,我会非常高兴地指出它。另外,我在.dot变量中使用了*而不只是up,因为当它是*时,我得到了关于不准确维度的错误,但是有了这个,似乎工作。谢谢。

1 个答案:

答案 0 :(得分:1)

我认为您在分配分母的声明中存在问题。您错误地使用了括号

  

down = np.sum((ytrain == i + 1)+ a + b -2)

应该是

down = np.sum((ytrain == i+1)) + a + b -2

另外,尝试更改

  

up = np.sum((ytrain == i + 1).dot((Xtrain [:,j] == 1))。conjugate()。T)+ a - 1

up = np.sum((ytrain == i+1) * (Xtrain[:, j]==1)) + a - 1

我希望有效。我没有看到您的代码有任何其他问题。

更改后,我使用了值

Xtrain = np.array([[1,2,3,4,5], [1,2,3,4,5]])
ytrain = np.array([1,2])
a = 1
b = 1

这给出了输出

array([[ 1.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.]])

在MATLAB和python中。如果结果符合预期,您可以使用这些值进行检查。