通过用S形函数组成线性回归的逻辑回归

时间:2015-02-06 16:40:05

标签: matlab logistic-regression

我正在尝试实现Logistic回归算法,而不调用matlab支持的任何函数,之后我调用matlab函数进行逻辑回归mnrfit,这样我就可以交叉确认我的算法运行良好。

我正在实施的流程如下。我首先制作一个具有输入数据的向量x和一个向量y [0,1],它对每个数据x都有相应的类。我使用梯度下降对这些数据实现线性回归,一旦我提取系数,我通过sigmoid函数传递线。稍后我会对x = 10进行预测,以找出此输入的第1类的可能性。很简单......

之后我调用matlab函数mnrfit并提取逻辑回归的系数。为了进行相同的预测,我使用参数10调用函数mnrval,因为我想像以前一样预测输入x = 10。我的结果不同,我不知道为什么......

最后显示了提取的2个图,显示了每种情况下的概率密度函数。

我还附上了实施的代码。

% x is the continues input and y is the category of every output [1 or 0]
x = (1:100)';   % independent variables x(s)
y(1:10)  = 0;    % Dependent variables y(s) -- class 0
y(11:100) = 1;    % Dependent variables y(s) -- class 1
y=y';
y = y(randperm(length(y))); % Random order of y array
x=[ones(length(x),1) x]; % This is done for vectorized code

%% Initialize Linear regression parameters

m = length(y); % number of training examples
% initialize fitting parameters - all zeros
Alpha = 0; % gradient
Beta = 0;  % offset
% Some gradient descent settings
% iterations must be a big number because we are taking very small steps .
iterations = 100000;
% Learning step must be small because the line must fit the data between 
% [0 and 1]
Learning_step_a = 0.0005;  % step parameter

%% Run Gradient descent 

fprintf('Running Gradient Descent ...\n')
for iter = 1:iterations
% In every iteration calculate objective function 
h= Alpha.*x(:,2)+ Beta.*x(:,1);
% Update line variables
Alpha=Alpha - Learning_step_a * (1/m)* sum((h-y).* x(:,2));
Beta=Beta - Learning_step_a * (1/m) *  sum((h-y).*x(:,1)); 
end

% This is my linear Model
LinearModel=Alpha.*x(:,2)+ Beta.*x(:,1);
% I pass it through a sigmoid !
LogisticRegressionPDF = 1 ./ (1 + exp(-LinearModel));
% Make a prediction for p(y==1|x==10)
Prediction1=LogisticRegressionPDF(10);

%% Confirmation with matlab function mnrfit

B=mnrfit(x(:,2),y+1); % Find Logistic Regression Coefficients
mnrvalPDF = mnrval(B,x(:,2));
% Make a prediction .. p(y==1|x==10)
Prediction2=mnrvalPDF(10,2);

%% Plotting Results 

% Plot Logistic Regression Results ...
figure;
plot(x(:,2),y,'g*');
hold on
plot(x(:,2),LogisticRegressionPDF,'k--');
hold off
title('My Logistic Regression PDF')
xlabel('continues input');
ylabel('propability density function');

% Plot Logistic Regression Results (mnrfit) ...      
figure,plot(x(:,2),y,'g*');
hold on   
plot(x(:,2),mnrvalPDF(:,2),'--k') 
hold off   
title('mnrval Logistic Regression PDF')
xlabel('continues input');
ylabel('propability density function') 

为什么我的情节(只要预测)对于每个案例都不一样?

  • 您可以提取的输出在每次执行时都会有所不同,因为y向量中的1和0的顺序是随机的。

enter image description here

2 个答案:

答案 0 :(得分:1)

我使用梯度下降法开发了自己的逻辑回归算法。对于"好"训练数据,我的算法别无选择,只能收集与mnrfit相同的解决方案。对于不太好的"训练数据,我的算法没有与mnrfit关闭。系数和相关模型可以很好地预测结果,但不如mnrfit。绘制残差显示mnrfit的残差几乎为零9x10 -200,相比之下接近于零(0.00001)。我尝试改变alpha,步数和初始theta猜测,但这样做只会产生不同的theta结果。当我用一个好的数据集调整这些参数时,我的theta开始与mnrfit更好地融合。

答案 1 :(得分:1)

非常感谢user3779062的信息。 PDF文件里面是我想要的。我已经实现了随机梯度下降,因此实现Logistic回归的唯一区别是通过for循环中的sigmoid函数更新假设函数,并且只要更新thetas规则中的符号就更改顺序。结果与mnrval相同。我为很多例子实现了代码,结果大多数时候都是一样的(特别是如果数据集很好并且在两个类中都有很多信息)。我附上最终代码和结果集的随机结果。

% Machine Learning : Logistic Regression

% Logistic regression is working as linear regression but as an output
% specifies the propability to be attached to one category or the other.
% At the beginning we created a well defined data set that can be easily
% be fitted by a sigmoid function.

clear all; close all; clc;

% This example runs many times to compare a lot of results
for examples=1:10:100
clearvars -except examples

%%  Creatte Training Data 

% x is the continues input and y is the category of every output [1 or 0]
x = (1:100)';   % independent variables x(s)
y(1:examples)  = 0;    % Dependent variables y(s) -- class 0
y(examples+1:100) = 1;    % Dependent variables y(s) -- class 1
y=y';
y = y(randperm(length(y))); % Random order of y array
x=[ones(length(x),1) x]; % This is done for vectorized code

%% Initialize Linear regression parameters

m = length(y); % number of training examples
% initialize fitting parameters - all zeros
Alpha = 0; % gradient
Beta = 0;  % offset
% Some gradient descent settings
% iterations must be a big number because we are taking very small steps .
iterations = 100000;
% Learning step must be small because the line must fit the data between 
% [0 and 1]
Learning_step_a = 0.0005;  % step parameter

%% Run Gradient descent 

fprintf('Running Gradient Descent ...\n')
for iter = 1:iterations

% Linear hypothesis function 
h= Alpha.*x(:,2)+ Beta.*x(:,1);

% Non - Linear hypothesis function
hx = 1 ./ (1 + exp(-h));

% Update coefficients
Alpha=Alpha + Learning_step_a * (1/m)* sum((y-hx).* x(:,2));
Beta=Beta + Learning_step_a * (1/m) *  sum((y-hx).*x(:,1));

end

% Make a prediction for p(y==1|x==10)
Prediction1=hx(10)

%% Confirmation with matlab function mnrfit

B=mnrfit(x(:,2),y+1); % Find Logistic Regression Coefficients
mnrvalPDF = mnrval(B,x(:,2));
% Make a prediction .. p(y==1|x==10)
Prediction2=mnrvalPDF(10,2)

%% Plotting Results 

% Plot Logistic Regression Results ...
figure;
subplot(1,2,1),plot(x(:,2),y,'g*');
hold on
subplot(1,2,1),plot(x(:,2),hx,'k--');
hold off
title('My Logistic Regression PDF')
xlabel('continues input');
ylabel('propability density function');

% Plot Logistic Regression Results (mnrfit) ...      
subplot(1,2,2),plot(x(:,2),y,'g*');
hold on   
subplot(1,2,2),plot(x(:,2),mnrvalPDF(:,2),'--k') 
hold off   
title('mnrval Logistic Regression PDF')
xlabel('continues input');
ylabel('propability density function')    
end

结果..

enter image description here 非常感谢!!