使用OLS回归预测未来值(Python,StatsModels,Pandas)

时间:2015-05-11 21:52:52

标签: python pandas statsmodels

我目前正在尝试用Python实现MLR,并且我不确定如何应用我已经发现的未来值的系数。

<?php 
$index="menu-items";
$Topalbums="menu-items";
$Topartists="menu-items";
$Toplists="menu-items";
$Charts="menu-items";
$memuLinkid=basename($_SERVER['PHP_SELF'],".php");
if($menuLinkid=="index"){
$index='active';
}else if ($memuLinkid=="Topalbums"){
$Topalbums='active';
}else if ($memuLinkid=="Topartists"){
$Topartists='active';
}else if ($memuLinkid=="Toplists"){
$Toplists='active';
}else if ($memuLinkid=="Charts"){
$Charts='active';
}
?>

所以,让我们说我想预测&#34;销售&#34;对于以下DataFrame:

import pandas as pd
import statsmodels.formula.api as sm
import statsmodels.api as sm2

TV = [230.1, 44.5, 17.2, 151.5, 180.8]
Radio = [37.8,39.3,45.9,41.3,10.8]
Newspaper = [69.2,45.1,69.3,58.5,58.4]
Sales = [22.1, 10.4, 9.3, 18.5,12.9]
df = pd.DataFrame({'TV': TV, 
                   'Radio': Radio, 
                   'Newspaper': Newspaper, 
                   'Sales': Sales})

Y = df.Sales
X = df[['TV','Radio','Newspaper']]
X = sm2.add_constant(X)
model = sm.OLS(Y, X).fit()
>>> model.params
const       -0.141990
TV           0.070544
Radio        0.239617
Newspaper   -0.040178
dtype: float64

我一直在尝试一种我在这里找到的方法,但我似乎无法让它发挥作用:Forecasting using Pandas OLS

谢谢!

1 个答案:

答案 0 :(得分:6)

假设df2是您的新样本DataFrame:

model = sm.OLS(Y, X).fit()
new_x = df2.loc[df.Sales.notnull(), ['TV', 'Radio', 'Newspaper']].values
new_x = sm2.add_constant(new_x)  # sm2 = statsmodels.api
y_predict = model.predict(new_x)

>>> y_predict
array([ 4.61319034,  5.88274588,  6.15220225])

您可以按如下方式将结果直接分配给df2:

df2.loc[:, 'Sales'] = model.predict(new_x)

要使用回归中的预测从原始DataFrame填充缺少的Sales值,请尝试:

X = df.loc[df.Sales.notnull(), ['TV', 'Radio', 'Newspaper']]
X = sm2.add_constant(X)
Y = df[df.Sales.notnull()].Sales

model = sm.OLS(Y, X).fit()
new_x = df.loc[df.Sales.isnull(), ['TV', 'Radio', 'Newspaper']]
new_x = sm2.add_constant(new_x)  # sm2 = statsmodels.api

df.loc[df.Sales.isnull(), 'Sales'] = model.predict(new_x)