汇总熊猫表中的循环结果

时间:2019-12-02 16:00:09

标签: python pandas loops linear-regression

我有下载代码的代码,并为下载列表中的每只股票运行线性回归。我停留在最后一步:显示数据中最后日期的每只股票的预测和残差值。

import pandas as pd
import numpy as np
import yfinance as yf
import datetime as dt
from sklearn import linear_model

tickers = ['EXPE','MSFT']

data = yf.download(tickers, start="2012-04-03", end="2017-07-07")['Close']
data = data.reset_index()
data = data.dropna()

df = pd.DataFrame(data, columns = ["Date"])
df["Date"]=df["Date"].apply(lambda x: x.toordinal())

for ticker in tickers:
   data[ticker] = pd.DataFrame(data, columns = [ticker])
   X = df
   y = data[ticker]
   lm = linear_model.LinearRegression()
   model = lm.fit(X,y)
   predictions = lm.predict(X)
   residuals = y-lm.predict(X)
   print (predictions[-1:])
   print(residuals[-1:])

当前输出如下:

[136.28856636]
1323    13.491432
Name: EXPE, dtype: float64
[64.19943648]
1323    5.260563
Name: MSFT, dtype: float64

但是我希望它像这样显示(如熊猫桌):

        Predictions Residuals
EXPE    136.29      13.49
MSFT    64.20       5.26

1 个答案:

答案 0 :(得分:1)

您可以执行以下操作,将值存储在列表中:

import pandas as pd
import numpy as np
import yfinance as yf
import datetime as dt
from sklearn import linear_model

tickers = ['EXPE','MSFT']

data = yf.download(tickers, start="2012-04-03", end="2017-07-07")['Close']
data = data.reset_index()
data = data.dropna()

df = pd.DataFrame(data, columns = ["Date"])
df["Date"]=df["Date"].apply(lambda x: x.toordinal())

predictions_output = []
residuals_output = []

for ticker in tickers:
    data[ticker] = pd.DataFrame(data, columns = [ticker])
    X = df
    y = data[ticker]
    lm = linear_model.LinearRegression()
    model = lm.fit(X,y)
    predictions = lm.predict(X)
    residuals = y-lm.predict(X)
    predictions_output.append(float(predictions[-1:]))
    residuals_output.append(float(residuals[-1:]))


expectation_df = pd.DataFrame(list(zip(predictions_output, residuals_output)), 
               columns =['Predictions', 'Residuals']).set_index([tickers])
print(expectation_df)

,输出为:

      Predictions  Residuals
EXPE   136.288566  13.491432
MSFT    64.199436   5.260563

编辑:我走得太快,回头看去已经意识到tickers已经定义,因此您可以使用它在此处设置索引,而不会丢失Tickers索引标题匹配您所需的输出。

如果您想对这些值进行四舍五入,也可以在循环中添加以下两行:

predictions_output.append(round(float(predictions[-1:]), 2))
residuals_output.append(round(float(residuals[-1:]), 2))