时间序列符合趋势python的值

时间:2015-04-30 06:51:22

标签: python pandas time-series statsmodels trend

我在一个名为price_data的数据框中有来自雅虎财经的每日股票价格数据。

我想在此添加一列,该列提供Adj Close列的时间序列趋势的拟合值。

以下是我正在使用的数据的结构:

In [41]: type(price_data)
Out[41]: pandas.core.frame.DataFrame

In [42]: list(price_data.columns.values)
Out[42]: ['Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close']

In [45]: type(price_data.index)
Out[45]: pandas.tseries.index.DatetimeIndex

在Python语言中实现这一目标的最佳方法是什么?

顺便说一句,以下是用R语言实现的

all_time_fitted <- function(data)
{
    all_time_model  <- lm(Adj.Close ~ Date, data=data)
    fitted_value    <- predict(all_time_model)

    return(fitted_value)
}

以下是一些示例数据:

In [3]: price_data
Out[3]: 
             Open   High    Low  Close     Volume  Adj Close  
Date                                                                     
2005-09-27  21.05  21.40  19.10  19.30     961200   19.16418
2005-09-28  19.30  20.53  19.20  20.50    5747900   20.35573
2005-09-29  20.40  20.58  20.10  20.21    1078200   20.06777
2005-09-30  20.26  21.05  20.18  21.01    3123300   20.86214
2005-10-03  20.90  21.75  20.90  21.50    1057900   21.34869
2005-10-04  21.44  22.50  21.44  22.16    1768800   22.00405
2005-10-05  22.10  22.31  21.75  22.20     904300   22.04377

1 个答案:

答案 0 :(得分:6)

又快又脏......

# get some data
import pandas.io.data as web
import datetime
start = datetime.datetime(2015, 1, 1)
end = datetime.datetime(2015, 4, 30)
df=web.DataReader("F", 'yahoo', start, end)

# a bit of munging - better column name - Day as integer 
df = df.rename(columns={'Adj Close':'AdjClose'})
dayZero = df.index[0]
df['Day'] = (df.index - dayZero).days

# fit a linear regression
import statsmodels.formula.api as sm
fit = sm.ols(formula="AdjClose ~ Day", data=df).fit()
print(fit.summary())
predict = fit.predict(df)
df['fitted'] = predict

# plot
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(8,4))
ax.scatter(df.index, df.AdjClose)
ax.plot(df.index, df.fitted, 'r')
ax.set_ylabel('$')
fig.suptitle('Yahoo')

plt.show()

enter image description here