Question

matplotlib中的自相关计算与pandas.tools.plotting，sm.graphics.tsa.plot_acf等其他库的计算方式有何不同？

从下面的代码中我们可以注意到这两个库返回的自动相关值不同，例如matplotlib返回大于零的所有自相关值，pandas.tools.plotting返回一些-ve自相关值（除了置信区间，负x轴）。

import matplotlib.pyplot as plt
import statsmodels.api as sm
import pandas as pd
from pandas.tools.plotting import autocorrelation_plot

dta = sm.datasets.sunspots.load_pandas().data
dta.index = pd.Index(sm.tsa.datetools.dates_from_range('1700', '2008'))
del dta["YEAR"]

plt.acorr(dta['SUNACTIVITY'],maxlags = len(dta['SUNACTIVITY']) -1, linestyle = "solid", usevlines = False, marker='')
plt.show()

autocorrelation_plot(dta['SUNACTIVITY'])
plt.show()

Answer 1

在计算自相关之前，pandas绘图和statsmodel图形中的自相关将数据标准化。这些库减去平均值并除以数据的标准偏差。

使用标准化时，他们假设您的数据是使用高斯定律生成的（具有一定的均值和标准差）。实际情况可能并非如此。

相关性很敏感。这些函数的两者（matplotlib和pandas plotting）都有它们的缺点。

使用matplotlib通过以下代码生成的图与pandas plotting或statsmodels图形生成的图相同

dta['SUNACTIVITY_2'] = dta['SUNACTIVITY']
dta['SUNACTIVITY_2'] = (dta['SUNACTIVITY_2'] - dta['SUNACTIVITY_2'].mean()) /     (dta['SUNACTIVITY_2'].std())
plt.acorr(dta['SUNACTIVITY_2'],maxlags = len(dta['SUNACTIVITY_2']) -1, linestyle = "solid", usevlines = False, marker='')
plt.show()

源代码：

Matplotlib

Pandas

matplotlib中的自相关和pandas.tools.plotting中的自相关有什么区别？

1 个答案: