Question

数据文件为here。

我只是想计算两个数据帧的列之间的成对相关性：

In [7]: import os

In [8]: import pandas as pd

In [9]: import numpy as np

In [10]: from pandas import Series, DataFrame

In [12]: blog_dat = pd.read_table("blogdata.txt", index_col="Blog")

In [13]: blog_dat = blog_dat.astype(float)

In [14]: all(blog_dat.notnull())
Out[14]: True

In [15]: x = DataFrame(np.random.randn(99*4).reshape((99, 4)))

In [16]: pd.expanding_corr(blog_dat.iloc[:, :4], blog_dat.iloc[:, :4], pairwise=True)[-1, :, :]
Out[16]:
          china      kids     music     yahoo
china  1.000000  0.053069  0.026599  0.246957
kids   0.053069  1.000000  0.409978  0.094636
music  0.026599  0.409978  1.000000  0.055923
yahoo  0.246957  0.094636  0.055923  1.000000

In [17]: pd.expanding_corr(blog_dat.iloc[:, :4], x, pairwise=True)[-1, :, :]
/usr/local/lib/python3.4/site-packages/pandas/core/index.py:1240: RuntimeWarning: unorderable types: str() < int(), sort order is undefined for incomparable objects
  "incomparable objects" % e, RuntimeWarning)
/usr/local/lib/python3.4/site-packages/pandas/core/index.py:1240: RuntimeWarning: unorderable types: int() < str(), sort order is undefined for incomparable objects
  "incomparable objects" % e, RuntimeWarning)
/usr/local/lib/python3.4/site-packages/pandas/core/index.py:1254: RuntimeWarning: unorderable types: str() > int(), sort order is undefined for incomparable objects
  "incomparable objects" % e, RuntimeWarning)
/usr/local/lib/python3.4/site-packages/pandas/core/index.py:1254: RuntimeWarning: unorderable types: int() > str(), sort order is undefined for incomparable objects
  "incomparable objects" % e, RuntimeWarning)
Out[17]:
        0   1   2   3
china NaN NaN NaN NaN
kids  NaN NaN NaN NaN
music NaN NaN NaN NaN
yahoo NaN NaN NaN NaN

即使我将索引和列名称赋予x，NaN也不会消失。

Answer 1

让x和blog_dat具有相同的index：

import pandas as pd
import numpy as np
np.random.seed(1)

blog_dat = pd.read_table("data", sep='\s+')
x = pd.DataFrame(np.random.randn(4*4).reshape((4, 4)),
                 index=blog_dat.index)

pd.expanding_corr(blog_dat.iloc[:, :4], x, pairwise=True)[-1, :, :]

产量

              0         1         2         3
china  0.684896  0.260795 -0.990586  0.281298
kids   0.077209 -0.871448  0.702822  0.241313
music -0.203808  0.071436  0.581267 -0.783753
yahoo -0.630744  0.373339 -0.060623  0.258728

仅提供x任何索引名称是不够的;它们必须与blog_dat的索引匹配。

pandas中的expanding_corr函数给出了NaN

1 个答案: