数据框无法正确打印

时间:2016-11-03 15:59:46

标签: python python-2.7 pandas dataframe

我将数据框下载到csv,进行了一些更改,然后又尝试再次调用。由于某些原因,日期列已全部混淆。

有人可以帮忙告诉我为什么我收到这条消息。 在保存为csv之前我的df看起来像这样:

@{if (ViewBag.IfcText == "hello")
    {
        <h2>Encountering a problem? We are here to help</h2>
        <h3>
            @Html.ActionLink("Contact our Support Team", "Create")
        </h3>
    }
    else if (ViewBag.IfcText == "done")
    {
        @:<h2>We received it, we will be in contact with you in 24 hrs.</h2>
    }
}

阅读正确的csv之后,它现在看起来像这样:

aapl = web.DataReader("AAPL", "yahoo", start, end)
bbry = web.DataReader("BBRY", "yahoo", start, end)
lulu = web.DataReader("LULU", "yahoo", start, end)
amzn = web.DataReader("AMZN", "yahoo", start, end)

# Below I create a DataFrame consisting of the adjusted closing price of these stocks, first by making a list of these objects and using the join method
stocks = pd.DataFrame({"AAPL": aapl["Adj Close"],
                      "BBRY": bbry["Adj Close"],
                      "LULU": lulu["Adj Close"],
                      "AMZN":amzn["Adj Close"]}, pd.date_range(start, end, freq='BM'))
​
stocks.head()

​
Out[60]:
AAPL    AMZN    BBRY    LULU
2011-11-30  49.987684   192.289993  17.860001   49.700001
2011-12-30  52.969683   173.100006  14.500000   46.660000
2012-01-31  59.702715   194.440002  16.629999   63.130001
2012-02-29  70.945373   179.690002  14.170000   67.019997
2012-03-30  78.414750   202.509995  14.700000   74.730003
In [74]:

stocks.to_csv('A5.csv', encoding='utf-8')

为什么不将日期列识别为日期?

感谢您的帮助

1 个答案:

答案 0 :(得分:1)

我建议您使用HDF存储而不是CSV - 它更快,它保留您的dtypes,您可以有条件地选择数据集的子集,它支持快速压缩等。

import pandas_datareader.data as web

stocklist = ['AAPL','BBRY','LULU','AMZN']
p = web.DataReader(stocklist, 'yahoo', '2011-11-01', '2012-04-01')
df = p['Adj Close'].resample('M').last()
print(df)

# saving DF to HDF file
store = pd.HDFStore(r'd:/temp/stocks.h5')
store.append('stocks', df, data_columns=True, complib='blosc', complevel=5)
store.close()

输出:

                 AAPL        AMZN       BBRY       LULU
Date
2011-11-30  49.987684  192.289993  17.860001  49.700001
2011-12-31  52.969683  173.100006  14.500000  46.660000
2012-01-31  59.702715  194.440002  16.629999  63.130001
2012-02-29  70.945373  179.690002  14.170000  67.019997
2012-03-31  78.414750  202.509995  14.700000  74.730003

让我们从HDF文件中读取我们的数据:

In [9]: store = pd.HDFStore(r'd:/temp/stocks.h5')

In [10]: x = store.select('stocks')

In [11]: x
Out[11]:
                 AAPL        AMZN       BBRY       LULU
Date
2011-11-30  49.987684  192.289993  17.860001  49.700001
2011-12-31  52.969683  173.100006  14.500000  46.660000
2012-01-31  59.702715  194.440002  16.629999  63.130001
2012-02-29  70.945373  179.690002  14.170000  67.019997
2012-03-31  78.414750  202.509995  14.700000  74.730003

您可以有条件地选择数据:

In [12]: x = store.select('stocks', where="AAPL >= 50 and AAPL <= 70")

In [13]: x
Out[13]:
                 AAPL        AMZN       BBRY       LULU
Date
2011-12-31  52.969683  173.100006  14.500000  46.660000
2012-01-31  59.702715  194.440002  16.629999  63.130001

检查索引dtype:

In [14]: x.index.dtype
Out[14]: dtype('<M8[ns]')

In [15]: x.index.dtype_str
Out[15]: 'datetime64[ns]'