使用PeriodIndex索引DataFrame

时间:2014-03-27 00:05:08

标签: python pandas

我对Pandas和Python都很陌生,但我已经读过Wes McKinney的书" Python for Data Analysis"我做了很多实验。但是我在尝试对我的DataFrame进行选择时遇到了问题。这可能是我的一些简单的误解,但我不知道下一步该尝试什么。

Pandas版本0.13.1

>>> pd.__version__
'0.13.1'
>>> 

DataFrame是这样加载的。

minData = pd.read_csv(
                currentSymbol["fullpath"],
                header = None,
                names = ['Date', 'Time', 'Open', 'High', 'Low', 'Close', 'Volume', 'Split Factor', 'Earnings', 'Dividends'], 
                parse_dates = [["Date", "Time"]],
                date_parser = lambda x : datetime.datetime.strptime(x, '%Y%m%d %H%M'), 
                index_col = "Date_Time",
                sep=' ')

minData = minData.to_period(freq="min")

DataFrame的前10行现在看起来像这样

>>> minData.head(10)
                     Open     High      Low    Close    Volume  Split Factor  \
1998-01-02 09:30  8.70630  8.70630  8.70630  8.70630    420.73             4   
1998-01-02 09:35  8.82514  8.82514  8.82514  8.82514    420.73             4   
1998-01-02 09:42  8.79424  8.79424  8.79424  8.79424    420.73             4   
1998-01-02 09:43  8.76572  8.76572  8.76572  8.76572   1262.19             4   
1998-01-02 09:44  8.76572  8.76572  8.76572  8.76572    420.73             4   
1998-01-02 09:45  8.73482  8.73482  8.73482  8.73482  21877.90             4   
1998-01-02 09:46  8.73482  8.79424  8.73482  8.74908  31554.70             4   
1998-01-02 09:47  8.79424  8.79424  8.76572  8.76572  12621.90             4   
1998-01-02 09:48  8.76572  8.76572  8.76572  8.76572   4207.30             4   
1998-01-02 09:54  8.74908  8.74908  8.73482  8.73482  12201.20             4   

                  Earnings  Dividends  
1998-01-02 09:30         0          0  
1998-01-02 09:35         0          0  
1998-01-02 09:42         0          0  
1998-01-02 09:43         0          0  
1998-01-02 09:44         0          0  
1998-01-02 09:45         0          0  
1998-01-02 09:46         0          0  
1998-01-02 09:47         0          0  
1998-01-02 09:48         0          0  
1998-01-02 09:54         0          0  

[10 rows x 8 columns]
>>> 

它有一个PeriodIndex,每个Period都是一分钟(freq =" T")

>>> minData.index
<class 'pandas.tseries.period.PeriodIndex'>
freq: T
[1998-01-02 09:30, ..., 2013-12-09 16:00]
length: 1373036
>>> 

我不明白如何按日期索引

这样的事情不起作用,我不明白为什么

>>> dummy = minData.index[123123]
>>> dummy
Period('2000-02-24 13:01', 'T')
>>> minData[dummy]
Traceback (most recent call last):
  File "<debug input>", line 1, in <module>
  File "C:\Users\Jason\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\core\frame.py", line 1658, in __getitem__
    return self._getitem_column(key)
  File "C:\Users\Jason\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\core\frame.py", line 1665, in _getitem_column
    return self._get_item_cache(key)
  File "C:\Users\Jason\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\core\generic.py", line 1005, in _get_item_cache
    values = self._data.get(item)
  File "C:\Users\Jason\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\core\internals.py", line 2874, in get
    _, block = self._find_block(item)
  File "C:\Users\Jason\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\core\internals.py", line 3186, in _find_block
    self._check_have(item)
  File "C:\Users\Jason\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\core\internals.py", line 3193, in _check_have
    raise KeyError('no item named %s' % com.pprint_thing(item))
KeyError: u'no item named 2000-02-24 13:01'
>>> 

然而,这有效

>>> minData["1999"]
                     Open     High      Low    Close      Volume  \
1999-01-04 09:30  10.2934  10.2934  10.2646  10.2646    6683.910   
1999-01-04 09:31  10.3077  10.3245  10.2646  10.3245   13367.800   
1999-01-04 09:32  10.2646  10.2646  10.2646  10.2646     417.745   
1999-01-04 09:33  10.2192  10.2192  10.2192  10.2192     417.745   
1999-01-04 09:34  10.3245  10.3245  10.3245  10.3245     417.745   
....

就像这样

>>> minData["1999-05"]
                     Open     High      Low    Close     Volume  Split Factor  \
1999-05-03 09:30  7.05430  7.05430  6.99427  6.99427  10412.100             4   
1999-05-03 09:31  6.96306  7.05430  6.94865  6.94865  20824.200             4   
1999-05-03 09:32  6.94865  6.99427  6.93425  6.99427   3331.870             4   
1999-05-03 09:33  6.93425  6.96306  6.90303  6.96306  16242.900             4   
1999-05-03 09:34  6.90303  6.90303  6.90303  6.90303   2915.390             4   
...

但不是这个..

>>> minData["1999-05-03"]
Traceback (most recent call last):
  File "<debug input>", line 1, in <module>
  File "C:\Users\Jason\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\core\frame.py", line 1658, in __getitem__
    return self._getitem_column(key)
  File "C:\Users\Jason\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\core\frame.py", line 1665, in _getitem_column
    return self._get_item_cache(key)
  File "C:\Users\Jason\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\core\generic.py", line 1005, in _get_item_cache
    values = self._data.get(item)
  File "C:\Users\Jason\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\core\internals.py", line 2874, in get
    _, block = self._find_block(item)
  File "C:\Users\Jason\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\core\internals.py", line 3186, in _find_block
    self._check_have(item)
  File "C:\Users\Jason\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\core\internals.py", line 3193, in _check_have
    raise KeyError('no item named %s' % com.pprint_thing(item))
KeyError: u'no item named 1999-05-03'
>>> 

0 个答案:

没有答案