我对Pandas和Python都很陌生,但我已经读过Wes McKinney的书" Python for Data Analysis"我做了很多实验。但是我在尝试对我的DataFrame进行选择时遇到了问题。这可能是我的一些简单的误解,但我不知道下一步该尝试什么。
Pandas版本0.13.1
>>> pd.__version__
'0.13.1'
>>>
DataFrame是这样加载的。
minData = pd.read_csv(
currentSymbol["fullpath"],
header = None,
names = ['Date', 'Time', 'Open', 'High', 'Low', 'Close', 'Volume', 'Split Factor', 'Earnings', 'Dividends'],
parse_dates = [["Date", "Time"]],
date_parser = lambda x : datetime.datetime.strptime(x, '%Y%m%d %H%M'),
index_col = "Date_Time",
sep=' ')
minData = minData.to_period(freq="min")
DataFrame的前10行现在看起来像这样
>>> minData.head(10)
Open High Low Close Volume Split Factor \
1998-01-02 09:30 8.70630 8.70630 8.70630 8.70630 420.73 4
1998-01-02 09:35 8.82514 8.82514 8.82514 8.82514 420.73 4
1998-01-02 09:42 8.79424 8.79424 8.79424 8.79424 420.73 4
1998-01-02 09:43 8.76572 8.76572 8.76572 8.76572 1262.19 4
1998-01-02 09:44 8.76572 8.76572 8.76572 8.76572 420.73 4
1998-01-02 09:45 8.73482 8.73482 8.73482 8.73482 21877.90 4
1998-01-02 09:46 8.73482 8.79424 8.73482 8.74908 31554.70 4
1998-01-02 09:47 8.79424 8.79424 8.76572 8.76572 12621.90 4
1998-01-02 09:48 8.76572 8.76572 8.76572 8.76572 4207.30 4
1998-01-02 09:54 8.74908 8.74908 8.73482 8.73482 12201.20 4
Earnings Dividends
1998-01-02 09:30 0 0
1998-01-02 09:35 0 0
1998-01-02 09:42 0 0
1998-01-02 09:43 0 0
1998-01-02 09:44 0 0
1998-01-02 09:45 0 0
1998-01-02 09:46 0 0
1998-01-02 09:47 0 0
1998-01-02 09:48 0 0
1998-01-02 09:54 0 0
[10 rows x 8 columns]
>>>
它有一个PeriodIndex,每个Period都是一分钟(freq =" T")
>>> minData.index
<class 'pandas.tseries.period.PeriodIndex'>
freq: T
[1998-01-02 09:30, ..., 2013-12-09 16:00]
length: 1373036
>>>
我不明白如何按日期索引
这样的事情不起作用,我不明白为什么
>>> dummy = minData.index[123123]
>>> dummy
Period('2000-02-24 13:01', 'T')
>>> minData[dummy]
Traceback (most recent call last):
File "<debug input>", line 1, in <module>
File "C:\Users\Jason\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\core\frame.py", line 1658, in __getitem__
return self._getitem_column(key)
File "C:\Users\Jason\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\core\frame.py", line 1665, in _getitem_column
return self._get_item_cache(key)
File "C:\Users\Jason\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\core\generic.py", line 1005, in _get_item_cache
values = self._data.get(item)
File "C:\Users\Jason\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\core\internals.py", line 2874, in get
_, block = self._find_block(item)
File "C:\Users\Jason\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\core\internals.py", line 3186, in _find_block
self._check_have(item)
File "C:\Users\Jason\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\core\internals.py", line 3193, in _check_have
raise KeyError('no item named %s' % com.pprint_thing(item))
KeyError: u'no item named 2000-02-24 13:01'
>>>
然而,这有效
>>> minData["1999"]
Open High Low Close Volume \
1999-01-04 09:30 10.2934 10.2934 10.2646 10.2646 6683.910
1999-01-04 09:31 10.3077 10.3245 10.2646 10.3245 13367.800
1999-01-04 09:32 10.2646 10.2646 10.2646 10.2646 417.745
1999-01-04 09:33 10.2192 10.2192 10.2192 10.2192 417.745
1999-01-04 09:34 10.3245 10.3245 10.3245 10.3245 417.745
....
就像这样
>>> minData["1999-05"]
Open High Low Close Volume Split Factor \
1999-05-03 09:30 7.05430 7.05430 6.99427 6.99427 10412.100 4
1999-05-03 09:31 6.96306 7.05430 6.94865 6.94865 20824.200 4
1999-05-03 09:32 6.94865 6.99427 6.93425 6.99427 3331.870 4
1999-05-03 09:33 6.93425 6.96306 6.90303 6.96306 16242.900 4
1999-05-03 09:34 6.90303 6.90303 6.90303 6.90303 2915.390 4
...
但不是这个..
>>> minData["1999-05-03"]
Traceback (most recent call last):
File "<debug input>", line 1, in <module>
File "C:\Users\Jason\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\core\frame.py", line 1658, in __getitem__
return self._getitem_column(key)
File "C:\Users\Jason\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\core\frame.py", line 1665, in _getitem_column
return self._get_item_cache(key)
File "C:\Users\Jason\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\core\generic.py", line 1005, in _get_item_cache
values = self._data.get(item)
File "C:\Users\Jason\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\core\internals.py", line 2874, in get
_, block = self._find_block(item)
File "C:\Users\Jason\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\core\internals.py", line 3186, in _find_block
self._check_have(item)
File "C:\Users\Jason\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\core\internals.py", line 3193, in _check_have
raise KeyError('no item named %s' % com.pprint_thing(item))
KeyError: u'no item named 1999-05-03'
>>>