熊猫明智地迭代数据框

时间:2019-12-27 18:03:32

标签: python python-3.x pandas

我正在尝试每天遍历数据帧。

我的数据如下:

                         open    high     low   close   volume
date                                                              
2019-12-18 09:15:00+05:30  182.10  182.30  180.55  181.30  4252638
2019-12-18 09:30:00+05:30  181.30  183.45  181.00  183.20  5869850
2019-12-18 09:45:00+05:30  183.35  184.50  183.05  183.25  5201947
2019-12-18 10:00:00+05:30  183.25  183.30  182.45  182.90  2029440
2019-12-18 10:15:00+05:30  182.95  183.25  181.50  182.00  2613453
...                           ...     ...     ...     ...      ...
2019-12-24 14:15:00+05:30  175.40  175.70  175.10  175.40   480322
2019-12-24 14:30:00+05:30  175.40  175.60  174.65  174.80  1193108
2019-12-24 14:45:00+05:30  174.80  176.10  174.75  175.55  1765370
2019-12-24 15:00:00+05:30  175.50  175.75  174.90  175.50  1369208
2019-12-24 15:15:00+05:30  175.45  175.75  175.20  175.40  2010583

我尝试过

(df['date'] >= "18-12-2019 09:00:00") & (df['date'] <= "18-12-2019 16:00:00")

但是我不希望特定日期的数据,我想根据日期将当前数据帧拆分为多个数据帧并将其存储在数组中。我该怎么办?

预期输出:

res = [] # list of dataframes length =  number of days


res[0] = 

                         open    high     low   close   volume
date                                                              
2019-12-18 09:15:00+05:30  182.10  182.30  180.55  181.30  4252638
2019-12-18 09:30:00+05:30  181.30  183.45  181.00  183.20  5869850
2019-12-18 09:45:00+05:30  183.35  184.50  183.05  183.25  5201947
2019-12-18 10:00:00+05:30  183.25  183.30  182.45  182.90  2029440
2019-12-18 10:15:00+05:30  182.95  183.25  181.50  182.00  2613453


res[1] = 
                          open    high     low   close   volume
date                                                              
2019-12-19 09:15:00+05:30  182.10  182.30  180.55  181.30  4252638
2019-12-19 09:30:00+05:30  181.30  183.45  181.00  183.20  5869850
2019-12-19 09:45:00+05:30  183.35  184.50  183.05  183.25  5201947
2019-12-19 10:00:00+05:30  183.25  183.30  182.45  182.90  2029440
2019-12-19 10:15:00+05:30  182.95  183.25  181.50  182.00  2613453

...
...
res[n] = 
2019-12-24 14:15:00+05:30  175.40  175.70  175.10  175.40   480322
2019-12-24 14:30:00+05:30  175.40  175.60  174.65  174.80  1193108
2019-12-24 14:45:00+05:30  174.80  176.10  174.75  175.55  1765370
2019-12-24 15:00:00+05:30  175.50  175.75  174.90  175.50  1369208
2019-12-24 15:15:00+05:30  175.45  175.75  175.20  175.40  2010583

将不同的日期数据分散并放置在数组中。

1 个答案:

答案 0 :(得分:1)

您可以尝试groupby方法并按照自己喜欢的方式聚合所有其他列。例如:

df.groupby(df.date.dt.date)['open'].sum()

OHLC将变为:

df.groupby(df.date.dt.date).agg({
    'open': 'first',
    'high': 'max',
    'low': 'min',
    'close': 'last',
})