如何根据索引将pandas的数据帧/系列切成小时块?

时间:2018-06-04 23:16:45

标签: python pandas dataframe matplotlib

我想从包含多天数据(从01.05.2018到18.05.2018)的数据框中保存6h块的图。

我的数据框“EperDtPanda”有这样的形式:

                               ldr
timestamp                         
2018-05-01T00:00:03.454+02:00  972
2018-05-01T00:00:08.532+02:00  972
2018-05-01T00:00:13.462+02:00  973
2018-05-01T00:00:18.467+02:00  973
2018-05-01T00:00:23.472+02:00  968
2018-05-01T00:00:28.480+02:00  972
2018-05-01T00:00:33.487+02:00  973
2018-05-01T00:00:38.484+02:00  970

我的索引类型为:“timestamp”

我使用以下代码绘制整个数据周期:

indicies = map(lambda t: np.datetime64(t), EperEtPanda.index)
newIndextValues = map(lambda v: v[0], EperEtPanda.values)

ts = pd.Series(newIndextValues, index=indicies)
series2 = ts.resample('H').mean()
plt.plot(series2.index, series2.values)
plt.xticks(rotation='vertical');

我得到了18天数据的附图。 plot of whole period of 18 days

现在我想将这个图切成6h的图,并保存数字。 这是我用来将图形切割成6h块的代码:

startDate = '2018-05-01T00:00:00+02:00'
endDate = '2018-05-18T00:00:00+02:00'
blockLength = 6
i = 0

while (str_to_ts(startDate) < str_to_ts(endDate)):
    mask = (EperEtPanda.index >= str_to_ts(startDate)) & (EperEtPanda.index <= (str_to_ts(startDate) + timedelta(hours=blockLength)))
    EperDtPanda6h = EperDtPanda.loc[mask]
    slice6h = EperDtPanda6h.plot()
    slice6h.get_figure().savefig('figure6h' + i + '.png')
    startDate = str_to_ts(startDate) + timedelta(hours=blockLength)
    i += 1

str_to_ts是一个将stings转换为时间戳的函数:

str_to_ts =  udf (lambda d: datetime.strptime(d, "%Y-%m-%dT%H:%M:%S.%f+02:00"), TimestampType())

但它不起作用..

任何人都知道如何完成这项工作?

1 个答案:

答案 0 :(得分:0)

我认为你可以做到:

# to ensure timestamp type for indexes (not necessary if it's already the case for you)
EperEtPanda.index = pd.to_datetime(EperEtPanda.index)
# start and end date as timestamps
startDate = pd.to_datetime('2018-05-01T00:00:00+02:00')
endDate = pd.to_datetime('2018-05-18T00:00:00+02:00')
# create all the time between start and end with a freq of 6 hours
list_time = pd.date_range(startDate, endDate, freq='6H') 
# loop for with zip to have both start_time and end_time
i = 0
for start_time, end_time in zip(list_time[:-1], list_time[1:]):
    # select the 6h range with .loc and slice()
    EperDtPanda6h = EperDtPanda.loc[slice(start_time, end_time),'ldr']
    # plot and save
    EperDtPanda6h.plot().get_figure().savefig('figure6h' + i + '.png')
    i += 1

希望它适合你