Finding minimum value in successive continuous time intervals using pandas

时间:2015-10-31 09:41:49

标签: python pandas

I'm a bit stuck on this one. I have a dataframe that has samples of a variable, each with a timestamp. The data are sorted in order of increasing time:

import pandas as pd

dates = [#Continuous Block
         pd.Timestamp('2012-05-03 09:00:01'), 
         pd.Timestamp('2012-05-03 09:00:02'), 
         pd.Timestamp('2012-05-03 09:00:03'),
         pd.Timestamp('2012-05-03 09:00:04'),
         #Non Continuous Block
         pd.Timestamp('2012-05-03 16:00:00'),
         pd.Timestamp('2012-05-03 17:00:04'),
         #Continuous Block
         pd.Timestamp('2012-05-03 18:00:01'), 
         pd.Timestamp('2012-05-03 18:00:02'), 
         pd.Timestamp('2012-05-03 18:00:03'),
         #Non Continuous Block
         pd.Timestamp('2012-05-03 19:00:03')]     


vars = [-0.105, -1.08, -1.08, -1.03, -1.0, -1.1, -0.15,-0.14,-0.13,-0.11]
df = pd.DataFrame({'A' : vars}, index=dates)

This gives:

                    A
2012-05-03 09:00:01 -0.105
2012-05-03 09:00:02 -1.080
2012-05-03 09:00:03 -1.080
2012-05-03 09:00:04 -1.030
2012-05-03 16:00:00 -1.000
2012-05-03 17:00:04 -1.100
2012-05-03 18:00:01 -0.150
2012-05-03 18:00:02 -0.140
2012-05-03 18:00:03 -0.130
2012-05-03 19:00:03 -0.110

As you can see there are often successive entries that are separated by one second. I want to pull out the lowest value of A within a set of timestamps that are separated by 1 second. So within the above example, running a function should give:

2012-05-03 09:00:03,    -1.080
2012-05-03 16:00:00,    -1.000
2012-05-03 17:00:04,    -1.100
2012-05-03 18:00:01,    -0.150
2012-05-03 19:00:03,    -0.110

Appreciate any help!

1 个答案:

答案 0 :(得分:1)

我通过创建一个名为' Time'

df['Time'] = df.index
df2 = df.groupby([df.index.hour]).apply(lambda x: x.min())
df2.reset_index(drop = True,inplace='True')
print df2.head()

给出:

     A                Time
0 -1.08 2012-05-03 09:00:01
1 -1.00 2012-05-03 16:00:00
2 -1.10 2012-05-03 17:00:04
3 -0.15 2012-05-03 18:00:01
4 -0.11 2012-05-03 19:00:03

如果您只需要按小时分组,则不需要时间列,您需要按TimeStamp进行分组:

df2 = df.groupby([df.index.hour]).apply(lambda x: x.min())
print df2.head()

输出为:

       A
9  -1.08
16 -1.00
17 -1.10
18 -0.15
19 -0.11