熊猫日期时间索引选择

时间:2018-07-03 17:43:52

标签: python pandas datetime

我有以下数据框:

gcloud container clusters get-credentials

我设法创建了一个'interval'列来指示天气或索引的小时不在16h到18h之间。

我的问题如下:

  • 我有一个半小时的数据帧
  • 我想创建一个半小时间隔的'interval'列,即例如,如果索引介于16h30和18h30之间,则interval列将等于1。

如何有效地做到这一点?

预期结果:

date  = ['2015-02-03 23:00:00','2015-02-03 23:30:00','2015-02-04 00:00:00','2015-02-04 00:30:00','2015-02-04 01:00:00','2015-02-04 01:30:00','2015-02-04 02:00:00','2015-02-04 02:30:00','2015-02-04 03:00:00','2015-02-04 03:30:00','2015-02-04 04:00:00','2015-02-04 04:30:00','2015-02-04 05:00:00','2015-02-04 05:30:00','2015-02-04 06:00:00','2015-02-04 06:30:00','2015-02-04 07:00:00','2015-02-04 07:30:00','2015-02-04 08:00:00','2015-02-04 08:30:00','2015-02-04 09:00:00','2015-02-04 09:30:00','2015-02-04 10:00:00','2015-02-04 10:30:00','2015-02-04 11:00:00','2015-02-04 11:30:00','2015-02-04 12:00:00','2015-02-04 12:30:00','2015-02-04 13:00:00','2015-02-04 13:30:00','2015-02-04 14:00:00','2015-02-04 14:30:00','2015-02-04 15:00:00','2015-02-04 15:30:00','2015-02-04 16:00:00','2015-02-04 16:30:00','2015-02-04 17:00:00','2015-02-04 17:30:00','2015-02-04 18:00:00','2015-02-04 18:30:00','2015-02-04 19:00:00','2015-02-04 19:30:00','2015-02-04 20:00:00','2015-02-04 20:30:00','2015-02-04 21:00:00','2015-02-04 21:30:00','2015-02-04 22:00:00','2015-02-04 22:30:00','2015-02-04 23:00:00','2015-02-04 23:30:00']
value = [33.24  , 31.71  , 34.39  , 34.49  , 34.67  , 34.46  , 34.59  , 34.83  , 35.78  , 33.03  , 35.49  , 33.79  , 36.12  , 37.09  , 39.54  , 41.19  , 45.99  , 50.23  , 46.72  , 47.47  , 48.46  , 48.38  , 48.40  , 48.13  , 38.35  , 38.19  , 38.12  , 38.05  , 38.06  , 37.83  , 37.49  , 37.41 , 41.84  , 42.26 , 44.09  , 48.85  , 50.07 , 50.94  , 51.09  , 50.60  , 47.39  , 45.57  , 45.03  , 44.98  , 41.32  , 40.37  , 41.12  , 39.33  , 35.38  , 33.44  ]
df = pd.DataFrame({'value':value,'index':date})
df.index = pd.to_datetime(df['index'],format='%Y-%m-%d %H:%M')
df.drop(['index'],axis=1,inplace=True)

df['interval'] = ((df.index.hour >= 16) & (df.index.hour <18 ))*1
print(df.head(50))

非常感谢,

2 个答案:

答案 0 :(得分:5)

您还可以使用熊猫功能indexer_between_time

df.at[df.index[df.index.indexer_between_time("16:30", "18:30")], "interval"] = 1

答案 1 :(得分:3)

也许有一种更清洁的方法(例如,@ vealkind的解决方案,例如 edit:),但这可以满足您的要求:

df['interval'] = (pd.Series(df.index.time)
              .between(pd.to_datetime('16:30:00').time(),
                       pd.to_datetime('18:30:00').time())
              .astype(int)
              .tolist())


>>> df.iloc[30:42]
                     value  interval
index                               
2015-02-04 14:00:00  37.49         0
2015-02-04 14:30:00  37.41         0
2015-02-04 15:00:00  41.84         0
2015-02-04 15:30:00  42.26         0
2015-02-04 16:00:00  44.09         0
2015-02-04 16:30:00  48.85         1
2015-02-04 17:00:00  50.07         1
2015-02-04 17:30:00  50.94         1
2015-02-04 18:00:00  51.09         1
2015-02-04 18:30:00  50.60         1
2015-02-04 19:00:00  47.39         0
2015-02-04 19:30:00  45.57         0