大熊猫中类似日期时间的索引

时间:2020-02-17 06:52:27

标签: python pandas

我有一个DataFrame加载了累积降雨的时间序列:

df = pd.read_csv(csv_file, parse_dates=[['date', 'time']], dayfirst=True, index_col=0)

(我无法共享源数据,它是通过一个适配器对象读取的,该适配器对象将数据作为具有.csv内容的文本文件呈现给read_csv,尽管源文件采用某种专有格式-但是,这与问题无关,最终结果是带有日期时间索引和浮点值的DataFrame-日期可以被删除)

然后将降水量重新采样为分钟:

rainfall_differences = df['rainfall'].diff()
rainfall_differences = rainfall_differences.resample('1min', label='right', closed='right').sum()

所有这些均按预期工作。但是,我的问题是这两个语句之间的区别:

x = rainfall_differences.rolling('90min').sum()
y = rainfall_differences.rolling('1.5h').sum()

第一个有效,但是第二个抛出异常:

  File "<<path>>/my_file.py", line 68, in load_rainfalls
    result[duration_label] = rainfall_differences.rolling(duration_label).sum()
  File "<<path>>\lib\site-packages\pandas\core\generic.py", line 10386, in rolling
    closed=closed,
  File "<<path>>\lib\site-packages\pandas\core\window\rolling.py", line 94, in __init__
    self.validate()
  File "<<path>>\lib\site-packages\pandas\core\window\rolling.py", line 1836, in validate
    freq = self._validate_freq()
  File "<<path>>\lib\site-packages\pandas\core\window\rolling.py", line 1888, in _validate_freq
    f"passed window {self.window} is not "
ValueError: passed window 1.5h is not compatible with a datetimelike index

我的问题:'90min'的窗口为什么不与日期时间索引兼容,而'1.5h'的窗口却不兼容,即使在传递时两者都得出相同的值Timedelta('0 days 01:30:00')pandas.to_timedelta()

注意:我知道如何解决/解决此问题,但这不是我的问题。我想知道为什么这甚至是必需的。例如:

index_duration = str(int(pd.to_timedelta('1.5 hour').total_seconds() / 60)) + 'min'
y = rainfall_differences.rolling(index_duration).sum()

1 个答案:

答案 0 :(得分:3)

我认为有必要将h更改为H

y = rainfall_differences.rolling('1.5H').sum()

我认为原因是因为offset alias无效:

Alias   Description
H       hourly frequency
T, min  minutely frequency
S       secondly frequency

示例

rng = pd.date_range('2017-04-03', periods=5, freq='10T')
rainfall_differences = pd.DataFrame({'a': range(5)}, index=rng)  
print (rainfall_differences)
                     a
2017-04-03 00:00:00  0
2017-04-03 00:10:00  1
2017-04-03 00:20:00  2
2017-04-03 00:30:00  3
2017-04-03 00:40:00  4

y = rainfall_differences.rolling('1.5H').sum()
print (y)
                        a
2017-04-03 00:00:00   0.0
2017-04-03 00:10:00   1.0
2017-04-03 00:20:00   3.0
2017-04-03 00:30:00   6.0
2017-04-03 00:40:00  10.0