时间序列与索引的替代方式

时间:2014-08-01 22:11:22

标签: python r time-series data-analysis

我现在处理的时间序列就像数据一样,具有以下形状:

它有三列t_1, t_2, attt_1t_2是有序的时间观察,att是数值。

玩具数据示例:

    t_1           t_2        att
    12:30:32      12:33:12   1
    12:30:55      12:33:43   3
    12:31:21      12:34:34   2

我想要构建的对象遵循以下规则:

  1. 如果t_1是“连续的”,那么我使用t_1构建一个时间序列对象 作为时间索引,attt_2作为值。

  2. 如果t_1不是“连续”且t_2是“连续”,那么我建立一个 时间序列对象,t_2作为时间索引,t_1att作为值

  3. 如果t_1t_2都不连续,则报告回复消息 什么都不建立

  4. 定义间隔< 1小时,比方说,连续

  5. 非连续t_1但连续t_2的示例:

        t_1           t_2        att
        12:30:32      12:33:12   1
        12:30:55      12:33:43   3
        14:31:21      12:34:34   2
        14:33:24      12:35:34   -12
    

    任何在python或R中实现的想法都将非常受欢迎。数据将作为数据帧导入,pandas dataframe或R dataframe。

    时间序列对象,例如R

    中的xtsts

2 个答案:

答案 0 :(得分:0)

这使得字典的日期为正确,其他数据为值:

from datetime import datetime

data = [['12:30:32', '12:33:12', 1],
        ['12:30:55', '12:33:43', 3],
        ['14:31:21', '12:34:34', 2],
        ['14:33:24', '12:35:34', -12]]


def test(ind):
    last = None
    for row in data:
        t = datetime.strptime(row[ind -1], '%H:%M:%S')
        if last and (t - last).seconds/3600. > 1:
            return False
        last = t
    return True

if test(1):
    obj = {row[0]:(row[1], row[2]) for row in data}
elif test(2):
    obj = {row[1]:(row[0], row[2]) for row in data}
else:
    print 'Error!'

答案 1 :(得分:0)

希望您可以从以下构建的元组构建time series objects

import itertools as it
import datetime
data = [['12:30:32', '12:33:12', 1],
        ['12:30:55', '12:33:43', 3],
        ['14:31:21', '12:34:34', 2],
        ['14:33:24', '12:35:34', -12]]

def continuous(series, time_format = '%H:%M:%S', criteria = 3600):
    '''Returns True if time series is continuous.

    series -- sequence of strings
    time_format -- str (default '%H:%M:%S')
    criteria -- int (default 3600)
    '''
    # make datetime objects
    t = [datetime.datetime.strptime(thing, time_format) for thing in series]
    # find the deltas
    t2 = (two - one for one, two in it.izip(t, t[1:]))
    # apply the criteria
    return all(item.seconds <= criteria for item in t2)

# extract the time series data
one, two, values = zip(*data)
if continuous(one):
    # make tuples - (t1, (t2, att))
    time_series_data = [(t1, (t2, att)) for t1, t2, att in it.izip(one, two, values)]
elif continuous(two):
    # make tuples - (t2, (t1, att))
    time_series_data = [(t2, (t1, att)) for t1, t2, att in it.izip(one, two, values)]
else:
    raise ValueError('No Continuous Data')
相关问题