迭代时间序列数据的间隔

时间:2018-04-28 04:02:01

标签: python pandas

pandas将数据分块到特定时间段进行分析的最佳方法是什么?

我有一个数据集,每行代表1秒,并希望找到传递的任何特定值的最高平均值。

我尝试过重新采样,但这并不是每秒迭代一次作为起点...例如,如果我有2分钟的数据并且比较30秒的间隔,我希望能够比较0-30秒,1-31,2-32等......通过重新采样,我只得到0-30,30-60,60-90和90-120。

# 1 second intervals from 0-60 seconds
interval_lengths = [i for i in range(1, 61)]
# 15 second intervals from 1:15 - 5:00 mins
interval_lengths += [i for i in range(75, 301, 15)]
# 30 second intervals for everything after 5 mins
interval_lengths += [i for i in range(330, df_samples['ride_length'].max() + 1, 30)]

latest_df = df_samples[df_samples['workoutId'] == df_samples.loc[df_samples.index.max]['workoutId']]

best_interval_df = pd.DataFrame()
latest_interval_df = pd.DataFrame()
# Resample by intervals and get max power for each interval
for i in interval_lengths:
    resample_chunk = str(i) + 'S'
    # Get interals for all time
    best_samplechunks = df_samples.groupby(['workoutId']).resample(resample_chunk).mean().reset_index()
    best_samplechunks['interval'] = resample_chunk[:-1]
    # Add max power for given interval to df
    best_interval_df = best_interval_df.append(best_samplechunks.loc[best_samplechunks['power'].idxmax()])

    # Get interals for latest workout
    latest_samplechunks = latest_df.groupby(['workoutId']).resample(resample_chunk).mean().reset_index()
    latest_samplechunks['interval'] = resample_chunk[:-1]
    # Add max power for given interval to df
    latest_interval_df = latest_interval_df.append(latest_samplechunks.loc[latest_samplechunks['power'].idxmax()])

更新

以下是数据的链接: https://www.dropbox.com/s/f8vd8lducriki5l/sample.csv?dl=0

另外,我尝试使用rolling()进行设置...但不要认为我得到了正确的结果:

df_samples = pd.read_csv('sample.csv')
# 1 second intervals from 0-60 seconds
interval_lengths = [i for i in range(1, 61)]
# 15 second intervals from 1:15 - 5:00 mins
interval_lengths += [i for i in range(75, 301, 15)]
# 30 second intervals for everything after 5 mins
interval_lengths += [i for i in range(330, df_samples['ride_length'].max() + 1, 30)]

intervals = df_samples
intervals['power'] = intervals['power'].interpolate()
latest_df = intervals[intervals['workoutId'] == intervals.loc[intervals.index.max]['workoutId']]

best_interval_df = pd.DataFrame()
latest_interval_df = pd.DataFrame()
for i in interval_lengths:
    # Get interals for all time
    temp_df = intervals
    temp_df['best_power'] = intervals.groupby(['workoutId'])['power'].rolling(int(i),min_periods=1).mean().reset_index(0,drop=True)
    temp_df['interval'] = i
    best_interval_df = best_interval_df.append(temp_df.loc[temp_df['best_power'].idxmax()])

    latest_temp_df = latest_df
    latest_temp_df['best_power'] = latest_df.groupby(['workoutId'])['power'].rolling(int(i),min_periods=1).mean().reset_index(0, drop=True)
    latest_temp_df['interval'] = i
    latest_interval_df = latest_interval_df.append(latest_temp_df.loc[latest_temp_df['best_power'].idxmax()])

best_interval_df = best_interval_df.set_index('interval')
latest_interval_df = latest_interval_df.set_index('interval')

0 个答案:

没有答案