查找给定间隔的近东值

时间:2019-06-26 15:40:50

标签: python pandas numpy vectorization

我每100毫秒进行一次测量。我想通过每10秒或至少选择最接近的值来减少数据。

我在这里做一个小系列,例如10s。 我使用循环,但是我想找到一种简便的方法来避免这种情况。

建议?

import pandas as pd
import numpy as np 

data = pd.Series([0, 1, 2, 8,11,12,26,27,28,31,40,49]) 

time_span = 10 
delta_time = 3

time_10s = np.arange(0,int((max(data)//10)*10)+1,10)
index_list = []

for elt in time_10s:
    min_index = abs(data-elt).idxmin()
    min_value = abs(data-elt).min()
    if min_value < delta_time:
        index_list.append(abs(data-elt).idxmin()) 

print(data[index_list])

我也尝试了一些模数运算,但是却什么也没给出:

A = data % time_span < delta_time 
B = data % time_span > (time_span - delta_time)
C = A | B
D = data[C == True].index.values

谢谢

1 个答案:

答案 0 :(得分:1)

我们可以使用np.searchsorted-

# Get array data for better performance
a = data.to_numpy(copy=False) # data.values on older pandas versions

# Use searchsorted to get right-side closest indices for each of bins
idx0 = np.searchsorted(a,time_10s,'right')

# Get right and left side differences for each of the bins
v1 = time_10s-a[(idx0-1).clip(min=0)]
v2 = a[idx0]-time_10s

# Compare those to see which ones from the left ones are closer
# and thus adjust the indices idx0 accordingly by 1    
idx1 = idx0-(v1<v2)

# Use those indices to get the indexed data and keep the valid ones
# based on the threshold delta_time    
data_f = data[idx1]
out = data_f[np.abs(data_f-time_10s)<delta_time]