Question

此问题基于上一个问题：create new column that compares across rows in pandas dataframe

我想创建一个新列，检查下面n行中的任何一行是否相差超过X.例如，如果我们有一个数据帧，而接下来的4行差异超过1，那么新的值为0.如果接下来的4行差异小于或等于1，那么新值将为1.

>>> df = pandas.DataFrame({"A": [5,6,4,3,5]})
>>> df
   A
0  5
1  6
2  4
3  3
4  5
>>> desired_result = pandas.DataFrame({"A": [5,6,7,8,2], "new":     [1,1,0,0,0]})
>>> desired_result
   A  new
0  5    1
1  6    0
2  4    1
3  3    0
4  5    0

在上面的例子中，值5变为1，因为接下来的两个值之间的绝对差值是＆lt; = 1（abs（5-6）= 1和abs（5-4）= 1）。 / p>

与上述帖子中提供的答案类似，我尝试用以下代码解决案例：

df['new'] = 1
df.loc[abs(df.A -  df.A.shift(-1)) > 1 , 'new'] = 0

这个代码在查看下一行时有效，但我不确定将它扩展到n行的最佳方法是什么。

Answer 1

n = 2  # Number of following rows
x = 1  # Differ by more than 'x'.

>>> pd.concat([(df.A - df.A.shift(-i - 1)).abs().le(x) 
               for i in range(n)], axis=1).any(axis=1) * 1
0    1
1    0
2    1
3    0
4    0
dtype: int64

转换执行n次并与当前值进行比较，得到以下布尔值：

# shift-1 shift-2
       A      A
0   True   True
1  False  False
2   True   True
3  False  False
4  False  False

然后在行之间比较这些结果，寻找任何真实值。最后，布尔结果乘以1将其变为1和0。

`pd.concat(...).any(axis=1) * 1`

Answer 2

您可以使用rolling_max和shift的组合。

例如，如果期间是2，那么

df = pd.DataFrame({"A": [5,6,4,3,5]})
>>> pd.rolling_max(df.A.shift(-1), 2).shift(-1)    
0     6
1     4
2     5
3   NaN
4   NaN
Name: A, dtype: float64

给出接下来2个句点的最大值（注意结尾的两个NaN，其中没有定义）。

通常，对于大小为k的窗口，您可以使用

pd.rolling_max(df.A.shift(-1), k).shift(k - 1)

从这一点开始，您可以将原始系列减1除以结果：

df.A - 1 > pd.rolling_max(...

测量数据帧中下N行的偏差

2 个答案: