Question

我想在熊猫数据框架上表达以下内容，但我不知道除了对所有单元格进行慢速手动迭代之外的其他方法。

对于上下文：我有一个包含两类列的数据框，我们将它们称为read_columns和non_read_columns。给定一个列名，我有一个函数可以返回true或false来告诉你该列属于哪个类别。

Given a specific read column A:
    For each row:
        1. Inspect the read column A to get the value X
        2. Find the read column with the smallest value Y that is greater than X.
            If no read column has a value greater than X, then substitute the largest value
            found in all of the *non*-read columns, call it Z, and skip to step 4.
        3. Find the non-read column with the greatest value between X and Y and call its value Z.
        4. Compute Z - X

最后，我希望有一系列Z-X值与原始数据帧具有相同的索引。 请注意，列值的排序顺序在行之间不一致。

最好的方法是什么？

Answer 1

如果不查看示例DF，很难给出答案，但您可以执行以下操作：

将读取的列与Y值分隔为新的DF。
转置此新DF以获取列中的Y值，而不是行。
在Y系列值上使用内置向量化函数，而不是手动迭代行和列。您可以先过滤大于X的值，然后对过滤后的系列应用min（）。

Pandas，每行获得两列之间最大列的值

1 个答案: