Question

我正在努力扩展我的熊猫技能。我有一个像这样的pandas数据框：

df

      Group 1     Group 2            Product ID
0   Products      International      X11
1   Products      International      X11
2   Products      Domestic           X11
3   Products      Domestic           X23
4   Services      Professional       X23
5   Services      Professional       X23
6   Services      Analytics          X25

我正在尝试使用一些pandas功能来获取第1组和第2组的值发生变化的索引。我知道我可能不得不逐列，并将这些索引附加到不同的列表中。

我引用了这个问题Find index where elements change value pandas dataframe，这是我能找到的最接近的类似问题。

我想获得这样的输出：

 Group 1 changes = [0,4]
 Group 2 changes = [0,2,4,6]

如果列中的两个值相同，pandas有哪些内置功能可以快速引用，然后获取该索引？

我的所有数据都按组排序，因此如果解决方案确实涉及逐行迭代，则不应遇到任何问题。

非常感谢任何帮助！

Answer 1

使用

In [91]: df.ne(df.shift()).apply(lambda x: x.index[x].tolist())
Out[91]:
Group 1             [0, 4]
Group 2       [0, 2, 4, 6]
Product ID       [0, 3, 6]
dtype: object

In [92]: df.ne(df.shift()).filter(like='Group').apply(lambda x: x.index[x].tolist())
Out[92]:
Group 1          [0, 4]
Group 2    [0, 2, 4, 6]
dtype: object

也适用于dict，

In [107]: {k: s.index[s].tolist() for k, s in df.ne(df.shift()).filter(like='Group').items()}
Out[107]: {'Group 1': [0L, 4L], 'Group 2': [0L, 2L, 4L, 6L]}

Answer 2

这是一只非熊猫的解决方案。我喜欢它，因为它很直观，不需要理解大型pandas库。

changes = {}

for col in df.columns:
    changes[col] = [0] + list(idx for idx, (i, j) in enumerate(zip(df[col], df[col][1:]), 1) if i != j)

# {'Group 1': [0, 4], 'Group 2': [0, 2, 4, 6], 'Product ID': [0, 3, 6]}

获取pandas dataframe列

2 个答案: