Question

我有一个df，其中有两列 uid 和 p ，我想在现有df中添加一个新列，或者使用附加内容创建一个全新的df其值基于列＆＃34; uid＆＃34;的值的列和索引列表 x：

x = [2,9,12]

x包含我应该生成新id的索引，新的id是前一个的增量。因此，有两种情况，只要遇到列表x中的索引，就会生成新的id，只要 uid 列中的ID发生变化，就会再次生成新的id，如下所示：

     uid       expected_newid     p     

0      1       1                 10     
1      1       1                 23    
2      1       2                 20  #new id generated at index 2    
3      1       2                 40
4      2       3                 21  #newid generated as "uid" changes
5      2       3                 89
6      2       3                 45
7      3       4                 50
8      3       4                 32
9      3       5                 76  #new id generated at index 9
10     3       5                 71 
11     3       5                 90
12     3       6                 56  #new id generated at index 12
13     3       6                 87

如果有任何不清楚的地方，请告诉我。

只要 uid 使用以下代码更改，我就可以管理案例以生成新ID

df['newid'] = (df.uid.diff() != 0).cumsum()

但它也应该在列表x中提到的索引处生成newid，如＃34; expected_newid＆＃34;

列中所示

Answer 1

IIUC，您可以简单地扩展您当前使用的条件，以便使用＆＃34;或＆＃34;来包含索引在x中的可能性。（此处写作|）：

In [12]: df["newid"] = ((df.uid.diff() != 0) | (df.index.isin(x))).cumsum()

In [13]: df
Out[13]: 
    uid  expected_newid   p  newid
0     1               1  10      1
1     1               1  23      1
2     1               2  20      2
3     1               2  40      2
4     2               3  21      3
5     2               3  89      3
6     2               3  45      3
7     3               4  50      4
8     3               4  32      4
9     3               5  76      5
10    3               5  71      5
11    3               5  90      5
12    3               6  56      6
13    3               6  87      6

Answer 2

这个怎么样：

df = pd.DataFrame({'uid': [1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3]})
# Add column uidd that is a 1 (True) when the existing id changes
df['uidd'] = (df.uid.diff() != 0)
# Add a column that is 0 except at the indices in your list
# where it is 1
df['byidx'] = 0
df.loc[[2, 9, 12], 'byidx'] = 1
# now combine them so we get a 1 where either has changed
df['both'] = df.uidd + byidx
# And finally, cumsum will generate the correct ids
df['newuid'] = df.both.cumsum()

结果是：

    uid   uidd  byidx  both  newuid
0     1   True      0     1       1
1     1  False      0     0       1
2     1  False      1     1       2
3     1  False      0     0       2
4     2   True      0     1       3
5     2  False      0     0       3
6     2  False      0     0       3
7     3   True      0     1       4
8     3  False      0     0       4
9     3  False      1     1       5
10    3  False      0     0       5
11    3  False      0     0       5
12    3  False      1     1       6
13    3  False      0     0       6

当然，您无需将列uidd，byidx或both添加到数据框中，您可以将所有这些步骤组合在一起。我只是想把它们分开，这可能会让事情变得更清楚。您也可以像上面那样添加它们，然后只删除临时列。

根据索引列更改pandas列的值

2 个答案: