Question

我的数据框是这样的，基本上是按月访问用户的网站：

month user_id
 1     1
 1     2
 1     1
 1     3
 2     1
 2     2
 2     4
 3     2
 3     5
 3     1

我想创建一个包含0或1的列。每个user_id将仅获得1，而其他时候将获得0。

期望输出示例

month user_id new_column
  1     1       1    
  1     2       1 
  1     1       0 
  1     3       1
  2     1       0 
  2     2       0 
  2     4       1 
  3     2       0
  3     5       1
  3     1       0

Answer 1

我认为您需要设置0列的user_id个重复值：

df['new'] = (~df.duplicated('user_id')).astype(int)

或者：

df['new'] = np.where(df.duplicated('user_id'), 0, 1)

print (df)
   month  user_id  new
0      1        1    1
1      1        2    1
2      1        1    0
3      1        3    1
4      2        1    0
5      2        2    0
6      2        4    1
7      3        2    0
8      3        5    1
9      3        1    0

Answer 2

这是使用基本操作的另一段代码：

i=0
df['new']=""
#a new empty column
for a in range(len(df)):
    if(df.iloc[a,1]>i):
    #get a th index entry for user_id(1)
        df.iloc[a,2]=1
#set value to one
#a,2 means ath entry for 3 column (new)
        i+=1
    else:
        df.iloc[a,2]=0

相对于其他列的值是否重复，将值分配给新列

2 个答案: