Question

我有一个双列DataFrame df，其列为phone和label，label只能为0或1。这是一个例子：

phone  label
   a       0
   b       1
   a       1
   a       0
   c       0
   b       0

我想要做的是计算＆＃39; 1＆＃39;每种类型的电话＆＃39;并使用该号码取代了电话＆＃39;柱我跟我说的是groupby，但我不熟悉它

答案应该是：

Count the number of each 'phone'
phone    count
   a         1
   b         1
   c         0

replace the 'phone' with 'count' in the original table
phone
   1
   1
   1
   1
   0
   1

Answer 1

考虑到label列只能有0或1，您可以使用.trasnform('sum')方法：

In [4]: df.label = df.groupby('phone')['label'].transform('sum')

In [5]: df
Out[5]:
  phone  label
0     a      1
1     b      1
2     a      1
3     a      1
4     c      0
5     b      1

说明：

In [2]: df
Out[2]:
  phone  label
0     a      0
1     b      1
2     a      1
3     a      0
4     c      0
5     b      0

In [3]: df.groupby('phone')['label'].transform('sum')
Out[3]:
0    1
1    1
2    1
3    1
4    0
5    1
dtype: int64

Answer 2

您可以在pandas中过滤和分组数据。对于你的情况，它看起来像

假设数据

  phone  label
0     a      0
1     b      1
2     a      1
3     a      1
4     c      1
5     d      1
6     a      0
7     c      0
8     b      0

df.groupby(['phone','label'])['label'].count()
phone  label
a      0        2
       1        2
b      0        1
       1        1
c      0        1
       1        1
d      1        1

如果您需要phones组label==1，请执行此操作 -

#first filter to get only label==1 rows
phone_rows_label_one_df = df[df.label==1]

#then do groupby
phone_rows_label_one_df.groupby(['phone'])['label'].count()

phone
a    2
b    1
c    1
d    1

要将count作为数据框中的新列，请执行此操作

phone_rows_label_one_df.groupby(['phone'])['label'].count().reset_index(name='count')
  phone  count
0     a      2
1     b      1
2     c      1
3     d      1

Pandas DataFrame使用另一列更新一列

2 个答案: