Question

我有一个带有 2 个日期列的数据框：

 ---------------------------- 
| date_created |  date_ended |
|--------------| ----------- |
|20/12/01      | 20/11/01    |
|20/12/01      | 20/12/02    |
|20/12/02      | 20/12/02    |
|20/12/02      | 20/12/03    |
|20/12/02      | 20/12/03    |
|20/12/03      | 20/12/03    |
|20/12/03      | 20/12/04    |
 ----------------------------

当两列值（日期）相同时，我需要计算两列中的行数，即我需要的输出：

 ------------------------------------------
| date_index   |created_count| ended_count |
|--------------| ----------- | ----------- |
|20/11/01      |      0      |      1      |
|20/12/01      |      2      |      0      |
|20/12/02      |      3      |      2      |
|20/12/03      |      2      |      3      |
|20/12/04      |      0      |      1      |
 ------------------------------------------

我一直在逐一计算各个列，然后匹配相同的日期索引。有没有干净的方法来实现这一目标？如果有人可以提供帮助。

Answer 1

你可以这样做：

res = pd.concat((df['date_created'].value_counts(),
                 df['date_ended'].value_counts()),
                  axis=1, sort=True).fillna(0).astype(int)
print(res)

输出

          date_created  date_ended
20/11/01             0           1
20/12/01             2           0
20/12/02             3           2
20/12/03             2           3
20/12/04             0           1

Answer 2

使用 DataFrame.apply 和 value_counts，将不匹配的 NaN 替换为 0 并最后转换为整数：

df = df.apply(pd.value_counts).fillna(0).astype(int)
print (df)
         date_created  date_ended
20/11/01             0           1
20/12/01             2           0
20/12/02             3           2
20/12/03             2           3
20/12/04             0           1

如果要过滤列进行处理：

cols = ['date_created','date_ended']
df = df[cols].apply(pd.value_counts).fillna(0).astype(int)
print (df)

          date_created  date_ended
20/11/01             0           1
20/12/01             2           0
20/12/02             3           2
20/12/03             2           3
20/12/04             0           1

在python中计算具有2个不同日期时间列的行

2 个答案: