Question

我有一个这样的数据框

[1,2],[1,5],[1,2] and [3,4]

我想对每个A都有两个B值的出现计数，以获得B的两个值最常见的组合（顺序无关紧要）我希望结果为：oc=pd.DataFrame(columns=['A','B_combination']) oc['B_combination']=df.astype('str').groupby('A')['B'].agg([ ';'.join,lambda x: set(x.tolist())])['<lambda_0>'].values oc['A']=df.astype('str').groupby('A')['B'].agg([ ';'.join,lambda x: set(x.tolist())])['<lambda_0>'].index，因为B的值出现得最多（我的意思是同一个A）我已经尝试过了：

|A |B_combination|
---+--------------
|1 |{2, 1, 3, 5} |
|2 |{2, 1, 5}    |
|3 |{2, 1, 5}    |
|4 |{4, 3}       |
|5 |{4, 3}       |
|6 |{2, 1, 3, 5} |
|7 |{4, 3, 5}    |

获得这样的不同组合：

oc.groupby('B_combination').count()

但是当我申请

{{1}}

要获得最常用的组合，它不起作用，因为这是我尝试转换为列表的集合，但同样没有用

Answer 1

让itertools.combinations和groupby()一起尝试：

(df.groupby('A')['B']
   .apply(lambda x: pd.Series([tuple(sorted(x)) for x in combinations(x,2)]).value_counts())
   .reset_index()
)

输出：

    A level_1  B
0   1  (3, 5)  1
1   1  (2, 5)  1
2   1  (3, 1)  1
3   1  (3, 2)  1
4   1  (1, 5)  1
5   1  (1, 2)  1
6   2  (2, 1)  1
7   2  (5, 1)  1
8   2  (5, 2)  1
9   3  (2, 5)  1
10  3  (1, 5)  1
11  3  (1, 2)  1
12  4  (3, 4)  1
13  5  (4, 3)  1
14  6  (5, 3)  1
15  6  (2, 1)  1
16  6  (2, 3)  1
17  6  (1, 3)  1
18  6  (2, 5)  1
19  6  (5, 1)  1
20  7  (5, 3)  1
21  7  (5, 4)  1
22  7  (4, 3)  1

Answer 2

df.groupby('A')['B'].apply(set).reset_index(name='B_combination')

按列分组值组合

2 个答案: