根据其他一些列对一列进行聚类 Python

时间:2021-06-28 15:32:48

标签: python data-mining

我有一个 Datasetcat1 , cat2 , cat3 , city。 我想在某些集群中获得 cities。 是否可以根据 df['city'] 三列对 other 进行聚类?

1 个答案:

答案 0 :(得分:1)

您可以先对猫进行聚类,然后,由于每组猫对应一个城市,请使用结果标签对城市进行聚类:

>>> import pandas as pd
>>> from sklearn.cluster import KMeans
>>> df = pd.DataFrame({'cat1': [-1, -2, -1, 3, 2], 'cat2': [-2, -1, -3, 1, 2], 'city': ['London', 'Paris', 'Lyon', 'Washington', 'Rome']})
>>> # some pairs of cats are all negative,
>>> # some pics are all positive,
>>> # so we definitely got two clusters
>>> df
   cat1  cat2        city
0    -1    -2      London
1    -2    -1       Paris
2    -1    -3        Lyon
3     3     1  Washington
4     2     2        Rome
>>> X = df[['cat1', 'cat2']].values
>>> X # the cats
array([[-1, -2],
       [-2, -1],
       [-1, -3],
       [ 3,  1],
       [ 2,  2]])
>>> # cluster the cats and get their labels
>>> lab = KMeans(2).fit(X).labels_
>>> lab
array([0, 0, 0, 1, 1], dtype=int32)
>>> # use labels to cluster cities
>>> # London, Paris and Lyon have all-negative cats
>>> df['city'][lab == 0]
0    London
1     Paris
2      Lyon
Name: city, dtype: object
>>> Washington and Rome have all-positive cats
>>> df['city'][lab == 1]
3    Washington
4          Rome
Name: city, dtype: object
>>> 
相关问题