pandas groupby + list

时间:2017-01-25 16:02:51

标签: python pandas

熊猫新手很抱歉,如果这是老帽子。我尝试完成的内容与grouping rows in list in pandas groupby中包含的内容类似,但我有两列以上,无法弄清楚如何将所有列与分组值一起显示。这就是我想要做的事情。

data = [{'ip': '192.168.1.1', 'make': 'Dell', 'model': 'UltraServ9000'},
{'ip': '192.168.1.3', 'make': 'Dell', 'model': 'MiniServ'},
{'ip': '192.168.1.5', 'make': 'Dell', 'model': 'UltraServ9000'},
{'ip': '192.168.1.6', 'make': 'HP', 'model': 'Thinger3000'},
{'ip': '192.168.1.8', 'make': 'HP', 'model': 'Thinger3000'}]

In [2]: df = pd.DataFrame(data)
In [3]: df
Out[4]:
            ip  make          model
0  192.168.1.1  Dell  UltraServ9000
1  192.168.1.3  Dell       MiniServ
2  192.168.1.5  Dell  UltraServ9000
3  192.168.1.6    HP    Thinger3000
4  192.168.1.8    HP    Thinger3000    

<magic>

Out[?]:    
            ip               make           model
0  192.168.1.1, 192.168.1.5  Dell   UltraServ9000
1  192.168.1.3               Dell        MiniServ
3  192.168.1.6, 192.168.1.8  HP       Thinger3000

提前致谢:)

1 个答案:

答案 0 :(得分:2)

groupby takes a parameter, by, through which you can specify a list of variables you want to operate your groupby over. So the answer of that question is modified as follows:

df.groupby(by = ["a", "c"])["b"].apply(list).reset_index()

EDIT: Looking at your comment: since all columns other than a have the same values, you can list them easily in the by parameter because they won't affect the result. To save you time and prevent you to actually type all the names you could do something like this:

df.groupby(by = list(set(df.columns) - set(["b"])))["b"].apply(list).reset_index()

Alternatively, you could exploit the agg function by passing a dictionary which for all columns will take the max and for b will return the list:

aggregate_functions = {x: max for x in df.columns if x != "a" and x != "b"}
aggregate_functions["b"] = lambda x: list(x)
df.groupby(by = "a").agg(aggregate_functions)

Which you prefer is up to you, probably the latter is more readable.