Question

一列中有50多个不同级别，每个级别都需要分解成自己的数据框并写入文件（excel或csv）。

我认为这是一种可能的解决方案：

df1, df2, df3, df4 = [x for _, x in df.groupby(df['column_of_interest'])]

但是有一种方法不对数据帧的数量进行硬编码吗？

Answer 1

有没有一种方法可以不对数据帧的数量进行硬编码？

是的，有。使用字典或列表。使用Title: Cat in the Hat Body: the sun did not shine. it was too wet to play. so we sat in the house all that cold, cold, wet day. Key Phrases: hat, bad, cold, wet day, green eggs：

Key phrases present: hat, cold, wet day

然后通过dict，dfs = {i: x for i, (_, x) in enumerate(df.groupby('column_of_interest'), 1)}等访问数据框。

或者，使用dfs[1]：

dfs[2]

然后使用list，dfs = [x for _, x in df.groupby('column_of_interest')]等

如果您不需要存储数据框切片，只需迭代一个dfs[0]对象并使用dfs[1]。使用f字符串（PEP 498，Python 3.6+）很方便：

groupby

Answer 2

您可以直接保存数据框

[df1.to_csv("coi_%s.csv"%val) for val, df1 in df.groupby(df['column_of_interest'])]

或带有显式的for循环

for val, df1 in df.groupby(df['column_of_interest']):
    #Write the df1 to csv or excel
    df1.to_csv("coi_%s.csv"%val)

Answer 3

一种方法可以使用locals来做到这一点，但不建议这样做，我个人认为jpp的答案是此类请求的正确方法。

variables = locals()
for key,value in df.groupby(df['column_of_interest']):
    variables["df{0}".format(key)]= value

为列中的每个因子级别创建新的数据框

3 个答案: