CSV中的Python3将每5个重复值移动到新的CSV

时间:2017-08-29 16:29:00

标签: python csv

我有一个大型.csv与企业,业务联系人和联系信息。我的问题是,许多公司有20-50个联系人,我希望每个CSV最多有5个。任何关于如何做到这一点的建议将不胜感激!感谢!!!

1 个答案:

答案 0 :(得分:1)

Pandas非常适合这种情况,以下是如何使用它来做你想做的事情:

import pandas as pd 

# load the csv data into a dataframe 
df = pd.read_csv("link_to_csv_file", sep=",")
# group everything using the  "businesses" column 
df = df.groupby("businesses", as_index=False).head(5) 
# write the results back to a csv file
df.to_csv("cleaned_csv_file.csv", sep=",", index=False) 

您可以按如下方式安装pandas:

pip install pandas 

实施例

这是一个可重复的例子:

>>> import pandas as pd 
>>> f = pd.DataFrame({'id':[1,1,1,2,2,2,2,3,4],'value':[1,2,3,1,2,3,4,1,1], 'business': ["google", "google", "IBM", "Microsoft", "google","IBM", "google", "IBM","Microsoft" ]})
>>> f
    business  id  value
0     google   1      1
1     google   1      2
2        IBM   1      3
3  Microsoft   2      1
4     google   2      2
5        IBM   2      3
6     google   2      4
7        IBM   3      1
8  Microsoft   4      1
>>> f.groupby("business",as_index=False).head(2)
    business  id  value
0     google   1      1
1     google   1      2
2        IBM   1      3
3  Microsoft   2      1
5        IBM   2      3
8  Microsoft   4      1
>>> f.groupby("business",as_index=False).head(2)