遍历CSV文件并创建表

时间:2019-05-12 02:24:39

标签: python pandas csv pandas-groupby

我正在尝试读取.csv文件并提取特定的列,以便我可以输出一个表,该表本质上对特定的列执行“ GROUP BY”并汇总某些其他感兴趣的列(类似于您在SQL中是可以的),但我不太熟悉如何在Python中轻松地做到这一点。

csv文件的格式如下:

age,education,balance,approved
30,primary,1850,yes
54,secondary,800,no
24,tertiary,240,yes

我尝试导入并读取csv文件以解析我关心的三列,并对其进行迭代以将它们放入三个单独的数组列表中。我对包以及如何将它们放入3列的数据框或矩阵中不太熟悉,以便随后可以遍历它们进行变异或执行所有汇总的输出字段(请参见下面的预期结果)。

with open('loans.csv') as csvfile:
    readCSV = csv.reader(csvfile, delimiter = ',')

    next(readCSV)  ##skips header row

    education = []
    balance = []
    loan_approved = []

    for row in readCSV:
        educat = row[1]
        bal = row[2]
        approve = row[3]

        education.append(educat)
        balance.append(bal)
        loan_approved.append(approve)

    print(education)
    print(balance)
    print(loan_approved)

输出将是一个四行的4x7表(按教育程度分组)和以下标头:

Education|#Applicants|Min Bal|Max Bal|#Approved|#Rejected|%Apps Approved
Primary  ...
Secondary  ...
Terciary ...

1 个答案:

答案 0 :(得分:1)

改为使用Pandas似乎要简单得多。例如,您只能读取自己需要的列,而不是全部:

import Pandas as pd

df = pd.read_csv(usecols=['education', 'balance', 'loan_approved'])

现在,要按教育程度分组,您可以找到该列的所有唯一条目并将其分组:

groupby_education = {}
for level in list(set(df['education'])):
    groupby_education[level] = df.loc[df['education'] == level]

print(groupby_education)

我希望这会有所帮助。让我知道您是否仍然需要帮助。 干杯!