如何合并两个csv文件?

时间:2014-03-03 22:01:57

标签: python python-2.7 csv

我有两个csv文件。员工包含公司每位员工的字典,每行有10行信息。社交版包含一份填写调查问卷的员工的字典,其中包含8行信息。调查中的每位员工也都是主要词典。两个dicts都有唯一的标识符(EXTENSION。)

我想说“如果员工在社交词典上,请在员工词典中添加第4,5,6行”换句话说,如果员工填写了调查表,则应附加其他信息。主要词典。

目前,我的计划从员工那里获取了参加调查的员工的所有信息。但我不知道如何将额外的信息行添加到EMPLOYEES csv。我花了大量时间阅读关于DictReader和Dictionary的StackOverflow,但我仍感到困惑。

提前感谢您的指导。

EMPLOYEE示例:

Name  Extension   Job
Bill  1111        plumber
Alice 2222        fisherman
Carl  3333        rodeo clown

示例调查:

Extension   Favorite Color    Book
 2222          blue          A Secret Garden
 3333          green         To Kill a Mockingbird

示例OUTPUT

Name  Extension   Job           Favorite Color     Favorite Book
Bill  1111        plumber
Alice 2222        fisherman         blue             A Secret Garden
Carl  3333        rodeo clown       green            To Kill a Mockingbird


import csv

with open('employees.csv', "rU") as npr_employees:
   employees = csv.DictReader(npr_employees)
   all_employees = {}
   total_employees = {}
   for employee in employees:
       all_employees[employee['Extension']] = employee

with open('social.csv', "rU") as social_employees:
   social_employee = csv.DictReader(social_employees) 
   for row in social_employee:
       print all_employees.get(row['Extension'], None)

2 个答案:

答案 0 :(得分:0)

你可以尝试:

for row in social_employee:
    employee = all_employees.get(row['Extension'], None)
    if employee is not None:
        all_employees[employee['additionalinfo1']] = row['additionalinfo1']
        all_employees[employee['additionalinfo2']] = row['additionalinfo2']

答案 1 :(得分:0)

您可以merge two dictionaries in Python使用:

dict(d1.items() + d2.items())

使用dict,all_employees,将密钥作为“扩展”,可以完美地将“社交员工”行与其对应的“员工”行相关联。

然后,您需要查看所有更新的员工信息,并按照一致的顺序输出其字段。由于词典本质上是无序的,因此我们会在我们看到它们时保留标题列表output_headers

import csv

# Store all the info about the employees
all_employees = {}
output_headers = []

# First, get all employee record info
with open('employees.csv', 'rU') as npr_employees:
    employees = csv.DictReader(npr_employees)
    for employee in employees:
        ext = employee['Extension']
        all_employees[ext] = employee
    # Add headers from "all employees"
    output_headers.extend(employees.fieldnames)

# Then, get all info from social, and update employee info
with open('social.csv', 'rU') as social_employees:
    social_employees = csv.DictReader(social_employees) 
    for social_employee in social_employees:
        ext = social_employee['Extension']

        # Combine the two dictionaries.
        all_employees[ext] = dict(
                all_employees[ext].items() + social_employee.items()
        )

    # Add headers from "social employees", but don't add duplicate fields
    output_headers.extend(
            [field for field in social_employees.fieldnames
            if field not in output_headers]
    )

# Finally, output the records ordered by extension
with open('output.csv', 'wb') as f:
    writer = csv.writer(f)
    writer.writerow(output_headers)

    # Write the new employee rows.  If a field doesn't exist, 
    # write an empty string.
    for employee in sorted(all_employees.values()):
        writer.writerow(
                [employee.get(field, '') for field in output_headers]
        )

输出:

Name,Extension,Job,Favorite Color,Book
Bill,1111,plumber,,
Alice,2222,fisherman,blue,A Secret Garden
Carl,3333,rodeo clown,green,To Kill a Mockingbird

如果您有任何疑问,请与我联系!