在Python中合并两个CSV文件

时间:2015-08-21 15:35:39

标签: python python-2.7 csv

我有两个csv文件,我想从两者的合并中创建第三个csv。这是我的文件的外观:

Num |状态
1213 |封闭
4223 |开放
2311 |开

和另一个文件有这个:

Num |代码
1002 | 9822
1213 | 1891年
4223 | 0011

所以,这是我试图循环的小代码,但它没有打印输出,第三列添加了匹配正确的值。

def links():
    first = open('closed.csv')
    csv_file = csv.reader(first)

    second = open('links.csv')
    csv_file2 = csv.reader(second)

    for row in csv_file:  
        for secrow in csv_file2:                             
            if row[0] == secrow[0]:
                print row[0]+"," +row[1]+","+ secrow[0]
                time.sleep(1)

所以我想要的是:

Num |状态|代码
1213 |关闭| 1891年
4223 |打开| 0011
2311 |打开|空白没有比赛

5 个答案:

答案 0 :(得分:4)

这绝对是pandas的工作。您可以轻松地将两个csv文件作为DataFrames读取,并使用merge或concat。它会更快,只需几行代码即可完成。

答案 1 :(得分:3)

如果您决定使用pandas,则只需五行即可完成。

import pandas as pd

first = pd.read_csv('closed.csv')
second = pd.read_csv('links.csv')

merged = pd.merge(first, second, how='left', on='Num')
merged.to_csv('merged.csv', index=False)

答案 2 :(得分:1)

您可以将第二个文件的值读入字典,然后将它们添加到第一个文件中。

Code = {}
for row in csv_file2:
    Code[row[0]] = row[1]

for row in csv_file1:
    row.append(Code.get(row[0], "blank no match"))

答案 3 :(得分:1)

问题是你只能在csv阅读器上迭代一次,这样csv_file2在第一次迭代后就不起作用了。要解决这个问题,您应该保存csv_file2的输出并迭代保存的列表。 它可能看起来像那样:

import time, csv


def links():
    first = open('closed.csv')
    csv_file = csv.reader(first, delimiter="|")


    second = open('links.csv')
    csv_file2 = csv.reader(second, delimiter="|")

    list=[]
    for row in csv_file2:
        list.append(row)


    for row in csv_file:
        match=False  
        for secrow in list:                             
            if row[0].replace(" ","") == secrow[0].replace(" ",""):
                print row[0] + "," + row[1] + "," + secrow[1]
                match=True
        if not match:
            print row[0] + "," + row[1] + ", blank no match" 
        time.sleep(1)

输出:

Num , status, code
1213 , closed, 1891
4223 , open, 0011
2311 , open, blank no match

答案 4 :(得分:1)

此代码将为您完成:

import csv

def links():

    # open both files
    with open('closed.csv') as closed, open('links.csv') as links:

        # using DictReader instead to be able more easily access information by num
        csv_closed = csv.DictReader(closed)
        csv_links = csv.DictReader(links)

         # create dictionaries out of the two CSV files using dictionary comprehensions
        num_dict = {row['num']:row['status'] for row in csv_closed}
        link_dict = {row['num']:row['code'] for row in csv_links}   

    # print header, each column has width of 8 characters
    print("{0:8} | {1:8} | {2:8}".format("Num", "Status", "Code"))

    # print the information
    for num, status in num_dict.items():

        # note this call to link_dict.get() - we are getting values out of the link dictionary,
        # but specifying a default return value of an empty string if num is not found in it
        # to avoid an exception
        print("{0:8} | {1:8} | {2:8}".format(num, status, link_dict.get(num, '')))

links()

在其中,我正在利用词典,它允许您通过键访问信息。我也使用隐式循环(字典理解),它往往更快,需要更少的代码。

您应该注意这个代码有两个怪癖,您的示例建议没问题:

  1. 订单未保留(因为我们正在使用词典)
  2. 打印输出中不包含links.csv中的
  3. Num 条目,但不包含在closed.csv中
  4. 最后注意:由于您将输入文件称为“CSV”文件,因此我对输入文件的格式进行了一些假设。这是我的输入文件的代码:

    closed.csv

    NUM,状态
    1213,收盘
    4223,开
    2311,打开

    links.csv

    NUM,代码
    1002,9822
    1213,1891
    4223,0011

    鉴于这些输入文件,结果如下所示:

    Num      | Status   | Code  
    1213     | closed   | 1891  
    2311     | open     |  
    4223     | open     | 0011