合并多个csv文件

时间:2014-01-10 09:33:09

标签: python csv pandas

我有3个csv文件,我想将这3个文件写入单个csv文件,如何实现。 例如

file1.csv

a b c d
1 2 3 4 
5 6 7 8

file 2.csv

e f g h
13 14 15 16
17 18 19 20

file3.csv

i j k l 
9 10 11 12 
21 22 23 24

所需的输出如下

  a b c d e   f g  h  i j  k  l
  1 2 3 4 13 14 15 16 9 10 11 12
  5 6 7 8 17 18 19 20 21 22 23 24

8 个答案:

答案 0 :(得分:5)

您可以使用数据操作工具pandas

import pandas as pd

df1 = pd.read_csv('file1.csv')
df2 = pd.read_csv('file2.csv')
df3 = pd.read_csv('file3.csv')

df_combined = pd.concat([df1, df2, df3],axis=1)
df_combined.to_csv('output.csv', index=None)

然后你得到组合的csv文件output.csv

答案 1 :(得分:1)

这些人是对的,你不应该要求代码。尽管如此,我发现这项任务足以让三分钟投入资金来解决这个问题:

import csv

allColumns = []
for dataFileName in [ 'a.csv', 'b.csv', 'c.csv' ]:
  with open(dataFileName) as dataFile:
    fileColumns = zip(*list(csv.reader(dataFile, delimiter=' ')))
    allColumns += fileColumns

allRows = zip(*allColumns)

with open('combined.csv', 'w') as resultFile:
  writer = csv.writer(resultFile, delimiter=' ')
  for row in allRows:
    writer.writerow(row)

请注意,此解决方案可能无法适用于大输入。它还假设所有文件都有相同数量的行(行),如果不是这样,可能会中断。

答案 2 :(得分:1)

Python Pandas方式。

(上述代码的略有改进版本)

import pandas as pd

files = ['file1.csv', 'file2.csv', 'file3.csv']

df_combined = pd.concat(map(pd.read_csv, files))
df_combined.to_csv('output.csv', index=None)

然后你得到组合的csv文件output.csv

Unix命令行方式。

paste -d" " file1.txt file2.txt

如果您使用的是UNIX类型操作系统,请检查您是否只关心合并文件how to merge two files consistently line by line

一帆风顺。

答案 3 :(得分:0)

一个想法可能是使用zip功能

file1 = "a b c d\n1 2 3 4\n5 6 7 8"
file2 = "e f g h\n13 14 15 16\n17 18 19 20"
file3 = "i j k l\n9 10 11 12\n21 22 23 24"

merged_file =[i+" " +j+" " +k for i,j,k in zip(file1.split('\n'),file2.split('\n'),file3.split('\n'))]
for i in merged_file:
   print i

答案 4 :(得分:0)

考虑所有文件都有相同的行。此解决方案也适用于大输入,因为只有3行(每个文件一行)一次被带入内存。

import csv
with open('foo1.txt') as f1, open('foo2.txt') as f2, \
     open('foo2.txt') as f3, open('out.txt', 'w') as f_out:

     writer = csv.writer(f_out, delimiter=' ')
     readers = [csv.reader(x, delimiter=' ') for x in (f1, f2, f3)]
     while True:
         try:
             writer.writerow([y for w in readers for y in next(w)])
         except StopIteration:
             break

上述代码的基于for循环的版本,但这需要首先迭代其中一个文件以获取行数:

import csv
with open('foo1.txt') as f1, open('foo2.txt') as f2, \
     open('foo2.txt') as f3, open('out.txt', 'w') as f_out:

     writer = csv.writer(f_out, delimiter=' ')
     lines = sum(1 for _ in f1) #Number of lines in f1
     f1.seek(0)                 #Move the file pointer to the start of file 
     readers = [csv.reader(x, delimiter=' ') for x in (f1, f2, f3)]
     for _ in range(lines):
         writer.writerow([y for w in readers for y in next(w)])

答案 5 :(得分:0)

inputs = 'file1.csv', 'file2.csv', 'file3.csv'

with open('out.csv','w') as output:
    for line in zip(*map(open, inputs)):
        output.write('%s\n'%' '.join(i.strip() for i in line))

编辑:
这是一个详细的版本。

inputs = 'file1.csv', 'file2.csv', 'file3.csv'

# open all input files
inputs = map(open, inputs)

with open('out.csv','w') as output:

    # iter over all the input files at the same time
    for line in zip(*inputs):

        # format the output line from input lines
        line = ' '.join(i.strip() for i in line)

        output.write('%s\n' % line)

答案 6 :(得分:0)

除了第一个答案,这是正确的答案之外,您可以通过以下方式处理文件夹中任意数量的csv文件更为通用:

import os
import pandas as pd

folder = r"C:\MyFolder"

frames = [pd.read_csv(os.path.join(folder,name) for name in os.listdir(folder) if name.endswith('.csv')]

merged = pd.concat(frames)

文档: http://pandas.pydata.org/pandas-docs/dev/merging.html

答案 7 :(得分:0)

首先考虑使用pandas模块,就像在waitingkuo的回答中一样。但我想你也可以使用DictWriter ......

import csv

# Initialize output file
header = [x for x in 'abcdefghijkl']    
output = csv.DictWriter(open('final_output.csv', 'wb'), fieldnames = header)
output.writerow(dict(zip(header, header))) 

# Compile contents of all three files into a single dictionary, outputdict
outputdict = {key:[] for key in header}
for fname in ['file1.csv', 'file2.csv', 'file3.csv']: 
    f = csv.DictReader(open(fname, 'r')) 
   [(outputdict[k]).append(line[k]) for k in line for line in f]


# Transfer the contents of outputdict into a csv file
[output.writerow(l) for l in outputdict]