根据另一个文件的zip合并更新csv文件中的城市

时间:2018-10-19 13:36:32

标签: python merge

我有csv文件file1.csv

Territory   Sales     Zipcode    city   statename
00001000      10         99764    

另一个包含城市详细信息的文件

Zipcode   city      Statename 
99764     Northway   Alaska

我想像下面那样更新file1.csv

Territory   Sales     Zipcode    city      statename
00001000      10         99764   Northway   Alaska

这就像SQL中的典型更新语句

UPDATE file1 SET file1.value = (SELECT table2.CODE
                                  FROM file2 
                                  WHERE table1.value = table2.DESC)

我如何在python中做到这一点?

3 个答案:

答案 0 :(得分:3)

import pandas as pd
file1 = pd.read_csv('file1.csv')
file2 = pd.read_csv('file2.csv')
df = pd.merge(file1,file2,how='left', on = 'Zipcode')
df.to_csv('new_file.csv')

答案 1 :(得分:1)

如果您无权访问或不想安装pandas,则可以使用csv模块。请注意,使用中间字典d2将邮政编码映射到file2.csv中的城市和州名称:

with open('file1.csv') as file1, open('file2.csv') as file2, open('output.csv', 'w') as outfile:
    output = csv.writer(outfile, delimiter=' ')
    d2 = {zip: cols for zip, *cols in csv.reader(file2, delimiter=' ', skipinitialspace=True)}
    for *cols, zip in csv.reader(file1, delimiter=' ', skipinitialspace=True):
        output.writerow([*cols, zip, *d2.get(zip, [])])

给出file1.csv以下内容:

Territory   Sales     Zipcode    city   statename
00001000      10         99764
00001001      11         99999

并为file2.csv提供以下内容:

Zipcode   city      Statename
99764     Northway   Alaska
99999     Somewhere  CoolState

output.csv将具有以下内容:

Territory Sales Zipcode city statename
00001000 10 99764 Northway Alaska
00001001 11 99999 Somewhere CoolState

还请注意,由于城市名称和州名称可以包含空格,因此应避免使用空格作为分隔符,而应改用实际的逗号,在这种情况下,您可以从{{1 }}。

答案 2 :(得分:0)

您提供的文件格式不正确,因为它们包含多个空格。在示例中,DSV文件的每一列都需要用单个特殊字符(例如)分隔。

在此示例中,我使用的是Pandas,但是由于Pandas有时在使用空格作为分隔符时遇到麻烦,因此我像下面这样转换了文件:

file1.csv

Territory,Sales,Zipcode
00001000,10,99764    

file2.csv

Zipcode,city,Statename 
99764,Northway,Alaska

一个利用Pandas写入file3.csv的脚本如下所示:

import pandas as pd

# Load both files via pandas
file1 = pd.read_csv('file1.csv', sep=',')
file2 = pd.read_csv('file2.csv', sep=',')

# Merge results and save them
merge = file1.merge(file2, on='Zipcode')
merge.to_csv('file3.csv', sep=',', index=None)

您也可以使用sep=' ',但我建议不要这样做,因为DSV文件已损坏,如前所述。

相关问题