查询CSV并将原始CSV和结果写入单个CSV Python

时间:2013-02-13 21:37:23

标签: python regex parsing csv

我正在尝试解析CSV,如果在任一列中满足条件,请将其写入新的csv。

例如

如果我的csv看起来像

123 Some Street
Flat 1, 21 Other road
House, Someother street

我需要分析每一行,所以如果一个数字出现在第一列而不是第二列,那么我需要提取该数字,如果两列中都有数字,那么我需要提取两个,如果没有数字然后我需要在第一列中提取文本。 然后用2个原始列和3个新的数字1,数字2,文本写一个新的csv。即平号,门牌号码,门牌号码。 所以新的CSV看起来像

123 Some Street, , 123, 
Flat 1, 21 Other road, 1, 21,
House, Someother street, , , House.

任何指导都会非常有用。

由于

被修改

import csv

csvFile = 'myData.csv'
csvOut = 'myOut.csv'

reader = csv.reader(csvFile)
writer = csv.writer(csvOut)

for row in reader:
    num = \d | \d\d | \d\d\d
     if row [0] || row [1] == num
        if row [1] == num
            writer.row [3]
        else row [0] == num
            writer.row [2]
            writer.row [3]
    else writer.row [0] [2]

csvOut.close()

再次编辑

我希望这可能是一个更清晰的探索:

我希望输出为新的CSV,原始数据在行[0],[1]中然后如果行中只有一个数字,即写入行[3]的门牌号,如果一行中有2个数字(行[0]和行[1]),那么它们应分别写入行[2]和[3],如果没有数字,则写入行[0]的字符串排[4]。最后,我需要将公寓号码,门牌号码和房屋名称分成3个不同的栏目。

进一步编辑

我一直在研究代码,现在有了以下内容,我觉得我越来越近但仍有一段距离?

import csv
import re

csvFile = open(myData.csv, 'rb')
csvOut = open(myOut.csv, 'wb')

reader = csv.reader(csvFile)
writer = csv.writer(csvOut)

for row in reader:
    a = row [0] re.compile('\d' | '\d\d' | '\d\d\d')
    a1 = row [0] re.compile('\d' | '\d\d' | '\d\d\d')
    b = row [1] 
    b1 = row [1] re.compile('\d' | '\d\d' | '\d\d\d')
        if b = re.compile('\d' | '\d\d' | '\d\d\d')
            writer.writerow(a,b,a1,b1, )
        elif a = re.compile('\d' | '\d\d' | '\d\d\d')
            witer.writerow(a,b, , b1, )
        else
            writer.writerow(a,b, , ,a)

csvOut.close()

由于

2 个答案:

答案 0 :(得分:0)

这可能会给我一个线索,因为我不完全确定你需要什么。

$cat t1

123 Some Street
Flat 1, 21 Other road
House, 23 Someother street

实施例

import csv
import re
p = re.compile('\d+')
for row in csv.reader(open('t1')):
    print "ROW", row
    match = p.search(row[0])
    if match:
        print "\t#1", match.group()
    if len(row) > 1:
        match = p.search(row[1])
        if match:
            print "\t#2", match.group()

输出

ROW ['123 Some Street']
    #1 123
ROW ['Flat 1', ' 21 Other road']
    #1 1
    #2 21
ROW ['House', ' 23 Someother street']
    #2 23

答案 1 :(得分:0)

以下代码可能会执行您需要的所有操作。对于输出,只需索引元组并写出所需的组件。每个结果都有6个元素

#(flat str, flat #, street str, street #, street, street type)

a = """
123 Some Street
Flat 1, 21 Other road
House, Someother street
"""

import re
#flat gets a word, 0 or more spaces, 0 or more digits
flat    = "([a-z]+ *(\d+)*)"
#street gets 0 or more digits, 1 or more spaces, 1 or more words with a space consuming until it hits street, or road or drive
street  = "((\d+)* +([a-z]+ )+?(street|road|drive))"
address = "%s*.*?%s" % (flat,street)
m       = re.compile(r"%s" % address, re.I)
results = m.findall(a)
with('output.csv','w') as fout:
    #whatever you wish to name your columns
    fout.write("Building,Address,Suite Number, Building Number")
    for r in results:
        fout.write("%s,%s,%s,%s" % (r[0],r[2],r[1],r[3]))

结果

[('', '', '123 Some Street', '123', 'Some ', 'Street'),
 ('Flat 1', '1', '21 Other road', '21', 'Other ', 'road'),
 ('House', '', ' Someother street', '', 'Someother ', 'street')]
相关问题