提取各种信息

时间:2016-09-24 15:22:54

标签: python csv

概述

希望在写入另一个csv文件之前从2列csv文件中提取姓名,日期和地址等各种信息

条件

  1. 按第一行提取名称,因为它始终是第一行 行。
  2. 通过正则表达式提取日期(在python中有正则表达式吗?)## / ## / ####     格式
  3. 通过常量关键字' road'
  4. 提取地址

    从EXCEL查看的CSV虚拟源数据参考文件格式

           ID,DATA
         88888,DADDY            
         88888,2/06/2016        
         88888,new issac road        
         99999,MUMMY            
         99999,samsung road   
         99999,12/02/2016      
    

    所需的CSV结果

    ID,Name,Address,DATE
    8888,DADDY,new issac road,2/06/2016 
    9999,MUMMY,samsung road,12/02/2016
    

    到目前为止我有什么:

    import csv
    from collections import defaultdict
    
    columns = defaultdict(list) # each value in each column is appended to a list
    
    with open('dummy_data.csv') as f:
        reader = csv.DictReader(f) # read rows into a dictionary format
        for row in reader: # read a row as {column1: value1, column2: value2,...}
            for (k,v) in row.items(): # go over each column name and value 
                columns[k].append(v) # append the value into the appropriate list
                                     # based on column name k
    uniqueidstatement = columns['receipt_id']
    
    print uniqueidstatement
    
    resultFile = open("wtf.csv",'wb')
    wr = csv.writer(resultFile, dialect='excel')
    wr.writerow(uniqueidstatement)
    

1 个答案:

答案 0 :(得分:0)

您可以按ID对这些部分进行分组,然后从每个组中,您可以使用一些简单的逻辑确定哪个是日期,哪个是地址。

import csv
from itertools import groupby
from operator import itemgetter

with open("test.csv") as f, open("out.csv", "w") as out:
    reader = csv.reader(f)
    next(reader)
    writer = csv.writer(out)
    writer.writerow(["ID","NAME","ADDRESS", "DATE"])
    groups = groupby(csv.reader(f), key=itemgetter(0))
    for k, v in groups:
        id_, name = next(v)
        add_date_1, add_date_2 = next(v)[1], next(v)[1]
        date, add = (add_date_1, add_date_2) if "road" in add_date_2 else  (add_date_2, add_date_1)
        writer.writerow([id_, name, add, date])