文本文件到CSV的转换

时间:2019-04-16 18:31:06

标签: python-3.x

我有一个文本文件,其内容类似于:

Name: Aar saa
 Last Name: sh
 DOB: 1997-03-22
 Phone: 1212222
 Graduation: B.Tech
 Specialization: CSE
 Graduation Pass Out: 2019
 Graduation Percentage: 60
 Higher Secondary Percentage: 65
 Higher Secondary School Name: Guru Nanak Dev University,amritsar
 City: hyd
 Venue Details: CMR College of Engineering & Technology (CMRCET) Medchal Road, TS � 501401

Name: bfdg df
 Last Name: df
 DOB: 2005-12-16
 Phone: 2222222
 Graduation: B.Tech
 Specialization: EEE
 Graduation Pass Out: 2018
 Graduation Percentage: 45
 Higher Secondary Percentage: 45
 Higher Secondary School Name: asddasd
 City: vjd
 Venue Details: Prasad V. Potluri Siddhartha Institute Of Technology, Kanuru, AP - 520007

Name: cc dd ee
 Last Name: ee
 DOB: 1995-07-28
 Phone: 444444444
 Graduation: B.Tech
 Specialization: ECE
 Graduation Pass Out: 2019
 Graduation Percentage: 75
 Higher Secondary Percentage: 93
 Higher Secondary School Name: Sasi institute of technology and engineering
 City: hyd
 Venue Details: CMR College of Engineering & Technology (CMRCET) Medchal Road, TS � 501401

我想将其标头转换为

的CSV文件

[“姓名”,“姓氏”,“ DOB”,“电话”,“毕业”,“专业化”,“毕业证书”,“高中名称”,“城市”,“场地详细信息” ]

以值作为':'之后的所有值

我已经做了类似的事情:

writer = csv.writer(open('result.csv', 'a'))
writer.writerow(['Name', 'Last Name','DOB', 'Phone', 'Graduation','Specialization','Graduation Pass Out','Graduation Percentage','Higher Secondary Percentage','Higher Secondary School Name','City','Venue Details'])

with open('Name2.txt') as f:
        text = f.read()
        myarray = text.split("\n\n")
        for text1 in myarray:
            parselines(text1, writer)

def parselines(lines,writer):
    data=[]
    for line in lines.split('\n'):
        Name = line.split(": ",1)[1]
        data.append(Name)
    writer.writerow(data)

它起作用了,但是任何有效的方法都将不胜感激。

1 个答案:

答案 0 :(得分:0)

此算法有效(一种状态机)

  1. 如果为空行,请换一个新行
  2. 否则:添加到当前行,收集所有标题和字段
def parselines(lines):
    header = []
    csvrows = [{}]
    for line in lines:
        line = line.strip()
        if not line:
           csvrows.append({})  # new row, in dict form
        else:
           field, data = line.split(":", 1)
           csvrows[-1][field] = data
           if field not in header:
               header.append(field)
    # format CSV
    print(",".join(header))
    for row in csvrows:
        print(",".join(row.get(h,"") for h in header))