解析文本文件

时间:2016-03-03 15:37:16

标签: python parsing text

我是python的新手,我正在寻找使用如下数据解析几个文本文件(~5000):

  

随机文字......
   ID:ABC123456

     

随机文字......

     

标题

     

包含文字

     

结束

     

随机文字......

每个文件大约有3000行,我想将标题结束之间的ID和文本提取到csv文件中,帽子看起来像这样:

  

ID文字

     

ABC123456包含文字1

     

ABC123457包含文字2

非常感谢任何帮助!

这就是我所拥有的:

{
  "name": "example-mean-app-client",
  "dependencies": {},
  "devDependencies": {},
  "ambientDependencies": {
    "bootstrap": "github:DefinitelyTyped/DefinitelyTyped/bootstrap/bootstrap.d.ts#4de74cb527395c13ba20b438c3a7a419ad931f1c",
    "es6-promise": "github:DefinitelyTyped/DefinitelyTyped/es6-promise/es6-promise.d.ts#830e8ebd9ef137d039d5c7ede24a421f08595f83",
    "es6-shim": "github:DefinitelyTyped/DefinitelyTyped/es6-shim/es6-shim.d.ts#4de74cb527395c13ba20b438c3a7a419ad931f1c",
    "jasmine": "github:DefinitelyTyped/DefinitelyTyped/jasmine/jasmine.d.ts#dd638012d63e069f2c99d06ef4dcc9616a943ee4",
    "karma": "github:DefinitelyTyped/DefinitelyTyped/karma/karma.d.ts#02dd2f323e1bcb8a823269f89e0909ec9e5e38b5",
    "karma-jasmine": "github:DefinitelyTyped/DefinitelyTyped/karma-jasmine/karma-jasmine.d.ts#661e01689612eeb784e931e4f5274d4ea5d588b7",
    "systemjs": "github:DefinitelyTyped/DefinitelyTyped/systemjs/systemjs.d.ts#83af898254689400de8fb6495c34119ae57ec3fe",
    "zone.js": "github:DefinitelyTyped/DefinitelyTyped/zone.js/zone.js.d.ts#9027703c0bd831319dcdf7f3169f7a468537f448"
  }
}

1 个答案:

答案 0 :(得分:0)

尝试在readline行之后的while循环中输入类似的内容:

id = None
title_set = True
f = open("test.txt",'r')
while True:
    text = f.readline()
    if text.startswith("ID: "):
        id = text[4:].strip() # The strip() is to remove the newline
    if text == "End":
        title_set = False
    if text == "Title":
        title_set = True
    if title_set and id is not None:
        print(id + " " + text.strip())

这应该按照您的需要打印所有行(除非格式化)。

将这些行写入另一个文件归结为将print(...)替换为other_file.write(...),其中other_file是使用写入权限打开的其他文件的句柄。