使用python无法解析文本块:文本文件解析不完整

时间:2019-07-25 05:38:04

标签: python python-3.x file parsing replace

我是一名化学家,对编程非常陌生。我尝试编写程序以使处理数据时的生活更加轻松。一整天都在搜寻StackOverflow之后,我终于能够编写一个简短的python脚本来解析一个文本文件,该文本文件包含用空白行分隔的相似数据块。我的代码运行良好,但无法解析最后一块。我不知道为什么。我尝试搜索答案,但找不到有帮助的答案。

在典型的文本文件中,有361个数据块,每个数据块都包含在3-D空间中构造一个分子的信息,该分子对于一组四个原子具有不同的扭转角。这是我尝试解析的文本文件的示例,该文本文件仅包含前两个块。

!Coordinate: -51.45857  Energy: *****
6 0.006074 0.000915 0.000760
6 0.003070 -0.004811 1.496641
6 1.065644 -0.015789 2.367841
6 2.500078 -0.010542 1.993114
6 3.043633 -0.885454 1.109936
6 2.319723 -2.061360 0.571949
6 1.651211 -3.009615 1.308815
16 0.964940 -4.223294 0.280714
6 1.598121 -3.476004 -1.156548
6 2.300403 -2.353600 -0.830192
1 2.774538 -1.713316 -1.566133
6 1.370973 -4.039010 -2.492108
6 2.306097 -3.847669 -3.514857
6 2.051238 -4.378854 -4.772466
7 0.959825 -5.084236 -5.080872
6 0.075629 -5.271691 -4.098835
6 0.226680 -4.776825 -2.808825
1 -0.547454 -4.952070 -2.067650
1 -0.811208 -5.846075 -4.358490
1 2.771093 -4.237936 -5.576037
1 3.231185 -3.312215 -3.327250
6 1.484740 -3.110171 2.791981
1 2.271126 -2.537323 3.291578
1 0.521994 -2.699519 3.116631
1 1.545489 -4.149268 3.130100
6 4.425208 -0.728995 0.567929
6 5.293981 -1.825349 0.536092
6 6.575924 -1.699782 0.012540
6 7.002467 -0.480078 -0.506308
6 6.138453 0.611969 -0.498798
6 4.860426 0.488085 0.033453
1 4.189564 1.341843 0.040929
1 6.459401 1.563510 -0.912065
1 8.000697 -0.382509 -0.922563
1 7.242127 -2.557541 0.005802
1 4.957135 -2.781274 0.928240
6 3.298894 1.044689 2.682189
6 2.806965 2.352662 2.756428
6 3.525634 3.346796 3.410575
6 4.740700 3.044040 4.018965
6 5.230033 1.741208 3.969123
6 4.514468 0.749369 3.308424
1 4.901734 -0.264238 3.270300
1 6.171693 1.494548 4.450468
1 5.300110 3.817950 4.536063
1 3.131670 4.358007 3.451132
1 1.851909 2.586965 2.294231
6 0.644628 0.032167 3.735978
6 -0.708788 0.041750 3.903716
16 -1.501825 0.018225 2.355367
6 -1.460523 0.074589 5.163238
6 -0.916630 -0.463354 6.334489
6 -1.645855 -0.393694 7.514376
7 -2.861426 0.150339 7.612820
6 -3.380262 0.652483 6.490232
6 -2.733763 0.643955 5.260195
1 -3.211536 1.093615 4.394957
1 -4.369681 1.095963 6.579511
1 -1.232419 -0.806908 8.432018
1 0.055022 -0.946356 6.323493
1 1.348290 0.078304 4.560069
1 -0.126732 -1.007882 -0.406234
1 -0.790297 0.637669 -0.396423
1 0.964526 0.378020 -0.366958

!Coordinate: -52.45859  Energy: *****
6 0.016006 0.016117 -0.001167
6 0.008091 0.004202 1.494640
6 1.068924 -0.017801 2.367520
6 2.503392 -0.009246 1.992562
6 3.048080 -0.887580 1.113704
6 2.322345 -2.062968 0.576734
6 1.653555 -3.010561 1.314091
16 0.963790 -4.222595 0.286393
6 1.595670 -3.475347 -1.151441
6 2.300257 -2.354228 -0.825550
1 2.774156 -1.714212 -1.561877
6 1.365619 -4.037046 -2.487061
6 2.299829 -3.846714 -3.510831
6 2.042363 -4.376373 -4.768547
7 0.949180 -5.079357 -5.076134
6 0.065835 -5.265841 -4.093142
6 0.219443 -4.772314 -2.802916
1 -0.554143 -4.946542 -2.060928
1 -0.822495 -5.838195 -4.352173
1 2.761473 -4.236208 -5.572914
1 3.226175 -3.313192 -3.323941
6 1.489754 -3.111703 2.797517
1 2.276398 -2.538063 3.295797
1 0.527124 -2.702199 3.123917
1 1.552391 -4.150812 3.135284
6 4.429609 -0.733119 0.571119
6 5.297405 -1.830292 0.541209
6 6.579288 -1.706863 0.016976
6 7.006698 -0.488561 -0.504432
6 6.143617 0.604255 -0.498839
6 4.865654 0.482526 0.034036
1 4.195506 1.336862 0.039937
1 6.465258 1.554673 -0.914138
1 8.004858 -0.392683 -0.921241
1 7.244728 -2.565225 0.011647
1 4.959792 -2.785154 0.935276
6 3.299443 1.049518 2.679214
6 2.802410 2.355625 2.752465
6 3.517994 3.353520 3.404255
6 4.735140 3.056448 4.011255
6 5.229631 1.755519 3.962371
6 4.517166 0.759897 3.304042
1 4.908495 -0.252167 3.266846
1 6.172959 1.513307 4.442708
1 5.292160 3.833294 4.526530
1 3.119965 4.363158 3.444095
1 1.845729 2.585480 2.291417
6 0.646126 0.024358 3.735114
6 -0.707598 0.038379 3.900610
16 -1.498235 0.027194 2.350780
6 -1.461222 0.067452 5.159060
6 -0.920246 -0.476558 6.328874
6 -1.650941 -0.410067 7.508021
7 -2.865451 0.136232 7.607012
6 -3.381624 0.644001 6.485729
6 -2.733382 0.639158 5.256592
1 -3.209059 1.093310 4.392548
1 -4.370246 1.089176 6.575392
1 -1.239629 -0.827839 8.424551
1 0.050370 -0.961638 6.317360
1 1.348629 0.063341 4.560569
1 -0.118307 -0.990869 -0.412258
1 -0.776594 0.657152 -0.398933
1 0.977453 0.391011 -0.363519

每个块包含以下信息:

  1. 包含扭转角的标题线。
  2. 标题行之后的每一行包含4列:原子序数,x,y,z

我需要对每个块执行以下操作:

  1. 提取扭转角。提取扭转角后删除线。
  2. 将每个原子序号更改为相应的元素。
  3. 编写一个单独的* .xyz文件,该文件的元素而不是原子数和原子数在顶部。

这是我的代码示例:

import os
import re

#I just paste the file path for now. And change \ to \\ 
filepath = os.path.normpath("file.xyz") 

#Dictionary for atomic number and element
replacements = {'1': 'H', '6': 'C', '7': 'N', '16':'S'} 

#Open read and write files
originalFile = open(filepath, 'r') 
writeEditedFile = open('output_all(edited).txt', 'w')
readEditedFile = open('output_all(edited).txt', 'r')

#Replace atomic numbers with element symbol
for lines in originalFile:
    writeEditedFile.write(re.sub('(^\d+)', lambda m: replacements[m.group()], lines)) 

#Extract torsion angle and append to array
with open('output_all(edited).txt', 'r') as wEF: 
    torsionAngles = []
    for line in wEF:
        if '!' in line:
            for number in line.split():
                try:
                    torsionAngles.append(str(float(number)))
                except ValueError:
                    pass

#Write each line into a new file until a blank line
#The file is closed and a new one is opened
#This should continue until the last block
with readEditedFile as rEF:
    record = 0
    separateFile = open('Step_' + str(record+1) + '_TorsionAngle_' + torsionAngles[record] + '.xyz', 'w')
    separateFile.write('64 \n \n')
    for lines in rEF:
        if lines == "\n":
            record += 1
            separateFile.close()
            separateFile = open('Step_'+ str(record+1) + '_TorsionAngle_' + torsionAngles[record] + '.xyz', 'w')
            separateFile.write('64 \n \n')
        else:
            if '!' in lines:
                lines = ''
            else:
                separateFile.write(lines)

对不起,您的代码草率!这是它输出的前两个文件的示例:

文件名:Step_1_TorsionAngle_-51.45857.xyz

64 

C 0.006074 0.000915 0.000760
C 0.003070 -0.004811 1.496641
C 1.065644 -0.015789 2.367841
C 2.500078 -0.010542 1.993114
C 3.043633 -0.885454 1.109936
C 2.319723 -2.061360 0.571949
C 1.651211 -3.009615 1.308815
S 0.964940 -4.223294 0.280714
C 1.598121 -3.476004 -1.156548
C 2.300403 -2.353600 -0.830192
H 2.774538 -1.713316 -1.566133
C 1.370973 -4.039010 -2.492108
C 2.306097 -3.847669 -3.514857
C 2.051238 -4.378854 -4.772466
N 0.959825 -5.084236 -5.080872
C 0.075629 -5.271691 -4.098835
C 0.226680 -4.776825 -2.808825
H -0.547454 -4.952070 -2.067650
H -0.811208 -5.846075 -4.358490
H 2.771093 -4.237936 -5.576037
H 3.231185 -3.312215 -3.327250
C 1.484740 -3.110171 2.791981
H 2.271126 -2.537323 3.291578
H 0.521994 -2.699519 3.116631
H 1.545489 -4.149268 3.130100
C 4.425208 -0.728995 0.567929
C 5.293981 -1.825349 0.536092
C 6.575924 -1.699782 0.012540
C 7.002467 -0.480078 -0.506308
C 6.138453 0.611969 -0.498798
C 4.860426 0.488085 0.033453
H 4.189564 1.341843 0.040929
H 6.459401 1.563510 -0.912065
H 8.000697 -0.382509 -0.922563
H 7.242127 -2.557541 0.005802
H 4.957135 -2.781274 0.928240
C 3.298894 1.044689 2.682189
C 2.806965 2.352662 2.756428
C 3.525634 3.346796 3.410575
C 4.740700 3.044040 4.018965
C 5.230033 1.741208 3.969123
C 4.514468 0.749369 3.308424
H 4.901734 -0.264238 3.270300
H 6.171693 1.494548 4.450468
H 5.300110 3.817950 4.536063
H 3.131670 4.358007 3.451132
H 1.851909 2.586965 2.294231
C 0.644628 0.032167 3.735978
C -0.708788 0.041750 3.903716
S -1.501825 0.018225 2.355367
C -1.460523 0.074589 5.163238
C -0.916630 -0.463354 6.334489
C -1.645855 -0.393694 7.514376
N -2.861426 0.150339 7.612820
C -3.380262 0.652483 6.490232
C -2.733763 0.643955 5.260195
H -3.211536 1.093615 4.394957
H -4.369681 1.095963 6.579511
H -1.232419 -0.806908 8.432018
H 0.055022 -0.946356 6.323493
H 1.348290 0.078304 4.560069
H -0.126732 -1.007882 -0.406234
H -0.790297 0.637669 -0.396423
H 0.964526 0.378020 -0.366958

文件名:Step_2_TorsionAngle_-52.45859.xyz

64 

C 0.016006 0.016117 -0.001167
C 0.008091 0.004202 1.494640
C 1.068924 -0.017801 2.367520
C 2.503392 -0.009246 1.992562
C 3.048080 -0.887580 1.113704
C 2.322345 -2.062968 0.576734
C 1.653555 -3.010561 1.314091
S 0.963790 -4.222595 0.286393
C 1.595670 -3.475347 -1.151441
C 2.300257 -2.354228 -0.825550
H 2.774156 -1.714212 -1.561877
C 1.365619 -4.037046 -2.487061
C 2.299829 -3.846714 -3.510831
C 2.042363 -4.376373 -4.768547
N 0.949180 -5.079357 -5.076134
C 0.065835 -5.265841 -4.093142
C 0.219443 -4.772314 -2.802916
H -0.554143 -4.946542 -2.060928
H -0.822495 -5.838195 -4.352173
H 2.761473 -4.236208 -5.572914
H 3.226175 -3.313192 -3.323941
C 1.489754 -3.111703 2.797517
H 2.276398 -2.538063 3.295797
H 0.527124 -2.702199 3.123917
H 1.552391 -4.150812 3.135284
C 4.429609 -0.733119 0.571119
C 5.297405 -1.830292 0.541209
C 6.579288 -1.706863 0.016976
C 7.006698 -0.488561 -0.504432
C 6.143617 0.604255 -0.498839
C 4.865654 0.482526 0.034036
H 4.195506 1.336862 0.039937
H 6.465258 1.554673 -0.914138
H 8.004858 -0.392683 -0.921241
H 7.244728 -2.565225 0.011647
H 4.959792 -2.785154 0.935276
C 3.299443 1.049518 2.679214
C 2.802410 2.355625 2.752465
C 3.517994 3.353520 3.404255
C 4.735140 3.056448 4.011255
C 5.229631 1.755519 3.962371
C 4.517166 0.759897 3.304042
H 4.908495 -0.252167 3.266846
H 6.172959 1.513307 4.442708
H 5.292160 3.833294 4.526530
H 3.119965 4.363158 3.444095
H 1.845729 2.585480 2.291417
C 0.646126 0.024358 3.735114
C -0.707598 0.038379 3.900610
S -1.498235 0.027194 2.350780
C -1.461222 0.067452 5.159060
C -0.920246 -0.476558 6.328874
C -1.650941 -0.410067 7.508021
N -2.865451 0.136232 7.607012
C -3.381624 0.644001 6.485729
C -2.733382 0.639158 5.256592
H -3.209059 1.093310 4.392548
H -4.370246 1.089176 6.575392
H -1.239629 -0.827839 8.424551
H 0.050370 -0.961638 6.317360
H 1.348629 0.063341 4.560569
H -0.118307 -0.990869 -0.412258
H -0.776594 0.657152 -0.398933
H 0.977453 0.391011 -0.363519

简单的代码可以完成除最后一个块之外的每个块所需要的功能!任何建议或技巧将不胜感激!感谢您阅读我的帖子!

1 个答案:

答案 0 :(得分:1)

此脚本将从file.txt中读取示例输入数据(按问题编写),并写入两个文件Step_1_TorsionAngle_-51.45857.xyzStep_2_TorsionAngle_-52.45859.xyz

import re

replacements = {'1': 'H', '6': 'C', '7': 'N', '16':'S'}

with open('file.txt', 'r') as f_in:
    data = f_in.read()

torsion_angles = re.findall(r'!Coordinate:\s+(.*?)\s+Energy', data)
blocks = [b.splitlines() for b in re.findall(r'^(\d.*?)(?=\s*!|\Z)', data, flags=re.DOTALL|re.M)]

for step, (angle, block) in enumerate(zip(torsion_angles, blocks), 1):
    with open('Step_{}_TorsionAngle_{}.xyz'.format(step, angle), 'w') as f_out:
        f_out.write(str(len(block)) + '\n\n')
        lines = [' '.join([replacements[s[0]], *s[1:]]) for s in [v.split() for v in block]]
        f_out.write('\n'.join(lines))

文件内容如下:

64

C 0.006074 0.000915 0.000760
C 0.003070 -0.004811 1.496641
C 1.065644 -0.015789 2.367841
C 2.500078 -0.010542 1.993114

...etc.
相关问题