如何使用python脚本从文本文件中提取数据?

时间:2013-07-22 08:43:05

标签: python

我在python中编写脚本,应该搜索数据文件,并将相关数据复制到单独的文件中。这是脚本:

#!/usr/bin/env python

import os

os.system("grep \"x \" dynamics.out | awk '{print $2}' > coord.dat")
os.system("grep \"Total\" dynamics.out | awk '{print $4}' > total.dat")
os.system("grep \"Kinetic\" dynamics.out | awk '{print $4}' > kinetic.dat")

问题是这部分使得coord.dat文件完全错误。 dynamics.out文件中的输出不是以此脚本中假定的方式排列的。 数据文件实际上是一长串数据块,如下所示:

    time: 0.2fs
coordinates
C         3.952444338331        0.353499658087        0.155475597879
C         2.898759709487        0.271561183058        2.878962426315
C         0.377507660095        1.575527713456        2.766723501812
N        -0.435656339866        0.616843403256        0.264424997127
C         1.700335308734        1.369156629701       -1.411382740946
C        -2.337147095089       -0.967913098150       -0.045537023463
C        -3.526272967903       -1.434075863003       -2.507321890479
C         1.622297308900        0.380583237194       -4.021983342405
O        -3.540891745414       -1.784144627448        2.005202557948
H         4.590691590007       -1.467822752968       -0.627674161136
H         5.486618188590        1.704246328926        0.014750183919
H         2.660849255805       -1.743362985878        3.501798747714
H         4.277029595067        1.121286334364        4.194254865266
H         0.568970284045        3.642407977900        2.660909012456
H        -1.014510536177        1.242297828699        4.266572018582
H        -3.406669591714        0.378282552422       -3.550366695442
H        -5.529437662690       -2.075200692969       -2.212384192799
H        -2.490387114770       -2.906665564518       -3.579439523150
H         1.675087738572        3.514639806992       -1.458115996333
H        -0.116965875674        1.068581149519       -5.163647181683
H         1.470748269634       -1.655673714451       -4.142927345712
H         3.361564138064        1.115048483423       -4.937771405417
H        -5.134728946067       -2.640023263298        1.567623789643
velocities
        0.000241908669        0.000039611121       -0.000250932377
       -0.000163805243       -0.000115366290       -0.000017375326
       -0.000047784448        0.000248119899       -0.000074616012
        0.000272673498       -0.000017362735        0.000399681421
       -0.000326634443       -0.000254296236        0.000120448584
       -0.000094363714        0.000239927614       -0.000271069374
        0.000122625277        0.000053803004       -0.000088144918
       -0.000112099948       -0.000143815691        0.000140925518
       -0.000020483349       -0.000161160777        0.000050721656
        0.000277228119        0.000550968890       -0.000249788972
        0.000308946542        0.000944826745        0.000083253008
       -0.001453065687        0.000249483273       -0.000194390979
        0.000370071103        0.000328142273       -0.000594811431
        0.000983242907       -0.000247664001       -0.000337676641
        0.000702749595       -0.000531050917       -0.000068247339
       -0.000913913436       -0.000822599342       -0.000519543480
        0.000657300149       -0.001239306947        0.000033192915
        0.000763780031        0.000151892085       -0.000106941733
       -0.000111349513        0.000591872099        0.000360147787
        0.000283007739        0.000537032161        0.000183614425
       -0.001766985000        0.001017499281       -0.000870068723
        0.001560592306       -0.000636221326        0.001124910644
       -0.000596019125        0.001094375746        0.000048984716

Kinetic energy    :        0.030613110934
Electronic energy :      -60.105483063648
Total energy      :      -60.074869952714
Conservation      :       -0.000000051487
self.forces:
       -0.000751933584       -0.004126331042       -0.004033882094
       -0.034302855990        0.029127675777       -0.007001211293
        0.037731564948       -0.009915059812        0.020878531238
       -0.109763802365       -0.102520873021       -0.034608644850
        0.033373305433        0.018949006487       -0.015434320612
        0.078807110369        0.101440274624        0.031960385836
        0.027868883444       -0.012844760956        0.009625682828
        0.011817203866       -0.011548503873        0.038027933611
       -0.016951256413       -0.005848802217       -0.020755575427
        0.002823354740        0.003214778324       -0.005974478408
       -0.012101005124       -0.007850077809        0.000381372379
        0.001958908572       -0.006446492464        0.003077496955
        0.005728700900       -0.005220923285       -0.001710604936
       -0.006358072353       -0.016410380723       -0.003938145281
        0.000121143012        0.012930928986       -0.005592639661
        0.008318475112        0.004530628154        0.009023640965
       -0.010548513939       -0.005006070272        0.008756275583
       -0.000601535778       -0.003075790288       -0.006209965764
       -0.002729816846       -0.003390850759       -0.001421821138
       -0.023967939963        0.000603482820       -0.016983439682
       -0.006731466272        0.010586445711       -0.001984503303
        0.009694983786        0.008555900046        0.002598629870
        0.006564564487        0.004265795793        0.001319282998
state for next step: 1

所以我需要的是来自每个数据块(文件中有数千个)来获取位于“坐标”行和“速度”行之间的部分。我需要将其复制到一个文件中,该文件的编号为23(原子数),然后是空行,然后是这个数据,然后是23,空行,数据......

如何重写脚本来执行此操作?或者有人可以推荐一些我可以学习的文献。我不知道关于Python的第一件事,但我确实有一些C和C ++以及Matlab的经验,我理解编程的基本逻辑。 对于长篇文章感到抱歉,但我认为提供数据样本会很有用。 提前感谢您对此的任何帮助。

2 个答案:

答案 0 :(得分:0)

您可以尝试这样的事情:

whole_data = []
grab_lines = False
with open('input','r') as atom_file:
    molecule_data = ['23\n\n']
    for line in atom_file:
        if line.startswith('coordinates'):
            grab_lines = True
            continue
        elif line.startswith('velocities'):
            grab_lines = False
            if molecule_data:
                #just checks that we aren't appending an empty list.
                molecule_data.append('\n')
                whole_data.append(molecule_data)
                molecule_data = ['23\n\n']
        if grab_lines: #in python 'is True' is implicit for many types.
            molecule_data.append(line)

with open('output','w') as out_file:
    for molecule in whole_data:
        out_file.write(''.join(molecule))

我复制/粘贴了几次输入数据,因此atom_file的内容如下所示:

    time: 0.2fs
coordinates
C         3.952444338331        0.353499658087        0.155475597879
C         2.898759709487        0.271561183058        2.878962426315
C         0.377507660095        1.575527713456        2.766723501812
N        -0.435656339866        0.616843403256        0.264424997127
C         1.700335308734        1.369156629701       -1.411382740946
C        -2.337147095089       -0.967913098150       -0.045537023463
C        -3.526272967903       -1.434075863003       -2.507321890479
C         1.622297308900        0.380583237194       -4.021983342405
O        -3.540891745414       -1.784144627448        2.005202557948
H         4.590691590007       -1.467822752968       -0.627674161136
H         5.486618188590        1.704246328926        0.014750183919
H         2.660849255805       -1.743362985878        3.501798747714
H         4.277029595067        1.121286334364        4.194254865266
H         0.568970284045        3.642407977900        2.660909012456
H        -1.014510536177        1.242297828699        4.266572018582
H        -3.406669591714        0.378282552422       -3.550366695442
H        -5.529437662690       -2.075200692969       -2.212384192799
H        -2.490387114770       -2.906665564518       -3.579439523150
H         1.675087738572        3.514639806992       -1.458115996333
H        -0.116965875674        1.068581149519       -5.163647181683
H         1.470748269634       -1.655673714451       -4.142927345712
H         3.361564138064        1.115048483423       -4.937771405417
H        -5.134728946067       -2.640023263298        1.567623789643
velocities
        0.000241908669        0.000039611121       -0.000250932377
       -0.000163805243       -0.000115366290       -0.000017375326
       -0.000047784448        0.000248119899       -0.000074616012
        0.000272673498       -0.000017362735        0.000399681421
       -0.000326634443       -0.000254296236        0.000120448584
       -0.000094363714        0.000239927614       -0.000271069374
        0.000122625277        0.000053803004       -0.000088144918
       -0.000112099948       -0.000143815691        0.000140925518
       -0.000020483349       -0.000161160777        0.000050721656
        0.000277228119        0.000550968890       -0.000249788972
        0.000308946542        0.000944826745        0.000083253008
       -0.001453065687        0.000249483273       -0.000194390979
        0.000370071103        0.000328142273       -0.000594811431
        0.000983242907       -0.000247664001       -0.000337676641
        0.000702749595       -0.000531050917       -0.000068247339
       -0.000913913436       -0.000822599342       -0.000519543480
        0.000657300149       -0.001239306947        0.000033192915
        0.000763780031        0.000151892085       -0.000106941733
       -0.000111349513        0.000591872099        0.000360147787
        0.000283007739        0.000537032161        0.000183614425
       -0.001766985000        0.001017499281       -0.000870068723
        0.001560592306       -0.000636221326        0.001124910644
       -0.000596019125        0.001094375746        0.000048984716

Kinetic energy    :        0.030613110934
Electronic energy :      -60.105483063648
Total energy      :      -60.074869952714
Conservation      :       -0.000000051487
self.forces:
       -0.000751933584       -0.004126331042       -0.004033882094
       -0.034302855990        0.029127675777       -0.007001211293
        0.037731564948       -0.009915059812        0.020878531238
       -0.109763802365       -0.102520873021       -0.034608644850
        0.033373305433        0.018949006487       -0.015434320612
        0.078807110369        0.101440274624        0.031960385836
        0.027868883444       -0.012844760956        0.009625682828
        0.011817203866       -0.011548503873        0.038027933611
       -0.016951256413       -0.005848802217       -0.020755575427
        0.002823354740        0.003214778324       -0.005974478408
       -0.012101005124       -0.007850077809        0.000381372379
        0.001958908572       -0.006446492464        0.003077496955
        0.005728700900       -0.005220923285       -0.001710604936
       -0.006358072353       -0.016410380723       -0.003938145281
        0.000121143012        0.012930928986       -0.005592639661
        0.008318475112        0.004530628154        0.009023640965
       -0.010548513939       -0.005006070272        0.008756275583
       -0.000601535778       -0.003075790288       -0.006209965764
       -0.002729816846       -0.003390850759       -0.001421821138
       -0.023967939963        0.000603482820       -0.016983439682
       -0.006731466272        0.010586445711       -0.001984503303
        0.009694983786        0.008555900046        0.002598629870
        0.006564564487        0.004265795793        0.001319282998
state for next step: 1

    time: 0.2fs
coordinates
C         3.952444338331        0.353499658087        0.155475597879
C         2.898759709487        0.271561183058        2.878962426315
C         0.377507660095        1.575527713456        2.766723501812
N        -0.435656339866        0.616843403256        0.264424997127
C         1.700335308734        1.369156629701       -1.411382740946
C        -2.337147095089       -0.967913098150       -0.045537023463
C        -3.526272967903       -1.434075863003       -2.507321890479
C         1.622297308900        0.380583237194       -4.021983342405
O        -3.540891745414       -1.784144627448        2.005202557948
H         4.590691590007       -1.467822752968       -0.627674161136
H         5.486618188590        1.704246328926        0.014750183919
H         2.660849255805       -1.743362985878        3.501798747714
H         4.277029595067        1.121286334364        4.194254865266
H         0.568970284045        3.642407977900        2.660909012456
H        -1.014510536177        1.242297828699        4.266572018582
H        -3.406669591714        0.378282552422       -3.550366695442
H        -5.529437662690       -2.075200692969       -2.212384192799
H        -2.490387114770       -2.906665564518       -3.579439523150
H         1.675087738572        3.514639806992       -1.458115996333
H        -0.116965875674        1.068581149519       -5.163647181683
H         1.470748269634       -1.655673714451       -4.142927345712
H         3.361564138064        1.115048483423       -4.937771405417
H        -5.134728946067       -2.640023263298        1.567623789643
velocities
        0.000241908669        0.000039611121       -0.000250932377
       -0.000163805243       -0.000115366290       -0.000017375326
       -0.000047784448        0.000248119899       -0.000074616012
        0.000272673498       -0.000017362735        0.000399681421
       -0.000326634443       -0.000254296236        0.000120448584
       -0.000094363714        0.000239927614       -0.000271069374
        0.000122625277        0.000053803004       -0.000088144918
       -0.000112099948       -0.000143815691        0.000140925518
       -0.000020483349       -0.000161160777        0.000050721656
        0.000277228119        0.000550968890       -0.000249788972
        0.000308946542        0.000944826745        0.000083253008
       -0.001453065687        0.000249483273       -0.000194390979
        0.000370071103        0.000328142273       -0.000594811431
        0.000983242907       -0.000247664001       -0.000337676641
        0.000702749595       -0.000531050917       -0.000068247339
       -0.000913913436       -0.000822599342       -0.000519543480
        0.000657300149       -0.001239306947        0.000033192915
        0.000763780031        0.000151892085       -0.000106941733
       -0.000111349513        0.000591872099        0.000360147787
        0.000283007739        0.000537032161        0.000183614425
       -0.001766985000        0.001017499281       -0.000870068723
        0.001560592306       -0.000636221326        0.001124910644
       -0.000596019125        0.001094375746        0.000048984716

Kinetic energy    :        0.030613110934
Electronic energy :      -60.105483063648
Total energy      :      -60.074869952714
Conservation      :       -0.000000051487
self.forces:
       -0.000751933584       -0.004126331042       -0.004033882094
       -0.034302855990        0.029127675777       -0.007001211293
        0.037731564948       -0.009915059812        0.020878531238
       -0.109763802365       -0.102520873021       -0.034608644850
        0.033373305433        0.018949006487       -0.015434320612
        0.078807110369        0.101440274624        0.031960385836
        0.027868883444       -0.012844760956        0.009625682828
        0.011817203866       -0.011548503873        0.038027933611
       -0.016951256413       -0.005848802217       -0.020755575427
        0.002823354740        0.003214778324       -0.005974478408
       -0.012101005124       -0.007850077809        0.000381372379
        0.001958908572       -0.006446492464        0.003077496955
        0.005728700900       -0.005220923285       -0.001710604936
       -0.006358072353       -0.016410380723       -0.003938145281
        0.000121143012        0.012930928986       -0.005592639661
        0.008318475112        0.004530628154        0.009023640965
       -0.010548513939       -0.005006070272        0.008756275583
       -0.000601535778       -0.003075790288       -0.006209965764
       -0.002729816846       -0.003390850759       -0.001421821138
       -0.023967939963        0.000603482820       -0.016983439682
       -0.006731466272        0.010586445711       -0.001984503303
        0.009694983786        0.008555900046        0.002598629870
        0.006564564487        0.004265795793        0.001319282998
state for next step: 1

    time: 0.2fs
coordinates
C         3.952444338331        0.353499658087        0.155475597879
C         2.898759709487        0.271561183058        2.878962426315
C         0.377507660095        1.575527713456        2.766723501812
N        -0.435656339866        0.616843403256        0.264424997127
C         1.700335308734        1.369156629701       -1.411382740946
C        -2.337147095089       -0.967913098150       -0.045537023463
C        -3.526272967903       -1.434075863003       -2.507321890479
C         1.622297308900        0.380583237194       -4.021983342405
O        -3.540891745414       -1.784144627448        2.005202557948
H         4.590691590007       -1.467822752968       -0.627674161136
H         5.486618188590        1.704246328926        0.014750183919
H         2.660849255805       -1.743362985878        3.501798747714
H         4.277029595067        1.121286334364        4.194254865266
H         0.568970284045        3.642407977900        2.660909012456
H        -1.014510536177        1.242297828699        4.266572018582
H        -3.406669591714        0.378282552422       -3.550366695442
H        -5.529437662690       -2.075200692969       -2.212384192799
H        -2.490387114770       -2.906665564518       -3.579439523150
H         1.675087738572        3.514639806992       -1.458115996333
H        -0.116965875674        1.068581149519       -5.163647181683
H         1.470748269634       -1.655673714451       -4.142927345712
H         3.361564138064        1.115048483423       -4.937771405417
H        -5.134728946067       -2.640023263298        1.567623789643
velocities
        0.000241908669        0.000039611121       -0.000250932377
       -0.000163805243       -0.000115366290       -0.000017375326
       -0.000047784448        0.000248119899       -0.000074616012
        0.000272673498       -0.000017362735        0.000399681421
       -0.000326634443       -0.000254296236        0.000120448584
       -0.000094363714        0.000239927614       -0.000271069374
        0.000122625277        0.000053803004       -0.000088144918
       -0.000112099948       -0.000143815691        0.000140925518
       -0.000020483349       -0.000161160777        0.000050721656
        0.000277228119        0.000550968890       -0.000249788972
        0.000308946542        0.000944826745        0.000083253008
       -0.001453065687        0.000249483273       -0.000194390979
        0.000370071103        0.000328142273       -0.000594811431
        0.000983242907       -0.000247664001       -0.000337676641
        0.000702749595       -0.000531050917       -0.000068247339
       -0.000913913436       -0.000822599342       -0.000519543480
        0.000657300149       -0.001239306947        0.000033192915
        0.000763780031        0.000151892085       -0.000106941733
       -0.000111349513        0.000591872099        0.000360147787
        0.000283007739        0.000537032161        0.000183614425
       -0.001766985000        0.001017499281       -0.000870068723
        0.001560592306       -0.000636221326        0.001124910644
       -0.000596019125        0.001094375746        0.000048984716

Kinetic energy    :        0.030613110934
Electronic energy :      -60.105483063648
Total energy      :      -60.074869952714
Conservation      :       -0.000000051487
self.forces:
       -0.000751933584       -0.004126331042       -0.004033882094
       -0.034302855990        0.029127675777       -0.007001211293
        0.037731564948       -0.009915059812        0.020878531238
       -0.109763802365       -0.102520873021       -0.034608644850
        0.033373305433        0.018949006487       -0.015434320612
        0.078807110369        0.101440274624        0.031960385836
        0.027868883444       -0.012844760956        0.009625682828
        0.011817203866       -0.011548503873        0.038027933611
       -0.016951256413       -0.005848802217       -0.020755575427
        0.002823354740        0.003214778324       -0.005974478408
       -0.012101005124       -0.007850077809        0.000381372379
        0.001958908572       -0.006446492464        0.003077496955
        0.005728700900       -0.005220923285       -0.001710604936
       -0.006358072353       -0.016410380723       -0.003938145281
        0.000121143012        0.012930928986       -0.005592639661
        0.008318475112        0.004530628154        0.009023640965
       -0.010548513939       -0.005006070272        0.008756275583
       -0.000601535778       -0.003075790288       -0.006209965764
       -0.002729816846       -0.003390850759       -0.001421821138
       -0.023967939963        0.000603482820       -0.016983439682
       -0.006731466272        0.010586445711       -0.001984503303
        0.009694983786        0.008555900046        0.002598629870
        0.006564564487        0.004265795793        0.001319282998
state for next step: 1

out_file的内容如下:

23

C         3.952444338331        0.353499658087        0.155475597879
C         2.898759709487        0.271561183058        2.878962426315
C         0.377507660095        1.575527713456        2.766723501812
N        -0.435656339866        0.616843403256        0.264424997127
C         1.700335308734        1.369156629701       -1.411382740946
C        -2.337147095089       -0.967913098150       -0.045537023463
C        -3.526272967903       -1.434075863003       -2.507321890479
C         1.622297308900        0.380583237194       -4.021983342405
O        -3.540891745414       -1.784144627448        2.005202557948
H         4.590691590007       -1.467822752968       -0.627674161136
H         5.486618188590        1.704246328926        0.014750183919
H         2.660849255805       -1.743362985878        3.501798747714
H         4.277029595067        1.121286334364        4.194254865266
H         0.568970284045        3.642407977900        2.660909012456
H        -1.014510536177        1.242297828699        4.266572018582
H        -3.406669591714        0.378282552422       -3.550366695442
H        -5.529437662690       -2.075200692969       -2.212384192799
H        -2.490387114770       -2.906665564518       -3.579439523150
H         1.675087738572        3.514639806992       -1.458115996333
H        -0.116965875674        1.068581149519       -5.163647181683
H         1.470748269634       -1.655673714451       -4.142927345712
H         3.361564138064        1.115048483423       -4.937771405417
H        -5.134728946067       -2.640023263298        1.567623789643

23

C         3.952444338331        0.353499658087        0.155475597879
C         2.898759709487        0.271561183058        2.878962426315
C         0.377507660095        1.575527713456        2.766723501812
N        -0.435656339866        0.616843403256        0.264424997127
C         1.700335308734        1.369156629701       -1.411382740946
C        -2.337147095089       -0.967913098150       -0.045537023463
C        -3.526272967903       -1.434075863003       -2.507321890479
C         1.622297308900        0.380583237194       -4.021983342405
O        -3.540891745414       -1.784144627448        2.005202557948
H         4.590691590007       -1.467822752968       -0.627674161136
H         5.486618188590        1.704246328926        0.014750183919
H         2.660849255805       -1.743362985878        3.501798747714
H         4.277029595067        1.121286334364        4.194254865266
H         0.568970284045        3.642407977900        2.660909012456
H        -1.014510536177        1.242297828699        4.266572018582
H        -3.406669591714        0.378282552422       -3.550366695442
H        -5.529437662690       -2.075200692969       -2.212384192799
H        -2.490387114770       -2.906665564518       -3.579439523150
H         1.675087738572        3.514639806992       -1.458115996333
H        -0.116965875674        1.068581149519       -5.163647181683
H         1.470748269634       -1.655673714451       -4.142927345712
H         3.361564138064        1.115048483423       -4.937771405417
H        -5.134728946067       -2.640023263298        1.567623789643

23

C         3.952444338331        0.353499658087        0.155475597879
C         2.898759709487        0.271561183058        2.878962426315
C         0.377507660095        1.575527713456        2.766723501812
N        -0.435656339866        0.616843403256        0.264424997127
C         1.700335308734        1.369156629701       -1.411382740946
C        -2.337147095089       -0.967913098150       -0.045537023463
C        -3.526272967903       -1.434075863003       -2.507321890479
C         1.622297308900        0.380583237194       -4.021983342405
O        -3.540891745414       -1.784144627448        2.005202557948
H         4.590691590007       -1.467822752968       -0.627674161136
H         5.486618188590        1.704246328926        0.014750183919
H         2.660849255805       -1.743362985878        3.501798747714
H         4.277029595067        1.121286334364        4.194254865266
H         0.568970284045        3.642407977900        2.660909012456
H        -1.014510536177        1.242297828699        4.266572018582
H        -3.406669591714        0.378282552422       -3.550366695442
H        -5.529437662690       -2.075200692969       -2.212384192799
H        -2.490387114770       -2.906665564518       -3.579439523150
H         1.675087738572        3.514639806992       -1.458115996333
H        -0.116965875674        1.068581149519       -5.163647181683
H         1.470748269634       -1.655673714451       -4.142927345712
H         3.361564138064        1.115048483423       -4.937771405417
H        -5.134728946067       -2.640023263298        1.567623789643

修改:您可以将if grab_lines语句替换为:

if grab_lines: #in python 'is True' is implicit for many types.
    line = [line.split()[0]]+[str(float(element)*.529) for element in line.split()[1:]]+['\n']
    molecule_data.append('\t'.join(line))

line的中间部分称为list comprehension。如果其中任何一个真的不清楚,请问。

答案 1 :(得分:0)

grab_lines = False
with open('input','r') as atom_file:
    molecule_data = ['23\n\n']
    for line in atom_file:
        if line.startswith('coordinates'):
            grab_lines = True
            continue
        elif line.startswith('velocities'):
            grab_lines = False
            #if molecule_data:
                #just checks that we aren't appending an empty list.
                #molecule_data.append('\n')
                #whole_data.append(molecule_data)
                #molecule_data = ['23\n\n']
        if grab_lines: #in python 'is True' is implicit for many types.
            new_line = molecule_data.append(line)

with open('output','w') as out_file:
    for molecule in molecule_data:
        out_file.write(molecule)
        #out_file.write(''.join(molecule))