使用Python从文本文件中删除行

时间:2015-02-02 22:37:51

标签: python regex

我正在处理一个非常大的日志文件,以使用Python正则表达式提取信息。但是,我想在找到特定字符串后才处理所有行,在本例中为Starting time loop。日志文件的最小版本如下:

Pstream initialized with:
floatTransfer      : 0
nProcsSimpleSum    : 0
commsType          : nonBlocking
polling iterations : 0
sigFpe : Enabling floating point exception trapping (FOAM_SIGFPE).
fileModificationChecking : Monitoring run-time modified files using timeStampMaster
allowSystemOperations : Disallowing user-supplied system call operations

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Create time

Create mesh for time = 0


PIMPLE: Operating solver in PISO mode


Reading g

Reading relaxProperties
Reading field p_rgh

Reading field alpha1

Reading field Urel

Reading/calculating face flux field phi

Reading transportProperties

Selecting incompressible transport model Newtonian
Selecting incompressible transport model Newtonian
Selecting turbulence model type LESModel
Selecting LES turbulence model Smagorinsky
Selecting LES delta type vanDriest
Selecting LES delta type cubeRootVol
SmagorinskyCoeffs
{
   ce              1.048;
   ck              0.02;
}

Reading STFProperties

Calculating field g.h

time step continuity errors : sum local = 8.4072346e-06, global = -1.5271655e-21, cumulative = -1.5271655e-21
GAMGPCG:  Solving for pcorr, Initial residual = 1, Final residual = 4.7194845e-06, No Iterations 9
GAMGPCG:  Solving for pcorr, Initial residual = 0.13716381, Final residual = 2.9068099e-06, No Iterations 6
time step continuity errors : sum local = 1.3456802e-10, global = -6.7890391e-13, cumulative = -6.7890392e-13
Courant Number mean: 0.021611246 max: 0.39023401
fieldAverage fieldAverage1:
Starting averaging at time 0


Starting time loop

Courant Number mean: 0.02156811 max: 0.3894551
Interface Courant Number mean: 0 max: 0
deltaT = 0.00022522523
Time = 0.000225225

目前,测试脚本如下:

logf = open(logName, 'r')
p = logf.tell()
logf.seek(0, 0)
for l in logf:
    if l.startswith('Starting time loop'):
        print l

但是,print l会打印日志文件中的所有行。请注意,日志文件以logf

打开

3 个答案:

答案 0 :(得分:4)

python迭代器(文件对象所属)的好处在于它们保持状态,所以如果你有两个for循环,第二个循环在第一个停止时开始。这导致以下传统模式:

for line in logf:
   if <some condition>
       break

for line in logf:
   process lines after that one

另一种更简洁的方法是itertools.dropwhile

答案 1 :(得分:1)

下面的代码一次读取一行。到达文件末尾时,line为空字符串,循环中断。

with open('your_file.txt', 'r') as opened_file:

    while True:
        line = opened_file.readline()
        if not line:
            break       

        else:
            # Your code goes here
            if line.startswith('Starting time loop'):
                print line

                break

如果您使用with open()可能会更好,因为它在完成后会自动关闭文件。

答案 2 :(得分:1)

如果没有看到打开日志文件的确切方式,很难对您的小脚本提供良好的反馈。

但是,这是一个按照您的要求运行的小脚本:

#!/usr/bin/env python
logfile = 'logfile'

start_line = 'Starting time loop'
started = False

with open(logfile) as f:
  for l in f.readlines():
    if l.startswith(start_line):
      started = True
    if started:
      print l.strip()

以下是一个示例日志文件:

$ cat logfile
This is the first line
This is the 2nd line

This is the 3rd non-blank line

Starting time loop and here we go

Here are some more lines
and some more
yadda yadda yadda
yadda yadda yadda
yadda yadda yadda
...
And.. we're done

最后,这是小日志脚本的运行:

$ ./log.py
Starting time loop and here we go

Here are some more lines
and some more
yadda yadda yadda
yadda yadda yadda
yadda yadda yadda
...
And.. we're done