从文本文件中提取文本并以不同的格式写入

时间:2013-04-29 20:20:28

标签: python parsing text

您好我正在尝试从程序生成的文件中提取一些文本行,并使用python以不同的格式写入另一个文本文件。

这是我到目前为止所做的:

import os
import glob



path="D:\Programming\Python\Examples\Home\GainWizard\MassLynx\VxWorks\TargetRegistryFiles"
os.chdir(path)
print os.getcwd()
print os.listdir(path)


filelist = os.listdir(os.getcwd())
filelist = filter(lambda x: not os.path.isdir(x), filelist)
newest = max(filelist, key=lambda x: os.stat(x).st_mtime)

print newest
f = open(newest,'r')

data = f.readlines()
print data

这会将所有文本添加到列表

我拥有的是

Autotune Ion Energy:Fixed Ion Energy 1,2.000000,Autotune Ion Energy:Fixed Ion Energy      2,2.000000,Autotune Ion Energy:MS1-Neg Opt,0.3,Autotune Ion Energy:MS1-Pos Opt,-0.2,Autotune Ion Energy:MS2-Neg Opt,0.4,Autotune Ion Energy:MS2-Pos Opt,0.6,Autotune Ion Energy:MSMS Mode Fixed Ion Energy 1,0.500000,Autotune Ion Energy:MSMS Mode Fixed Ion Energy 2,2.000000,Autotune Ion Energy:OptimumValuesSet,true,Debug:Use old bunching method,true,Detector Gain Negative:High Gain,368.861012,Detector Gain Negative:Low Gain,73.523644,Detector Gain Negative:a,1.865677e-021,Detector Gain Negative:b,8.441605,Detector Gain Postitve:High Gain,613.662847,Detector Gain Postitve:Low Gain,124.065398,Detector Gain Postitve:a,4.973557e-021,Detector Gain Postitve:b,8.367407,DivertValve:ValveZone,0,Engineers Settings:MS1 DC Balance -,0.300000,Engineers Settings:MS1 DC Polarity,1,Engineers Settings:MS1 High Mass Position,174.000000,Engineers Settings:MS1 High Mass Resolution,1801.000000,Engineers Settings:MS1 Low Mass Position,519.000000,Engineers Settings:MS1 Low Mass Resolution,511.000000,Engineers Settings:MS1 Resolution Linearity,873.000000,Engineers Settings:MS2 DC Balance -,-0.200000,Engineers Settings:MS2 DC Polarity,0,Engineers Settings:MS2 High Mass Position,190.000000,Engineers Settings:MS2 High Mass Resolution,1744.000000,Engineers Settings:MS2 Low Mass Position,519.000000,Engineers Settings:MS2 Low Mass Resolution,514.000000,Engineers Settings:MS2 Resolution Linearity,857.000000,Engineers Settings:PIC MS Scan CE,4.000000,Engineers Settings:PIC Threshold Calc Scan Delay,3,Engineers Settings:PIC decreasing data points,3,Engineers Settings:PIC nonDefault Scan Speed,5000.000000,Engineers Settings:PMT Type,Hamamatsu,Engineers Settings:RF Offset Negative,0.000000,Engineers Settings:RF Offset Positive,0.000000,Failure:Gas failed state,OK,Failure:Leak detected state,Tripped,Fluidics:AcknowledgeCountThreshold,5,Fluidics:ActiveReservoir,2,Fluidics:Aspirate Rate,1000,Fluidics:Draw Rate,1000,Fluidics:Fill Volume,250,Fluidics:Flow Rate,10,Fluidics:Flow State,Waste,Fluidics:Inject-Flow Rate,400,Fluidics:Inject-MethodType,4,Fluidics:Inject-Pump Time1,5,Fluidics:Inject-Pump Time2,6,Fluidics:Inject-Pump Time3,10,Fluidics:Max Flow Rate,1500,Fluidics:Pending Active TimeOut,10,Fluidics:Pending Complete TimeOut,1200,Fluidics:Pending Response TimeOut,10,Fluidics:Power Cycle Delay,3.000000,Fluidics:Precompression Dispense Rate,300,Fluidics:Precompression Dispense Volume,30,Fluidics:Precompression Enable,TRUE,Fluidics:Precompression Max Fill Volume,280,Fluidics:Purge Delay Length,1,Fluidics:Refill Wait Time,60.000000,Fluidics:Sample Purge Count,0,Fluidics:Wash Purge Count,1,Instrument:Collision gas status,off,Instrument:EPC Version,Feb 15 2012,Instrument:Serial Number,QCA331,Instrument:Unique Name,,Ion Energy Settings:Fixed Ion Energy 1,3.000000,Ion Energy Settings:Fixed Ion Energy 2,3.000000,Maintenance Counters:DAYS_SINCE_LAST_SERVICE_THRESHOLD,0,Maintenance Counters:OPERATE_SWITCHES,28,Maintenance Counters:OPERATE_SWITCHES_THRESHOLD,0,Maintenance Counters:OPERATE_TIME,141233,Maintenance Counters:OPERATE_TIME_THRESHOLD,0,Maintenance Counters:POLARITY_SWITCHES,187,Maintenance Counters:POLARITY_SWITCHES_THRESHOLD,0,Maintenance Counters:VACUUM_TIME,763973,Maintenance Counters:VACUUM_TIME_THRESHOLD,0,Protective Actions:ENABLE_DIVERT_TO_WASTE,1,Scan Parameters:Interchannel Delay,0.020000,Scan Parameters:Interscan Delay,0.020000,Scan Parameters:Manual Mode,true,Scan Parameters:Polarity Switching Interscan Delay,0.020000,Scan Parameters:Scan Speed Options,1000\,2000\,5000\,10000,Scan speed adjust::DefaultsVersionLevel,2,Scan speed adjust:HIGH_SCALE_MASS_ADJUST_MS1_SETTING,-60.000000,Scan speed adjust:HIGH_SCALE_MASS_ADJUST_MS2_SETTING,-32.000000,Scan speed adjust:ION_ENERGY_1_RAMP_SETTING,2.000000,Scan speed adjust:ION_ENERGY_2_RAMP_SETTING,2.000000,Scan speed adjust:LINEARITY_ADJUST_MS1_SETTING,0.000000,Scan speed adjust:LINEARITY_ADJUST_MS2_SETTING,0.000000,Scan speed adjust:LOW_MASS_RESOLUTION_MS1_SETTING,10.000000,Scan speed adjust:LOW_MASS_RESOLUTION_MS2_SETTING,20.000000,Scan speed adjust:LOW_SCALE_MASS_ADJUST_MS1_SETTING,-15.000000,Scan speed adjust:LOW_SCALE_MASS_ADJUST_MS2_SETTING,-15.000000,Scan speed adjust:MS1_ION_ENERGY_SETTING,1.000000,Scan speed adjust:MS1_ION_ENERGY_WRITE_SETTING,1.000000,Scan speed adjust:MS2_ION_ENERGY_SETTING,0.700000,Scan speed adjust:MS2_ION_ENERGY_WRITE_SETTING,0.700000,Scan speed adjust:RESOLUTION_ADJUST_MS1_SETTING,-15.000000,Scan speed adjust:RESOLUTION_ADJUST_MS2_SETTING,0.000000

我需要的是

START_TARGET_REGISTRY
Detector Gain Negative:a,1.087668e-021
Detector Gain Negative:b,8.536190
Detector Gain Negative:High Gain,392.233021 
Detector Gain Negative:Low Gain,76.782164
Detector Gain Postitve:a,4.061385e-021 
Detector Gain Postitve:b,8.398445
Detector Gain Postitve:High Gain,610.368775
Detector Gain Postitve:Low Gain,122.669833
END_TARGET_REGISTRY

由于

1 个答案:

答案 0 :(得分:0)

有些事情并不十分清楚,比如你是否需要更多的参数而不仅仅是“探测器增益”或者数字来自哪里(因为它们没有出现在你的例子中)。

然而,这可能会让你到达你需要的地方:

from collections import OrderedDict

D = OrderedDict()
for field in data.split(','):    
    if ':' in field:
        k = field
    else:
        D[k]= field.strip()

with open(r"C:\temp\detector_gain.txt", 'w') as outfile:
    print("START_TARGET_REGISTRY", file=outfile)
    for k, v in D.items():
        if "Detector Gain" in k:
           print(k, v, sep=',', file=outfile)
    print("END_TARGET_REGISTRY", file=outfile)

由于数据格式似乎为CATEGORY_1:KEY_1,VALUE_1,CATEGORY_2:KEY_2,VALUE_2...,我们会使用split方法将数据分解为每个逗号的字段。

然后我们遍历每个字段,寻找一个:字符,告诉我们我们正在阅读CATEGORY:KEY字段。

获得CATEGORY:KEY字段后,我们知道下一个字段将是关联值。因此,我们将其添加到Python字典中,该字典将键映射到值。我选择了OrderedDict字典,以防字段的顺序很重要。

最后,我们读完了我们构建的字典,寻找“探测器增益”字段。然后我们将它们打印到一个outfile - 你可以看到我们如何用上下文管理器打开它。

如果您使用的是Python 2,请执行from __future__ import print_function