Python:如何匹配字符串 - 仅当下一行有给定字符串时?

时间:2012-04-05 15:26:23

标签: python regex

我有一个文本文件,如下所示:

node13 
    state = free 
np = 8 
properties = beta,eightcores 
ntype = cluster 
status = opsys=linux,uname=Linux node13 2.6.27.19-5-default #1 SMP 2009-02-28 04:40:21 +0100 x86_64,sessions=? 15201,nsessions=? 01,nusers=0,idletime=6837317,totmem=20506268kb,availmem=20259728kb,physmem=20506268kb,ncpus=8,loadave=0.00,gres=,netload=17130666575,se=free,jobs=,varattr=,rectime=1333639375 

node14 
    state = job-exclusive 
np = 8 
properties = beta,eightcores 
ntype = cluster

我想只在节点空闲时抓取节点。为此,我必须制作一个匹配node(..)的正则表达式,仅当以下行有state = free时。你能帮我解决这个问题吗?

修改

到目前为止没有任何作用。可能是因为我没有在文件中阅读,而是

proc = subprocess.Popen("pbsnodes", stdout=subprocess.PIPE)
listOfFreeNodes = proc.stdout.read()

它可能会对解决方案产生什么影响吗?这是完整的pbsnodes输出:

node01                                                   
     state = free                                        
     np = 8                                              
     properties = alpha,eightcores                       
     ntype = cluster                                     
     status = opsys=linux,uname=Linux node01 2.6.27.19-5-01,nusers=0,idletime=861913,totmem=16432576kb,availmem=16=free,jobs=,varattr=,rectime=1333641123                  

node02                                                   
     state = free                                        
     np = 8                                              
     properties = alpha,eightcores                       
     ntype = cluster                                     
     status = opsys=linux,uname=Linux node02 2.6.27.19-5-nusers=2,idletime=5357510,totmem=16432576kb,availmem=1617ree,jobs=,varattr=,rectime=1333641107                    

node03                                                   
     state = free                                        
     np = 8                                              
     properties = alpha,eightcores                       
     ntype = cluster                                     
     status = opsys=linux,uname=Linux node03 2.6.27.19-5-s=1,idletime=8564681,totmem=16432576kb,availmem=16029924kobs=60966.hpchead.linux,varattr=,rectime=1333641119      

node04                                                   
     state = free                                        
     np = 8                                              
     properties = alpha,eightcores                       
     ntype = cluster                                     
     status = opsys=linux,uname=Linux node04 2.6.27.19-5-01,nusers=0,idletime=8564678,totmem=16432576kb,availmem=1e=free,jobs=,varattr=,rectime=1333641124                 

node05                                                   
     state = free                                        
     np = 8                                              
     properties = alpha,eightcores                       
     ntype = cluster                                     
     status = opsys=linux,uname=Linux node05 2.6.27.19-5-01,nusers=0,idletime=2072593,totmem=16432652kb,availmem=1=free,jobs=,varattr=,rectime=1333641091                  

node06                                                   
     state = free                                        
     np = 8                                              
     properties = alpha,eightcores                       
     ntype = cluster                                     
     status = opsys=linux,uname=Linux node06 2.6.27.19-5-s=1,idletime=9038,totmem=16432576kb,availmem=16200960kb,p,varattr=,rectime=1333641096                             

node07                                                   
     state = free                                        
     np = 8                                              
     properties = alpha,eightcores                       
     ntype = cluster                                     
     status = opsys=linux,uname=Linux node07 2.6.27.19-5-s=1,idletime=8564671,totmem=16432576kb,availmem=16173848kobs=,varattr=,rectime=1333641134                         

node08                                                   
     state = free                                        
     np = 8                                              
     properties = alpha,eightcores                       
     ntype = cluster                                     
     status = opsys=linux,uname=Linux node08 2.6.27.19-5- 21356,nsessions=5,nusers=1,idletime=8564604,totmem=1643219260329746,state=free,jobs=,varattr=,rectime=1333641095 

node09                                                   
     state = free                                        
     np = 8                                              
     properties = alpha,eightcores                       
     ntype = cluster                                     
     status = opsys=linux,uname=Linux node09 2.6.27.19-5-01,nusers=0,idletime=8564648,totmem=16432552kb,availmem=1e=free,jobs=,varattr=,rectime=1333641126                 

node10                                                   
     state = free                                        
     np = 8                                              
     properties = alpha,eightcores                       
     ntype = cluster                                     
     status = opsys=linux,uname=Linux node10 2.6.27.19-5-2,nsessions=5,nusers=1,idletime=6821493,totmem=16432552kb036941,state=free,jobs=,varattr=,rectime=1333641133      

node11                                                   
     state = free                                        
     np = 8                                              
     properties = alpha,eightcores                       
     ntype = cluster                                     
     status = opsys=linux,uname=Linux node11 2.6.27.19-5-01,nusers=0,idletime=8564599,totmem=16432556kb,availmem=1e=free,jobs=,varattr=,rectime=1333641120                 

node12                                                   
     state = free                                        
     np = 8                                              
     properties = alpha,eightcores                       
     ntype = cluster                                     
     status = opsys=linux,uname=Linux node12 2.6.27.19-5-01,nusers=0,idletime=8564627,totmem=16432556kb,availmem=1e=free,jobs=,varattr=,rectime=1333641121                 

node13                                                   
     state = free                                        
     np = 8                                              
     properties = beta,eightcores                        
     ntype = cluster                                     
     status = opsys=linux,uname=Linux node13 2.6.27.19-5-01,nusers=0,idletime=6839072,totmem=20506268kb,availmem=2e=free,jobs=,varattr=,rectime=1333641130                 

node14                                                   
     state = job-exclusive                               
     np = 8                                              
     properties = beta,eightcores                        
     ntype = cluster                                     
     jobs = 0/66481.hpchead.linux, 1/66481.hpchead.linux,chead.linux, 6/66481.hpchead.linux, 7/66481.hpchead.linux
     status = opsys=linux,uname=Linux node14 2.6.27.19-5-,nusers=1,idletime=8568052,totmem=24635060kb,availmem=206free,jobs=66481.hpchead.linux,varattr=,rectime=1333641132

node15                                                   
     state = job-exclusive                               
     np = 8                                              
     properties = beta,eightcores                        
     ntype = cluster                                     
     jobs = 0/66482.hpchead.linux, 1/66482.hpchead.linux,chead.linux, 6/66482.hpchead.linux, 7/66482.hpchead.linux
     status = opsys=linux,uname=Linux node15 2.6.27.19-5-,nusers=1,idletime=8567636,totmem=24635012kb,availmem=212free,jobs=66482.hpchead.linux,varattr=,rectime=1333641092

node16                                                   
     state = job-exclusive                               
     np = 8                                              
     properties = beta,eightcores                        
     ntype = cluster                                     
     jobs = 0/66481.hpchead.linux, 1/66481.hpchead.linux,chead.linux, 6/66481.hpchead.linux, 7/66481.hpchead.linux
     status = opsys=linux,uname=Linux node16 2.6.27.19-5-=1,idletime=8564418,totmem=24634928kb,availmem=20700104kbbs=66481.hpchead.linux,varattr=,rectime=1333641117       

node17                                                   
     state = job-exclusive                               
     np = 8                                              
     properties = beta,eightcores                        
     ntype = cluster                                     
     jobs = 0/66482.hpchead.linux, 1/66482.hpchead.linux,chead.linux, 6/66482.hpchead.linux, 7/66482.hpchead.linux
     status = opsys=linux,uname=Linux node17 2.6.27.19-5-s=1,idletime=6824915,totmem=24634928kb,availmem=20598068kbs=66482.hpchead.linux,varattr=,rectime=1333641113       

node21                                                   
     state = job-exclusive                               
     np = 12                                             
     properties = blade                                  
     ntype = cluster                                     
     jobs = 0/66483.hpchead.linux, 1/66483.hpchead.linux,chead.linux, 6/66483.hpchead.linux, 7/66483.hpchead.linux.hpchead.linux                                           
     status = opsys=linux,uname=Linux node21 2.6.27.19-5-,nusers=1,idletime=8569176,totmem=26790348kb,availmem=203e=free,jobs=66483.hpchead.linux,varattr=,rectime=13336411

node22                                                   
     state = job-exclusive                               
     np = 12                                             
     properties = blade                                  
     ntype = cluster                                     
     jobs = 0/66475.hpchead.linux, 1/66475.hpchead.linux,chead.linux, 6/66475.hpchead.linux, 7/66475.hpchead.linux.hpchead.linux                                           
     status = opsys=linux,uname=Linux node22 2.6.27.19-5-users=1,idletime=8569178,totmem=26790348kb,availmem=21384free,jobs=66475.hpchead.linux,varattr=,rectime=1333641118

node23                                                   
     state = job-exclusive                               
     np = 12                                             
     properties = blade
     ntype = cluster
     jobs = 0/66484.hpchead.linux, 1/66484.hpchead.linux, 2/66484.hpchead.linux, 3/66484.hpchead.linux, 4/66484.hpchead.linux, 5/66484.hpchead.linux, 6/66484.hpchead.linux, 7/66484.hpchead.linux, 8/66484.hpchead.linux, 9/66484.hpchead.linux, 10/66484.hpchead.linux, 11/66484.hpchead.linux
     status = opsys=linux,uname=Linux node23 2.6.27.19-5-default #1 SMP 2009-02-28 04:40:21 +0100 x86_64,sessions=10309 10370,nsessions=2,nusers=1,idletime=8569255,totmem=26790348kb,availmem=20165484kb,physmem=24685876kb,ncpus=12,loadave=12.01,gres=,netload=21742922098,state=free,jobs=66484.hpchead.linux,varattr=,rectime=1333641120

node24
     state = job-exclusive
     np = 12
     properties = blade
     ntype = cluster
     jobs = 0/66485.hpchead.linux, 1/66485.hpchead.linux, 2/66485.hpchead.linux, 3/66485.hpchead.linux, 4/66485.hpchead.linux, 5/66485.hpchead.linux, 6/66485.hpchead.linux, 7/66485.hpchead.linux, 8/66485.hpchead.linux, 9/66485.hpchead.linux, 10/66485.hpchead.linux, 11/66485.hpchead.linux
     status = opsys=linux,uname=Linux node24 2.6.27.19-5-default #1 SMP 2009-02-28 04:40:21 +0100 x86_64,sessions=11157 11218,nsessions=2,nusers=1,idletime=8569254,totmem=26790348kb,availmem=21489804kb,physmem=24685876kb,ncpus=12,loadave=12.05,gres=,netload=18486923435,state=free,jobs=66485.hpchead.linux,varattr=,rectime=1333641114

node25
     state = job-exclusive
     np = 12
     properties = blade
     ntype = cluster
     jobs = 0/66469.hpchead.linux, 1/66469.hpchead.linux, 2/66469.hpchead.linux, 3/66469.hpchead.linux, 4/66469.hpchead.linux, 5/66469.hpchead.linux, 6/66469.hpchead.linux, 7/66469.hpchead.linux, 8/66469.hpchead.linux, 9/66469.hpchead.linux, 10/66469.hpchead.linux, 11/66469.hpchead.linux
     status = opsys=linux,uname=Linux node25 2.6.27.19-5-default #1 SMP 2009-02-28 04:40:21 +0100 x86_64,sessions=6711 6772,nsessions=2,nusers=1,idletime=8569282,totmem=26790348kb,availmem=21082316kb,physmem=24685876kb,ncpus=12,loadave=12.00,gres=,netload=15199518313,state=free,jobs=66469.hpchead.linux,varattr=,rectime=1333641095

修改

感谢所有回答的人。

6 个答案:

答案 0 :(得分:4)

这应该返回正确的节点值

r'node\d+(?=[^\n]*\n\s*state\s*=\s*free)'

这使用积极的先行来窥视线的末尾,但不捕捉它找到的任何东西。它只匹配节点值。

l = re.findall(r'node\d+(?=[^\n]*\n\s*state\s*=\s*free)', s)
print l
>>> ['node13']

编辑:受到@hexparrot评论的启发,我意识到有一种更简单的方法。这个正则表达式r'node\d+(?=\s*state\s*=\s*free)'更简单,也可以工作,即使它没有显式搜索换行符(因为\s包含EOL字符)。但是......它也不能保证{<1}}可以在后续行上找到,如OP的要求中所述。它也会在同一行上匹配state=free。因此,明确地搜索node99 state=free更符合OP的要求。

答案 1 :(得分:3)

如果依赖于生成的文件是可靠构造的(例如,遵循与您所示相同的格式),正则表达式有时会比必要的要大一些。

因此,这是一种使用简单迭代的方法:

with open('yourfile.txt', 'r') as fp:
    node_dict = {}
    node = None
    for line in fp:
        if line[0:4] == 'node':
            node = line.strip()
            node_dict[node] = 0
        elif "state" in line:
            node_dict[node] = line.split('=')[1].strip()

print node_dict

返回

{'node13': 'free', 'node14': 'job-exclusive'}

然后很容易获得“免费”节点:

>>> print [k for k,v in node_dict.items() if v == 'free']
['node13']

答案 2 :(得分:2)

我建议先将文本解析为python结构,然后再操作该结构。正则表达式太复杂,太脆弱,无法完成这项工作。考虑:

doc = """
node13 
    state = free 
np = 8 
properties = beta,eightcores 
ntype = cluster 
status = opsys=linux,uname=Linux node13 2.6.27.19-5-default etc

node14 
    state = job-exclusive 
np = 8 
properties = beta,eightcores 
ntype = cluster
"""

data = {}
lastkey = None
for line in map(str.strip, doc.splitlines()):
    if ' = ' in line and lastkey:
        k, v = line.split(' = ', 1)
        data[lastkey][k] = v
    elif len(line):
        lastkey = line
        data[lastkey] = {}

这会创建一个这样的字典:

{'node13': {'np': '8',
            'ntype': 'cluster',
            'properties': 'beta,eightcores',
            'state': 'free',
            'status': 'opsys=linux,uname=Linux node13 2.6.27.19-5-default etc'},
 'node14': {'np': '8',
            'ntype': 'cluster',
            'properties': 'beta,eightcores',
            'state': 'job-exclusive'}}

你可以用普通的python方式操作:

 free_nodes = [v for v in data.values() if v['state'] == 'free']

答案 3 :(得分:1)

您可以使用re.DOTALL标记,以便.匹配包括换行符在内的所有内容。这是一个样本

>>> st="""
node13 
    state = free 
np = 8 
properties = beta,eightcores 
ntype = cluster 
status = opsys=linux,uname=Linux node13 2.6.27.19-5-default #1 SMP 2009-02-28 04:40:21 +0100 x86_64,sessions=? 15201,nsessions=? 01,nusers=0,idletime=6837317,totmem=20506268kb,availmem=20259728kb,physmem=20506268kb,ncpus=8,loadave=0.00,gres=,netload=17130666575,se=free,jobs=,varattr=,rectime=1333639375 

node14 
    state = job-exclusive 
np = 8 
properties = beta,eightcores 
ntype = cluster
"""

>>> re.findall("(node\d+).*?state.*?free",st,re.DOTALL)
['node13']

请注意,这也可以在没有正则表达式的情况下完成

>>> stlines=st.splitlines()
>>> [stlines[i]  for i in xrange(0,len(stlines)-1) if stlines[i+1].partition("=")[-1].strip() == 'free']
['node13']
>>> 

请注意*** 如果你需要一个更强大的正则表达式,正如弗朗西斯在他的例子中所示,你可以使用下面的

>>> re.findall("(node\d+).*?state[ ]*=[ ]*free",st,re.DOTALL)
['node13']
>>> 

答案 4 :(得分:1)

我同意@ thg435,正则表达式对于这项工作来说太强大了。我更喜欢一个非常简单的解决方案:

lines = data.split('\n')
num_lines = len(lines)
[lines[i] for i in range(numlines - 1) if 'state = free' in lines[i+1]]

这确实捕获了你想要做的事情的本质:如果下一行(lines[i+1])包含所需的文本,当前行(可能是节点的名称)将进入列表。

答案 5 :(得分:1)

向后看往往比向前看更容易。因此,当下一行包含某些内容时,不要考虑获取当前行;当当前行包含某些内容时,您希望获取上一个行。以这些术语表示,很容易构思和实施:

def find_free_node(doc):
    prevline = ""
    for line in doc.splitlines():
       if line.strip() == "state = free" and previine.startswith("node"):
           return prevline.strip()
       prevline = line

另一种方法是跟踪您所在的节点而不是前一行。即使state = free行没有紧跟节点名称行,也会有效。

def find_free_node(doc):
    node = ""
    for line in doc.splitlines():
        if line.startswith("node"):
            node = line.strip()
        elif line.strip() = "state = free" and node:
            return node

对我而言,这些比基于多线正则表达式的解决方案要清晰得多。