Question

我希望使用Python遍历文件并从两个特定的数据列中提取数据，示例数据如下。

----------------------------------
Local Cell ID  Cell Name                        Physical cell ID  Additional spectrum emission  Cell active state  Cell admin state  Cell middle block timer(min)  Cell FDD TDD indication  Subframe assignment  Special subframe patterns  

11             12345678912345678912345678912    427               1                             Active             Unblock           NULL                          TDD                      SA2                  SSP6                       
12             12345678912345678912345678912    130               1                             Active             Unblock           NULL                          TDD                      SA2                  SSP6                       
14             12345678912345678912345678912    94                1                             Active             Unblock           NULL                          TDD                      SA2                  SSP6                       
15             12345678912345678912345678912    37                1                             Active             Unblock           NULL                          TDD                      SA2                  SSP6                       
21             12345678912345678912345678912    188               1                             Active             Unblock           NULL                          TDD                      SA2                  SSP6                       
22             12345678912345678912345678912    203               1                             Active             Unblock           NULL                          TDD                      SA2                  SSP6                       
24             12345678912345678912345678912    209               1                             Active             Unblock           NULL                          TDD                      SA2                  SSP6                       
25             12345678912345678912345678912    230               1                             Active             Unblock           NULL                          TDD                      SA2                  SSP6                       
(Number of results = 8)


---    END

我已经使用以下脚本将每一行拉入一个特定值，但是我想知道是否有可能仅将“ Cell Name”和“ Physical Cell ID”下的数据拉至第4行的12345678912345678912345678912和427。

signal = open('signal.txt', 'r') 
newFile = open('results2.txt', 'w') 
for line in signal: 
    if 'False' in line: 
        print('.', end="") 
        newFile.write(line) 
    else: 
        print(" ", end="") 
newFile.close() 
signal.close() 
print('Done')

Answer 1

@ J.Byrne，另一种方法是使用pandas数据框read_csv提取数据（忽略第1行和底部数据，添加列名），然后选择列您感兴趣的。

查看此代码以提取：

import pandas as pd
df=pd.read_csv('signal.txt', skiprows=2,skipfooter=4, sep='\s+', 
            names=[
                'Local Cell ID',  
                'Cell Name',                        
                'Physical cell ID',  
                'Additional spectrum emission',  
                'Cell active state',  
                'Cell admin state',  
                'Cell middle block timer(min)',  
                'Cell FDD TDD indication',  
                'Subframe assignment',  
                'Special subframe patterns'], 
            engine='python')

df

结果在这里：

Local Cell ID   Cell Name   Physical cell ID    Additional spectrum emission    Cell active state   Cell admin state    Cell middle block timer(min)    Cell FDD TDD indication Subframe assignment Special subframe patterns
0   11  12345678912345678912345678912   427 1   Active  Unblock NaN TDD SA2 SSP6
1   12  12345678912345678912345678912   130 1   Active  Unblock NaN TDD SA2 SSP6
2   14  12345678912345678912345678912   94  1   Active  Unblock NaN TDD SA2 SSP6
3   15  12345678912345678912345678912   37  1   Active  Unblock NaN TDD SA2 SSP6
4   21  12345678912345678912345678912   188 1   Active  Unblock NaN TDD SA2 SSP6
5   22  12345678912345678912345678912   203 1   Active  Unblock NaN TDD SA2 SSP6
6   24  12345678912345678912345678912   209 1   Active  Unblock NaN TDD SA2 SSP6
7   25  12345678912345678912345678912   230 1   Active  Unblock NaN TDD SA2 SSP6

使用此过滤器：

df[["Cell Name","Physical cell ID"]]

结果在这里：

Cell Name   Physical cell ID
0   12345678912345678912345678912   427
1   12345678912345678912345678912   130
2   12345678912345678912345678912   94
3   12345678912345678912345678912   37
4   12345678912345678912345678912   188
5   12345678912345678912345678912   203
6   12345678912345678912345678912   209
7   12345678912345678912345678912   230

Answer 2

请参阅下面的另一种方法。您可以遍历txt文件signal.txt中的各行，然后调用搜索功能以获取CellName或PhysicalCellID。

import re
import pandas as pd
mydicts = []

def FindCellName(line):#create a function looking at each line
    CellName=None #empty the variable
    j=re.findall('\d{29}', line) #find string with 29 characters
    if len(j)>0:
            CellName=j[0] #if it exists assign it to CellName
    return CellName

def FindPhysicalCellID(line):#create a function looking at each line
    PhysicalCellID=None #empty the variable
    res= re.search('\d{29}(.*)               1', line) #find string after the 29 characters and before the 1
    if res:
            PhysicalCellID=res.group(1) #if it exists assign it to PhysicalCellID
    return PhysicalCellID

with open('signal.txt') as topo_file:
    for line in topo_file:
        if FindCellName(line) : #if CellName exists 
            mydicts.append((FindCellName(line), FindPhysicalCellID(line))) # append CellName and PhysicalCellID in the diction
    df=pd.DataFrame(mydicts, columns=('CellName', 'PhysicalCellID'))
df

结果如下：

CellName    PhysicalCellID
0   12345678912345678912345678912   427
1   12345678912345678912345678912   130
2   12345678912345678912345678912   94
3   12345678912345678912345678912   37
4   12345678912345678912345678912   188
5   12345678912345678912345678912   203
6   12345678912345678912345678912   209
7   12345678912345678912345678912   230

循环脚本从文本文件中提取特定数据

2 个答案: