Question

我正在编写一个程序来从文本文件中提取数据（New Revision: 39772）（mbox.txt link - google drive link for file ）

我使用普通方法完成了任务，但我希望使用re.findall方法来完成。

import re
print "Please enter file path only"
text_file = raw_input ("Enter the file name:")
print "Trying to open the file that you have entered"
try:
    open_file = open ( text_file )
    print "Text file " + text_file + " is opened"

except:
    print "File not found"
    raise SystemExit
# using normal method     
count = 0
total = 0.0
using regular expresion
for line in open_file: 
    if 'New Revision:' in line:   
        print line
        total += float(line.split()[-1])
        count = count + 1
        Avg = total/count
print "The number of line with 'New Revision:' is:", count
print "The total of the floating point numbers at the end of the 'New   Revision:'is:", total
print "Average:",round(Avg,1)

#using findall()method 

numlist = [];
for line in open_file:
   line = line.rstrip()
   Extract_data = re.findall('^New Revision:([0-9]+)',line)
   number = int(Extract_data[0])
   numlist.append(Extract_data)

print numlist

我想在New Revision: 39772末尾提取数字，并使用re.findall方法将其保存到列表中。到目前为止，我已阅读本网站上的所有可用文档，但我无法理解如何做到这一点并输出错误。

Answer 1

使用以下正则表达式

reg = r'^New Revision:\s([0-9]+)'

在使用正则表达式时，缺少空格并使用原始字符串。

使用re.findall方法提取数据

1 个答案: