使用正则表达式提取数据:Python

时间:2016-10-18 16:41:31

标签: regex python-2.7 data-extraction

此问题的基本概要是读取文件,使用re.findall()查找整数,查找[0-9]+的正则表达式,然后将提取的字符串转换为整数并汇总整数

我在添加列表时遇到了麻烦。从我的下面的代码,它只是附加行的第一个(0)索引。请帮我。谢谢。

import re
hand = open ('a.txt')
lst = list()
for line in hand:
    line = line.rstrip()
    stuff = re.findall('[0-9]+', line) 
    if len(stuff)!= 1  : continue
    num = int (stuff[0])
    lst.append(num)
print sum(lst)

2 个答案:

答案 0 :(得分:0)

太好了,谢谢你加入整个txt文件!你的主要问题是在if len(stuff)...行中,如果stuff中没有任何东西,并且有2,3等等,那么就会跳过这一行。您只保留stuff长度为1的列表。我在代码中添加了注释,但如果不清楚,请询问任何问题。

import re
hand = open ('a.txt')
str_num_lst = list()
for line in hand:
    line = line.rstrip()
    stuff = re.findall('[0-9]+', line)
    #If we didn't find anything on this line then continue
    if len(stuff) == 0: continue
    #if len(stuff)!= 1: continue #<-- This line was wrong as it skip lists with more than 1 element

    #If we did find something, stuff will be a list of string:
    #(i.e. stuff = ['9607', '4292', '4498'] or stuff = ['4563'])
    #For now lets just add this list onto our str_num_list
    #without worrying about converting to int.
    #We use '+=' instead of 'append' since both stuff and str_num_lst are lists
    str_num_lst += stuff

#Print out the str_num_list to check if everything's ok
print str_num_lst

#Get an overall sum by looping over the string numbers in the str_num_lst
#Can convert to int inside the loop
overall_sum = 0
for str_num in str_num_lst:
    overall_sum += int(str_num)

#Print sum
print 'Overall sum is:'
print overall_sum

编辑:

你是对的,读取整个文件,因为一行是一个很好的解决方案,并不难做到。查看this post。这是代码的样子。

import re

hand = open('a.txt')
all_lines = hand.read() #Reads in all lines as one long string
all_str_nums_as_one_line = re.findall('[0-9]+',all_lines)
hand.close() #<-- can close the file now since we've read it in

#Go through all the matches to get a total
tot = 0
for str_num in all_str_nums_as_one_line:
    tot += int(str_num)

print 'Overall sum is:',tot

答案 1 :(得分:0)

import re
ls=[];
text=open('C:/Users/pvkpu/Desktop/py4e/file1.txt');
for line in text:
    line=line.rstrip();
    l=re.findall('[0-9]+',line);
    if len(l)==0:
        continue
    ls+=l
for i in range(len(ls)):
    ls[i]=int(ls[i]);
print(sum(ls));