Question

对于我的问题，我已经有了一个解决方案，但为了个人的改进，我想知道更好地解决同样的问题。

从课程开始，作为测验的一部分，我们在文本中查找以：

开头的行

来自stephen.marquard@uct.ac.za 2008年1月5日星期六09:14:16

并从文本中提取电子邮件字符串，但首先我们在找到匹配项时使用split（）。样本输出应为：

louis@media.berkeley.edu
louis@media.berkeley.edu
ray@media.berkeley.edu
cwen@iupui.edu
cwen@iupui.edu
cwen@iupui.edu
There were 27 lines in the file with From as the first word

这是我提出的用于作业的代码。我愿意提出更好的方法来编写我的程序来提取电子邮件字符串

import re

fname = raw_input("Enter file name: ")
if len(fname) < 1 : fname = "mbox-short.txt"

fh = open(fname)
count = 0
matches = []

for lines in fh :
    # look for specific characters in document text
    if not lines.startswith("From ") : continue
    # increment the count variable for each math found
    count += 1
    # append the required lines to the matches list
    matches.append(lines)
    # loop through the list to acess each line individually
    for email in matches :
        # place values in variable
        out = email
        # looking through each line for any email add found
        found = re.findall(r'[\w\.-]+@[\w\.-]+', out)
        # loop through the found emails and print them out
        for i in found :
            ans = i
    print ans       
    # print count
print "There were", count, "lines in the file with From as the first word"

通过文本文件查找特定字符串

0 个答案: