读取后从txt文件中删除行

时间:2013-01-24 03:50:34

标签: python scripting

我正在尝试创建一个脚本,该脚本从txt文件发出对随机网址的请求

import urllib2

with open('urls.txt') as urls:
    for url in urls:
        try:
            r = urllib2.urlopen(url)
        except urllib2.URLError as e:
            r = e
        if r.code in (200, 401):
            print '[{}]: '.format(url), "Up!"
        elif r.code == 404:
            print '[{}]: '.format(url), "Not Found!" 

但我想要的是当某个网址没有找到404从文件中删除时。每个网址都是每行,所以基本上是擦除404找不到的网址。怎么做?!

2 个答案:

答案 0 :(得分:1)

你可以写第二个文件:

import urllib2

with open('urls.txt', 'r') as urls, open('urls2.txt', 'w') as urls2:
    for url in urls:
        try:
            r = urllib2.urlopen(url)
        except urllib2.URLError as e:
            r = e

        if r.code in (200, 401):
            print '[{}]: '.format(url), "Up!"
            urls2.write(url + '\n')
        elif r.code == 404:
            print '[{}]: '.format(url), "Not Found!" 

答案 1 :(得分:0)

要从文件中删除行,您必须重写文件的整个内容。最安全的方法是在同一目录中写出 new 文件,然后在旧文件上rename。我会像这样修改你的代码:

import os
import sys
import tempfile
import urllib2

good_urls = set()

with open('urls.txt') as urls:
    for url in urls:
        try:
            r = urllib2.urlopen(url)
        except urllib2.URLError as e:
            r = e
        if r.code in (200, 401):
            sys.stdout.write('[{}]: Up!\n'.format(url))
            good_urls.add(url)
        elif r.code == 404:
            sys.stdout.write('[{}]: Not found!\n'.format(url))
        else:
            sys.stdout.write('[{}]: Unexpected response code {}\n'.format(url, r.code))

tmp = None
try:
    tmp = tempfile.NamedTemporaryFile(mode='w', suffix='.txt', dir='.', delete=False)
    for url in sorted(good_urls):
        tmp.write(url + "\n")
    tmp.close()
    os.rename(tmp.name, 'urls.txt')
    tmp = None
finally:
    if tmp is not None:
        os.unlink(tmp.name)

您可能希望在第一个循环中向good_urls.add(url)子句添加else。如果有人知道一个比较简单的方法来做我做的尝试 - 最后在那里,我想听听它。