Question

所以我试图检查一个url是否存在，如果确实存在，我想使用python将url写入文件。我还希望每个网址都在文件中自己的行。这是我已有的代码：

import urllib2

创建一个BLANK TXT文件桌面

urlhere = "http://www.google.com"   
print "for url: " + urlhere + ":"  

try: 
    fileHandle = urllib2.urlopen(urlhere)
    data = fileHandle.read()
    fileHandle.close()
    print "It exists"

然后，如果URL确实存在，请将URL写在文本文件

中的新行上

except urllib2.URLError, e:
    print 'PAGE 404: It Doesnt Exist', e

如果URL不存在，请不要在文件中写入任何内容。

Answer 1

这样的事情怎么样：

import urllib2

url  = 'http://www.google.com'
data = ''

try:
    data = urllib2.urlopen(url).read()
except urllib2.URLError, e:
    data = 'PAGE 404: It Doesnt Exist ' + e

with open('outfile.txt', 'w') as out_file:
   out_file.write(data)

Answer 2

你提出问题的方式有点令人困惑，但如果我理解你正确的话，你要做的就是测试一个url是否有效使用urllib2并且是否将url写入文件？如果这是正确的，则以下情况应该有效。

import urllib2
f = open("url_file.txt","a+")
urlhere = "http://www.google.com"   
print "for url: " + urlhere + ":"  

try: 
    fileHandle = urllib2.urlopen(urlhere)
    data = fileHandle.read()
    fileHandle.close()
    f.write(urlhere + "\n")
    f.close()
    print "It exists"

except urllib2.URLError, e:
    print 'PAGE 404: It Doesnt Exist', e

如果要测试多个网址但不想编辑python脚本，可以通过键入python python_script.py "http://url_here.com"来使用以下脚本。这可以通过使用sys模块来实现，其中sys.argv [1]等于传递给python_script.py的第一个参数。在这个例子中是url（'http://url_here.com'）。

import urllib2,sys
f = open("url_file.txt","a+")
urlhere = sys.argv[1]   
print "for url: " + urlhere + ":"  

try: 
    fileHandle = urllib2.urlopen(urlhere)
    data = fileHandle.read()
    fileHandle.close()
    f.write(urlhere+ "\n")
    f.close()
    print "It exists"

except urllib2.URLError, e:
    print 'PAGE 404: It Doesnt Exist', e

或者，如果您真的想让自己的工作变得轻松，可以使用以下脚本，在命令行python python_script http://url1.com,http://url2.com中输入以下内容，其中您要测试的所有网址都以逗号分隔，没有空格。

import urllib2,sys
f = open("url_file.txt","a+")
urlhere_list = sys.argv[1].split(",")   

for urls in urlhere_list:
    print "for url: " + urls + ":" 
    try: 
        fileHandle = urllib2.urlopen(urls)
        data = fileHandle.read()
        fileHandle.close()
        f.write(urls+ "\n")

        print "It exists"

    except urllib2.URLError, e:
        print 'PAGE 404: It Doesnt Exist', e
    except:
        print "invalid url"
f.close()

如果您不想使用命令行功能，

sys.argv[1].split()也可以由脚本中的python列表替换。希望这对你有用，祝你的计划好运。

注意使用命令行输入的脚本在ubuntu linux上进行了测试，因此，如果您使用的是Windows或其他操作系统，我无法保证它可以使用给定的指令，但它应该。

Answer 3

使用requests：

import requests

def url_checker(urls):
    with open('somefile.txt', 'a') as f:
       for url in urls:
           r = requests.get(url)
           if r.status_code == 200:
              f.write('{0}\n'.format(url))

url_checker(['http://www.google.com','http://example.com'])

在新行上用python写文本到txt文件？

3 个答案: