从CSV文件中抓取多个网址

时间:2018-07-30 20:00:11

标签: python python-3.x

我目前有以下代码:

from bs4 import BeautifulSoup 
import requests
import csv

with open("safari history.csv") as f_urls, open("Titles.txt", "w", newline="") as f_output:

    csv_output = csv.writer(f_output)
    csv_output.writerow(['Title'])

    for url in f_urls:
        #url = url.strip()
        #t = lxml.html.parse(url)
        response = requests.get(url)
        soup = BeautifulSoup(response.text, "lxml")
        titles = soup.find_all('meta')
        print( [meta.attrs['content']for meta in titles if 'name' in meta.attrs and meta.attrs['name'] == 'description'])
        csv_output.writerow([titles]) 

但是,连接死了,我得到了一个错误。是否有代码会“跳过”错误的刮擦或类似的内容?

我的“最终目标”是将网络历史记录中的关键字归为几类:

  

地理位置,性别,年龄等

这是为了查看我们的网络历史记录如何准确地代表我们。
预先感谢

1 个答案:

答案 0 :(得分:0)

如果始终抛出特定错误,则可以使用try / except块来处理成功并简单地传递错误:

try:
    do_work(url)
except YourExceptionType:
    #Do nothing
    pass

shell中的小例子:

>>> float("not a float")
Traceback (most recent call last):
  File "<pyshell#51>", line 1, in <module>
    float("not a float")
ValueError: could not convert string to float: 'not a float'
>>> s = "not a float"
>>> try:
    print(float(s))
except ValueError:
    print("Exception caught")


Exception caught
>>>