下载没有扩展名的验证码图像

时间:2017-01-20 00:37:06

标签: python io download beautifulsoup python-requests

如何使用PIL或其他图像处理库下载this验证码图像,我尝试了几种方法,但无法下载图像。

from PIL import Image
import urllib2 as urllib
import io

fd = urllib.urlopen("https://notacarioca.rio.gov.br/senhaweb/CaptchaImage.aspx?guid=9759fc80-d385-480a-aa6e-8e00ef20be7b&s=1")
image_file = io.BytesIO(fd.read())
im = Image.open(image_file)
print im

1 个答案:

答案 0 :(得分:0)

您尝试下载的图片没有静态网址。

链接工作: Link working 相同链接不再有效: Link not working

这意味着您无法使用静态网址来引用图片(urllib.urlopen("https://notacarioca.rio.gov.br/senhaweb/CaptchaImage.aspx?guid=9759fc80-d385-480a-aa6e-8e00ef20be7b&s=1")无效)。

以下是使用RequestsBeautifulSoup的解决方案:

import requests
from mimetypes import guess_extension
from bs4 import BeautifulSoup
from urllib.parse import urljoin
# from PIL import Image
# from io import BytesIO

s = requests.session()
r = s.get("https://notacarioca.rio.gov.br/senhaweb/login.aspx")

if r.status_code == 200:
    soup = BeautifulSoup(r.content, "html.parser")
    div = soup.find("div", attrs={"class": "captcha", "style": "color:Red;width:100%;"})

    r = s.get(urljoin("https://notacarioca.rio.gov.br/senhaweb/", div.img["src"]))
    if r.status_code == 200:
        guess = guess_extension(r.headers['content-type'])
        if guess:
            with open("captcha" + guess, "wb") as f:
                f.write(r.content)
            # Image.open(BytesIO(r.content)).show()