原始文件名的Python下载文件

时间:2020-08-25 13:26:53

标签: python

我曾经使用Python和漂亮的汤来检测网站的链接,现在我想从检测到的url下载图像文件并将其存储在特定的文件夹中,最简单的方法是什么?

我到目前为止开发的代码:

from bs4 import BeautifulSoup as soup  # HTML data structure
from urllib.request import urlopen as uReq  # Web client
from PIL import Image
import requests
my_url = "https://abc/videos/vod/movies/actress/letter=a/sort=popular/page=1/"

uClient = uReq(my_url)
page_html=uClient.read()
uClient.close()

page_soup = soup(page_html, "html.parser")

for div in page_soup.findAll('div', attrs={'class':'main'}):
    for ul in div.findAll('ul'):
        for li in ul.findAll('li'):
            for img in li.findAll('img', alt=True):
                link=img['src']

检测到网址链接:

https://abcde/mono/actjpgs/abb1.jpg
https://abcde/mono/actjpgs/t31sw.jpg
https://abcde/mono/actjpgs/beaas.jpg

最终结果文件名:

abb1.jpg
t31sw.jpg
beaas.jpg

2 个答案:

答案 0 :(得分:0)

import os
import shutil
from urllib.parse import urlparse

# get filename from URL
url = "https://abcde/mono/actjpgs/abb1.jpg"
url_parsed = urlparse(url)
filename = os.path.basename(url_parsed.path)    # will contain abb1.jpg 

# download file
with urllib.request.urlopen(url) as response, open(filename, 'wb') as out_file:
    shutil.copyfileobj(response, out_file)

答案 1 :(得分:0)

正如卡尔(Karl)所建议的那样,通过谷歌快速搜索可以告诉您这一点,但是由于我在SO早期职业生涯中有所帮助,因此我会尽力为您做到这一点。

import requests

link = your/example/link.jpg

# Get image and file name
r = requests.get(link, allow_redirects=True) 
fname = link.split('/')[-1]

# save the file
open(fname, 'wb').write(r.content)

我尚未测试此代码。

相关问题