我正在从网址下载pdf文件。网址列表为.csv格式。以下代码有效。但是,由于我的所有URL均以 /filename1.pdf 结尾,因此输出内容将写入先前下载的filename1.pdf的顶部。我有大约15,000个网址,但最终只有一个文件(即filename1)。有什么方法可以将下载的pdf文件重命名为增量编号?
import os
import csv
import requests
os.chdir('C:\\Users\\dul\\Dropbox\\CTO\\ctos')
write_path = 'C:\\Users\\dul\\Dropbox\\CTO\\ctos\\'
with open('urls.csv', 'r') as csvfile:
spamreader = csv.reader(csvfile)
for link in spamreader:
print('-'*72)
pdf_file = link[0].split('/')[-1]
with open(os.path.join(write_path, pdf_file), 'wb') as pdf:
try:
# Try to request PDF from URL
print('TRYING {}...'.format(link[0]))
a = requests.get(link[0], stream=True)
for block in a.iter_content(512):
if not block:
break
pdf.write(block)
print('OK.')
except requests.exceptions.RequestException as e:
print('REQUESTS ERROR:')
print(e)
答案 0 :(得分:1)
使用enumerate()
获取由csv
迭代器产生的每个项目的索引,然后在输出文件名前加上该数字以使每个文件唯一:
with open("urls.csv", "r") as csvfile:
for idx, link in enumerate(csv.reader(csvfile)):
print("-" * 72)
pdf_file = "{idx:05}_{link}".format(idx=idx, link=link[0].split('/')[-1])
print(pdf_file)
格式字符串的{idx:05}
组件指示格式化程序考虑idx
的宽度为五个字符,并将其零填充。
结果:
------------------------------------------------------------------------ 00000_filename1.pdf ------------------------------------------------------------------------ 00001_filename1.pdf ...