使用python从网站下载.pdf文件

时间:2018-01-13 01:25:57

标签: python scripting

我正在尝试从提供的网站下载所有pdf,我使用以下代码:

import mechanize
from time import sleep
br = mechanize.Browser()


br.open('http://www.nerc.com/comm/CCC/Pages/AgendasHighlightsandMinutes-.aspx')

f=open("source.html","w")
f.write(br.response().read()) 

filetypes=[".pdf"] 
myfiles=[]
for l in br.links(): 
    for t in filetypes:
        if t in str(l): 
            myfiles.append(l)


def downloadlink(l):
    f=open(l.text,"w") 
    br.click_link(l)
    f.write(br.response().read())
    print l.text," has been downloaded"


for l in myfiles:
    sleep(1) 
    downloadlink(l)

继续收到以下错误,无法找出原因。

legal and privacy  has been downloaded
Traceback (most recent call last):
  File "downloads-pdfs.py", line 29, in <module>
    downloadlink(l)
  File "downloads-pdfs.py", line 21, in downloadlink
    f=open(l.text,"w")
IOError: [Errno 13] Permission denied: u'/trademark policy'

1 个答案:

答案 0 :(得分:1)

您遇到的问题是因为您使用链接URL作为文件名。字符“/”在文件名中无效。尝试将downloadlink函数修改为以下内容:

def downloadlink(l):
    filename = l.text.split('/')[-1]
    with open(filename, "w") as f:
        br.click_link(l)
        f.write(br.response().read())
    print l.text," has been downloaded"