关闭urllib2连接

时间:2011-03-26 12:22:55

标签: python ftp connection urllib2

我正在使用urllib2从ftp和http服务器加载文件。

某些服务器每个IP仅支持一个连接。问题是,urllib2不会立即关闭连接。看一下示例程序。

from urllib2 import urlopen
from time import sleep

url = 'ftp://user:pass@host/big_file.ext'

def load_file(url):
    f = urlopen(url)
    loaded = 0
    while True:
        data = f.read(1024)
        if data == '':
            break
        loaded += len(data)
    f.close()
    #sleep(1)
    print('loaded {0}'.format(loaded))

load_file(url)
load_file(url)

代码从ftp-server加载两个文件(这里两个文件是相同的),只支持1个连接。这将打印以下日志:

loaded 463675266
Traceback (most recent call last):
  File "conection_test.py", line 20, in <module>
    load_file(url)
  File "conection_test.py", line 7, in load_file
    f = urlopen(url)
  File "/usr/lib/python2.6/urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.6/urllib2.py", line 391, in open
    response = self._open(req, data)
  File "/usr/lib/python2.6/urllib2.py", line 409, in _open
    '_open', req)
  File "/usr/lib/python2.6/urllib2.py", line 369, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.6/urllib2.py", line 1331, in ftp_open
    fw = self.connect_ftp(user, passwd, host, port, dirs, req.timeout)
  File "/usr/lib/python2.6/urllib2.py", line 1352, in connect_ftp
    fw = ftpwrapper(user, passwd, host, port, dirs, timeout)
  File "/usr/lib/python2.6/urllib.py", line 854, in __init__
    self.init()
  File "/usr/lib/python2.6/urllib.py", line 860, in init
    self.ftp.connect(self.host, self.port, self.timeout)
  File "/usr/lib/python2.6/ftplib.py", line 134, in connect
    self.welcome = self.getresp()
  File "/usr/lib/python2.6/ftplib.py", line 216, in getresp
    raise error_temp, resp
urllib2.URLError: <urlopen error ftp error: 421 There are too many connections from your internet address.>

因此第一个文件被加载而第二个文件失败,因为第一个连接没有关闭。

但是当我在sleep(1)之后使用f.close()时,错误不会发生:

loaded 463675266
loaded 463675266

有没有办法强制关闭连接,以便第二次下载不会失败?

4 个答案:

答案 0 :(得分:4)

原因确实是文件描述符泄漏。我们还发现,使用jython,问题比cpython更明显。 一位同事提出了这个解决方案:

 

    fdurl = urllib2.urlopen(req,timeout=self.timeout)
    realsock = fdurl.fp._sock.fp._sock** # we want to close the "real" socket later 
    req = urllib2.Request(url, header)
    try:
             fdurl = urllib2.urlopen(req,timeout=self.timeout)
    except urllib2.URLError,e:
              print "urlopen exception", e
    realsock.close() 
    fdurl.close()

修复很难看,但是做了工作,没有“太多的开放连接”。

答案 1 :(得分:3)

Biggie:我认为这是因为连接不是shutdown()。

  

注意close()释放资源   与连接相关但是确实如此   不一定关闭连接   立即。如果你想关闭   及时联系,打电话   close()之前的shutdown()。

你可以在f.close()之前尝试这样的事情:

import socket
f.fp._sock.fp._sock.shutdown(socket.SHUT_RDWR)

(是的..如果有效,那就不对了(tm),但你会知道问题是什么。)

答案 2 :(得分:3)

对于Python 2.7.1 urllib2确实泄漏了文件描述符: https://bugs.pypy.org/issue867

答案 3 :(得分:0)

Alex Martelli回答了类似的问题。阅读:should I call close() after urllib.urlopen()?

简而言之:

import contextlib

with contextlib.closing(urllib.urlopen(u)) as x:
    # ...