wget中的HTTP 500错误

时间:2011-05-21 16:46:43

标签: http http-headers wget

看看这个页面:

  

http://www.ptmytrade.com/product.asp?id=61363

装载正常(至少在这里)。现在我想用wget抓住它。

$ wget http://www.ptmytrade.com/product.asp?id=61363 --debug
DEBUG output created by Wget 1.12 on linux-gnu.

--2011-05-21 18:24:51--  http://www.ptmytrade.com/product.asp?id=61363
Resolving www.ptmytrade.com... 205.209.150.134
Caching www.ptmytrade.com => 205.209.150.134
Connecting to www.ptmytrade.com|205.209.150.134|:80... connected.
Created socket 3.
Releasing 0x0890e260 (new refcount 1).

---request begin---
GET /product.asp?id=61363 HTTP/1.0
User-Agent: Wget/1.12 (linux-gnu)
Accept: */*
Host: www.ptmytrade.com
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 500 Internal Server Error
Connection: keep-alive
Date: Sat, 21 May 2011 16:24:56 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Content-Length: 471822
Content-Type: text/html
Set-Cookie: ASPSESSIONIDSCACCAQA=FOCCMJODFHHMOKNKPAIHJCIL; path=/
Cache-control: private

---response end---
500 Internal Server Error

Stored cookie www.ptmytrade.com -1 (ANY) / <session> <insecure> [expiry none] ASPSESSIONIDSCACCAQA FOCCMJODFHHMOKNKPAIHJCIL
Registered socket 3 for persistent reuse.
Disabling further reuse of socket 3.
Closed fd 3
2011-05-21 18:24:57 ERROR 500: Internal Server Error.

好的,所以我在使用浏览器(使用Live HTTP Headers插件)获取页面时检查标题:

http://www.ptmytrade.com/product.asp?id=61361

GET /product.asp?id=61361 HTTP/1.1
Host: www.ptmytrade.com
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:2.0) Gecko/20100101 Firefox/4.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Cookie: ASPSESSIONIDSCACBBRA=AMPBLLNDGMFLNPNCPEBPNNLB; ASPSESSIONIDSCACCAQA=FJNBMJODLHHJNDHPFBIEEPEM

HTTP/1.1 500 Internal Server Error
Date: Sat, 21 May 2011 16:20:46 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Content-Length: 471822
Content-Type: text/html
Cache-Control: private
----------------------------------------------------------
http://www.ptmytrade.com/images/index_117.jpg

GET /images/index_117.jpg HTTP/1.1
Host: www.ptmytrade.com
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:2.0) Gecko/20100101 Firefox/4.0
Accept: image/png,image/*;q=0.8,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Referer: http://www.ptmytrade.com/product.asp?id=61361
Cookie: ASPSESSIONIDSCACBBRA=AMPBLLNDGMFLNPNCPEBPNNLB; ASPSESSIONIDSCACCAQA=FJNBMJODLHHJNDHPFBIEEPEM

HTTP/1.1 404 Not Found
Content-Length: 1635
Content-Type: text/html
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Date: Sat, 21 May 2011 16:20:48 GMT

我不确定这里发生了什么。该页面显示正常,但我在标题中得到500错误代码。

问题是通过使用curl(也是获得500,但获取页面很好)来解决的,但我很好奇这里发生了什么。

3 个答案:

答案 0 :(得分:2)

这是网页中的一个错误。 HTTP状态确实看似错误地设置为HTTP 500. Firefox / Firebug也证实了这一点。基本上,您正面临着一个带有“正常”内容的HTTP 500错误页面。

将其报告给网站管理员。

答案 1 :(得分:0)

使用此选项将解决此问题:

--content-on-error
           If this is set to on, wget will not skip the content when the
           server responds with a http status code that indicates error.

所以命令看起来像这样:

wget --content-on-error "https://stackoverflow.com"

注意:将URL放在双引号中很重要,否则wget会卡在Redirecting output to ‘wget-log’.上。

或者如注释中所述,并由OP代替,使用curl

但是我应该注意curl无法下载整个网页(css,js,图像等),因为它无法解析HTML。 SourceTaken from

答案 2 :(得分:0)

尝试将其用引号引起来

wget "http://www.ptmytrade.com/product.asp?id=61363"

代替:

wget http://www.ptmytrade.com/product.asp?id=61363