转储Python中的所有网络请求和响应

时间:2014-10-22 21:12:44

标签: python

如何使用python转储所有网络请求和响应?我要做的事情将与以下内容进行比较(此示例位于nodejs https://github.com/ariya/phantomjs/blob/master/examples/netlog.js

我一直在尝试各种不同的工具,包括以下内容:

示例:

import requests
import logging

logging.basicConfig(level=logging.DEBUG)
r = requests.get('http://www.google.com')

示例:

import urllib2   

request = urllib2.Request('http://jigsaw.w3.org/HTTP/300/302.html')
response = urllib2.urlopen(request)
print "Response code was: %d" % response.getcode()

示例:

import urllib2

passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
authhandler = urllib2.HTTPBasicAuthHandler(passman)
handler=urllib2.HTTPHandler(debuglevel=1)
opener = urllib2.build_opener(handler)
opener=urllib2.build_opener(authhandler, urllib2.HTTPHandler(debuglevel=1))
urllib2.install_opener(opener)
response = urllib2.urlopen('http://groupon.com')
print response

......还有更多。

我想要捕获的信息类型的示例如下(我使用fiddler2来获取此信息。所有这些以及更多来自访问groupon.com):

#   Result  Protocol    Host    URL Body    Caching Content-Type    Process Comments    Custom  
6   200 HTTP    www.groupon.com /   23,236  private, max-age=0, no-cache, no-store, must-revalidate text/html; charset=utf-8    chrome:6080         
7   200 HTTP    www.groupon.com /homepage-assets/styles-6fca4e9f48.css  6,766   public, max-age=31369910    text/css; charset=UTF-8 chrome:6080         
8   200 HTTP    Tunnel to   img.grouponcdn.com:443  0           chrome:6080         
9   200 HTTP    img.grouponcdn.com  /deal/gsPCLbbqioFVfvjT3qbBZo/The-Omni-Mount-Washington-Resort_01-960x582/v1/c550x332.jpg    94,555  public, max-age=315279127; Expires: Fri, 18 Oct 2024 22:20:20 GMT   image/jpeg  chrome:6080         
10  200 HTTP    img.grouponcdn.com  /deal/d5YmjhxUBi2mgfCMoriV/pE-700x420/v1/c220x134.jpg   17,832  public, max-age=298601213; Expires: Mon, 08 Apr 2024 21:35:06 GMT   image/jpeg  chrome:6080         
11  200 HTTP    www.groupon.com /homepage-assets/main-fcfaf867e3.js 9,604   public, max-age=31369913    application/javascript  chrome:6080         
12  200 HTTP    www.groupon.com /homepage-assets/locale.js?locale=en_US&country=US  1,507   public, max-age=994 application/javascript  chrome:6080         
13  200 HTTP    www.groupon.com /tracky 3       application/octet-stream    chrome:6080         
14  200 HTTP    www.groupon.com /cart/widget?consumerId=b577c9c2-4f07-11e4-8305-0025906127fe    17  private, max-age=0, no-cache, no-store, must-revalidate application/json; charset=utf-8 chrome:6080         
15  200 HTTP    www.googletagmanager.com    /gtm.js?id=GTM-B76Z 39,061  private, max-age=911; Expires: Wed, 22 Oct 2014 20:48:14 GMT    text/javascript; charset=UTF-8  chrome:6080         

1 个答案:

答案 0 :(得分:0)

这不完全是它,但它足够接近,是的,它是urllib2:

from bs4 import BeautifulSoup
import requests
import re
import csv
import json
import time
import fileinput
import urllib2

data = urllib2.urlopen("http://stackoverflow.com").read()
soup = BeautifulSoup(data)

.read()会为所有http标头的网址返回足够的数据。

相关问题