Python POST请求(网页抓取)

时间:2012-11-23 11:52:01

标签: python post screen-scraping

我无法使用以下参数发布帖子请求:

网站:www.zoover.it

这是我的代码:

import requests

Request_URL="http://www.zoover.it/services/Testimonials/TestimonialQueryService.asmx/AccommodationTestimonialQuery"
serviceRequest:{"CurrentLanguage":"Language_NL","PartyFilter":"","CurrentPage":"0","PageSize":"10","SortOption":"date-of-visit"}
pageContext:{"EntityLevel":"accommodation","NewEntityLevel":"accommodation","EntityId":151433,"EntityName":"Residence Belmonte Vacanze****","SemanticName":"accommodation-testimonials","PhysicalUrl":"/accommodation/testimonials.aspx","CurrentSiteVariation":"it","CmsAccommodationTypeFilter":"","PageCode":"accommodation","PageSubcode":"testimonials","CmsEntity":{"Level":1,"Id":151433},"NewCmsEntity":{"Level":{"EntityLevel":"accommodation"},"Id":151433},"Path":"/accommodation/testimonials.aspx","PageSemantic":{"SemanticName":"accommodation-testimonials","PhysicalUrl":"/accommodation/testimonials.aspx","KnownFriendlyParams":["accommodationId"],"HasFriendlyUrl":true},"EntityType":"Appartamento","PageRequestUrl":"/italia/toscana/montaione/residence-belmonte-vacanze/appartamento"}

r = requests.post(Request_URL, params=serviceRequest)

print r.text

我有两个问题:

1)在R.TEXT中,我总是得到“HTTP错误411.请求必须被分块或具有内容长度。”

2)我不知道如何用两个词典(serviceRequest和pageContext)进行POST

我的目标是通过更改字典的参数来进行网页抓取。

感谢您的帮助

1 个答案:

答案 0 :(得分:2)

使用data关键字,内容长度将自动设置:

r = requests.post(Request_URL, data=serviceRequest)

您必须将两个词典合并为一个词典:

data = serviceRequest.copy()
data.update(pageContext)
r = requests.post(Request_URL, data=data)