FormRequest Scrapy

时间:2016-09-19 10:43:33

标签: python scrapy scrapy-spider

我是Scrapy和Python的新手。我尝试使用Scrapy示例中的FormRequest,但似乎formdata参数没有解析' []'来自" Air"。关于解决方法的任何想法? 这是代码:

import scrapy
import re
import json
from scrapy.http import FormRequest

class AirfareSpider(scrapy.Spider):
    name = 'airfare'
    start_urls = [
    'http://www.viajanet.com.br/busca/voos-resultados#/POA/MEX/RT/01-03-2017/15-03-2017/-/-/-/1/0/0/-/-/-/-'
    ]

    def parse(self, response):
    return [FormRequest(url='http://www.viajanet.com.br/busca/resources/api/AvailabilityStatusAsync', 
       formdata={"Partner":{
                   "Token":"p0C6ezcSU8rS54+24+zypDumW+ZrLkekJQw76JKJVzWUSUeGHzltXDhUfEntPPLFLR3vJpP7u5CZZYauiwhshw==",
                   "Key":"OsHQtrHdMZPme4ynIP4lcsMEhv0=",
                   "Id":"52",
                   "ConsolidatorSystemAccountId":"80",
                   "TravelAgencySystemAccountId":"80",
                   "Name":"B2C"
                           },
                 "Air":[{
                   "Arrival":{
                   "Iata":"MEX",
                   "Date":"2017-03-15T15:00:00.000Z"
                        },
                 "Departure":{
                   "Iata":"POA",
                   "Date":"2017-03-01T15:00:00.000Z"
                  },
               "InBoundTime":"0",
               "OutBoundTime":"0",
               "CiaCodeList":"[]",
               "BookingClass":"-1",
               "IsRoundTrip":"true",
               "Stops":"-1",
               "FareType":"-"
               }],
              "Pax":{
                   "adt":"1",
                   "chd":"0",
                   "inf":"0"
              },
              "DisplayTotalAmount":"false",
              "GetDeepLink":"false",
              "GetPriceMatrixOnly":"false",
              "PageLength":"10",
              "PageNumber":"2"
              }
             , callback=self.parse_airfare)]

    def parse_airfare(self, response):
        data = json.loads(response.body)

2 个答案:

答案 0 :(得分:6)

尝试使用 FormRequest.from_response 功能

https://doc.scrapy.org/en/latest/topics/request-response.html#using-formrequest-from-response-to-simulate-a-user-login

<script type="text/javascript"> 
var previewsectionID = document.getElementById('PreviewSection');
 previewsectionID.addEventListener('click',function(){ 
previewsectionID.style.opacity = 1; });
 </script>

答案 1 :(得分:0)

另外回答@Uday的问题,如果一个页面有多个表单,使用formid或者formname选择正确的表单:

def parse(self, response):
    return scrapy.FormRequest.from_response(
        response,
        formid='form_id_of_the_form',
        formdata={'username': 'john', 'password': 'secret'},
        callback=self.after_login
    )

如果没有,FormRequest 默认采用第一个表单。