使用python爬网页面

时间:2016-10-21 12:31:24

标签: python python-3.x web-scraping web-crawler

我必须在该页面中抓取所有这些结果: http://www.carnival.com.au/Find-A-Cruise/search-results.aspx?ShipCode=LE&

问题是没有选项可以全部显示。到目前为止,我已成功抓取初始页面,但我无法转到其他页面。怎么做到这一点?

1 个答案:

答案 0 :(得分:2)

后续页面通过分页加载JavaScript。您会看到请求正在使用POST请求中的某些参数发送到"http://www.carnival.com.au/DomainData/SailingSearch/Get/"。如果您模拟相同的请求,则会返回包含巡航信息的JSON数据。

import requests
sesh = requests.Session()
first_page = sesh.get("http://www.carnival.com.au/Find-A-Cruise/search-results.aspx?ShipCode=LE&#UBSELBWf2tB4Rs1H.97")
data = {"ShipCode": "LE", "CurrencyCode": "AUD", "PageSize": 5, "PageNumber": 2, "SortExpression": "FirstSailDate"}
page_2 = sesh.post("http://www.carnival.com.au/DomainData/SailingSearch/Get/", data=data)
cruise_data = page_2.json()

JSON响应甚至可以显示有多少总结果,您可以使用它来更有效地请求后续页面。

该JSON的一些示例输出。

{'CurrentPage': '2',
 'CurrentResultsCount': '6 - 10',
 'LastPage': '9',
 'SortExpression': 'FirstSailDate',
 'TotalResultsCount': 44,
 'Voyages': [{'BookNowUrl': 'http://booking.carnival.com.au/index.asp?AIID=44&overridePageID=651&currentPageID=650&processingObjectIDList=21604&search
Mode=searchByNumber&searchByNumberCriteria=G639&searchByCriteriaStatus=go&voyageCode=G639&voyageName=G639&shipCode=LE&shipName=Legend&brandCode=CL&bra
ndName=Carnival%20Cruise%20Lines&homeCityCode=SYD&airCityCode=SYD&homeCityName=Sydney&airCityNameSydney&tDef=&tourName=&duration=10&embarkDate=2016121
7&tType=O&tDirection=R&destinationCode=I&destinationName=Pacific+Islands&cruiseSelected=yes&unbundling=-&switchPolarRegion=prd&currencyCode=AUD',
              'CruiseCode': 'G639 ',
              'CruiseNights': 10,
              'DateRangeText': '17 Dec 2016 (Sat - Tue)',
              'DeparturePortCode': 'SYD',
              'DeparturePortName': 'Sydney',
              'FromBPrice': '1,699.00 AUD',
              'FromIPrice': '1,549.00 AUD',
              'FromOPrice': '1,649.00 AUD',
              'FromQuadPrice': '1,689.00 AUD',
              'FromSPrice': '2,649.00 AUD',
              'FromTwinPrice': '1,549.00 AUD',
              'MetaCategory': 'P',
              'MetaCategoryDescription': 'Pacific Islands',
              'PortsVisited': [{'CruiseDay': 0,
                                'PortCode': 'SYD',
                                'PortName': 'Sydney'},
                               {'CruiseDay': 1,
                                'PortCode': 'NOU',
                                'PortName': 'Noumea'},
                               {'CruiseDay': 2,
                                'PortCode': 'MY2',
                                'PortName': 'Mystery Island'},
                               {'CruiseDay': 3,
                                'PortCode': 'LIF',
                                'PortName': 'Lifou Isle'},
                               {'CruiseDay': 4,
                                'PortCode': 'MEE',
                                'PortName': 'Mare'},
                               {'CruiseDay': 5,
                                'PortCode': 'SYD',
                                'PortName': 'Sydney'}],
              'RegionCode': 'I',
              'RegionName': 'Pacific Islands',
              'SailDate': '/Date(1481950800000)/',
              'ShipCode': 'LE',
              'ShipName': 'Legend',
相关问题