数据未正确抓取

时间:2019-07-02 19:06:24

标签: python scrapy

尝试使用Scrapy https://www2.trollandtoad.com/buylist/?_ga=2.123753418.115346513.1562026676-1813285172.1559913561#!/M/10591抓取以下网页,但我正确抓取了部分数据,但是我无法正确抓取卡名,因为卡的选择器与设置名称相同,所以我只需获取卡名称的设置名称即可。

 def parse(self, response):
        #  Initialize item to function GameItem located in items.py, will be called multiple times
        item = GameItem()
        # Extract card category from URL using html code from website that identifies the category.  Will be outputted before rest of data
        for data in response.css("tr.ng-scope"):
            item["Set"] =data.css("a.ng-binding.ng-scope::text").get()
            if item["Set"] == None:
                item["Set"] = data.css("span.ng-binding.ng-scope::text").get()
            item["Card_Name"] = data.css("a.ng-binding.ng-scope::text").get()
            # Call item again in order to extract the condition, stock, and price using the corresponding html code from the website
            item["Condition"] = data.css("td\.5557170.buylist_condition::text").get()
            item["Quantity"] = data.css("span.ng-binding::text").get()
            item["Price"] = data.css("span.ng-binding::text").get()

更新#1

我改用xpath并能够获得卡名而不是集合名,但是它为每一行返回相同的卡名,而不是不同的行。

item["Card_Name"] = data.xpath("/html/body/div[2]/div[2]/div[1]/table[1]/tbody/tr[1]/td[2]/a/text()").get()

2 个答案:

答案 0 :(得分:0)

card_names = response.xpath("//div/table/tbody/tr/td[contains(@class,'buylist_productname item')]/a/text()").getall()

将根据页面中的顺序返回不同卡名的列表。

答案 1 :(得分:0)

下面的代码最终使它正常工作,我不得不修整xpath并使它相对而不是绝对。

item["Card_Name"]  = data.xpath(".//td[2]/a/text()").get()