如何使用scrapy从网页中提取链接?

时间:2014-06-20 13:29:01

标签: python web-scraping scrapy

这是我的种子网址:

http://www.amazon.com/s/ref=sr_nr_n_0?rh=n%3A133140011%2Cn%3A%21133141011%2Cn%3A154606011%2Cn%3A668010011%2Cn%3A158591011%2Cn%3A158592011&bbn=158591011&ie=UTF8&qid=1403264414&rnid=158591011

如何从scrapy中提取所有kindle book链接?

这是我的代码,但我没有得到预期的结果:

class MySpider(CrawlSpider):
    name = "scraper"
    allowed_domains = ["amazon.com"]
    start_urls = ["http://www.amazon.com/s/ref=sr_nr_n_0?rh=n%3A133140011%2Cn%3A%21133141011%2Cn%3A154606011%2Cn%3A668010011%2Cn%3A158591011%2Cn%3A158592011&bbn=158591011&ie=UTF8&qid=1403264414&rnid=158591011"]   

    def parse(self, response):
        hxs = HtmlXPathSelector(response)
        items = hxs.select('//*[@id="resultsCol"]').re('\/dp\/B00.*digital-text')
        for item in items:
            link = item.extract()
            print link

0 个答案:

没有答案
相关问题