10049: The requested address is not valid in its context.. Scrapy-Splash not reading URL correctly

时间:2019-01-09 22:01:32

标签: python-3.x scrapy splash scrapy-splash

I am trying to get the code to read in the web page using splash for a more complicated site, but I can't even get the code to run for this simple site location. I ran the docker and have the 8050 port mapped to 0.0.0.0 in my settings.py file. Any help would be greatly appreciated. Please provide version you used for any package as I fear this may be an issue.

I have tried numerous error fixes along the way. Changing the versions of Splash, Scrapy, and Twisted. Scrapy only works on Python 3.x with a newer version of Twisted, but Splash says incomparable with Twisted > 16.2. So I tried switching up the versioning some there with no fixes.

import scrapy
import scrapy_splash


class ExampleSpider(scrapy.Spider):
    name = "test"
    #allowed_domains = ["Monster.com"]
    start_urls = [
        'http://quotes.toscrape.com/page/1/'
    ]


    def start_requests(self):
        for url in self.start_urls:
            yield scrapy_splash.SplashRequest(url, self.parse, 
                args={
                    'wait': 0.5,
                     },
                    endpoint='render.html',
            )
    def parse(self, response):
        for quote in response.css('div.quote'):
            print (quote.css('span.text::text').extract())

I should just receive the Quote Texts, ie. this is the same URL from the python documentation

1 个答案:

答案 0 :(得分:0)

您的代码没有错。 您的问题是这样的:

我在settings.py文件中将8050端口映射为 0.0.0.0

settings.py中的正确映射应为:

SPLASH_URL = http://localhost:8050

SPLASH_URL = http://127.0.0.1:8050