如何在url上使用scrapy shell和用户名和密码(登录时需要网站)

时间:2014-12-01 15:45:57

标签: python-2.7 xpath scrapy scrapyd scrapy-spider

我想废弃一个登录需求网站并使用python scrapy-framework中的scrapy shell检查我的xpath是对还是错

    C:\Users\Ranvijay.Sachan>scrapy shell https://www.google.co.in/?gfe_rd=cr&ei=mIl8V
    6LovC8gegtYHYDg&gws_rd=ssl
    :0: UserWarning: You do not have a working installation of the service_identity m
    ule: 'No module named service_identity'.  Please install it from <https://pypi.py
    on.org/pypi/service_identity> and make sure all of its dependencies are satisfied
     Without the service_identity module and a recent enough pyOpenSSL to support it,
    wisted can perform only rudimentary TLS client hostname verification.  Many valid
    ertificate/hostname mappings may be rejected.
    2014-12-01 21:00:04-0700 [scrapy] INFO: Scrapy 0.24.2 started (bot: scrapybot)
    2014-12-01 21:00:04-0700 [scrapy] INFO: Optional features available: ssl, http11
    2014-12-01 21:00:04-0700 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL'
    0}
    2014-12-01 21:00:05-0700 [scrapy] INFO: Enabled extensions: TelnetConsole, CloseS
    der, WebService, CoreStats, SpiderState
    2014-12-01 21:00:05-0700 [scrapy] INFO: Enabled downloader middlewares: HttpAuthM
    dleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, Default
    adersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddle
    re, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
    2014-12-01 21:00:05-0700 [scrapy] INFO: Enabled spider middlewares: HttpErrorMidd
    ware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
    2014-12-01 21:00:05-0700 [scrapy] INFO: Enabled item pipelines:
    2014-12-01 21:00:05-0700 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:60

    2014-12-01 21:00:05-0700 [scrapy] DEBUG: Web service listening on 127.0.0.1:6081
    2014-12-01 21:00:05-0700 [default] INFO: Spider opened
    2014-12-01 21:00:06-0700 [default] DEBUG: Crawled (200) <GET https://www.google.c
    in/?gfe_rd=cr> (referer: None)
    [s] Available Scrapy objects:
    [s]   crawler    <scrapy.crawler.Crawler object at 0x01B71910>
    [s]   item       {}
    [s]   request    <GET https://www.google.co.in/?gfe_rd=cr>
    [s]   response   <200 https://www.google.co.in/?gfe_rd=cr>
    [s]   settings   <scrapy.settings.Settings object at 0x023CBC90>
    [s]   spider     <Spider 'default' at 0x29402f0>
    [s] Useful shortcuts:
    [s]   shelp()           Shell help (print this help)
    [s]   fetch(req_or_url) Fetch request (or URL) and update local objects
    [s]   view(response)    View response in a browser

    >>> response.xpath("//div[@id='_eEe']/text()").extract()

    [u'Google.co.in offered in: ', u'  ', u'  ', u'  ', u'  ', u'  ', u'  ', u'  ', u
     ', u' ']
    >>>

0 个答案:

没有答案