使用xpath选择器查找下一页链接时Selenium :: WebDriver :: Error :: InvalidSelectorError

时间:2013-12-16 15:05:55

标签: selenium xpath selenium-webdriver capybara screen-scraping

尝试执行时出现

错误,这是我一直在使用capybara和selenium web驱动程序的屏幕抓取工具。这是确切的错误:给定的选择器#无效或不会产生WebElement。发生以下错误:(Selenium :: WebDriver :: Error :: InvalidSelectorError)。这与应该找到下一页链接并单击它的选择器有关。在查看源代码时它是一个有效的xpath选择器,但是水豚和selenium不同意。

require "capybara/dsl"
require "spreadsheet"
require "fileutils"
require "open-uri"


Capybara.run_server = false
Capybara.default_driver = :selenium
Capybara.default_selector = :xpath
Spreadsheet.client_encoding = 'UTF-8'

class Tomtop
  include Capybara::DSL

  def initialize
    @LOCAL_DIR = "data-hold/images"
    @excel = Spreadsheet::Workbook.new
    @work_list = @excel.create_worksheet
    @row = 0
    FileUtils.makedirs(@LOCAL_DIR) unless File.exists? @LOCAL_DIR
  end

  def go
    visit_main_link
  end

  def retryable(options = {}, &block)
    opts = { :tries => 1, :on => Exception }.merge(options) #possible bug (remove this line and take options from options hash)

    retry_exception, retries = opts[:on], opts[:tries]

    begin
      return yield
    rescue retry_exception
      retry if (retries -= 1) > 0
    end

    yield
  end

  def visit_main_link
    visit "http://www.example.com/clothing-accessories?dir=asc&limit=72&order=position"
    @results = all("//h5/a[contains(@onclick, 'analyticsLog')]")
    while page.has_selector?("//td[contains(@class, 'pages')]//a[img/@alt='Next Page']")
      retryable(:tries => 1, :on => OpenURI::HTTPError) do
      find.first("//td[contains(@class, 'pages')]//a[img/@alt='Next Page']").click
      @results = all("//h5/a[contains(@onclick, 'analyticsLog')]")
      @results.each do |a|
        @links << a[:href]
        end
          @links.each do |link|
            visit link
            save_item
          end
        @excel.write "inventory.csv"
      end
    end 
  end

  def save_item
    data = all("//*[@id='content-wrapper']/div[2]/div/div")
    data.each do |info|
      @work_list[@row, 0] = info.find("//*[@id='productright']/div/div[1]/h1").text

      price = info.first("//div[contains(@class, 'price font left')]")
      @work_list[@row, 1] = (price.text.to_f * 1.33).round(2) if price

      @work_list[@row, 2] = info.find("//*[@id='productright']/div/div[11]").text

      @work_list[@row, 3] = info.find("//*[@id='tabcontent1']/div/div").text.strip

      color = info.all("//dd[1]//select[contains(@name, 'options')]//*[@price='0']")
      @work_list[@row, 4] = color.collect(&:text).join(', ')

      size = info.all("//dd[2]//select[contains(@name, 'options')]//*[@price='0']")
      @work_list[@row, 5] = size.collect(&:text).join(', ')

      sku = File.basename(info.find("//*[@id='content-wrapper']/div[2]/div/div/div[1]/div[1]/a")['href'])
      @work_list[@row, 6] = sku.gsub!(/\D/, "")#.join(([*('A'..'Z'),*('0'..'9')]-%w(0 1 I O)).sample(4).join)

      @work_list[@row, 7] = File.basename(info.find("//*[@id='content-wrapper']/div[2]/div/div/div[1]/div[1]/a")['href'])

      imagelink = info.all("//*[@rel='lightbox[rotation]']")
      @work_list[@row, 8] = imagelink.map { |link| File.basename(link['href']) }.join(', ')  

      images = imagelink.map { |link| link['href'] }
      images.each do |image|
        File.open(File.basename("#{@LOCAL_DIR}/#{image}"), 'w') do |f|
          f.write(open(image).read)
        end
      end
      @row = @row + 1
    end
  end
end

tomtop = Tomtop.new
tomtop.go

1 个答案:

答案 0 :(得分:0)

我认为你的问题在这里:

find.first("//td[contains(@class, 'pages')]//a[img/@alt='Next Page']").click

更具体地说:

a[img/@alt='Next Page']

我之前从未在xpath中看过这种表示法,所以我猜它是无效的。对我来说,这意味着:

  

查找<a img='Next Page' alt='Next Page' />

但是,由于您匹配@上的属性,我确定这是不正确的表示法。

修复您的选择器以符合您的需要。例如,如果您要查找<a />下的图像,则应使用

a/img[@alt='Next Page']

我应该做的另一个建议是使用CSS选择器。它们是faster, and more readable.

相关问题