错误,这是我一直在使用capybara和selenium web驱动程序的屏幕抓取工具。这是确切的错误:给定的选择器#无效或不会产生WebElement。发生以下错误:(Selenium :: WebDriver :: Error :: InvalidSelectorError)。这与应该找到下一页链接并单击它的选择器有关。在查看源代码时它是一个有效的xpath选择器,但是水豚和selenium不同意。
require "capybara/dsl"
require "spreadsheet"
require "fileutils"
require "open-uri"
Capybara.run_server = false
Capybara.default_driver = :selenium
Capybara.default_selector = :xpath
Spreadsheet.client_encoding = 'UTF-8'
class Tomtop
include Capybara::DSL
def initialize
@LOCAL_DIR = "data-hold/images"
@excel = Spreadsheet::Workbook.new
@work_list = @excel.create_worksheet
@row = 0
FileUtils.makedirs(@LOCAL_DIR) unless File.exists? @LOCAL_DIR
end
def go
visit_main_link
end
def retryable(options = {}, &block)
opts = { :tries => 1, :on => Exception }.merge(options) #possible bug (remove this line and take options from options hash)
retry_exception, retries = opts[:on], opts[:tries]
begin
return yield
rescue retry_exception
retry if (retries -= 1) > 0
end
yield
end
def visit_main_link
visit "http://www.example.com/clothing-accessories?dir=asc&limit=72&order=position"
@results = all("//h5/a[contains(@onclick, 'analyticsLog')]")
while page.has_selector?("//td[contains(@class, 'pages')]//a[img/@alt='Next Page']")
retryable(:tries => 1, :on => OpenURI::HTTPError) do
find.first("//td[contains(@class, 'pages')]//a[img/@alt='Next Page']").click
@results = all("//h5/a[contains(@onclick, 'analyticsLog')]")
@results.each do |a|
@links << a[:href]
end
@links.each do |link|
visit link
save_item
end
@excel.write "inventory.csv"
end
end
end
def save_item
data = all("//*[@id='content-wrapper']/div[2]/div/div")
data.each do |info|
@work_list[@row, 0] = info.find("//*[@id='productright']/div/div[1]/h1").text
price = info.first("//div[contains(@class, 'price font left')]")
@work_list[@row, 1] = (price.text.to_f * 1.33).round(2) if price
@work_list[@row, 2] = info.find("//*[@id='productright']/div/div[11]").text
@work_list[@row, 3] = info.find("//*[@id='tabcontent1']/div/div").text.strip
color = info.all("//dd[1]//select[contains(@name, 'options')]//*[@price='0']")
@work_list[@row, 4] = color.collect(&:text).join(', ')
size = info.all("//dd[2]//select[contains(@name, 'options')]//*[@price='0']")
@work_list[@row, 5] = size.collect(&:text).join(', ')
sku = File.basename(info.find("//*[@id='content-wrapper']/div[2]/div/div/div[1]/div[1]/a")['href'])
@work_list[@row, 6] = sku.gsub!(/\D/, "")#.join(([*('A'..'Z'),*('0'..'9')]-%w(0 1 I O)).sample(4).join)
@work_list[@row, 7] = File.basename(info.find("//*[@id='content-wrapper']/div[2]/div/div/div[1]/div[1]/a")['href'])
imagelink = info.all("//*[@rel='lightbox[rotation]']")
@work_list[@row, 8] = imagelink.map { |link| File.basename(link['href']) }.join(', ')
images = imagelink.map { |link| link['href'] }
images.each do |image|
File.open(File.basename("#{@LOCAL_DIR}/#{image}"), 'w') do |f|
f.write(open(image).read)
end
end
@row = @row + 1
end
end
end
tomtop = Tomtop.new
tomtop.go
答案 0 :(得分:0)
我认为你的问题在这里:
find.first("//td[contains(@class, 'pages')]//a[img/@alt='Next Page']").click
更具体地说:
a[img/@alt='Next Page']
我之前从未在xpath中看过这种表示法,所以我猜它是无效的。对我来说,这意味着:
查找
<a img='Next Page' alt='Next Page' />
但是,由于您匹配@
上的属性,我确定这是不正确的表示法。
修复您的选择器以符合您的需要。例如,如果您要查找<a />
下的图像,则应使用
a/img[@alt='Next Page']
我应该做的另一个建议是使用CSS选择器。它们是faster, and more readable.