在后台运行时属性不会更新,但在内联

时间:2016-04-28 00:38:28

标签: ruby-on-rails redis mechanize sidekiq

我正在编写一个屏幕抓取工具,它会从帖子中获取网址列表,然后访问网址并获取网页上所有链接的列表。然后它访问所有链接(原始和来自刮擦)并获取图像列表。当我运行内联作业时,一切正常(除了需要30秒才能完成,这是一个问题,因为它需要永远响应API调用)。出于某种原因,当我使用相同的代码并使用后台工作程序来运行它时,有2个URL永远不会更新为完成。它总是相同的2个网址。

更奇怪的是我收到错误消息

3 TID-ov9t89ido WARN: NoMethodError: undefined method `search' for #<Mechanize::File:0x007f9d86e77a40>

3 TID-ov9t89ido警告:/app/app/models/scraper.rb:16:in scrape_images' /app/app/workers/image_worker.rb:5:in执行' /app/vendor/bundle/ruby/2.2.0/gems/sidekiq-4.1.1/lib/sidekiq/processor.rb:151:in execute_job' /app/vendor/bundle/ruby/2.2.0/gems/sidekiq-4.1.1/lib/sidekiq/processor.rb:133:in阻止(2级)进程中' /app/vendor/bundle/ruby/2.2.0/gems/sidekiq-4.1.1/lib/sidekiq/middleware/chain.rb:127:in block in invoke' /app/vendor/bundle/ruby/2.2.0/gems/newrelic_rpm-3.12.1.298/lib/new_relic/agent/instrumentation/sidekiq.rb:33:in阻止通话' /app/vendor/bundle/ruby/2.2.0/gems/newrelic_rpm-3.12.1.298/lib/new_relic/agent/instrumentation/controller_instrumentation.rb:361:in perform_action_with_newrelic_trace' /app/vendor/bundle/ruby/2.2.0/gems/newrelic_rpm-3.12.1.298/lib/new_relic/agent/instrumentation/sidekiq.rb:29:in call' /app/vendor/bundle/ruby/2.2.0/gems/sidekiq-4.1.1/lib/sidekiq/middleware/chain.rb:129:in block in invoke' /app/vendor/bundle/ruby/2.2.0/gems/sidekiq-4.1.1/lib/sidekiq/middleware/server/active_record.rb:6:in call' /app/vendor/bundle/ruby/2.2.0/gems/sidekiq-4.1.1/lib/sidekiq/middleware/chain.rb:129:in block in invoke' /app/vendor/bundle/ruby/2.2.0/gems/sidekiq-4.1.1/lib/sidekiq/middleware/server/retry_jobs.rb:74:in call' /app/vendor/bundle/ruby/2.2.0/gems/sidekiq-4.1.1/lib/sidekiq/middleware/chain.rb:129:in block in invoke' /app/vendor/bundle/ruby/2.2.0/gems/sidekiq-4.1.1/lib/sidekiq/middleware/server/logging.rb:11:in阻止通话' /app/vendor/bundle/ruby/2.2.0/gems/sidekiq-4.1.1/lib/sidekiq/logging.rb:31:in with_context' /app/vendor/bundle/ruby/2.2.0/gems/sidekiq-4.1.1/lib/sidekiq/middleware/server/logging.rb:7:in call' /app/vendor/bundle/ruby/2.2.0/gems/sidekiq-4.1.1/lib/sidekiq/middleware/chain.rb:129:in block in invoke' /app/vendor/bundle/ruby/2.2.0/gems/sidekiq-4.1.1/lib/sidekiq/middleware/chain.rb:132:in call' /app/vendor/bundle/ruby/2.2.0/gems/sidekiq-4.1.1/lib/sidekiq/middleware/chain.rb:132:in invoke' /app/vendor/bundle/ruby/2.2.0/gems/sidekiq-4.1.1/lib/sidekiq/processor.rb:128:in阻止进程' /app/vendor/bundle/ruby/2.2.0/gems/sidekiq-4.1.1/lib/sidekiq/processor.rb:167:in stats' /app/vendor/bundle/ruby/2.2.0/gems/sidekiq-4.1.1/lib/sidekiq/processor.rb:127:in进程' /app/vendor/bundle/ruby/2.2.0/gems/sidekiq-4.1.1/lib/sidekiq/processor.rb:79:in process_one' /app/vendor/bundle/ruby/2.2.0/gems/sidekiq-4.1.1/lib/sidekiq/processor.rb:67:in run' /app/vendor/bundle/ruby/2.2.0/gems/sidekiq-4.1.1/lib/sidekiq/util.rb:16:in watchdog' /app/vendor/bundle/ruby/2.2.0/gems/sidekiq-4.1.1/lib/sidekiq/util.rb:24:in block in safe_thread'

这是来自这段代码:

 def self.scrape_images(uri)
    page = get_page(uri)
    base_url = page.uri.to_s
    images = page.search('//img') || []
    qualify_images(uri, images).push(base_url)
  end

我看到Mechanize不是线程安全的,我认为可能是我的问题,但我不知道当它适用于其他一切时,这会给我这个错误。任何帮助都会是光荣的,感谢阅读。

1 个答案:

答案 0 :(得分:0)

我正在添加答案,因为我在搜索时没有在SO上找到答案。如果Mechanize访问内容类型为.txt的页面,则它不返回Page对象,而是返回File对象。在我的案例中,我用一个保护条款解决了它:

return [] unless page.class == Mechanize::Page
相关问题