Nokogiri检查价值是否存在

时间:2014-12-23 01:03:03

标签: ruby-on-rails ruby

我正在抓一些网页内容,我收到以下错误,

scrape.rb:27:in block in <main>': undefined method text&#39; for nil:NilClass(NoMethodError)

运行我的ruby任务时,由于css中不包含任何内容。

有没有办法检查CSS是否未定义,以便它不会停止爬行?我的代码不起作用:(

products.each do |product|

     web = Nokogiri::HTML(open(product))

      counter = products.index(product)

      if web.at_css('.entry-title').text != undefined
      puts "CSS content is not undefined"
      else
      puts "Error"
      end

2 个答案:

答案 0 :(得分:3)

您可以在调用文本

之前 IF 对象结果
result = web.at_css('.entry-title')
if result
  puts "CSS content is not undefined"
  puts result.text
else
  puts "Error"
end

答案 1 :(得分:0)

我同意at_css&amp; IF是测试类存在的最佳解决方案。这是我掀起的一个例子..

user_agents = ["Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:43.0) Gecko/20100101 Firefox/43.0",
            "Mozilla/5.0 (compatible; Konqueror/3; Linux)",             
            "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624",
            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:43.0) Gecko/20100101 Firefox/43.0",
            "Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.4a) Gecko/20030401",
            "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_2; de-at) AppleWebKit/531.21.8 (KHTML, like Gecko) Version/4.0.4 Safari/531.21.10",
            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9",
            "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.125 Safari/537.36",
            "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)",
            "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",
            "Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",
            "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)",
            "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/6.0)",
            "Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko", 
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2486.0 Safari/537.36 Edge/13.10586",
            "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4b) Gecko/20030516 Mozilla Firebird/0.6",
            "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0",
            "Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B5110e Safari/601.1",
            "Mozilla/5.0 (iPad; CPU OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1",
            "Mozilla/5.0 (Linux; Android 5.1.1; Nexus 7 Build/LMY47V) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.76 Safari/537.36"]
user_agent = user_agents.sample

good_2_go = "https://gomovies.to/genre/action/1"
my_bad = "https://gomovies.to/genre/action/100"

crawls = []
crawls.push(good_2_go, my_bad)

crawls.each do |crawl|
  doc = Nokogiri::HTML(open(crawl, 'User-Agent' => user_agent).read, nil, 'utf-8')

  entries = doc.at_css('.ml-item')

  if entries
      puts crawl
      puts "Found entries class, proceeding with scrape.."
  else
      puts crawl
      puts "Could not find base class for entries"
  end
end

这将是STDOUT ......

=> https://gomovies.to/genre/action/1
   Found entries class, proceeding with scrape..
   https://gomovies.to/genre/action/100
   Could not find base class for entries