将抓取的数据输入数据库

时间:2013-04-26 01:03:28

标签: ruby-on-rails ruby web-scraping

Heyo,

所以我构建了一个工作刮刀并将文件添加到我的应用程序中。我现在正在尝试将刮刀中的信息放入我的数据库中。我试图使用find_or_create方法,但我不断收到以下错误。

 breads_scraper.rb:49:in `block in summary': uninitialized constant Scraper::Bread    (NameError)   
from /Users/Cameron/.rvm/gems/ruby-1.9.3-p392/gems/nokogiri-  1.5.9/lib/nokogiri/xml/node_set.rb:239:in `block in each'
from /Users/Cameron/.rvm/gems/ruby-1.9.3-p392/gems/nokogiri-1.5.9/lib/nokogiri/xml/node_set.rb:238:in `upto'
from /Users/Cameron/.rvm/gems/ruby-1.9.3-p392/gems/nokogiri-1.5.9/lib/nokogiri/xml/node_set.rb:238:in `each'
from breads_scraper.rb:24:in `map'
from breads_scraper.rb:24:in `summary'
from breads_scraper.rb:57:in `<class:Scraper>'
from breads_scraper.rb:9:in `<main>'

我的代码如下所示。我的理论是我错误地使用了find_or_create,或者文件不知道如何到达面包方法和控制器。

require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'uri'
require 'json'

url = Nokogiri::HTML(open("http://en.wikipedia.org/wiki/List_of_breads"))

class Scraper 

def initialize
  @url = "http://en.wikipedia.org/wiki/List_of_breads"
  @nodes = Nokogiri::HTML(open(@url))

end

def summary

  bread_data = @nodes

  breads = bread_data.css('div.mw-content-ltr table.wikitable tr') 
     bread_data.search('sup').remove

    bread_hashes = breads.map {|x| 

      if content = x.css('td')[0]
        name = content.text
      end
       if content = x.css('td a.image').map {|link| link ['href']}
        image =content[0]
      end
      if content = x.css('td')[2]
        type = content.text
      end
       if content = x.css('td')[3]
        country = content.text
      end
       if content = x.css('td')[4]
        description =content.text
      end

   {
      :name => name,
      :image => image,
      :type => type,
      :country => country,
      :description => description,
    }
    Bread.find_or_create(:title => name, :description => description, :image_url => image, :country_origin => country, :type => type)

        }

   end


bready = Scraper.new
bready.summary
puts "atta boy"
end

谢谢!

2 个答案:

答案 0 :(得分:2)

从rake任务调用scraper。

<强> LIB /任务/ scraper.rake

  namespace :app do
    desc "Scrape breads"
    task :scrape_breads => :environment do
      Scraper.new.summary
    end
  end

现在,您可以按如下方式运行rake任务:

rake app:scrape_breads

答案 1 :(得分:0)

看起来没有加载Bread类。