我刚开始学习rails。你能帮我理解解析一个链接吗?好的教程也会有所帮助...
问题:
当您在Digg,Facebook等中提交链接时。在您说附加链接后,它会解析链接以获取特定网址的标题,内容和图像。你能帮我解决一下类似的东西如何在rails中实现吗?
我已经查看了feedzirra等提要解析器,但他们似乎得到了完整的网站提要..不仅仅是我们正在寻找的链接..还是我在某处犯了错误?
非常感谢。
答案 0 :(得分:6)
看起来你可能正在寻找像Pismo这样的东西:https://github.com/peterc/pismo
require 'pismo'
# Load a Web page (you could pass an IO object or a string with existing HTML data along, as you prefer)
doc = Pismo::Document.new('http://www.rubyinside.com/cramp-asychronous-event-driven-ruby-web-app-framework-2928.html')
doc.title # => "Cramp: Asychronous Event-Driven Ruby Web App Framework"
doc.author # => "Peter Cooper"
doc.lede # => "Cramp (GitHub repo) is a new, asynchronous evented Web app framework by Pratik Naik of 37signals (and the Rails core team). It's built around Ruby's EventMachine library and was designed to use event-driven I/O throughout - making it ideal for situations where you need to handle a large number of open connections (such as Comet systems or streaming APIs.)"
doc.keywords # => [["cramp", 7], ["controllers", 3], ["app", 3], ["basic", 2], ..., ... ]
图像警告是:
图像提取仅处理具有绝对URL的图像
答案 1 :(得分:4)
ootoovak的回答是正确的,但我更喜欢使用mechanize
作为替代方案。使用mechanize这对你有用:
agent=Mechanize.new # Creates a new Mechanize Object
agent.get("http://domain.de/page.html") # This fetches the page given as parameter
agent.page.title # This will return the title of the page
要安装机械化,只需将gem 'mechanize'
添加到Gemfile
并运行bundle install
。
答案 2 :(得分:2)
> Mechanize.new.get('http://google.com').title
=> "Google"
确保您{1}}或require 'mechanize'
添加到 Gemfile 。