Question

作为我网页的一部分，我需要使用Open-URI来获取网页的来源。出于某种原因，每当我尝试抓住http://learning.blogs.nytimes.com/2010/08/23/teaching-with-infographics-places-to-start/?_php=true&_type=blogs&_php=true&_type=blogs&_r=1处找到的网页的源代码时，我得到一个NoMethodError，指出“未定义的方法`+'为nil：NilClass”。我不确定是什么导致了这个问题。从我的网络浏览器访问时，网页似乎加载正常。这是一个可以在控制台中运行的片段，用于重新创建此错误。

require 'open-uri'
open("http://learning.blogs.nytimes.com/2010/08/23/teaching-with-infographics-places-to-start/?_php=true&_type=blogs&_php=true&_type=blogs&_r=1")

提前致谢！

修改：如果有人感兴趣，这是完整的错误消息。

NoMethodError: undefined method `+' for nil:NilClass
    from /usr/lib64/ruby/2.1.0/net/http.rb:1530:in `addr_port'
    from /usr/lib64/ruby/2.1.0/net/http.rb:1463:in `begin_transport'
    from /usr/lib64/ruby/2.1.0/net/http.rb:1405:in `transport_request'
    from /usr/lib64/ruby/2.1.0/net/http.rb:1379:in `request'
    from /usr/lib64/ruby/2.1.0/open-uri.rb:343:in `block in open_http'
    from /usr/lib64/ruby/2.1.0/net/http.rb:854:in `start'
    from /usr/lib64/ruby/2.1.0/open-uri.rb:336:in `open_http'
    from /usr/lib64/ruby/2.1.0/open-uri.rb:751:in `buffer_open'
    from /usr/lib64/ruby/2.1.0/open-uri.rb:214:in `block in open_loop'
    from /usr/lib64/ruby/2.1.0/open-uri.rb:211:in `catch'
    from /usr/lib64/ruby/2.1.0/open-uri.rb:211:in `open_loop'
    from /usr/lib64/ruby/2.1.0/open-uri.rb:152:in `open_uri'
    from /usr/lib64/ruby/2.1.0/open-uri.rb:731:in `open'
    from /usr/lib64/ruby/2.1.0/open-uri.rb:34:in `open'
    from (irb):2
    from /usr/bin/irb:11:in `<main>'

到目前为止，我已经开始查看上面列出的文件的源代码无效。

Answer 1

这不是您的代码的问题;相反，它是New York Times paywall弄乱你一天的情况。您获得的错误是标准库中的追溯完全（请参阅所有路径如何开始/usr/lib64？），这是一个强有力的指标，它不是你的代码有问题。有时，当您错误地使用库时，您会收到这样的错误，但您已经确定您的代码适用于其他网址。那么我们怎样才能弄清楚发生了什么？

Ruby的open-uri模块是wrapper around the net/http module。我们可以直接使用net/http模块了解更多信息：

require 'net/http'
uri = URI("http://learning.blogs.nytimes.com/2010/08/23/teaching-with-infographics-places-to-start/?_php=true&_type=blogs&_php=true&_type=blogs&_r=1")
response = Net::HTTP.get_response(uri)
p response # #<Net::HTTPSeeOther 303 See Other readbody=true>
p response['location'] # "http://www.nytimes.com/glogin?URI=http://learning.blogs.nytimes.com/2010/08/23/teaching-with-infographics-places-to-start/&OQ=_phpQ3DtrueQ26_typeQ3DblogsQ26_phpQ3DtrueQ26_typeQ3DblogsQ26_phpQ3DtrueQ26_typeQ3DblogsQ26_rQ3D2Q26&OP=e8954d71Q2FgyQ2BvgdMvgQ27Q27Q27gEQ2BQ2BQ51JQ23yiuPQ2BUQ2B"

从Ruby检索时，该URL以303 See Other响应，并尝试将我们重定向到登录页面。这不是直接与付费专区有关，但它是一个类似的主题：“纽约时报”保护其内容，宁愿人们也没有使用计算机来阅读。

有时，您可以欺骗网站通过spoofing the user agent向您提供内容，但似乎NYT对此很明智。我无法让网站向我发送除响应之外的任何内容，但如果你坚持不懈，你可能会找到一种方法。

但如果这个网页对你的应用并不重要，而你只是想阻止它崩溃，那我就写下这样的话：

require 'net/http'
uri = URI("http://learning.blogs.nytimes.com/2010/08/23/teaching-with-infographics-places-to-start/?_php=true&_type=blogs&_php=true&_type=blogs&_r=1")
response = Net::HTTP.get_response(uri)

if response.body.empty?
  # Show the user an error message
else
  # Process the contents of the webpage here, accessed via response.body
end

尝试使用open检索网页的HTML源时接收NoMethodError

1 个答案: