与Nokogiri :: XML :: Text#文本输出混淆

时间:2013-04-14 18:32:58

标签: ruby nokogiri

我写了下面的代码:

require 'nokogiri'
require 'pp'

html = <<-END
<html>

    <head>

    <title> A Dirge </title>

    <link rel     = "schema.DC"
          href    = "http://purl.org/DC/elements/1.0/">

    <meta name    = "DC.Title"
          content = "A Dirge">

    <meta name    = "DC.Creator"
          content = "Shelley, Percy Bysshe">

    <meta name    = "DC.Type"
          content = "poem">

    <meta name    = "DC.Date"
          content = "1820">

    <meta name    = "DC.Format"
          content = "text/html">

    <meta name    = "DC.Language"
          content = "en">

    </head>

    <body><pre>

            Rough wind, that moanest loud
              Grief too sad for song;
            Wild wind, when sullen cloud
              Knells all the night long;
            Sad storm, whose tears are vain,
            Bare woods, whose branches strain,
            Deep caves and dreary main, -
              Wail, for the world's wrong!

    </pre></body>

    </html>
 END

doc = Nokogiri::HTML::DocumentFragment.parse(html)
pp doc 
doc.children.each do |ch|
    p ch.text if ch.text?
end

但它输出:

"\n\n    \n\n    "
"\n\n    "

现在我的问题是为什么<pre> .. <\pre>内的行没有打印出来?

任何人都可以帮我解决这个问题吗?

1 个答案:

答案 0 :(得分:1)

doc.children.each块输出比我更多:

"\n\n    \n\n    "
"\n\n    "
"\n\n    "
"\n\n    "
"\n\n    "
"\n\n    "
"\n\n    "
"\n\n    "
"\n\n    \n\n    "
"\n\n    \n"

这是正确的输出;这些是<html>的直接子节点的文本节点。

我不确定你想要的哪条“线”你没有看到。例如,如果您想要<pre>的内容,则可以执行

doc.xpath("pre").text

得到它。如果这不能为您解答问题,您必须澄清您的问题。