如何获得这组TR标签?

时间:2013-03-23 01:10:36

标签: ruby nokogiri

这是指向实际HTML的链接:

doc = Nokogiri::HTML(open('https://www.google.com/finance?q=NYSE%3AAA&fstype=iii'))

一个近似的HTML代码段:

<div id="incannualdiv">
  <table id="fs-table">
    <tbody>
      <tr>..</tr>
      ...
      <tr>
        <td>Net Income</td>
        <td>100</td>
      </tr>
      <tr>..</tr>
    </tbody>
  </table>
</div>

这些是我正在使用的细节:

doc = Nokogiri::HTML(open('https://www.google.com/finance?q=NYSE%3AAA&fstype=iii'))
div = doc.at "div[@id='incannualdiv']" #div containing the table i want
table = div.at 'table' #table containing tbody i want
tbody = table.at 'tbody' #tbody containing tr's I want
trs = tbody.at 'tr' #SHOULD be all tr's of that table/tbody - but it's only the first TR?

我希望最后一点能够给我所有<TR>个标签,其中包含我正在寻找的<TD>标签,但事实上它只给我第一个<TR>

这是下一篇文章:

irb(main):023:0> a = nil
=> nil
irb(main):024:0> doc.css('#incannualdiv > #fs-table tr').each { |e| if e.text.include? "Net Income\n"; a = e.text; end}
=> 0
irb(main):025:0> a
=> "Net Income\n\n191.00\n611.00\n254.00\n-1,151.00\n"
irb(main):026:0> a.split"\n"
=> ["Net Income", "", "191.00", "611.00", "254.00", "-1,151.00"]

1 个答案:

答案 0 :(得分:2)

这应该这样做。

doc.css('#incannualdiv > #fs-table tr')

at_css(我猜at)会返回css返回所有匹配项的一个元素。

编辑:回答OP的评论

您可以使用tr方法<{1}}获取td中的文字,该文字实际上是其子text中的文字

trs = doc.css('#incannualdiv > #fs-table tr')
# Get column labels from table headers
labels = trs.first.css('th')[1..-1].map(&:text)
net_income_tr = trs.detect { |tr|
  tr.css('td').any? { |td| td.text.strip =~ /^Net Income$/ }
}
# drop first tr that just has Net Income text
# I gave up on error checking. Exercise left to reader ;)
# this will give you an array of floats for your Net Income columns
net_income_columns = net_income_tr.css('td')[1..-1].map { |td| td.text.gsub(',','_').to_f }

labeled_values = net_income_columns.each_with_index.map { |value, i| { label: labels[i].strip, value: value } }