Sidekiq工作似乎没有解析

时间:2016-05-25 13:57:02

标签: ruby-on-rails ruby xml nokogiri sidekiq

我试图模仿以前的开发人员在我的Rails应用程序中解析XML文件所做的事情并且卡住了。从我所知道的,我的工作完成,但没有任何内容被发布,所以我猜测我的解析文件是不正确的(但是,当我在localhost上使用原始文件进行测试时,它可以正常工作)。那么,我在哪里错了?

这是Sidekiq日志输出,只是为了确认工作正在进行并且在处理过程中没有显示任何错误:

2016-05-25T13:51:04.499Z 8977 TID-oxs3s9lng ParseTestData JID-2a01971539c887cac3bf3374:1 INFO: start
2016-05-25T13:51:04.781Z 8977 TID-oxs3s9l3g GenerateNotifications JID-2a01971539c887cac3bf3374:2 INFO: start
2016-05-25T13:51:04.797Z 8977 TID-oxs3s9lng ParseTestData JID-2a01971539c887cac3bf3374:1 INFO: done: 0.297 sec
2016-05-25T13:51:04.824Z 8977 TID-oxs3s9l3g GenerateNotifications JID-2a01971539c887cac3bf3374:2 INFO: done: 0.043 sec

这是我的Sidekiq作业文件,它遍历通过我的API提交的压缩文件。我正在处理的相关文件是nmap_poodle_scan.xml

class ParseTestData
  include Sidekiq::Worker

  # Order matters. Parse network hosts first to ensure we uniquely identify network hosts by their mac address.
  PARSERS = {
    "network_hosts.xml" => Parsers::NetworkHostParser,
    "nmap_tcp_service_scan.xml" => Parsers::TcpServiceScanParser,
    "nmap_shellshock_scan.xml" => Parsers::ShellshockScanParser,
    "hydra.out" => Parsers::HydraParser,
    "events.log" => Parsers::EventParser,
    "nmap_poodle_scan.xml" => Parsers::PoodleScanParser
  }

  def perform(test_id)
    test = Test.find(test_id)

    gzip = if Rails.env.development?
      Zlib::GzipReader.open(test.data.path)
    else
      file = Net::HTTP.get(URI.parse(test.data.url))
      Zlib::GzipReader.new(StringIO.new(file))
    end

    # Collect entries from tarball
    entries = {}
    tar_extract = Gem::Package::TarReader.new(gzip)
    tar_extract.rewind
    tar_extract.each do |entry|
      entries[File.basename(entry.full_name)] = entry.read
    end

    # Preserve parse order by using the parser hash to initiate parser executions.
    PARSERS.each_pair do |filename, parser|
      next unless entry = entries[filename]
      parser.run!(test, entry)
    end
  end
end

抓取nmap_poodle_scan.xml:

<host starttime="1464180941" endtime="1464180941"><status state="up" reason="arp-response" reason_ttl="0"/>
<address addr="10.10.10.1" addrtype="ipv4"/>
<address addr="4C:E6:76:3F:2F:77" addrtype="mac" vendor="Buffalo.inc"/>
<hostnames>
<hostname name="DD-WRT" type="PTR"/>
</hostnames>
Nmap scan report for DD-WRT (10.10.10.1)
<ports><extraports state="closed" count="996">
<extrareasons reason="resets" count="996"/>
</extraports>
<table key="CVE-2014-3566">
<elem key="title">SSL POODLE information leak</elem>
<elem key="state">VULNERABLE</elem>
<table key="ids">
<elem>OSVDB:113251</elem>
<elem>CVE:CVE-2014-3566</elem>
</table>
<table key="description">
<elem>    The SSL protocol 3.0, as used in OpenSSL through 1.0.1i and&#xa;    other products, uses nondeterministic CBC padding, which makes it easier&#xa;    for man-in-the-middle attackers to obtain cleartext data via a&#xa;    padding-oracle attack, aka the &quot;POODLE&quot; issue.</elem>
</table>
<table key="dates">
<table key="disclosure">
<elem key="year">2014</elem>
<elem key="month">10</elem>
<elem key="day">14</elem>
</table>
</table>
<elem key="disclosure">2014-10-14</elem>
<table key="check_results">
<elem>TLS_RSA_WITH_3DES_EDE_CBC_SHA</elem>
</table>
<table key="refs">
<elem>https://www.imperialviolet.org/2014/10/14/poodle.html</elem>
<elem>http://osvdb.org/113251</elem>
<elem>https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2014-3566</elem>
<elem>https://www.openssl.org/~bodo/ssl-poodle.pdf</elem>
</table>
</table>
</script></port>
</ports>
<times srtt="4665" rttvar="556" to="100000"/>
</host>

哪个应提交给PoodleScanParser:

module Parsers
  class PoodleScanParser < NmapScanParser
    def self.run!(test, content)    
      super(test, content, "//host//ports[.//elem[@key='state'][contains(text(), 'VULNERABLE')]]") do |host, network_host_test|
        logger.info "Something cool"
        IssueFinder.match(cve_id: "CVE-2014-3566").each do |issue|
          Result.generate!(network_host_test.id, issue.id)
        end
      end
    end
  end
end

哪个继承自NmapScanParser。这个文件是解析器被确认工作正常,所以我知道它不是问题:

module Parsers
  class NmapScanParser

    def self.run!(test, content, xpath)
      document = Nokogiri::XML(content)
      document.remove_namespaces!

      document.xpath(xpath).each do |host|
        ip_address = host.at_xpath("address[@addrtype='ipv4']").at_xpath("@addr").value
        vendor = host.at_xpath("address[@addrtype='mac']").at_xpath("@vendor").value rescue "Unknown"
        hostname = host.at_xpath("hostnames/hostname").at_xpath("@name").value rescue "Hostname Unknown"
        os = host.at_xpath("os/osmatch").at_xpath("@name").value rescue "Unknown"
        os_vendor = host.at_xpath("os/osmatch/osclass").at_xpath("@vendor").value rescue "Unknown"

        network_host_test = NetworkHostTest.generate!(test, ip_address: ip_address, hostname: hostname, vendor: vendor, os: os, os_vendor: os_vendor)

        # If we didn't find a network host, that's because our network_hosts file didn't have this entry.
        next unless network_host_test

        yield(host, network_host_test)
      end
    end

  end
end

我已经使用普通的ruby文件确认解析器在我的localhost上使用与上面相同的原始输出,然后运行ruby poodle_parser.rb

require 'nokogiri'

document = Nokogiri::XML(File.open("poodle_results.xml"))
document.remove_namespaces!

document.xpath("//host[.//elem/@key='state']").each do |host|
  ip_address = host.at_xpath("address[@addrtype='ipv4']").at_xpath("@addr").value
  result =  host.at_xpath("//ports//elem[@key='state']").content
  puts "#{ip_address} #{result}"
end

哪个输出了我期望的终端:

10.10.10.1 VULNERABLE

所以,最后,我希望生成Result,但事实并非如此。我在本地主机上的Rails日志中没有看到任何错误,也没有看到任何表明Sidekiq日志中的错误的错误!

我决定在logger.info添加PoodleScanParser行,以查看Parser是否正常运行。假设我正确地做到了这一点,Parser看起来并不像它在运行。

1 个答案:

答案 0 :(得分:0)

嗯,答案与Sidekiq无关,而是Nokogiri正在濒临死亡的输出。事实证明,Nmap在XML文件“Starting Nmap 7.12”的开头添加了一条非XML行。所以,Nokogiri只是在那里死去。

我想这个故事的道德是确保你的XML输出是你Nokogiri想要的那样!