机械化不提交表格

时间:2013-07-24 14:12:43

标签: ruby web-scraping mechanize

我正在尝试通过mechanize登录网站,我填写所有表单,然后提交,但每次尝试时,我都会在同一页面上,所以要么我正在重定向回(没有错误消息),或表单没有提交。那是为什么?

代码

require 'mechanize'
class Scraper

  def initialize
    @a = Mechanize.new { |agent|
      agent.follow_meta_refresh = true
    }
  end

  def login


    @a.get("https://login.salesforce.com/") do |page|
      red = page.forms.first do |form|
        form.field_with(:type => "email").value = username
        form.field_with(:type => "password").value  =  password
      end.click_button
      pp red
      #puts main.title
    end
  end
end
s = Scraper.new
s.login

初始页面:

   #<Mechanize::Page
   {url #<URI::HTTPS:0x007f8f39d6fb30 URL:https://login.salesforce.com/>}
   {meta_refresh}
   {title "salesforce.com - Customer Secure Login Page"}
   {iframes
    #<Mechanize::Page::Frame
     "marketing"
     "https://www.salesforce.com/login-messages/messages.html?noroundedcorner">}
   {frames}
   {links
    #<Mechanize::Page::Link "Salesforce" "http://www.salesforce.com">
    #<Mechanize::Page::Link
     "Forgot your password?"
     "/secur/forgotpassword.jsp?locale=us">
    #<Mechanize::Page::Link
     "Sign up for free."
     "https://www.salesforce.com/form/trial/freetrial.jsp?d=70130000000Enus">}
   {forms
    #<Mechanize::Form
     {name "login"}
     {method "POST"}
     {action "https://login.salesforce.com/"}
     {fields
      [hidden:0x3fc79cec63ac type: hidden name: un value: ]
      [hidden:0x3fc79cec6244 type: hidden name: width value: ]
      [hidden:0x3fc79cec60c8 type: hidden name: height value: ]
      [hidden:0x3fc79cec5efc type: hidden name: hasRememberUn value: true]
      [hidden:0x3fc79cec5d58 type: hidden name: startURL value: ]
      [hidden:0x3fc79cec5bc8 type: hidden name: loginURL value: ]
      [hidden:0x3fc79cec5a38 type: hidden name: loginType value: ]
      [hidden:0x3fc79cec987c type: hidden name: useSecure value: true]
      [hidden:0x3fc79cec969c type: hidden name: local value: ]
      [hidden:0x3fc79cec9520 type: hidden name: lt value: standard]
      [hidden:0x3fc79cec9340 type: hidden name: qs value: ]
      [hidden:0x3fc79cec9174 type: hidden name: locale value: ]
      [hidden:0x3fc79cec8f80 type: hidden name: oauth_token value: ]
      [hidden:0x3fc79cec8db4 type: hidden name: oauth_callback value: ]
      [hidden:0x3fc79cec8be8 type: hidden name: login value: ]
      [hidden:0x3fc79cec89cc type: hidden name: serverid value: ]
      [hidden:0x3fc79cec8814 type: hidden name: display value: page]
      [field:0x3fc79cec8670 type: email name: username value: ]
      [field:0x3fc79cec84e0 type: password name: pw value: ]}
     {radiobuttons}
     {checkboxes
      [checkbox:0x3fc79cec833c type: checkbox name: rememberUn value: ]}
     {file_uploads}
     {buttons [button:0x3fc79cecac2c type:  name: Login value: ]}>}>

最后一页:

#<Mechanize::Page
{url #<URI::HTTPS:0x007f9d1d250960 URL:https://login.salesforce.com/>}


{meta_refresh}
   {title "salesforce.com - Customer Secure Login Page"}
   {iframes
    #<Mechanize::Page::Frame
     "marketing"
     "https://www.salesforce.com/login-messages/messages.html?    r=https%3A%2F%2Flogin.salesforce.com%2F&noroundedcorner">}
   {frames}
   {links
    #<Mechanize::Page::Link "Salesforce" "http://www.salesforce.com">
    #<Mechanize::Page::Link
     "Forgot your password?"
     "/secur/forgotpassword.jsp?locale=us">
    #<Mechanize::Page::Link
     "Sign up for free."
     "https://www.salesforce.com/form/trial/freetrial.jsp?d=70130000000Enus">}
   {forms
    #<Mechanize::Form
     {name "login"}
     {method "POST"}
     {action "https://login.salesforce.com/"}
     {fields
      [hidden:0x3fce8e93aaac type: hidden name: un value: ]
      [hidden:0x3fce8e93a8a4 type: hidden name: width value: ]
      [hidden:0x3fce8e93a638 type: hidden name: height value: ]
      [hidden:0x3fce8e93a390 type: hidden name: hasRememberUn value: true]
      [hidden:0x3fce8e93a19c type: hidden name: startURL value: null]
      [hidden:0x3fce8e939f58 type: hidden name: loginURL value: null]
      [hidden:0x3fce8e939cc4 type: hidden name: loginType value: ]
      [hidden:0x3fce8e9399a4 type: hidden name: useSecure value: true]
      [hidden:0x3fce8e93979c type: hidden name: local value: ]
      [hidden:0x3fce8e939648 type: hidden name: lt value: standard]
      [hidden:0x3fce8e93d414 type: hidden name: qs value:r=https%3A%2F%2Flogin.salesforce.com%2F]
      [hidden:0x3fce8e93d284 type: hidden name: locale value: ]
      [hidden:0x3fce8e93d0cc type: hidden name: oauth_token value: ]
      [hidden:0x3fce8e93cf50 type: hidden name: oauth_callback value: ]
      [hidden:0x3fce8e93cd98 type: hidden name: login value: ]
      [hidden:0x3fce8e93cc44 type: hidden name: serverid value: ]
      [hidden:0x3fce8e93cab4 type: hidden name: display value: page]
      [field:0x3fce8e93c780 type: email name: username value: ]
      [field:0x3fce8e93c4c4 type: password name: pw value: ]}
     {radiobuttons}
     {checkboxes
      [checkbox:0x3fce8e93c334 type: checkbox name: rememberUn value: ]}
     {file_uploads}
     {buttons [button:0x3fce8e93b9d4 type:  name: Login value: ]}>}>

我的代码出了什么问题?

2 个答案:

答案 0 :(得分:1)

该网站使用Javascript来处理登录,机械化无法处理。您可以使用Selenium之类的内容访问该网站。

答案 1 :(得分:0)

在表单上有隐藏字段unwidthheight,除了包含{0}的username字段外,还需要包含用户名和一些数字用户名。

  [hidden:0x3fce8e93aaac type: hidden name: un value: ]
  [hidden:0x3fce8e93a8a4 type: hidden name: width value: ]
  [hidden:0x3fce8e93a638 type: hidden name: height value: ]

您可以使用“网络”标签下的Chrome检查器监控实际发送到服务器的请求内容(启用“保留日志”选项),然后通过javascript修改。