站点重定向到新会话时的Python抓取

时间:2017-12-11 18:27:34

标签: python session web python-requests

我试图从需要在某个位置或登录的网站获取数据。我的问题是它似乎重定向到我无法以编程方式使用Python访问的新会话。以下是我试图访问它的方式......

payload = {
    'user' : 'myusername',
    'pass': 'mypassword'
}

session = requests.session()
r = session.post("http://apps.webofknowledge.com/WOS_CitedReferenceSearch_input.do?SID=1DtxhgpRsI16gPP7tRC&product=WOS&search_mode=CitedReferenceSearch",
                 data=payload)

print(r.text)

导致以下输出表明未正确捕获重定向...

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><!-- !DOCTYPE HTML PUBLIC "-/W3C/DTD HTML 4.01 Transitional/EN" -->
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<link rel="stylesheet" href='http://login.webofknowledge.com/error/WOK5/WoKcommon.css' type="text/css" />
<link rel="stylesheet" href='http://login.webofknowledge.com/error/WOK5/main.css' type="text/css" />
<script language="javascript" src="http://login.webofknowledge.com/error/WOK5/jquery.js"></script>
<script language="javascript" src="http://login.webofknowledge.com/error/WOK5/main.js"></script>
<title>Web of Science - Starting New Session...</title>
    <script>
      function autoredirect() {
        var s = "true";
        document.cookie = "SID=1; expires=15/02/2000 00:00:00; domain=www.webofknowledge.com";
        if (false == s)
        {
            setTimeout("this.form.submit()", null);
        }
        else
        {             
            setTimeout("top.location.href='http://www.webofknowledge.com?'", null);
        }        

      }      
    </script>
</head>

<body id="WoKerror" onload="javascript:autoredirect()">


  <form action='http://www.webofknowledge.com'>


<div class="main-container">




<div class="navBar clearfix">
  <ul class="userCabinet nav-list">
    <li class="nav-item">
      <a title="" class="nav-link" href="javascript: void(0)">English <i class="icon-arrow"></i></a>
      <ul class="subnav">




              <li class="subnav-item">
               <a class="subnav-link" title="简体中文" href="http://login.webofknowledge.com/error/Error?DestApp=WOS&SID=1DtxhgpRsI16gPP7tRC&Error=Server.sessionNotFound&locale=zh_CN"> 简体中文</a>
              </li>






              <li class="subnav-item">
               <a class="subnav-link" title="繁體中文" href="http://login.webofknowledge.com/error/Error?DestApp=WOS&SID=1DtxhgpRsI16gPP7tRC&Error=Server.sessionNotFound&locale=zh_TW"> 繁體中文</a>
              </li>





              <li class="subnav-item language-active-option">
               <a class="subnav-link" title="English" href="http://login.webofknowledge.com/error/Error?DestApp=WOS&SID=1DtxhgpRsI16gPP7tRC&Error=Server.sessionNotFound&locale=en_US"> English</a>
              </li>







              <li class="subnav-item">
               <a class="subnav-link" title="日本語" href="http://login.webofknowledge.com/error/Error?DestApp=WOS&SID=1DtxhgpRsI16gPP7tRC&Error=Server.sessionNotFound&locale=ja"> 日本語</a>
              </li>






              <li class="subnav-item">
               <a class="subnav-link" title="한국어" href="http://login.webofknowledge.com/error/Error?DestApp=WOS&SID=1DtxhgpRsI16gPP7tRC&Error=Server.sessionNotFound&locale=ko_KR"> 한국어</a>
              </li>






              <li class="subnav-item">
               <a class="subnav-link" title="Português" href="http://login.webofknowledge.com/error/Error?DestApp=WOS&SID=1DtxhgpRsI16gPP7tRC&Error=Server.sessionNotFound&locale=pt_BR"> Português</a>
              </li>






              <li class="subnav-item">
               <a class="subnav-link" title="Español" href="http://login.webofknowledge.com/error/Error?DestApp=WOS&SID=1DtxhgpRsI16gPP7tRC&Error=Server.sessionNotFound&locale=es_LA"> Español</a>
              </li>






              <li class="subnav-item">
               <a class="subnav-link" title="Pусский" href="http://login.webofknowledge.com/error/Error?DestApp=WOS&SID=1DtxhgpRsI16gPP7tRC&Error=Server.sessionNotFound&locale=ru_RU"> Pусский</a>
              </li>



      </ul>
    </li>
  </ul>
</div>  
<div class="logoBar">
  <h1 class="titleh1"><a href="http://www.webofknowledge.com/"> <span title="Web of Science">Web of Science</span> </a></h1>
  <span><img alt="Clarivate Analytics" title="Clarivate Analytics" src="http://login.webofknowledge.com/error/WOK5/images/trlogo.png" /></span>
</div>


<!-- Begin : Module Title Shell -->
<table border="0" cellpadding="0" cellspacing="0" width="100%">
  <tbody>
    <tr>
      <td class="NEWleftOuterEdge"><img src="http://login.webofknowledge.com/error/WOK5/images/spacer.gif" class="NEWleftOuterEdge" width="8"></td>
      <td class="NEWwokErrorContainer">
          <div class="NEWpageTitle"><H1>Thank you for using Web of Science</H1></div>
     </td>
      <td class="NEWrightOuterEdge"><img src="http://login.webofknowledge.com/error/WOK5/images/spacer.gif" class="NEWrightOuterEdge"></td>
    </tr>
  </tbody>
</table>
<!-- End : Module Title Shell -->

<!-- Begin : WoK Error Shell -->
<table width="100%" border="0" cellspacing="0" cellpadding="0" valign="top">
  <tr>
    <td class="NEWleftOuterEdge"><img src="http://login.webofknowledge.com/error/WOK5/images/spacer.gif" width="8" class="NEWleftOuterEdge" /></td>
    <td class="NEWwokErrorContainer SignInLeftColumn ">
       <!-- Begin : Error -->
       <h2>STARTING A NEW SESSION...</h2>
       <p>


       <p>If a new session is not started automatically in a few seconds, click <a href="http://www.webofknowledge.com?" target="_top">establish a new session</a>.


       <!-- End : Error --></td>
    <td class="NEWrightOuterEdge"><img src="http://login.webofknowledge.com/error/WOK5/images/spacer.gif" class="NEWrightOuterEdge" /></td>
  </tr>
</table>
<!-- End : WoK Error Shell -->
 </form>  




<div id="skip-to-footer" class="footer">
  <div class="footerContent">
    <ul>
      <li><span>&copy; 2017</span>&nbsp;<a id="TRcopyright" title="Clarivate Analytics" href="http://clarivate.com" name="Clarivate Analytics" target="_new"> Clarivate Analytics</a></li>
      <li><a id="TRpolicy" title="Terms of Use" href="http://wokinfo.com/terms" name="Terms of Use" target="_new"> Terms of Use</a></li>
      <li><a id="TRprivacy" title="Privacy Policy" href="http://ip-science.thomsonreuters.com/privacy" name="Privacy Policy" target="_new"> Privacy Policy</a></li>
      <li><a id="TRfeedback" title="Feedback" href="http://science.thomsonreuters.com/info/wokfeedback" name="Feedback" target="_new"> Feedback</a></li>
    </ul>
  </div>
</div>


</body></html>

我已经尝试过先发送一个帖子请求,然后在我从我的浏览器中执行相同的操作时向其重定向的网址发送一个获取请求,同时从帖子请求中传入cookie,但我得到一个空的当我调用r.cookies时,我的cookie的数组,所以我最终得到相同的HTML输出,如上所示。问题似乎是我无法通过Python重定向到网站发起的新会话。

0 个答案:

没有答案