Question

我是Python的新手，我正在尝试使用BeautifulSoup来解析HTML页面并提取一些内容。我遇到的问题是我需要解析的URL是动态的，因此我不能将它硬编码到urllib2.urlopen中，就像BeautifulSoup show的所有示例一样。

我试图使用SELF从浏览器中提取当前的URL，但我无法使其工作。任何人都可以发布一个示例，说明如何使用SELF从浏览器中提取当前URL，或者如何将BeautifulSoup附加到当前URL？

非常感谢任何帮助。

到目前为止，这是我的代码：

import os
import time

import win32api
import win32com.client
import win32con

from pywinauto import application

class A(object):
  def __init__(self):
    self.x = self.request.url

  def method_a(self):
    print self.x

#start IE with a start URL of what was passed in
app = application.Application()
app.Start(r"c:\program files\internet explorer\iexplore.exe %s"% "http://www.cyclestreets.net/journey")
time.sleep(3)
#ie = app.window_(title_re = "CycleStreets Cycle journey planner")
ie = app.window_(title_re = ".*CycleStreets.*")

a = A()
a.method_a()

当我运行这个时，我得到一条消息说AttributeError：'A'对象没有属性'request'

Answer 1

认为你有点困惑。在你的班级'A'中你有这个：

class A(object):
  def __init__(self):
    self.x = self.request.url

在init函数中将x的值设置为self.request.url。这就是抱怨，因为此时你的对象中不存在self.request。

Answer 2

您可以使用urllib获取当前网址参见下面的示例：

from urllib import request,response
url = "http://www.example.com"
response=request.Request(url,headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36'})
print(response.get_full_url())

这可能对您有帮助！....

如何在Python中获取当前URL或将BeautifulSoup附加到当前URL

2 个答案: