Question

from bs4 import BeautifulSoup
import urllib2
page = urllib2.urlopen("http://www.@@@@@@.com/@@/")
soup = BeautifulSoup(page)
for link in soup.findAll('a'):
    if link['href'].startswith('http://'):
        print(link)

我正在使用这些代码，通过解析href标签的脚本，但在尝试使用iframe时，他们无法提供输出。我不知道那里发生了什么。有人建议我PLZ ......

Answer 1

如何使用iframe和src以及请求它们更好urllib2

from bs4 import BeautifulSoup
#import urllib2
import requests
#page = urllib2.urlopen("http://www.@@@@@@.com/@@/")
headers = {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64)AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36'}
page1 = requests.get(url,headers=headers)
page = page1.text
soup = BeautifulSoup(page,'lxml')
link = soup.find_all({'iframe':'src'})
link_clean = re.compile('src="(.+?)"').findall(str(z))
for item in link_clean:
    print item

Answer 2

哦，所以你想在页面上找到所有的iframe？除了你应该在iframe中使用src属性外，一切看起来都不错。如果这没有帮助，请提供示例页面。

如何使用python beautifulsoup4过滤iframe标签？

2 个答案: