Question

我目前正在抓取这样格式化的页面：

<div id="container>
   <script>Script that cause iframe contents to load correctly</script>
   <iframe>Contents of iFrame</iframe>
   <script>More scripts</script>
</div>

我可以轻松地抓取页面，但这不会抓取iframe的内容，因此我使用以下命令切换了框架：

driver.switch_to.frame(iframeElement)

这使我可以获取iframe内容。现在，这使我想到了一个问题：如何获取容器div，然后将已刮除的iframe的内容插入到已刮除的div中。页面设置的方式是，在iframe之前有动态脚本，这些脚本可以使iframe的内容正常工作，这就是为什么我需要将iframe的内容嵌入到已抓取的div中。

下面的相关Python：

driver.get(url)
iframeElement = driver.find_element_by_tag_name('iframe')
driver.switch_to.frame(iframeElement)
time.sleep(3) #Wait for the contents to generate
# driver.switch_to_default_content() #Commented out, but I know to use this to exit out of the iframe

html = driver.page_source
soup=BeautifulSoup(html, features="lxml")
print(soup)
print(soup.find("div", {"id": "Container"})) #Let's see the HTML of the container
soupStr=str(soup)
Con = str(soup.find("div", {"id": "Container"})) #Create a variable with JUST the container HTML

with open('iframeWithinDiv.html', 'w', encoding='utf-8') as f_out: #Save the file
    f_out.write(soupStr)```

Answer 1

您可以使用execute_script和一些jquery将其附加到以下div（可以使用纯JS）：）

html = driver.page_source
soup=BeautifulSoup(html, features="lxml")
print(soup)
print(soup.find("div", {"id": "Container"})) #Let's see the HTML of the container
soupStr=str(soup)
Con = str(soup.find("div", {"id": "Container"}))1

#### Append your variable to the given string within wrap ###

driver.execute_script("$('#container').val('newhtmlcontent')")

如何使用Selenium和BeautifulSoup在div中抓取div和iframe的内容？

1 个答案: