在新标签Selenium + Python中打开网页

时间:2015-02-10 12:31:10

标签: python selenium webdriver phantomjs

所以我试图在我的WebDriver中的新标签上打开网站。我想这样做,因为使用PhantomJS为每个网站打开一个新的WebDriver需要大约3.5秒,我想要更快的速度......

我使用的是多进程python脚本,我想从每个页面获取一些元素,因此工作流程如下:

Open Browser

Loop throught my array
For element in array -> Open website in new tab -> do my business -> close it

但我无法找到任何方法来实现这一目标。

这是我正在使用的代码。网站之间需要永远,我需要快速...其他工具是允许的,但我不知道有太多工具可以删除使用JavaScript加载的网站内容(在加载时触发某些事件时创建的div) )这就是我需要Selenium的原因...... BeautifulSoup不能用于我的某些页面。

#!/usr/bin/env python
import multiprocessing, time, pika, json, traceback, logging, sys, os, itertools, urllib, urllib2, cStringIO, mysql.connector, shutil, hashlib, socket, urllib2, re
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from PIL import Image
from os import listdir
from os.path import isfile, join
from bs4 import BeautifulSoup
from pprint import pprint

def getPhantomData(parameters):
    try:
        # We create WebDriver
        browser = webdriver.Firefox()
        # Navigate to URL
        browser.get(parameters['target_url'])
        # Find all links by Selector
        links = browser.find_elements_by_css_selector(parameters['selector'])

        result = []
        for link in links:
            # Extract link attribute and append to our list
            result.append(link.get_attribute(parameters['attribute']))
        browser.close()
        browser.quit()
        return json.dumps({'data': result})
    except Exception, err:
        browser.close()
        browser.quit()
        print err

def callback(ch, method, properties, body):
    parameters = json.loads(body)
    message = getPhantomData(parameters)

    if message['data']:
        ch.basic_ack(delivery_tag=method.delivery_tag)
    else:
        ch.basic_reject(delivery_tag=method.delivery_tag, requeue=True)

def consume():
    credentials = pika.PlainCredentials('invitado', 'invitado')
    rabbit = pika.ConnectionParameters('localhost',5672,'/',credentials)
    connection = pika.BlockingConnection(rabbit)
    channel = connection.channel()

    # Conectamos al canal
    channel.queue_declare(queue='com.stuff.images', durable=True)
    channel.basic_consume(callback,queue='com.stuff.images')

    print ' [*] Waiting for messages. To exit press CTRL^C'
    try:
        channel.start_consuming()
    except KeyboardInterrupt:
        pass

workers = 5
pool = multiprocessing.Pool(processes=workers)
for i in xrange(0, workers):
    pool.apply_async(consume)

try:
    while True:
        continue
except KeyboardInterrupt:
    print ' [*] Exiting...'
    pool.terminate()
    pool.join()

14 个答案:

答案 0 :(得分:25)

您可以通过组合键 COMMAND + T COMMAND + W <组合来打开/关闭标签页/ kbd>(OSX)。在其他操作系统上,您可以使用 CONTROL + T / CONTROL + W

在硒中你可以模仿这种行为。 您将需要创建一个webdriver以及与所需测试一样多的选项卡。

这是代码。

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.Firefox()
driver.get("http://www.google.com/")

#open tab
driver.find_element_by_tag_name('body').send_keys(Keys.COMMAND + 't') 
# You can use (Keys.CONTROL + 't') on other OSs

# Load a page 
driver.get('http://stackoverflow.com/')
# Make the tests...

# close the tab
# (Keys.CONTROL + 'w') on other OSs.
driver.find_element_by_tag_name('body').send_keys(Keys.COMMAND + 'w') 


driver.close()

答案 1 :(得分:7)

  • 操作系统:Win 10,
  • Python 3.8.1
    • 硒== 3.141.0
from selenium import webdriver
import time

driver = webdriver.Firefox(executable_path=r'TO\Your\Path\geckodriver.exe')
driver.get('https://www.google.com/')

# Open a new window
driver.execute_script("window.open('');")
# Switch to the new window
driver.switch_to.window(driver.window_handles[1])
driver.get("http://stackoverflow.com")
time.sleep(3)

# Open a new window
driver.execute_script("window.open('');")
# Switch to the new window
driver.switch_to.window(driver.window_handles[2])
driver.get("https://www.reddit.com/")
time.sleep(3)
# close the active tab
driver.close()
time.sleep(3)

# Switch back to the first tab
driver.switch_to.window(driver.window_handles[0])
driver.get("https://bing.com")
time.sleep(3)

# Close the only tab, will also close the browser.
driver.close()

参考:Need Help Opening A New Tab in Selenium

答案 2 :(得分:4)

其他解决方案不适用于 chrome驱动程序v83

相反,它的工作原理如下,假设只有1个打开标签:

driver.execute_script("window.open('');")
driver.switch_to.window(driver.window_handles[1])
driver.get("https://www.example.com")

如果打开的标签已经超过1个,则应首先获取最后一个新创建的标签的索引并切换到该标签,然后再调用url(贷记为tylerl):

driver.execute_script("window.open('');")
driver.switch_to.window(len(driver.window_handles)-1)
driver.get("https://www.example.com")

答案 3 :(得分:3)

经过长时间的努力,以下方法对我有用:

driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL + 't')
driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL + Keys.TAB)

windows = driver.window_handles

time.sleep(3)
driver.switch_to.window(windows[1])

答案 4 :(得分:3)

通过 Selenium v​​3.x 通过 Python 新标签中打开网站现在变得更加容易。这是一个解决方案,您可以在初始TAB 中打开@Provider @ClientInterceptor public class MyRequestInterceptor implements AcceptedByMethod, ClientExecutionInterceptor { @Override public boolean accept(Class aClass, Method method) { return method.getName().equals("interceptedMethod"); } @Override public ClientResponse execute(ClientExecutionContext clientExecutionContext) { ClientResponse clientResponse = null; try { clientResponse = clientExecutionContext.proceed(); } catch (Exception e) { clientResponse = applyBackup(clientExecutionContext); } return clientResponse; } private ClientResponse applyBackup(ClientExecutionContext clientExecutionContext) { Map<String, ProductResult> results = getProductResultMap(clientExecutionContext); return generateBackupClientResponse(results, clientExecutionContext); } private Map<String, ProductResult> getProductResultMap(ClientExecutionContext clientExecutionContext) { final MyRequest myRequest = (MyRequest) clientExecutionContext.getRequest().getBody(); Map<String, MyResult> results = new HashMap<String, MyResult>(); // Code that fills the results map properly return results; } //---------Where the help is needed---------------------// private ClientResponse generateBackupClientResponse(Map<String, ProductResult> results, ClientExecutionContext clientExecutionContext) { ResponseBuilderImpl responseBuilder = new ResponseBuilderImpl(); responseBuilder.entity(results); responseBuilder.status(200); Response response = responseBuilder.build(); //TODO how to convert this repsonse to a ClientResponse? final Response responseFromBuilder = responseBuilder.build(); //TODO how to give to this mockResponse a results entity? MockHttpResponse mockHttpResponse = new MockHttpResponse(); mockHttpResponse.setStatus(200); final BaseClientResponse baseClientResponse = new BaseClientResponse(InMemoryClientExecutor.createStreamFactory(mockHttpResponse)); baseClientResponse.setStatus(200); return baseClientResponse; } } ,在邻近TAB 中打开http://www.google.co.in

  • 代码块:

    https://www.yahoo.com
  • 控制台输出:

    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    options = webdriver.ChromeOptions() 
    options.add_argument("start-maximized")
    options.add_argument('disable-infobars')
    driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
    driver.get("http://www.google.co.in")
    print("Initial Page Title is : %s" %driver.title)
    windows_before  = driver.current_window_handle
    print("First Window Handle is : %s" %windows_before)
    driver.execute_script("window.open('https://www.yahoo.com')")
    WebDriverWait(driver, 10).until(EC.number_of_windows_to_be(2))
    windows_after = driver.window_handles
    new_window = [x for x in windows_after if x != windows_before][0]
    driver.switch_to_window(new_window)
    print("Page Title after Tab Switching is : %s" %driver.title)
    print("Second Window Handle is : %s" %new_window)
    
  • 浏览器快照:

multiple__tabs

答案 5 :(得分:2)

试试这个它会起作用:

# Open a new Tab
driver.execute_script("window.open('');")

# Switch to the new window and open URL B
driver.switch_to.window(driver.window_handles[1])
driver.get(tab_url)

答案 6 :(得分:0)

我花了很长时间尝试在Chrome上使用主体上的action_keys和send_keys复制标签页。唯一对我有用的是答案here。这就是我重复的选项卡def最终看起来像的样子,可能不是最好的,但是对我来说很好用。

def duplicate_tabs(number, chromewebdriver):
#Once on the page we want to open a bunch of tabs
url = chromewebdriver.current_url
for i in range(number):
    print('opened tab: '+str(i))
    chromewebdriver.execute_script("window.open('"+url+"', 'new_window"+str(i)+"')")

它基本上是从python内部运行一些Java,这非常有用。希望这对某人有帮助。

注意:我正在使用Ubuntu,它不会有所作为,但如果对您不起作用,则可能是原因。

答案 7 :(得分:0)

据我所知,不可能在Chrome浏览器的同一窗口中打开新的空白标签 ,但是您可以通过网络链接打开新标签。

到目前为止,我已经上网冲浪,并且在这个问题上我获得了很好的工作内容。 请尝试按照步骤操作,不要错过。

import selenium.webdriver as webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.Chrome()
driver.get('https://www.google.com?q=python#q=python')
first_link = driver.find_element_by_class_name('l')

# Use: Keys.CONTROL + Keys.SHIFT + Keys.RETURN to open tab on top of the stack 
first_link.send_keys(Keys.CONTROL + Keys.RETURN)

# Switch tab to the new tab, which we will assume is the next one on the right
driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL + Keys.TAB)

driver.quit()

我认为到目前为止,这是更好的解决方案。

积分:https://gist.github.com/lrhache/7686903

答案 8 :(得分:0)

tabs = {}

def new_tab():
    global browser
    hpos = browser.window_handles.index(browser.current_window_handle)
    browser.execute_script("window.open('');")
    browser.switch_to.window(browser.window_handles[hpos + 1])
    return(browser.current_window_handle)
    
def switch_tab(name):
    global tabs
    global browser
    if not name in tabs.keys():
        tabs[name] = {'window_handle': new_tab(), 'url': url+name}
        browser.get(tabs[name]['url'])
    else:
        browser.switch_to.window(tabs[name]['window_handle'])

答案 9 :(得分:0)

为此我会坚持使用 ActionChains

这是一个打开新标签并切换到该标签的函数:

import time
from selenium.webdriver.common.action_chains import ActionChains

def open_in_new_tab(driver, element, switch_to_new_tab=True):
    base_handle = driver.current_window_handle
    # Do some actions
    ActionChains(driver) \
        .move_to_element(element) \
        .key_down(Keys.COMMAND) \
        .click() \
        .key_up(Keys.COMMAND) \
        .perform()
    
    # Should you switch to the new tab?
    if switch_to_new_tab:
        new_handle = [x for x in driver.window_handles if x!=base_handle]
        assert len new_handle == 1 # assume you are only opening one tab at a time
        
        # Switch to the new window
        driver.switch_to.window(new_handle[0])

        # I like to wait after switching to a new tab for the content to load
        # Do that either with time.sleep() or with WebDriverWait until a basic
        # element of the page appears (such as "body") -- reference for this is 
        # provided below
        time.sleep(0.5)        

        # NOTE: if you choose to switch to the window/tab, be sure to close
        # the newly opened window/tab after using it and that you switch back
        # to the original "base_handle" --> otherwise, you'll experience many
        # errors and a painful debugging experience...

您将如何应用该功能:

# Remember your starting handle
base_handle = driver.current_window_handle

# Say we have a list of elements and each is a link:
links = driver.find_elements_by_css_selector('a[href]')

# Loop through the links and open each one in a new tab
for link in links:
    open_in_new_tab(driver, link, True)
    
    # Do something on this new page
    print(driver.current_url)
    
    # Once you're finished, close this tab and switch back to the original one
    driver.close()
    driver.switch_to.window(base_handle)
    
    # You're ready to continue to the next item in your loop

您可以通过以下方式wait until the page is loaded

答案 10 :(得分:0)

正如已经多次提到的,以下方法不再有效:

driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL + 't')
ActionChains(driver).key_down(Keys.CONTROL).send_keys('t').key_up(Keys.CONTROL).perform()

此外,driver.execute_script("window.open('');") 正在运行,但受到弹出窗口阻止程序的限制。我并行处理数百个标签(使用 scrapy 进行网页抓取)。但是,弹出窗口阻止程序在使用 JavaScript 的 window.open('') 打开 20 个新标签后变得活跃,因此破坏了我的抓取工具。

作为解决方法,我将一个标签声明为“master”,它打开了以下 helper.html

<!DOCTYPE html>
<html><body>
<a id="open_new_window" href="about:blank" target="_blank">open a new window</a>
</body></html>

现在,我的(简化的)爬虫可以通过故意点击弹出式博主根本不考虑的链接来打开尽可能多的标签:

# master
master_handle = driver.current_window_handle
helper = os.path.join(os.path.dirname(os.path.abspath(__file__)), "helper.html")
driver.get(helper)

# open new tabs
for _ in range(100):
    window_handle = driver.window_handles          # current state
    driver.switch_to_window(master_handle)
    driver.find_element_by_id("open_new_window").click()
    window_handle = set(driver.window_handles).difference(window_handle).pop()
    print("new window handle:", window_handle)

通过 JavaScript 的 window.close() 关闭这些窗口没有问题。

答案 11 :(得分:0)

#Change the method of finding the element if needed
self.find_element_by_xpath(element).send_keys(Keys.CONTROL + Keys.ENTER)

这将找到该元素并在新选项卡中打开它。 self 只是用于 webdriver 对象的名称。

答案 12 :(得分:0)

from selenium import webdriver
import time

driver = webdriver.Firefox()
driver.get('https://www.google.com')

driver.execute_script("window.open('');")
time.sleep(5)

driver.switch_to.window(driver.window_handles[1])
driver.get("https://facebook.com")
time.sleep(5)

driver.close()
time.sleep(5)

driver.switch_to.window(driver.window_handles[0])
driver.get("https://www.yahoo.com")
time.sleep(5)

#driver.close()

https://www.edureka.co/community/52772/close-active-current-without-closing-browser-selenium-python

答案 13 :(得分:-1)

奇怪的是,答案如此之多,而且所有人都在使用诸如JS和键盘快捷键之类的替代方法,而不仅仅是使用硒功能:

def newTab(driver, url="about:blank"):
    wnd = driver.execute(selenium.webdriver.common.action_chains.Command.NEW_WINDOW)
    handle = wnd["value"]["handle"]
    driver.switch_to.window(handle)
    driver.get(url) # changes the handle
    return driver.current_window_handle
相关问题