python和selenium web抓取多个下拉菜单(1或2或3或4下拉菜单)文本

时间:2019-05-21 23:37:21

标签: python selenium selenium-webdriver

我正在尝试在eBay上输出下拉菜单的文本。我要输出文本,然后选择不同的下拉选项来选择价格(这就是为什么我不想一次全部刮取下拉值列表的原因)。我有一个仅适用于1个下拉框和价格的代码。

from selenium.webdriver.support.select import Select

sel = Select(driver.find_element_by_xpath("//select[@id='msku-sel-1']"))

for index in range(1, len(sel.options)):
    # skipping index 0 because it is not valid option
    sel.select_by_index(index)
    print("{}: {}".format(sel.first_selected_option.text, browser.find_element_by_xpath("//span[@id='prcIsum']").text))

https://www.ebay.co.uk/itm/APPLE-iPHONE-5S-16GB-32GB-64GB-Unlocked-EE-O2-Voda-Smartphone-Mobile/323645059709?epid=168509016&hash=item4b5abfb27d%3Am%3AmPVOlUVEGK642jC7sPt_4Yg&LH_BIN=1

https://www.ebay.co.uk/itm/UK-Women-Off-Shoulder-Floral-Bodycon-Backless-Ladies-Summer-Beach-Midi-Sun-Dress/254198776097?hash=item3b2f6d8121:m:m9B15WsfVx5zTh_73LlzBGA

我希望输出为

颜色:白色,尺寸:S价格:£4.99

1 个答案:

答案 0 :(得分:0)

您将需要生成所有可能的排列,在页面中选择它们并获取每个的价格。

以下是该网站的两个下拉菜单:

from selenium.webdriver.support.select import Select
from selenium.webdriver import Chrome
from selenium.common.exceptions import NoSuchElementException

import itertools
from pprint import pformat


def apply_values(dropdowns, values):
    """
    :param dropdowns: list of select DropDown
    :param values: list of values to set
    :return: dict with key=dropdownName, value=dropDownValue
    """
    r = {}
    for value in values:
        # For each value, get the associated DropDown and set it
        for dropdown in dropdowns:
            if value in dropdown.values:
                try:
                    dropdown.select_by_visible_text(value)
                    r[dropdown.name] = value
                except NoSuchElementException:
                    # print("Unable to set the following values {}..Skipping".format(values))
                    # This is caused by an option not being available with other selected dropdown values
                    # You can also check the attribute disabled for this
                    return False
                break
    return r


driver = Chrome()
driver.get('yourUrl')
els = driver.find_elements_by_css_selector(".msku-sel")
selects = []
for el in els:
    # Adding the dropdown name to the select object
    name = el.get_attribute('name')
    select = Select(el)
    select.name = name
    selects.append(select)

print("Get all options for each dropdown")
for idx, sel in enumerate(selects):
    sel.values = [option.text for option in sel.options][1:]

print("Get all possible permutations")
permutations = list(itertools.product(*[sel.values for sel in selects]))

# Iteration for all possible permutation
print("Select all possible permutation and get price for each")
results = []
for permutation in permutations:
    # Resetting all parameter to default
    for sel in selects:
        sel.select_by_index(0)
    # Iteration to set each dropdown
    result = apply_values(selects, permutation)
    if result:
        # Once all dropdown value are set, get the finally price
        result['Price'] = driver.find_element_by_id("prcIsum").text
        results.append(result)


print(pformat(results))

driver.close()

结果:

Get all options for each dropdown
Get all possible permutations
Select all possible permutation and get price for each
[{'Colour': 'White', 'Price': '£6.99', 'Size': '6'},
 {'Colour': 'White', 'Price': '£6.99', 'Size': '8'},
 {'Colour': 'White', 'Price': '£6.99', 'Size': '10'},
 {'Colour': 'White', 'Price': '£6.99', 'Size': '12'},
 {'Colour': 'White', 'Price': '£6.99', 'Size': '14'},
 {'Colour': 'White', 'Price': '£6.99', 'Size': '16'},
 {'Colour': 'White', 'Price': '£6.99', 'Size': 'S'},
 {'Colour': 'White', 'Price': '£6.99', 'Size': 'M'},
 {'Colour': 'White', 'Price': '£6.99', 'Size': 'L'},
 {'Colour': 'White', 'Price': '£6.99', 'Size': 'XL'},
 {'Colour': 'Blue', 'Price': '£6.99', 'Size': '6'},
 {'Colour': 'Blue', 'Price': '£6.99', 'Size': '8'},
 {'Colour': 'Blue', 'Price': '£6.99', 'Size': '10'},
 {'Colour': 'Blue', 'Price': '£6.99', 'Size': '12'},
 {'Colour': 'Blue', 'Price': '£6.99', 'Size': '14'},
 {'Colour': 'Blue', 'Price': '£6.99', 'Size': '16'},
 {'Colour': 'Blue', 'Price': '£6.99', 'Size': 'S'},
 {'Colour': 'Blue', 'Price': '£6.99', 'Size': 'M'},
 {'Colour': 'Blue', 'Price': '£6.99', 'Size': 'L'},
 {'Colour': 'Blue', 'Price': '£6.99', 'Size': 'XL'},
 {'Colour': 'Light Green', 'Price': '£6.99', 'Size': '6'},
 {'Colour': 'Light Green', 'Price': '£6.99', 'Size': '8'},
 {'Colour': 'Light Green', 'Price': '£6.99', 'Size': '10'},
 {'Colour': 'Light Green', 'Price': '£6.99', 'Size': '12'},
 {'Colour': 'Light Green', 'Price': '£6.99', 'Size': '14'},
 {'Colour': 'Light Green', 'Price': '£6.99', 'Size': '16'},
 {'Colour': 'Light Green', 'Price': '£6.99', 'Size': 'S'},
 {'Colour': 'Light Green', 'Price': '£6.99', 'Size': 'M'},
 {'Colour': 'Light Green', 'Price': '£6.99', 'Size': 'L'},
 {'Colour': 'Light Green', 'Price': '£6.99', 'Size': 'XL'}]

我已经使用pformat来打印结果字典,但是您可以轻松地将其重新格式化为所需的预期输出。

我已经在代码中添加了注释,但是如果您需要澄清,可以询问

相关问题