Python beautifulSoup 抓取下拉菜单

时间:2021-02-26 10:42:11

标签: python web-scraping beautifulsoup

我正在尝试在此链接上抓取搜索结果:https://www.inecnigeria.org/elections/polling-units/,这要求我选择一个下拉值,然后显示另一个我必须在搜索前从中进行选择的值。我能够从第一个下拉选择中获取值,但不能从其他下拉选择中获取值。这是我目前拥有的:

from bs4 import BeautifulSoup
import requests

base = 'https://www.inecnigeria.org/elections/polling-units/'

base_req = requests.get(base, verify=False)

soup = BeautifulSoup( base_req.text, "html.parser" )

# states
states = soup.find('select', id = "statePoll")

stateItems = states.select('option[value]')

stateValues = [ stateItem.text for stateItem in stateItems ]


# print(stateValues)

lgas = soup.find('select', id = "lgaPoll")

lgaItems = lgas.select('option[value]')

lgaValues = [ lgaItem.text for lgaItem in lgaItems ]


print(lgas)

1 个答案:

答案 0 :(得分:1)

实际上,您无法通过抓取该页面上的 HTML 来获取这些值。该页面使用 JavaScript 从另一个页面请求选项并将它们动态插入到页面中。您将不得不使用可以抓取的信息自己提出此类请求。以下是如何进行下一步的示例,该示例应向您展示总体思路:

from bs4 import BeautifulSoup
import requests

base = 'https://www.inecnigeria.org/elections/polling-units/'
lga_view = 'https://www.inecnigeria.org/wp-content/themes/independent-national-electoral-commission/custom/views/lgaView.php'
base_req = requests.get(base, verify=False)
soup = BeautifulSoup(base_req.text, "html.parser" )

states = soup.find('select', id = "statePoll")
state_options = states.find_all('option')
states = {opt.text: int(opt['value']) for opt in state_options if 'value' in opt.attrs}

lga = {k: requests.post(lga_view, {'state_id': v}, verify=False).json() for k,v in states.items()}

print(lga)
相关问题