在 Beautiful Soup Python 中用文本打印类名

时间:2021-04-05 07:14:21

标签: python beautifulsoup

我想用所有选项打印类名。对于错误的答案,它只是radio-button-click-target,但如果有正确的选项,它的radio-button-click-target Correctquestions

import requests
from bs4 import BeautifulSoup
addresses = ['https://www.ilmkidunya.com/online-test/5th-class-science-english-meduim-mcqs-with-answers?startfrom=0&last=92']
subjects = ['05th-Science-EM-']
for runscript in range(0, len(addresses)):
    response = requests.get(addresses[runscript])
    soup = BeautifulSoup(response.text, 'lxml')
    ques_id = soup.find_all('div', class_='q-title')
    ques_det = soup.find_all('div', class_='q-desc')
    optn_det = soup.find_all('div', class_='choose-answer-block')
    for i in range(0, len(ques_id)):
        unformated_ques_id = (ques_id[i].text)
        formated_ques_id = unformated_ques_id.replace("Question #  ", subjects[runscript])
        print(formated_ques_id)
        #print('\n')
        print(str(ques_det[i].text).strip())
        #print('\n')
        options = optn_det[i].find_all('label', class_='radio-button-click-target')
        for opn in options:
            print(str(opn.text).strip())
            #print('\n')
        print('<----->')


当前结果
05th-Science-EM-1
雌性登革热蚊子产卵:
在墙上
在土壤中
在木头上
在水中
<----->
05th-Science-EM-2
蛇是一个例子:
脊椎动物
蠕虫
昆虫
蜗牛
<----->
预期结果
05th-Science-EM-1
雌性登革热蚊子产卵:
单选按钮点击目标:在墙上
单选按钮点击目标:在土壤中
单选按钮点击目标:在木头上
单选按钮单击目标更正问题:在水中
<----->
05th-Science-EM-2
蛇是一个例子:
单选按钮单击目标更正问题:脊椎动物
单选按钮点击目标:蠕虫
单选按钮点击目标:昆虫
单选按钮点击目标:蜗牛
<----->

3 个答案:

答案 0 :(得分:2)

按如下方式更改您的选择器并删除对已有字符串的 str 调用。此外,在适当的情况下使用 ['class'] 提取类,我使用 join 组合多值类:

import requests
from bs4 import BeautifulSoup

addresses = ['https://www.ilmkidunya.com/online-test/5th-class-science-english-meduim-mcqs-with-answers?startfrom=0&last=92']
subjects = ['05th-Science-EM-']

with requests.Session() as s:
    for runscript in range(0, len(addresses)):
        response = s.get(addresses[runscript])
        soup = BeautifulSoup(response.text, 'lxml')
        ques_id = soup.find_all('div', class_='q-title')
        ques_det = soup.find_all('div', class_='q-desc')
        optn_det = soup.find_all('div', class_='choose-answer-block')

        for i in range(0, len(ques_id)):
            unformated_ques_id = (ques_id[i].text)
            formated_ques_id = unformated_ques_id.replace("Question #  ", subjects[runscript])
            print(formated_ques_id)
            print(ques_det[i].text.strip())
            options = optn_det[i].select('label.radio-button-click-target')

            for opn in options:
                print(' '.join(opn['class']), opn.text.strip())
            print('<----->')

答案 1 :(得分:1)

你应该多尝试一点。

for opn in options:
            print(' '.join(opn['class']), end=' : ')
            print(str(opn.text).strip())

这将得到您想要的结果。

答案 2 :(得分:0)

实际上您的代码包含多个错误:

  1. 您不需要多次调用 soup 对象!因为您可以提取 goal 然后解析它。
  2. 如果可以使用一个函数,则不需要使用两个函数。像 .text.strip() 这样的东西实际上可以通过 .get_text(strip=True) 一并处理!
  3. 有很多错误,但请移至我的代码,以便您获得完整的想法。
from bs4 import BeautifulSoup
import requests


def main(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'lxml')
    goal = soup.select_one('#divQuestion1')

    title = goal.select_one('.q-desc').get_text(strip=True)
    can = goal.select_one(
        'label[class$=correctquestions]').get_text(strip=True)
    wan = [x.get_text(strip=True)
           for x in goal.select('label[class$=click-target]')]
    print("Question: {}\nCorrect-Answer: {}\nWrong-Answers: {}".format(title, can, wan))


main('https://www.ilmkidunya.com/online-test/5th-class-science-english-meduim-mcqs-with-answers?startfrom=0&last=92')
相关问题