使用Beautiful Soup使用Python提取HTML内容

时间:2017-12-23 20:10:52

标签: python html web-scraping beautifulsoup

您好我正在使用漂亮的汤库来解析html页面中的内容。

我使用以下脚本来获取我想要的页面部分:

review_list = soup.find(class_="review_list_score_breakdown_right")



<span class=" review_list_score_breakdown_right">
 <ul class="review_score_breakdown_list list_tighten clearfix" data-et-view="bLTQHcXJVNRCSPOMcAQJO:1 bLTQHcXJVNRCSPOMcAQJO:3 " id="review_list_score_breakdown">
  <li class="clearfix one_col" data-question="hotel_clean">
   <p class="review_score_name">
    Cleanliness
   </p>
   <div class="score_bar">
    <div class="score_bar_value" data-score="100" style="width: 100%;">
    </div>
   </div>
   <p class="review_score_value">
    10
   </p>
  </li>
  <li class="clearfix one_col" data-question="hotel_comfort">
   <p class="review_score_name">
    Comfort
   </p>
   <div class="score_bar">
    <div class="score_bar_value" data-score="100" style="width: 100%;">
    </div>
   </div>
   <p class="review_score_value">
    10
   </p>
  </li>
  <li class="clearfix one_col" data-question="hotel_services">
   <p class="review_score_name">
    Facilities
   </p>
   <div class="score_bar">
    <div class="score_bar_value" data-score="100" style="width: 100%;">
    </div>
   </div>
   <p class="review_score_value">
    10
   </p>
  </li>
  <li class="clearfix one_col" data-question="hotel_staff">
   <p class="review_score_name">
    Staff
   </p>
   <div class="score_bar">
    <div class="score_bar_value" data-score="100" style="width: 100%;">
    </div>
   </div>
   <p class="review_score_value">
    10
   </p>
  </li>
  <li class="clearfix one_col" data-question="hotel_value">
   <p class="review_score_name">
    Value for money
   </p>
   <div class="score_bar">
    <div class="score_bar_value" data-score="100" style="width: 100%;">
    </div>
   </div>
   <p class="review_score_value">
    10
   </p>
  </li>
  <li class="clearfix one_col" data-question="hotel_wifi">
   <p class="review_score_name">
    Free WiFi
   </p>
   <div class="score_bar">
    <div class="score_bar_value" data-score="100" style="width: 100%;">
    </div>
   </div>
   <p class="review_score_value">
    10
   </p>
  </li>
  <li class="clearfix one_col" data-question="hotel_location">
   <p class="review_score_name">
    Location
   </p>
   <div class="score_bar">
    <div class="score_bar_value" data-score="100" style="width: 100%;">
    </div>
   </div>
   <p class="review_score_value">
    10
   </p>
  </li>
 </ul>
</span>
&#13;
&#13;
&#13;

我需要从数据问题标签中提取分数。例如,如果我想知道酒店的舒适度分数,我需要访问data-question= "hotel_confort"我已尝试使用find()功能,但它无法正常工作。< / p>

2 个答案:

答案 0 :(得分:0)

我认为您需要的是attrs查找查询。 您的问题与Extracting an attribute value with beautifulsoup

类似

我会根据你的情况说明一点。

review = soup.find(class_="review_list_score_breakdown_right")
input = review.find(attrs={"data-question" : "hotel-comfort"})
output = input['value']

自从我使用bs4以来已经有一段时间了,所以请调试代码。

编辑: 这是从您的示例字符串

中获取的一些工作代码
review = soup.find('span', {'class' : "review_list_score_breakdown_right"})
input = review.find_all(attrs={"data-question": "hotel_comfort"})
print(input) #print the html extract which you can go down further.

答案 1 :(得分:0)

您的代码中没有hotel_confort个。

    review = soup.find(class_="review_list_score_breakdown_right")
    hotel = review.find(attrs={"data-question" : "hotel_comfort"})

此代码返回

<li class="clearfix one_col" data-question="hotel_comfort"> ..... </li>

相关问题