难以从网站抓取产品 href

时间:2021-04-01 13:43:09

标签: excel vba web-scraping screen-scraping

我在尝试从网站中提取 href 时遇到困难。我已经坚持了几天了。如下图所示,我可以获得所有其他必需的信息。我已经为该类尝试了多种变体,并尝试通过 a 标签获取它,但是我无法解决。

link

这是我最近的尝试,仍然无法解决

问题,有人可以指出正确的班级吗?

        If element.getElementsByClassName("product-card ")(0).getElementsByClassName("listing-fpa-link")(0) Is Nothing Then
            wsSheet.Cells(sht.Cells(sht.Rows.Count, "A").End(xlUp).Row + 1, "A").Value = "-"
        Else
            HtmlText = element.getElementsByClassName("product-card")(0).getElementsByClassName("listing-fpa-link")(0).href
            wsSheet.Cells(sht.Cells(sht.Rows.Count, "A").End(xlUp).Row + 1, "A").Value = HtmlText
        End If

结果

Results Image

<article data-standout-type="featured-listing" class="product-card js-standout-listing">
  <div class="product-card__inner">
    <section class="product-card-image ">
      <div class="listing-image-count">
        <i data-label="search appearance click " class="listing-image-icon">
         <svg>
         <use xlink:href="/templates/_generated/svg_icons/search-listings.svg#icon-camera"></use>
         </svg>
         </i> 12
         <!--replace with hasSpin-->
      </div>
      <img class="product-card-image__image product-card-image__main-image" alt="" srcset="
       https://m.atcdn.co.uk/a/media/w262h198pf7f7f5/db56983925c647b6bbcaa526f9248a99.jpg 262w,
       https://m.atcdn.co.uk/a/media/w340h255pf7f7f5/db56983925c647b6bbcaa526f9248a99.jpg
       " sizes=" (max-width: 1024px) 262px, 340px
       " src="https://m.atcdn.co.uk/a/media/w340h255pf7f7f5/db56983925c647b6bbcaa526f9248a99.jpg" loading="lazy" data-label="search appearance click ">
      <div class="product-card-image__thumbnails">
        <img class="product-card-image__image" loading="lazy" data-label="search appearance click " srcset="                                                                                                        
       https://m.atcdn.co.uk/a/media/w84h63pf7f7f5/e301a4730db94209afbd2f44d4cfb314.jpg 84w,                                                                                                                 
       https://m.atcdn.co.uk/a/media/w108h81pf7f7f5/e301a4730db94209afbd2f44d4cfb314.jpg
       " sizes="(max-width: 1024px) 84px,108px
         " src="https://m.atcdn.co.uk/a/media/w108h81pdfdfdf/e301a4730db94209afbd2f44d4cfb314.jpg" alt="">
        <img class="product-card-image__image" loading="lazy" data-label="search appearance click " srcset="
      https://m.atcdn.co.uk/a/media/w84h63pf7f7f5/28f4022b7f20457b9fc76a392d11191d.jpg 84w,
      https://m.atcdn.co.uk/a/media/w108h81pf7f7f5/28f4022b7f20457b9fc76a392d11191d.jpg " sizes="
      (max-width: 1024px) 84px, 108px
    " src="https://m.atcdn.co.uk/a/media/w108h81pdfdfdf/28f4022b7f20457b9fc76a392d11191d.jpg" alt="">
     <img class="product-card-image__image" loading="lazy" data-label="search appearance click " srcset="
      https://m.atcdn.co.uk/a/media/w84h63pf7f7f5/fa16468ed0e54f6fb0dbd6661d84016e.jpg 84w,
     https://m.atcdn.co.uk/a/media/w108h81pf7f7f5/fa16468ed0e54f6fb0dbd6661d84016e.jpg " sizes="
     (max-width: 1024px) 84px, 108px" 
      src="https://m.atcdn.co.uk/a/media/w108h81pdfdfdf/fa16468ed0e54f6fb0dbd6661d84016e.jpg" alt="">
      </div>
    </section>
    <div class="product-card-content">
      <div class="product-card-content__car-info">
        <ol class="badge-group">
          <li class="badge-group__item" data-category="priceIndicator-GOOD">
            Good price
          </li>
          <li class="badge-group__item">
            No admin fees
          </li>
          <li class="badge-group__item">
            Finance available
          </li>
        </ol>
        <section class="product-card-pricing">
          <div class="product-card-pricing__content">
            <div class="product-card-pricing__price">
              <span>£6,607</span>
            </div>
          </div>
        </section>
        <section class="product-card-details">
          <h3 class="product-card-details__title">
            Renault Clio
          </h3>
          <p class="product-card-details__subtitle">
            0.9 TCe Play (s/s) 5dr
          </p>
          <p class="product-card-details__attention-grabber">
            £550 OF EXTRAS • BLUETOOTH
          </p>
          <ul class="listing-key-specs">
            <li class="atc-type-picanto--medium">2018 (68 reg)</li>
            <li class="atc-type-picanto--medium">Hatchback</li>
            <li class="atc-type-picanto--medium">61,671 miles</li>
            <li class="atc-type-picanto--medium">0.9L</li>
            <li class="atc-type-picanto--medium">76PS</li>
            <li class="atc-type-picanto--medium">Manual</li>
            <li class="atc-type-picanto--medium">Petrol</li>
            <li class="atc-type-picanto--medium">1 owner</li>
            <li class="atc-type-picanto--medium">ULEZ</li>
          </ul>
        </section>
      </div>
      <div class="product-card-seller-info">
        <div class="product-card-seller-info__details">
          <div class="product-card-seller-info__name-container">
            <h3 class="product-card-seller-info__name atc-type-picanto">Carbase Bristol</h3>
            <a class="dealer-profile-link atc-type-picanto" href="/dealers/gloucestershire/bristol/carbase-bristol-2445554" rel="nofollow" target="_blank">See all 780 cars</a>
          </div>
          <ul class="product-card-seller-info__specs">
            <li class="product-card-seller-info__spec-item atc-type-picanto">
              <svg width="16" height="16" viewBox="0 0 16 16" fill="#242D3D" xmlns="http://www.w3.org/2000/svg">
                                                                                        <path fill-rule="evenodd" clip-rule="evenodd" d="M10.6689 8.907L9.93491 9.47L10.2069 10.355L11.1309 13.363L8.81291 11.582L7.99991 10.958L7.18791 11.582L4.86891 13.363L5.79291 10.355L6.06491 9.47L5.33091 8.907L2.80991 6.969H5.84891H6.83391L7.12291 6.027L7.99991 3.173L8.87691 6.027L9.16591 6.969H10.1509H13.1899L10.6689 8.907ZM15.4299 5.636H10.1509L8.54391 0.403C8.46191 0.135 8.23091 0 7.99991 0C7.76991 0 7.53891 0.135 7.45591 0.403L5.84891 5.636H0.569911C0.0249108 5.636 -0.208089 6.331 0.224911 6.664L4.51891 9.965L2.89391 15.255C2.77091 15.657 3.08791 16 3.43991 16C3.55591 16 3.67491 15.963 3.78391 15.88L7.99991 12.639L12.2159 15.88C12.3249 15.963 12.4439 16 12.5599 16C12.9119 16 13.2299 15.657 13.1069 15.255L11.4809 9.965L15.7749 6.664C16.2079 6.331 15.9749 5.636 15.4299 5.636Z"></path></svg>
             <span class="product-card-seller-info__spec-item-copy">4.5</span> (
              <a class="product-card-seller-info__review-count dealer-profile-link" href="/dealers/gloucestershire/bristol/carbase-bristol-2445554#reviews-container">5409 reviews</a>)
            </li>
            <li class="product-card-seller-info__spec-item atc-type-picanto">
              <svg width="16" height="16" viewBox="0 0 16 16" fill="#242D3D" xmlns="http://www.w3.org/2000/svg">
            <path fill-rule="evenodd" clip-rule="evenodd" d="M8 4.0001C6.896 4.0001 6 4.8961 6 6.0001C6 7.1041 6.896 8.0001 8 8.0001C9.104 8.0001 10 7.1041 10 6.0001C10 4.8961 9.104 4.0001 8 4.0001ZM7.999 14.13C6.174 12.23 3.333 8.699 3.333 6C3.333 3.427 5.427 1.333 8 1.333C10.573 1.333 12.667 3.427 12.667 6C12.667 8.692 9.824 12.228 7.999 14.13ZM8 0C4.686 0 2 2.687 2 6C2 10.5 8 16 8 16C8 16 14 10.5 14 6C14 2.687 11.313 0 8 0Z"></path></svg>
              <span class="product-card-seller-info__spec-item-copy">bristol</span> (77 miles)
            </li>
          </ul>
        </div>
        <div class="product-card-seller-info__logo">
          <img src="https://dealerlogo.atcdn.co.uk/at2/adbranding/2445554/images/searchlogo.gif" alt="Advertiser Logo Carbase Bristol">
        </div>
      </div>
    </div>
  </div>
  <a class="js-click-handler listing-fpa-link tracking-standard-link" data-results-nav-fpa="" data-label="search appearance click " rel="nofollow" href="/car-details/202103170253503?include-delivery-option=on&amp;postcode=b94ta&amp;radius=1500&amp;sort=relevance&amp;onesearchad=New&amp;onesearchad=Nearly%20New&amp;onesearchad=Used&amp;advertising-location=at_cars&amp;page=1"></a>
</article>

一如既往地提前致谢。

1 个答案:

答案 0 :(得分:0)

好的,我已经解决了这个问题。我将父类更改为 Set elements = HTML.getElementsByClassName("search-page__result")

然后把我的代码改成

   If element.getElementsByClassName("js-click-handler listing-fpa-link tracking-standard-link")(0) Is Nothing Then
            wsSheet.Cells(sht.Cells(sht.Rows.Count, "A").End(xlUp).Row + 1, "A").Value = "-"
        Else
            HtmlText = element.getElementsByClassName("js-click-handler listing-fpa-link tracking-standard-link")(0).href
            wsSheet.Cells(sht.Cells(sht.Rows.Count, "A").End(xlUp).Row + 1, "A").Value = HtmlText
        End If

结果

NEW RESULTS