Apify分页陷入无休止的循环

时间:2020-06-04 20:29:07

标签: apify

我正在尝试通过分页方式抓取网页。

没有分页代码块,它可以正常工作,但是一旦添加分页,它就会陷入无限循环。

// The function accepts a single argument: the "context" object.
// For a complete list of its properties and functions,
// see https://apify.com/apify/web-scraper#page-function 
async function pageFunction(context) {
    const {
        log,
        jQuery: $,
        waitFor
    } = context;

    //Pagination:
    let timeoutMillis; // undefined
    const buttonSelector = 'div.moreblock a';
    while (true) {
        log.info('Waiting for the "Show more" button.');
        try {
            await waitFor(buttonSelector, {
                timeoutMillis
            }); // Default timeout first time.
            timeoutMillis = 3000; // 2 sec timeout after the first.
        } catch (err) {
            // Ignore the timeout error.
            log.info('Could not find the "Show more button", we\'ve reached the end.');
            break;
        }
        log.info('Clicking the "Show more" button.');
        $(buttonSelector).click();
    }

    //Export results:
    var result = [];
    $(".box").each(function() {
        result.push({
            pname: $(this).find('.top .fb .browsinglink').text(),
            pdesc: $(this).find('.Description').text(),
            pprice: $(this).find('.c2').text(),
        });
    });
    return result;
}

日志:

2020-06-04T20:23:02.535Z INFO  Waiting for the "Show more" button.
2020-06-04T20:23:02.585Z INFO  Clicking the "Show more" button.
2020-06-04T20:23:02.586Z INFO  Waiting for the "Show more" button.
2020-06-04T20:23:02.637Z INFO  Clicking the "Show more" button.
2020-06-04T20:23:02.637Z INFO  Waiting for the "Show more" button.
2020-06-04T20:23:02.688Z INFO  Clicking the "Show more" button.
2020-06-04T20:23:02.689Z INFO  Waiting for the "Show more" button.
2020-06-04T20:23:02.743Z INFO  Clicking the "Show more" button

它的工作方式应为:

- get results (".box").each(function()
- click on Show more button
- get results (".box").each(function()
- click on Show more button
...

我尝试使用的示例页面:

https://www.alza.sk/search.htm?exps=romet

具有传统的分页和显示更多按钮。

<div class="cpager bottom" id="pagerbottom">
    <a class="pgn sel" href="search.htm?exps=romet" id="pgb1">
        <span>
            1
        </span>
    </a>
    <a class="pgn" href="search-p2.htm?exps=romet" id="pgb2">
        <span>
            2
        </span>
    </a>
    <a class="pgn" href="search-p3.htm?exps=romet" id="pgb3">
        <span>
            3
        </span>
    </a>
    <span>
        ...
    </span>
    <a class="next fa fa-chevron-right" href="search-p2.htm?exps=romet" id="pgby2">
        <span>
        </span>
    </a>
</div>

和:

<div class="moreblock" id="loadmoreInner">
    <a class="js-button-more button-more btnx normal" href="0-p2.htm">
        <span class="">
            24 ďalších...
        </span>
    </a>
    <a class="goToTop" href="javascript:void(0);">
        Hore
    </a>
</div>

0 个答案:

没有答案
相关问题