我正在尝试通过分页方式抓取网页。
没有分页代码块,它可以正常工作,但是一旦添加分页,它就会陷入无限循环。
// The function accepts a single argument: the "context" object.
// For a complete list of its properties and functions,
// see https://apify.com/apify/web-scraper#page-function
async function pageFunction(context) {
const {
log,
jQuery: $,
waitFor
} = context;
//Pagination:
let timeoutMillis; // undefined
const buttonSelector = 'div.moreblock a';
while (true) {
log.info('Waiting for the "Show more" button.');
try {
await waitFor(buttonSelector, {
timeoutMillis
}); // Default timeout first time.
timeoutMillis = 3000; // 2 sec timeout after the first.
} catch (err) {
// Ignore the timeout error.
log.info('Could not find the "Show more button", we\'ve reached the end.');
break;
}
log.info('Clicking the "Show more" button.');
$(buttonSelector).click();
}
//Export results:
var result = [];
$(".box").each(function() {
result.push({
pname: $(this).find('.top .fb .browsinglink').text(),
pdesc: $(this).find('.Description').text(),
pprice: $(this).find('.c2').text(),
});
});
return result;
}
日志:
2020-06-04T20:23:02.535Z INFO Waiting for the "Show more" button.
2020-06-04T20:23:02.585Z INFO Clicking the "Show more" button.
2020-06-04T20:23:02.586Z INFO Waiting for the "Show more" button.
2020-06-04T20:23:02.637Z INFO Clicking the "Show more" button.
2020-06-04T20:23:02.637Z INFO Waiting for the "Show more" button.
2020-06-04T20:23:02.688Z INFO Clicking the "Show more" button.
2020-06-04T20:23:02.689Z INFO Waiting for the "Show more" button.
2020-06-04T20:23:02.743Z INFO Clicking the "Show more" button
它的工作方式应为:
- get results (".box").each(function()
- click on Show more button
- get results (".box").each(function()
- click on Show more button
...
我尝试使用的示例页面:
https://www.alza.sk/search.htm?exps=romet
具有传统的分页和显示更多按钮。
<div class="cpager bottom" id="pagerbottom">
<a class="pgn sel" href="search.htm?exps=romet" id="pgb1">
<span>
1
</span>
</a>
<a class="pgn" href="search-p2.htm?exps=romet" id="pgb2">
<span>
2
</span>
</a>
<a class="pgn" href="search-p3.htm?exps=romet" id="pgb3">
<span>
3
</span>
</a>
<span>
...
</span>
<a class="next fa fa-chevron-right" href="search-p2.htm?exps=romet" id="pgby2">
<span>
</span>
</a>
</div>
和:
<div class="moreblock" id="loadmoreInner">
<a class="js-button-more button-more btnx normal" href="0-p2.htm">
<span class="">
24 ďalších...
</span>
</a>
<a class="goToTop" href="javascript:void(0);">
Hore
</a>
</div>