Question

我的任务是使用机器人在网站上限制在线人数。条件： 1.机器人应该去网站并尽可能长时间停留在这个页面上（不要断开连接） 2.网站可以同时使用 - WebSockets或长轮询来检查连接（即应支持javascript）

我有无头浏览器（puppeteer）+ node.js

的解决方案

const puppeteer = require('puppeteer');

async function runBot(botsCount = 10, secondsToWait = 60, interval = 1000) {
    let time = secondsToWait * 1000;
    console.log(`Starting chrome...`);
    const browser = await puppeteer.launch({
        args: [
            '--disable-gpu',
            '--no-sandbox',
            '--headless',
            '--disable-web-security',
            '--disable-dev-profile',
            '--disable-dev-shm-usage',
        ]
    })
    for (let i = 1; i <= botsCount; i++) {
        const page = await browser.newPage();
        await page.goto('https://www.example.page/');
        console.log(`Page ${i} created`);
    }
    console.log(`Awaiting for finish...`);
    const savedInterval = setInterval(() => {
        process.stdout.write("\rTime Left:" + (time / 1000) + "       ");
        time -= interval;
        if(time === 0) {
            clearInterval(savedInterval);
            browser.close();
            console.log(`\nFinished`);
        }
    }, interval);
}

runBot();

但这不是一个很好的解决方案，因为每个浏览器窗口使用60MB到120MB的RAM。它非常昂贵......

也许有人遇到过这个并知道一些解决方案，如何更有效地做到这一点？

任何帮助表示赞赏

Answer 1

setRequestInterception API将有助于减少内存消耗。根据您的使用案例，您可能不需要图像，字体，样式表来跟踪在线用户。

详细的API可以在这里找到

https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pagesetrequestinterceptionvalue

我在服务器上做了一个基准测试，使用您的代码访问谷歌。它平均假设大约90 MB RAM。

在我实现请求拦截后，如下所示，它减少了每个线程的10 MB RAM使用率。

await page.setRequestInterception(true);
page.on('request', (request) => {
    if (['image', 'stylesheet', 'font', 'script'].indexOf(request.resourceType()) !== -1) {
        request.abort();
    } else {
        request.continue();
    }
 });

 await page.goto('https://www.google.com/');

希望有所帮助

从ubuntu服务器查看网站（保持连接/风在线人数）

1 个答案: