在Puppeteer请求拦截期间手动更改响应URL

时间:2018-07-30 14:58:37

标签: javascript web-scraping puppeteer

对于特定的用例,我很难用puppeteer浏览相对URL。您可以在下面看到基本设置和描述问题的伪示例。

基本上,我想更改浏览器认为他所在的当前URL。

我已经尝试过的:

  1. 通过自己解析所有相对URL来操纵响应主体。与一些基于JavaScript的链接发生冲突。
  2. 如果请求url与响应url不匹配,则触发一个新的page.goto(response.url)并返回上一个请求的响应。似乎无法输入自定义选项,因此我不知道哪个请求是伪造的page.goto。

有人可以帮我吗?预先感谢。

设置:

const browser = await puppeteer.launch({
    headless: false,
});

const [page] = await browser.pages();

await page.setRequestInterception(true);

page.on('request', (request) => {
    const resourceType = request.resourceType();

    if (['document', 'xhr', 'script'].includes(resourceType)) {

        // fetching takes place on an different instance and handles redirects internally
        const response = await fetch(request);

        request.respond({
             body: response.body,
             statusCode: response.statusCode,
             url: response.url // no effect
        });
    } else {
        request.abort('aborted');
    }
});

导航:

await page.goto('https://start.de');

// redirects to https://redirect.de
await page.click('a'); 

// relative href '/demo.html' resolves to https://start.de/demo.html instead of https://redirect.de/demo.html
await page.click('a'); 

更新1

解决方案 通过window.location操纵浏览器历史记录方向。

await page.goto('https://start.de');

// redirects to https://redirect.de internally
await page.click('a'); 

// changing current window location
await page.evaluate(() => {
    window.location.href = 'https://redirect.de';
});

// correctly resolves to https://redirect.de/demo.html instead of https://start.de/demo.html
await page.click('a');

1 个答案:

答案 0 :(得分:0)

当您匹配要编辑其主体的请求时,只需获取URL并使用“ node-fetch”或“ request”模块进行调用,则在收到主体编辑后将其作为响应发送给原始请求。

例如:

const requestModule = require("request");
const cheerio = require("cheerio");

page.on("request", async (request) => {
  // Match the url that you want
  const isMatched = /page-12/.test(request.url());

  if (isMatched) {
    // Make a new call
    requestModule({
      url: request.url(),
      resolveWithFullResponse: true,
    })
      .then((response) => {
        const { body, headers, statusCode, statusMessage } = response;
        const contentType = headers["content-type"];

        // Edit body using cheerio module
        const $ = cheerio.load(body);
        $("a").each(function () {
          $(this).attr("href", "/fake_pathname");
        });

        // Send response
        request.respond({
          ok: statusMessage === "OK",
          status: statusCode,
          contentType,
          body: $.html(),
        });
      })
      .catch(() => request.continue());
  } else request.continue();
});