使用操纵up获取数据的时间问题

时间:2019-07-01 17:07:38

标签: javascript web-scraping puppeteer

问题

你好开发者

我一直在用伪装者抓取特定页面,尤其是视频部分。我的问题是,拍摄视频的src所需的时间大于10秒。

有没有办法减少等待的时间?

waiting TTFB

代码

如果您发现我尝试执行此请求,请不要使用字体,样式表和图像,以使其更快。

但是等待时间仍然超过10秒

const getAnimeVideo = async (id: string, chapter: number) => {
  const BASE_URL = `${url}${id}/${chapter}/`;
  const browser = await puppeteer.launch({args: ['--no-sandbox', '--disable-setuid-sandbox']});
  const page = await browser.newPage();
  await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36');
  await page.setRequestInterception(true);


  page.on('request', (req) => {
    if(req.resourceType() == 'stylesheet' || req.resourceType() == 'font' || req.resourceType() == 'image'){
      req.abort();
    }
    else{
      req.continue();
    }
  });

  await page.goto(BASE_URL);
  await page.waitFor(10000);
  const elementHandle = await page.waitForSelector('iframe.player_conte');
  const frame = await elementHandle.contentFrame();
  const video = await frame.$eval('#jkvideo_html5_api', el =>
    Array.from(el.getElementsByTagName('source')).map(e => e.getAttribute("src")));
  await page.close();
  await browser.close();
  return video;
}

1 个答案:

答案 0 :(得分:0)

使用cheerio的解决方案

async function getVideoURL(url: string) {
  // This requests the underlying iframe page
  const { data } = await axios.get(url);
  const $ = cheerio.load(data);
  const video = $('video');
  if (video.length) {
    // Sometimes the video is directly embedded
    const src = $(video).find('source').attr('src');
    return src;
  } else {
    // If the video is not embedded, there is obfuscated code that will create a video element
    // Here we run the code to get the underlying video url
    const scripts = $('script');
    // The obfuscated code uses a variable called l which is the window / global object
    const l = global;
    // The obfuscated code uses a variable called ll which is String
    const ll = String;
    const $script2 = $(scripts[1]).html();
    // Kind of dangerous, but the code is very obfuscated so its hard to tell how it decrypts the URL
    eval($script2);
    // The code above sets a variable called ss that is the mp4 URL
    return (l as any).ss;
  }
}

相关问题