如何延迟fetch()直到网站加载动态内容

时间:2018-04-02 02:22:33

标签: javascript google-chrome-extension web-scraping fetch-api dynamic-content

我有一个chrome扩展程序。每当用户点击扩展程序的按钮时,它将下载以下URL的来源:“smmry.com/(用户当前活动标签的网址)”

我正在使用以下javascript代码以html文件的形式下载URL的来源。当用户单击我的扩展程序按钮时,此代码当前运行(变量URL是我的扩展程序可以下载的假设URL。在这种情况下,用户实际上将浏览cnn.com/(path_to_news_article),但扩展名将被下载:smmry.com/ {{}}}):

let URL = 'https://smmry.com/https://www.cnn.com/2018/04/01/politics/ronald-kessler-jake-tapper-interview/index.html#&SM_LENGTH=7'
    fetch(URL)
        .then((resp) => resp.text())
        .then(responseText => {
           download("website_source.html", responseText)
        })

function download(filename, text) {

    var element = document.createElement('a');
    element.setAttribute('href', 'data:text/plain;charset=utf-8,' + encodeURIComponent(text));
    element.setAttribute('download', filename);

    element.style.display = 'none';
    document.body.appendChild(element);

    element.click();

    document.body.removeChild(element);
}

以下是网页的来源:https://www.cnn.com/(path_to_news_article)

但是,正如您可以看到的那样,如果您访问该网页,有时网页会花费很少的时间(最多几秒钟)来总结文章。这篇文章不太引人注意 - 但通常粉红色的加载栏会在粉红色的框中上下移动,直到摘要被创建并显示在网站上。

我相信我的代码在完成文章总结之前正在下载网站的源代码,因此我的程序下载的HTML文件不包含文章的摘要。

如果fetch()网站https://smmry.com完成文章https://www.cnn.com/2018/04/01/politics/ronald-kessler-jake-tapper-interview/index.html的总结,我怎样才能确保manifest.json请求仅下载网站内容。

修改:我的{ "manifest_version": 2, "name": "Summarizer", "version": "1.0", "description": "Summarizes webpages", "permissions": [ "tabs", "downloads", "*://*.smmry.com/*" ], "icons": { "48": "icons/border-48.png" }, "browser_action": { "browser_style": true, "default_popup": "popup/choose_page.html", "default_icon": { "16": "icons/summarizer-icon-16.png", "32": "icons/summarizer-icon-32.png" } } } 文件。

formGroups

2 个答案:

答案 0 :(得分:0)

我认为您正在寻找document.onload

也许你需要像这样做一些事情:

document.onload = () => { 
    let URL = 'https://smmry.com/https://www.cnn.com/2018/04/01/politics/ronald-kessler-jake-tapper-interview/index.html#&SM_LENGTH=7'

    fetch(URL)
    .then((resp) => resp.text())
    .then(responseText => {
       download("website_source.html", responseText)
    })

    const download = (filename, text) => {

    const element = document.createElement('a');
    element.setAttribute('href', 'data:text/plain;charset=utf-8,' + encodeURIComponent(text));
    element.setAttribute('download', filename);

    element.style.display = 'none';
    document.body.appendChild(element);

    element.click();

     document.body.removeChild(element);
    }
};

onload将等待页面,然后您可以进行提取

答案 1 :(得分:0)

使用addEventListener并清理代码:

function main(){
  const URL = 'https://smmry.com/https://www.cnn.com/2018/04/01/politics/ronald-kessler-jake-tapper-interview/index.html#&SM_LENGTH=7'
  fetch(URL)
    .then(resp => resp.text())
    .then(responseText => download("website_source.html", responseText));

  function download(filename, text) {
    const element = document.createElement('a');
    element.href = 'data:text/plain;charset=utf-8,' + encodeURIComponent(text);
    element.setAttribute('download', filename);
    element.style.display = 'none';
    document.body.appendChild(element);
    element.click();
    element.remove();
  }
}
document.addEventListener('DOMContentLoaded', main);