htmlAgilityPack Load()方法什么都不做

时间:2016-02-17 10:26:58

标签: c# winforms html-agility-pack

我正在使用htmlAgilityPack来获取网站的内容:

private String getImageUrl(String websiteUrl)
{
    HtmlAgilityPack.HtmlDocument docHtml = new HtmlWeb().Load(websiteUrl);
    // ...
}

我没有到达第二行的断点,第一行没有抛出异常。该URL已存在且可用。

docHtml也不是null,该行似乎没有执行,只是杀了我的线程。

什么可以使这种情况发生/如何获得有关正在发生的事情的更多信息?

edit:从一个类调用该函数,该类由我的main-form实例化。调用类运行一个线程。对于我的班级的第一个实例,它正在工作,但不适用于第二个实例。

class Image
{
    BackgroundWorker downloadWorker = new BackgroundWorker();

    private String fileName;
    private String directory;
    private String url;

    RichTextBox rtxtStatus;


    public Image(String _fileName, String _directory, String _url)
    {
        fileName = _fileName;
        directory = _directory;
        url = _url;

        downloadWorker.WorkerReportsProgress = true;

        downloadWorker.WorkerSupportsCancellation = true;

        downloadWorker.DoWork += new DoWorkEventHandler(worker_doWork);
        downloadWorker.ProgressChanged += new ProgressChangedEventHandler(worker_progressChanged);
        downloadWorker.RunWorkerCompleted += new RunWorkerCompletedEventHandler(worker_runWorkerCompleted);
    }

    private void worker_doWork(object sender, DoWorkEventArgs e)
    {
        download();
    }

    private void download()
    {
        WebClient downloadClient = new WebClient();

        if (!Directory.Exists(directory))
        {
            MessageBox.Show("Directory to save image not found.");
        }
        else
        {
            HttpWebRequest HttpReq = (HttpWebRequest)WebRequest.Create(url);

            HttpWebResponse response;
            try
            {
                response = (HttpWebResponse)HttpReq.GetResponse();
            }
            catch (WebException ex)
            {
                response = (HttpWebResponse)ex.Response;
            }

            if (response.StatusCode == HttpStatusCode.OK)
            {
                string image = getImageUrl(url);

                // Replace HTML-Characters
                image = WebUtility.HtmlDecode(image);
                string saveName = directory + @"\" + fileName + ".png";
            }
            try
            {
                downloadClient.DownloadFile(image, saveName);
            }
            catch (Exception)
            {
                MessageBox.Show("Error while downloading");
            }                                       
        }
    }

    private void worker_progressChanged(object sender, ProgressChangedEventArgs e)
    {
        // Nothing to do
    }

    private void worker_runWorkerCompleted(object sender, RunWorkerCompletedEventArgs e)
    {
        if (e.Error != null)
        {
            MessageBox.Show("Download cancelled, please check URL.");
        }
        rtxtStatus.AppendText("\nDownload finished.");
    }



    Properties.Settings.Default.SaveFileLocation = directory;
    Properties.Settings.Default.Save();
}

// this method is called from outside on image-object.
public void downloadImage(RichTextBox _rtxtStatus)
{
    rtxtStatus = _rtxtStatus;
    if (!downloadWorker.IsBusy)
    {
        downloadWorker.RunWorkerAsync();
    }
    else
    {
        MessageBox.Show("Download already running.");
    }
}
private String getImageUrl(String websiteUrl)
{
    HtmlAgilityPack.HtmlDocument docHtml = new HtmlWeb().Load(websiteUrl);
    var nodes = docHtml.DocumentNode.SelectNodes("//img");
    return nodes[0].Attributes["src"].Value;
}

1 个答案:

答案 0 :(得分:0)

也许您尝试访问的网站需要启用Cookie。向CookieContainer对象添加HtmlWeb并尝试使用Load方法。

HtmlWeb htmlWeb = new HtmlWeb();
htmlWeb.PreRequest += request =>
    {
        request.CookieContainer = new System.Net.CookieContainer();
        return true;
    };
var htmlDoc = htmlWeb.Load(yourUrl);