从webbrowser中选择特定文本c#

时间:2014-08-05 14:34:08

标签: c# html winforms text webbrowser-control

所以我很想在Win-forms中使用浏览器,而且我遇到了一个特定的问题。

我想做什么,浏览器打开一个页面(我已经走到了这一步)。页面打开后,必须导航到特定部分(它位于页面中间的某个位置)并选择它。然后在我需要的时候复制并存储它,只需要文本。

我已经能够使用以下代码选择页面上的所有文字作为示例:

    WebBrowser wb = (WebBrowser)sender;
    wb.Document.ExecCommand("SelectAll", false, null);
    wb.Document.ExecCommand("Copy", false, null);
    richTextBox1.Text = Clipboard.GetText();

它可以适用于我的程序,但我想知道是否有更好的方法可以选择我需要的文本或信息。如果可以,请将它们放在文本框中,或直接放入我的数据库中。

这是指向页面的链接:http://www.lolking.net/news/league-trends-jul30

我想从页面的这些部分中选择并获取信息:

冠军选择率 - 前5名增加和减少

冠军赢率 - 前5名增加和减少

冠军禁令率 - 前5名增加和减少

任何帮助都将不胜感激。

4 个答案:

答案 0 :(得分:1)

你的foreach循环将如下所示:

foreach (var item in list_ban)
{

       string rtbpicker = item.ToString();

       foreach (var comp in list_Comp)
       {

           int count = 0; //Counts for the number of occurences

           foreach (Match m in Regex.Matches(rtbpicker, "" + comp.ToString() + ""))
           {

               int matchindex = m.Index;
               int matchlength = m.Length;

               rtbpicker = rtbpicker.Insert(matchindex + matchlength + count, " "); //Count just moves the index forward by however many postions the original index was shifted

               if(Regex.Matches(rtbpicker, "" + comp.ToString() + "").Count > 1)
               {
                  count++;

               }         

            }

       }
            richTextBox6.Text += rtbpicker + "\n";
            //rtbBan.AppendText(rtbpicker + System.Environment.NewLine);
}

答案 1 :(得分:0)

我还没有完整的解决方案,但我可以帮助你一点:

从FULLY LOADED webBrowser获取纯文本并在richTextBox1中写入后,您可以将3部分打印到其他文本框中:

        private void button_Click(object sender, EventArgs e)
    {
        List<string> rawhtml = new List<string>(); //List for the whole page
        List<string> list_pick = new List<string>(); //PICK section
        List<string> list_win = new List<string>(); //WIN section
        List<string> list_ban = new List<string>(); //BAN section
        rawhtml = richTextBox1.Lines.ToList(); //FILL the page to list
        int ID_pick = 0;
        int ID_win = 0;
        int ID_ban = 0;
        int ID_cmt = 0; // We need to specify the end of BAN section
        for (int i = 0; i < rawhtml.Count; i++) //Search for the line number of section-start
        {
            if (rawhtml[i] == "Champion Pick Rates") ID_pick = i;
            if (rawhtml[i] == "Champion Win Rates") ID_win = i;
            if (rawhtml[i] == "Champion Ban Rates") ID_ban = i;
            if (rawhtml[i].Contains("Comments")) ID_cmt = i;
        }
        // PICK
        for (int i = ID_pick; i < ID_pick + (ID_win - ID_pick); i++) //Calculate the start and the end line-number
        {
            list_pick.AddRange(Regex.Split(rawhtml[i], "(?<=[)])")); //Split the five characters, without losing the ')'
        }
        foreach (var item in list_pick)
        {
            richTextBox2.AppendText(item + System.Environment.NewLine); //Optinal: Add to richtextbox
        }
        // WIN
        for (int i = ID_win; i < ID_win + (ID_ban - ID_win); i++)
        {
            list_win.AddRange(Regex.Split(rawhtml[i], "(?<=[)])"));
        }
        foreach (var item in list_win)
        {
            richTextBox3.AppendText(item + System.Environment.NewLine);
        }
        // BAN
        for (int i = ID_ban; i < ID_ban + (ID_cmt - ID_ban); i++)
        {
            list_ban.AddRange(Regex.Split(rawhtml[i], "(?<=[)])"));
        }
        foreach (var item in list_ban)
        {
            richTextBox4.AppendText(item + System.Environment.NewLine);
        }
    }

此代码将从“冠军赢率”输出,如:


  

冠军胜利率

     

前五大增幅

     

Urgot41.38% - &GT; 43.67%(+ 2.29%)

     

Kennen47.7% - &GT; 49.28%(+ 1.58%)

     

Lucian51.61% - &GT; 53.1%(+ 1.49%)

     

Singed48.95% - &GT; 50.31%(+ 1.36%)

     

Fiora53.48% - &GT; 54.71%(+ 1.23%)

     

前五大减少

     

Kassadin48.7% - &GT; 46.67%( - 2.03%)

     

Galio53.18% - &GT; 51.42%( - 1.76%)

     

Cho'Gath48.03% - &GT; 46.37%( - 1.66%)

     

Corki50.05% - &GT; 48.43%( - 1.62%)

     

Graves49.49% - &GT; 47.98%( - 1.51%)


好多了......;)

我遇到了空格问题,但我还是无法解决。

我希望您理解这一点,如果您有任何疑问请发表评论!

Ps。:对不好的英国人抱歉

Pss。:我知道这不是完整的解决方案,但我必须与您分享:)

答案 2 :(得分:0)

现在是完整的解决方案,拥有完美的空间。正则表达式对我来说很难,但我认为这更简单,但也更长。

private void btnspace_Click(object sender, EventArgs e)
{
    richTextBox6.Text = null;
    for (int i = 0; i < list_ban.Count; i++)
    {
        string rebuilder = ""; //for the output string (one line)
        List<char> temp_chars = list_ban[i].ToCharArray().ToList(); //split one line into char sequence
        int number_occur = 0; //occurence counter for numbers
        int minus_occur = 0;// occurence counter for '-'
        for (int j = 0; j < temp_chars.Count; j++)
        {
            // NUMBERS
            // I don't wanted to hardcode the champions :/
            if (number_occur < 2 && (temp_chars[j] == '1' || temp_chars[j] == '2' || temp_chars[j] == '3' || temp_chars[j] == '4' || temp_chars[j] == '5' || temp_chars[j] == '6' || temp_chars[j] == '7' || temp_chars[j] == '8' || temp_chars[j] == '9' || temp_chars[j] == '0')) //looks pretty, isn't?
            {
                temp_chars.Insert(j, ' '); //insert a space into char seq
                j = j + 5; // in the longest case: 12.34, so skip 5 char, or 1 2. 3 4
                number_occur = number_occur + 1; //for the difference percentage we don't need spaces, so insert by number only twice
            }
            // NUMBERS DONE
        }
        for (int j = 0; j < temp_chars.Count; j++)
        {
            // ( and -
            if (temp_chars[j] == '-' || temp_chars[j] == '(')
            {
                if (temp_chars[j] == '-') minus_occur = minus_occur + 1; //if the difference is negative, there will be one more minus, which doesn't need space
                if (minus_occur <= 1) temp_chars.Insert(j, ' ');
                j = j + 1; //avoid endless loop
            }
            // ( and - DONE
        }
        foreach (var item in temp_chars)
        {
            rebuilder = rebuilder + item; //rebuild the line from the char list, with spaces
        }
        list_ban.RemoveAt(i); //replace the old spaceless lines...
        list_ban.Insert(i, rebuilder);
        richTextBox1.AppendText(list_ban[i] + System.Environment.NewLine);
    }
}

我希望它清楚,我试着评论一切。祝你好运,随意问。请提及它是否有效,因为我想完美地回答这个问题:D

答案 3 :(得分:0)

好的,所以这是我的最终解决方案,它100%有效,它需要你的第一个答案,你可以看到; p并使用我的regex.matches。我认为我添加到foreach循环中的部分可以在方法中完成,因此您可以在需要时随时调用它。我还没到那个! :)

    private void button3_Click(object sender, EventArgs e)
    {

        List<string> rawhtml = new List<string>(); //List for the whole page
        List<string> list_pick = new List<string>(); //PICK section
        List<string> list_win = new List<string>(); //WIN section
        List<string> list_ban = new List<string>(); //BAN section
        List<string> list_Comp = new List<string>(); //Champion names
        fillchamplist(list_Comp);
        rawhtml = richTextBox1.Lines.ToList(); //FILL the page to list
        int ID_pick = 0;
        int ID_win = 0;
        int ID_ban = 0;
        int ID_cmt = 0; // We need to specify the end of BAN section
        for (int i = 0; i < rawhtml.Count; i++) //Search for the line number of section-start
        {
            if (rawhtml[i] == "Champion Pick Rates") ID_pick = i;
            if (rawhtml[i] == "Champion Win Rates") ID_win = i;
            if (rawhtml[i] == "Champion Ban Rates") ID_ban = i;
            if (rawhtml[i].Contains("Comments")) ID_cmt = i;
        }
        // PICK
        for (int i = ID_pick; i < ID_pick + (ID_win - ID_pick); i++) //Calculate the start and the end line-number
        {
            list_pick.AddRange(Regex.Split(rawhtml[i], "(?<=[)])")); //Split the five characters, without losing the ')'
        }
        foreach (var item in list_pick)
        {
            string rtbpicker = item.ToString();
            foreach (var comp in list_Comp)
            {
                int count = 0; //To see which match we working with later
                foreach (Match m in Regex.Matches(rtbpicker, "" + comp.ToString() + "")) // Checks for all matches and cycles through them
                {
                    if (count == 2) // if the count == 2, it means that its on its 3rd match(the one we dont wana give a space to
                    {
                    }
                    else // puts the space in
                    {
                        int matchindex = m.Index;
                        int matchlength = m.Length;
                        if (m.Length >= 2) // only champ names are >=2
                        {
                            rtbpicker = rtbpicker.Insert(matchindex + matchlength + count, "\t"); 
                        }
                        else
                        {
                            rtbpicker = rtbpicker.Insert(matchindex + matchlength + count, " "); // the count variable updates he index so the space doesnt occur before the % sign
                        }


                        if (Regex.Matches(rtbpicker, "" + comp.ToString() + "").Count > 0)// just to update the index for the 2nd %
                        {
                            count++;


                        }
                    }
                }

            }
            rtbPick.AppendText(rtbpicker + System.Environment.NewLine); //Optinal: Add to richtextbox
        }
        // WIN
        for (int i = ID_win; i < ID_win + (ID_ban - ID_win); i++)
        {
            list_win.AddRange(Regex.Split(rawhtml[i], "(?<=[)])"));
        }
        foreach (var item in list_win)
        {
            string rtbpicker = item.ToString();
            foreach (var comp in list_Comp)
            {
                int count = 0;
                foreach (Match m in Regex.Matches(rtbpicker, "" + comp.ToString() + ""))
                {
                    if (count == 2)
                    {
                    }
                    else
                    {
                        int matchindex = m.Index;
                        int matchlength = m.Length;
                        if (m.Length >= 2)
                        {
                            rtbpicker = rtbpicker.Insert(matchindex + matchlength + count, "\t");
                        }
                        else
                        {
                            rtbpicker = rtbpicker.Insert(matchindex + matchlength + count, " ");
                        }


                        if (Regex.Matches(rtbpicker, "" + comp.ToString() + "").Count > 0)
                        {
                            count++;


                        }
                    }
                }

            }
            rtbWin.AppendText(rtbpicker + System.Environment.NewLine);
        }
        // BAN
        for (int i = ID_ban; i < ID_ban + (ID_cmt - ID_ban); i++)
        {
            list_ban.AddRange(Regex.Split(rawhtml[i], "(?<=[)])"));
        }
        foreach (var item in list_ban)
        {
            string rtbpicker = item.ToString();
            foreach (var comp in list_Comp)
            {
                int count = 0;
                foreach (Match m in Regex.Matches(rtbpicker, "" + comp.ToString() + ""))
                {
                    if (count == 2)
                    {
                    }
                    else
                    {
                        int matchindex = m.Index;
                        int matchlength = m.Length;
                        if (m.Length >= 2)
                        {
                            rtbpicker = rtbpicker.Insert(matchindex + matchlength + count, "\t");
                        }
                        else
                        {
                            rtbpicker = rtbpicker.Insert(matchindex + matchlength + count, " ");
                        }


                        if (Regex.Matches(rtbpicker, "" + comp.ToString() + "").Count > 0)
                        {
                            count++;


                        }
                    }
                }

            }
            rtbBan.AppendText(rtbpicker + System.Environment.NewLine);
        }
    }

这是结果:(由于某种原因,这里没有显示标签)

  

冠军选择率

     

前五大增幅

     

Lucian 27.75% - &gt; 32.3%(+ 4.55%)

     

Ahri 8.7% - &gt; 11.3%(+ 2.6%)

     

Rengar 11.25% - &gt; 13.84%(+ 2.59%)

     

Nidalee 10.7% - &gt; 12.93%(+ 2.23%)

     

Tristana 30.07% - &gt; 32.02%(+1.95%)

     

前五大减少

     

Caitlyn 34.44% - &gt; 30.63%( - 3.81%)

     

Vayne 17.25% - &gt; 15.69%( - 1.56%)

     

Ezreal 15.08% - &gt; 13.6%( - 1.48%)

     

Renekton 13.84% - &gt; 12.6%( - 1.24%)

     

Lee Sin 30.54% - &gt; 23.36%( - 7.18%)

好的:D对我来说很完美,但那是因为我知道我想要的结果是针对这个具体的事情。您的方法也有效,我实际上会推荐它用于场景。

如果你有任何问题,不要害怕问嘿:)