使用HTMLAgilityPack将孤立文本放入标记中

时间:2014-08-28 12:09:48

标签: c# html html-agility-pack

如何转换像这样的一段HTML的语法

<div>
     some text
     <br/>
     goes in here
     <br/>
     with only br tags
     <br/>
     to separate it
     <br/>
</div>

到这个

<div>
     <p>some text</p>
     <p>goes in here</p>
     <p>with only br tags</p>
     <p>to separate it</p>
</div>

在c#中使用HTML Agility Pack?

2 个答案:

答案 0 :(得分:1)

一种可能的方式:

var html = @"<div>
     some text
     <br/>
     goes in here
     <br/>
     with only br tags
     <br/>
     to separate it
     <br/>
</div>";
var doc = new HtmlDocument();
doc.LoadHtml(html);
var div = doc.DocumentNode.SelectSingleNode("div");
//select all non-empty text nodes within <div>
var texts = div.SelectNodes("./text()[normalize-space()]");
foreach (var text in texts)
{
    //remove current text node
    text.Remove();
    //replace with : <p>current text node content</p>
    var p = doc.CreateElement("p");
    p.AppendChild(doc.CreateTextNode(text.InnerText));
    div.PrependChild(p);
}
//remove all <br/> tags within <div>
foreach (var selectNode in div.SelectNodes("./br"))
{
    selectNode.Remove();
}
//print result
Console.WriteLine(doc.DocumentNode.OuterHtml);

答案 1 :(得分:0)

我采用了稍微不同的方法,将div的innerHTML视为文本,我使用<br>将其拆分。这有点像黑客,但它的工作原理。

var html = @"<div>
     some text
     <br/>
     goes in here
     <br/>
     with only br tags
     <br/>
     to separate it
     <br/>
</div>";

var doc = new HtmlDocument();
doc.LoadHtml(html);

var divs = doc.DocumentNode.Descendants("div");
//select all non-empty text nodes within <div>

foreach (var div in divs)
{
    // create a list of p nodes
    var ps = new List<HtmlNode>();

    // split text by "<br>"
    var texts = div.InnerHtml.Split(new string[]{ "<br>" }, StringSplitOptions.None);

    // iterate over split text
    foreach (var text in texts)
    {
        // if the line is not empty, add it to the collection
        if (!string.IsNullOrEmpty(text.Trim()))
        {
            var p = doc.CreateElement("p");
            p.AppendChild(doc.CreateTextNode(text));
            ps.Add(p);
        }
    }

    // join the p collection and paste it into the div
    div.InnerHtml = string.Join("", ps.Select(x => x.OuterHtml));
}
相关问题