使用C#

时间:2016-01-26 08:17:24

标签: c# html xml xml-parsing linq-to-xml

我有以下XML文件:

<?xml version="1.0" encoding="utf-8"?>
<html>
    <body>
        <p><p>
           <span class="screenitems">
               Close 
               <MCap:variable name="1052.zartzut"></MCap:variable> 
               without prompting if you launch a non-
               <MCap:variable name="zirtZat"></MCap:variable>
               measurement module. (You will be prompted to save any unsaved data.)
               <span lol="scs">dsfsfs</span>
            </span>
        </p></p>
    </body>
</html>

我只想删除<span class="screenitems">和相应的结束标记</span>,因此解析后应该如下所示:

<?xml version="1.0" encoding="utf-8"?>
<html>
    <body>
        <p><p>

               Close 
               <MCap:variable name="1052.zartzut"></MCap:variable> 
               without prompting if you launch a non-
               <MCap:variable name="zirtZat"></MCap:variable>
               measurement module. (You will be prompted to save any unsaved data.)
               <span lol="scs">dsfsfs</span>

        </p></p>
    </body>
</html>

<span class="screenitems">是唯一的唯一标记,因此在此<html>之间,您可以拥有所有内容。您能否使用XDocument中的C#方法帮助我解决此问题?

1 个答案:

答案 0 :(得分:3)

static void Main(string[] args)
{        
string html = @"<?xml version=""1.0"" encoding=""utf-8""?>
<html>
    <body>
        <p><p>
           <span class=""screenitems"">
               Close 
               <MCap:variable name=""1052.zartzut""></MCap:variable> 
               without prompting if you launch a non-
               <MCap:variable name=""zirtZat""></MCap:variable>
               measurement module. (You will be prompted to save any unsaved data.)
               <span lol=""scs"">dsfsfs</span>
            </span>
        </p></p>
    </body>
</html>";

        HtmlDocument doc = new HtmlDocument();
        doc.LoadHtml(html);

        var spanNode = doc.DocumentNode.Descendants().Where(x => x.Name == "span" && x.Attributes["class"].Value == "screenitems").First();

        var parent = spanNode.ParentNode;

        parent.RemoveChild(spanNode, true);

        Console.WriteLine(doc.DocumentNode.OuterHtml);
}

你应该使用Html解析器,在这里你可以用Html Agility Pack做到这一点。这里的诀窍是由parent.RemoveChild(Node, keepGrandChilds);

完成的