查找具有特定HTML类名称的图像

时间:2010-03-12 09:12:03

标签: c# asp.net regex

我有一些标记包含特定类的HTML图像标记。我需要的是找到所有这些图像,在图像周围添加锚标签,将锚点的href属性设置为图像src值(图像路径),最后用新值替换图像src值(我调用将返回此值的方法。)

<p>Some text here <img src="/my/path/image.png" alt="image description" class="featured" />. Some more text and another image that should not be modified <img src="/my/path/image2.png" alt="image description" /></p>

应该成为。

<p>Some text here <a href="/my/path/image.png"><img src="/new/path/from/method.png" alt="image description" class="featured" /></a>. Some more text and another image that should not be modified <img src="/my/path/image2.png" alt="image description" /></p>

2 个答案:

答案 0 :(得分:0)

不要使用RegEx来解析HTML。请参阅this经典SO答案。

使用HTML Agility Pack代替 - 您可以使用XPath查询HTML。

答案 1 :(得分:0)

结束此代码。

using System;

使用System.Reflection; 使用HtmlAgilityPack; 使用log4net;

命名空间Company.Web.Util {     public static class HtmlParser     {         private static readonly ILog _log = LogManager.GetLogger(MethodBase.GetCurrentMethod()。DeclaringType);         private static HtmlDocument _htmlDocument;

    public static string Parse(string input)
    {
        _htmlDocument = new HtmlDocument();

        _htmlDocument.LoadHtml(input);
        ParseNode(_htmlDocument.DocumentNode);

        return _htmlDocument.DocumentNode.WriteTo().Trim();
    }

    private static void ParseChildren(HtmlNode parentNode)
    {
        for (int i = parentNode.ChildNodes.Count - 1; i >= 0; i--)
        {
            ParseNode(parentNode.ChildNodes[i]);
        }
    }

    private static void ParseNode(HtmlNode node)
    {
        if (node.NodeType == HtmlNodeType.Element)
        {
            if (node.Name == "img" && node.HasAttributes)
            {
                for (int i = node.Attributes.Count - 1; i >= 0; i--)
                {
                    HtmlAttribute currentAttribute = node.Attributes[i];
                    if ("class" == currentAttribute.Name && currentAttribute.Value.ToLower().Contains("featured"))
                    {
                        try
                        {
                            string originaleImagePath = node.Attributes["src"].Value;

                            string imageThumbnailPath = GetImageThumbnail(originaleImagePath);

                            var anchorNode = HtmlNode.CreateNode("<a>");
                            var imageNode = HtmlNode.CreateNode("<img>");

                            imageNode.SetAttributeValue("alt", node.Attributes["alt"].Value);
                            imageNode.SetAttributeValue("src", imageThumbnailPath);

                            anchorNode.SetAttributeValue("href", originaleImagePath);

                            anchorNode.AppendChild(imageNode);
                            node.ParentNode.InsertBefore(anchorNode, node);

                            node.ParentNode.RemoveChild(node);
                        }
                        catch (Exception exception)
                        {
                            if (_log.IsDebugEnabled)
                            {
                                _log.WarnFormat("Some message: {0}", exception);
                            }
                        }
                    }
                }
            }
        }

        if (node.HasChildNodes)
        {
            ParseChildren(node);
        }
    }
}

}