从Microsoft SEO Toolkit解析XML文档

时间:2013-11-04 20:17:01

标签: c# xml class

我有一个由Microsoft SEO工具包生成的XML文件,格式如下:

<?xml version="1.0" encoding="utf-8"?>
<urls>
<url url="First URL">
<violations>
  <violation code="HasBrokenLinks" url2="First URL - 1st Broken Link" />
</violations>
</url>
<url url="Second URL">
<violations>
  <violation code="HasBrokenLinks" url2="Second URL - 1st Broken Link" />
  <violation code="HasBrokenLinks" url2="Second URL - 2nd Broken Link" />
  <violation code="HasBrokenLinks" url2="Second URL - 3rd Broken Link" />
  <violation code="HasBrokenLinks" url2="Second URL - 4th Broken Link" />
  <violation code="HasBrokenLinks" url2="Second URL - 5th Broken Link" />
  <violation code="HasBrokenLinks" url2="Second URL - 6th Broken Link" />
</violations>
</url>
</urls>

我正在尝试使用c#app解析结果,并将其输出类似于:

URL:  First URL
Broken Links: First URL - 1st Broken Link

URL: Second URL
Broken Links: Second URL - 1st Broken Link
Second URL - 2nd Broken Link
Second URL - 3rd Broken Link
Second URL - 4th Broken Link
Second URL - 5th Broken Link
Second URL - 6th Broken Link

我的课程定义如下:

public class WebpageErrors
{
    public String SourceURL { get; set; }
    private static List<string> BrokenLinkList = new List<string>();

    public void BrokenLinkStore(string BrokenLink)
    {
        BrokenLinkList.Add(BrokenLink);
    }
    public List<string> BrokenLinkReturner
    {
        get { return BrokenLinkList; }
    }

}

然后开始迭代xml:

        // Generate an array to store a list of URLs
        List<WebpageErrors> errorList = new List<WebpageErrors>();

        // File to open up, can be an URL too
        string XmlFileUrl = @path;
        using (XmlReader reader = new XmlTextReader(XmlFileUrl))
        {
            //Define a new object to store errors in
            WebpageErrors Error = new WebpageErrors();

            // Loop the reader, till it cant read anymore
            while (reader.Read())
            {
                // An object with the type Element was found.
                if (reader.NodeType == XmlNodeType.Element)
                {

                    // Check name of the node and write the contents in the object accordingly.
                    if (reader.Name == "url")
                    {
                        //Define a new object to store errors in
                        Error = new WebpageErrors();

                        Error.SourceURL = reader["url"];
                    }

                    // Check name of the node and write the contents in the object accordingly.
                    if (reader.Name == "violation")
                    {
                        // Check name of the node and write the contents in the object accordingly.
                        if (reader["code"] == "HasBrokenLinks")
                        {
                            Error.BrokenLinkStore(reader["url2"]);

                        }
                    }
                } 
                else if (reader.NodeType == XmlNodeType.EndElement)
                {
                    if (Error.BrokenLinkReturner.Count > 0) 
                    {
                        errorList.Add(Error);
                    }
                }


            }
        }
        return errorList;

之后,我遍历错误列表并打印出来:

    private static void PrintErrors(List<WebpageErrors> Errors)
    {

        StringBuilder Output = new StringBuilder();

        for (int i = 0; i < Errors.Count; i++)
        {
            Output.Append("Source URL: " + Errors[i].SourceURL + Environment.NewLine);

            List<string> BrokenLinkList = Errors[i].BrokenLinkReturner;
            foreach (String BrokenLink in BrokenLinkList)
            {
                Output.Append("Broken Link: " + BrokenLink + Environment.NewLine);
            }

            Output.Append(Environment.NewLine);
        }

虽然我得到了不同的东西,但没有获得预期的输出:

 Source URL: First URL
 Broken Link: First URL - 1st Broken Link
 Broken Link: Second URL - 1st Broken Link
 Broken Link: Second URL - 2nd Broken Link
 Broken Link: Second URL - 3rd Broken Link
 Broken Link: Second URL - 4th Broken Link
 Broken Link: Second URL - 5th Broken Link
 Broken Link: Second URL - 6th Broken Link

 Source URL: First URL
 Broken Link: First URL - 1st Broken Link
 Broken Link: Second URL - 1st Broken Link
 Broken Link: Second URL - 2nd Broken Link
 Broken Link: Second URL - 3rd Broken Link
 Broken Link: Second URL - 4th Broken Link
 Broken Link: Second URL - 5th Broken Link
 Broken Link: Second URL - 6th Broken Link

 Source URL: Second URL
 Broken Link: First URL - 1st Broken Link
 Broken Link: Second URL - 1st Broken Link
 Broken Link: Second URL - 2nd Broken Link
 Broken Link: Second URL - 3rd Broken Link
 Broken Link: Second URL - 4th Broken Link
 Broken Link: Second URL - 5th Broken Link
 Broken Link: Second URL - 6th Broken Link

 Source URL: Second URL
 Broken Link: First URL - 1st Broken Link
 Broken Link: Second URL - 1st Broken Link
 Broken Link: Second URL - 2nd Broken Link
 Broken Link: Second URL - 3rd Broken Link
 Broken Link: Second URL - 4th Broken Link
 Broken Link: Second URL - 5th Broken Link
 Broken Link: Second URL - 6th Broken Link

 Source URL: Second URL
 Broken Link: First URL - 1st Broken Link
 Broken Link: Second URL - 1st Broken Link
 Broken Link: Second URL - 2nd Broken Link
 Broken Link: Second URL - 3rd Broken Link
 Broken Link: Second URL - 4th Broken Link
 Broken Link: Second URL - 5th Broken Link
 Broken Link: Second URL - 6th Broken Link

我似乎无法弄清楚为什么我的输出如此搞砸了。它必须与创建WebpageErrors对象有关?谁能帮我理解我做错了什么?

由于 布拉德

1 个答案:

答案 0 :(得分:0)

我的第一个问题似乎是我不想要:

private static List<string> BrokenLinkList = new List<string>();

而是:

private List<string> BrokenLinkList = new List<string>();

(没有我班级的静态声明)。这会为每个对象创建唯一列表,而不是共享的违规列表。

第二个问题是我有:

else if (reader.NodeType == XmlNodeType.EndElement)

哪个EndElements比我预期的要多。相反,我需要将其更改为:

else if (reader.Name == "url" && reader.NodeType == XmlNodeType.EndElement)

如果元素是url且它也是EndElement

,那么只获得匹配