XPath查询以ID

时间:2017-10-11 12:29:39

标签: c# xml xpath xml-parsing lookup

我正在编写使用XPath查询解析XML文件的类。 XML可能看起来像这样:

<?xml version="1.0" encoding="UTF-8"?>
<Doc>
    <Name id="aa">Alice</Name>
    <Name id="bb">Bob</Name>
    <Name id="cc">Candice</Name>
    <Person nameid="aa"></Person>
    <Person nameid="bb"></Person>
    <Person nameid="aa"></Person>
</Doc>

所需的输出是:

Alice
Bob
Alice

我使用C#来解析这些人:

// these are dynanically defined elsewhere.
const string personXPath = "/Doc/Person";
const string nameXPath = "/Doc/Name[@id=current()/@nameid]"; // <== modify this line

void ParseXDocument(XDocument doc)
{
    foreach (var personElement in doc.XPathSelectElements(personXPath))
    {
        var nameElement = personElement.XPathSelectElement(nameXPath);
        Console.WriteLine(nameElement.Value);
    }
}

这可以通过修改nameXPath变量来实现吗? (我的软件不应该知道&#34; XML结构,将XML映射到我自己的类的唯一方法是x路径,它们是可配置的。)

另一个例子:

[TestMethod]
public void TestLibrary()
{
    string xmlFromMessage = @"<Library>
        <Writer ID=""writer1""><Name>Shakespeare</Name></Writer>
        <Writer ID=""writer2""><Name>Tolkien</Name></Writer>
        <Book><WriterRef REFID=""writer1"" /><Title>Sonnet 18</Title></Book>
        <Book><WriterRef REFID=""writer2"" /><Title>The Hobbit</Title></Book>
        <Book><WriterRef REFID=""writer2"" /><Title>Lord of the Rings</Title></Book>
         </Library>"; 

    var titleXPathFromConfigurationFile = "./Title"; 
    var writerXPathFromConfigurationFile = "??? what to put here ???";

    var library = ExtractBooks(xmlFromMessage, titleXPathFromConfigurationFile, writerXPathFromConfigurationFile).ToDictionary(b => b.Key, b => b.Value);

    Assert.AreEqual("Shakespeare", library["Sonnet 18"]);
    Assert.AreEqual("Tolkien", library["The Hobbit"]);
    Assert.AreEqual("Tolkien", library["Lord of the Rings"]);
}

public IEnumerable<KeyValuePair<string,string>> ExtractBooks(string xml, string titleXPath,  string writerXPath)
{
    var library = XDocument.Parse(xml);
    foreach(var book in library.Descendants().Where(d => d.Name == "Book"))
    {
        var title = book.XPathSelectElement(titleXPath).Value;
        var writer = book.XPathSelectElement(writerXPath).Value;
        yield return new KeyValuePair<string, string>(title, writer);
    }
}

2 个答案:

答案 0 :(得分:0)

您应该将从第一个XPath获得的值放到第二个表达式中。

const string personXPath = "/Doc/Person";
const string nameXPath = "/Doc/Name[@id='{0}']";


foreach (var personElement in doc.XPathSelectElements(personXPath))
{
    var nameid = personElement.Attribute("nameid").Value;
    var nameElement = doc.XPathSelectElement(string.Format(nameXPath, nameid));
    Console.WriteLine(nameElement.Value);
}

答案 1 :(得分:0)

新答案。

下面的旧答案

Sombody正确指出:

  • .NET,期间
  • 不支持XPath 2.0
  • 数据模型和查询语言是分开的。

所以我通过使用第三方XPath 2库XPath2 nuget package来解决它。这允许像

这样的表达式
for $c in . return ../Writer[@ID=$c/WriterRef/@REFID]/Name

请注意,我需要使用从书到作家的相对路径。这样做工作:

# does not work due to the absolute path
for $c in . return /Library/Writer[@ID=$c/WriterRef/@REFID]/Name

供将来参考:此代码在安装nuget pacage之后有效:

using Microsoft.VisualStudio.TestTools.UnitTesting;
using System.Collections.Generic;
using System.Linq;
using System.Xml.Linq;
using Wmhelp.XPath2;

namespace My.Library
{
    [TestClass]
    class WmhelpTests
    {
        [TestMethod]
        public void LibraryTest()
        {
            string xmlFromMessage = @"<Library>
                <Writer ID=""writer1""><Name>Shakespeare</Name></Writer>
                <Writer ID=""writer2""><Name>Tolkien</Name></Writer>
                <Book><WriterRef REFID=""writer1"" /><Title>King Lear</Title></Book>
                <Book><WriterRef REFID=""writer2"" /><Title>The Hobbit</Title></Book>
                <Book><WriterRef REFID=""writer2"" /><Title>Lord of the Rings</Title></Book>
            </Library>";

            var titleXPathFromConfigurationFile = "./Title";
            var writerXPathFromConfigurationFile = "for $curr in . return ../Writer[@ID=$curr/WriterRef/@REFID]/Name";

            var library = ExtractBooks(xmlFromMessage, titleXPathFromConfigurationFile, writerXPathFromConfigurationFile).ToDictionary(b => b.Key, b => b.Value);

            Assert.AreEqual("Shakespeare", library["King Lear"]);
            Assert.AreEqual("Tolkien", library["The Hobbit"]);
            Assert.AreEqual("Tolkien", library["Lord of the Rings"]);
        }

        public IEnumerable<KeyValuePair<string, string>> ExtractBooks(string xml, string titleXPath, string writerXPath)
        {
            var library = XDocument.Parse(xml);
            foreach (var book in library.Descendants().Where(d => d.Name == "Book"))
            {
                var title = book.XPath2SelectElement(titleXPath).Value;
                var writer = book.XPath2SelectElement(writerXPath).Value;
                yield return new KeyValuePair<string, string>(title, writer);
            }
        }
    }
}

低于我的回答

我使用了一个脏修复:在我的xpath中,我将“current()”替换为实际值。这样,当前函数的行为类似于the xslt-standard

class MyClass
{

    // these are dynanically defined elsewhere.
    const string personXPath = "/Doc/Person";
    const string nameXPath = "/Doc/Name[@id=current()/@nameid]"; 
    XElement _node;

    void ParseXDocument(XDocument doc)
    {
        foreach (var personElement in doc.XPathSelectElements(personXPath))
        {
            _node = personElement; // my actual code is a bit cleaner
            var nameElement = personElement.XPathSelectElement(PreParse(nameXPath));
            Console.WriteLine(nameElement.Value);
        }
    }

    /// <summary>
    /// Pre-evaluates calls to current()
    /// </summary>
    /// <param name="xpath"></param>
    /// <returns></returns>
    private string PreParse(string xpath)
    {
        var sb = new StringBuilder();
        foreach (var part in Tokenize(xpath))
        {
            if (part.Trim().StartsWith("current()"))
            {
                var query = part.Replace("current()", ".");
                sb.Append("'")
                    .Append(EvaluateXPath(query))
                    .Append("'");
            }
            else
            {
                sb.Append(part);
            }
        }
        return sb.ToString();
    }

    private IEnumerable<string> Tokenize(string path)
    {
        var begin = 0;
        for (var i = 0; i < path.Length; i++)
        {
            if ("[=]".Contains(path[i]))
            {
                yield return path.Substring(begin, i - begin);
                yield return path[i].ToString();
                begin = i + 1;
            }
        }
        yield return path.Substring(begin);
    }

    private string EvaluateXPath(string xpath)
    {
        var result = _node.XPathEvaluate(xpath);
        if (result is IEnumerable)
            foreach (var node in (IEnumerable)result)
                return (node as XElement)?.Value ?? (node as XAttribute).Value;
        return string.Format(CultureInfo.InvariantCulture, "{0}", result);
    }
}