(C#)如何在不加载或重写整个文件的情况下修改现有XML文件中的属性值?

时间:2017-04-03 14:07:19

标签: c# .net xml xmlreader xmlwriter

我在XmlWriter和Linq2Xml的帮助下制作了一些巨大的XML文件(几GB)。 此文件的类型为:

<Table recCount="" recLength="">
<Rec recId="1">..</Rec>
<Rec recId="2">..</Rec>
..
<Rec recId="n">..</Rec>
</Table>

在我写完所有内部 Rec 之前,我不知道 Table的 recCount recLength 属性的值节点,所以我必须在最后将值写入这些属性。

现在我正在将所有内部 Rec 节点写入临时文件,计算的属性值并按照我上面显示的方式编写所有内容到结果文件。 (使用所有 Rec 节点复制临时文件中的所有内容)

我想知道是否有办法修改这些属性的值而无需将内容写入另一个文件(就像我现在这样做)或将整个文档加载到内存中(这显然是不可能的,因为这些文件的大小)?

4 个答案:

答案 0 :(得分:1)

严重评论代码。基本的想法是,在第一遍中我们写道:

git checkout lexer

然后我们回到文件的开头,我们重写前三行:

<?xml version="1.0" encoding="utf-8"?>
<Table recCount="$1" recLength="$2">
<!--Reserved space:++++++++++++++++-->
<Rec...

这里重要的“技巧”是你不能“插入”文件,你只能覆盖它。所以我们为数字“保留”了一些空格(<?xml version="1.0" encoding="utf-8"?> <Table recCount="1000" recLength="150"> <!--Reserved space:#############--> 注释。我们有很多方法可以做到这一点......例如,在第一遍中我们可以有:

Reserved space:#############.

然后(xml-legal但丑陋):

<Table recCount="              " recLength="          ">

或者我们可以在表的<Table recCount="1000 " recLength="150 "> 之后添加空格

>

(在<{em} {/ 1>}之后有20个空格

然后:

<Table recCount="" recLength="">                   

(现在 >之后有13个空格

或者我们可以简单地在新行上添加没有<Table recCount="1000" recLength="150"> 的空格...

代码:

>

慢速.NET 3.5方式

在.NET 3.5中,<!-- --> / int maxRecCountLength = 10; // int.MaxValue.ToString().Length int maxRecLengthLength = 10; // int.MaxValue.ToString().Length int tokenLength = 4; // 4 == $1 + $2, see below what $1 and $2 are // Note that the reserved space will be in the form +++++++++++++++++++ string reservedSpace = new string('+', maxRecCountLength + maxRecLengthLength - tokenLength); // You have to manually open the FileStream using (var fs = new FileStream("out.xml", FileMode.Create)) // and add a StreamWriter on top of it using (var sw = new StreamWriter(fs, Encoding.UTF8, 4096, true)) { // Here you write on your StreamWriter however you want. // Note that recCount and recLength have a placeholder $1 and $2. int recCount = 0; int maxRecLength = 0; using (var xw = XmlWriter.Create(sw)) { xw.WriteWhitespace("\r\n"); xw.WriteStartElement("Table"); xw.WriteAttributeString("recCount", "$1"); xw.WriteAttributeString("recLength", "$2"); // You have to add some white space that will be // partially replaced by the recCount and recLength value xw.WriteWhitespace("\r\n"); xw.WriteComment("Reserved space:" + reservedSpace); // <--------- BEGIN YOUR CODE for (int i = 0; i < 100; i++) { xw.WriteWhitespace("\r\n"); xw.WriteStartElement("Rec"); string str = string.Format("Some number: {0}", i); if (str.Length > maxRecLength) { maxRecLength = str.Length; } xw.WriteValue(str); recCount++; xw.WriteEndElement(); } // <--------- END YOUR CODE xw.WriteWhitespace("\r\n"); xw.WriteEndElement(); } sw.Flush(); // Now we read the first lines to modify them (normally we will // read three lines, the xml header, the <Table element and the // <-- Reserved space: fs.Position = 0; var lines = new List<string>(); using (var sr = new StreamReader(fs, sw.Encoding, false, 4096, true)) { while (true) { string str = sr.ReadLine(); lines.Add(str); if (str.StartsWith("<Table")) { // We read the next line, the comment line str = sr.ReadLine(); lines.Add(str); break; } } } string strCount = XmlConvert.ToString(recCount); string strMaxRecLength = XmlConvert.ToString(maxRecLength); // We do some replaces for the tokens int oldLen = lines[lines.Count - 2].Length; lines[lines.Count - 2] = lines[lines.Count - 2].Replace("=\"$1\"", string.Format("=\"{0}\"", strCount)); lines[lines.Count - 2] = lines[lines.Count - 2].Replace("=\"$2\"", string.Format("=\"{0}\"", strMaxRecLength)); int newLen = lines[lines.Count - 2].Length; // Remove spaces from reserved whitespace lines[lines.Count - 1] = lines[lines.Count - 1].Replace(":" + reservedSpace, ":" + new string('#', reservedSpace.Length - newLen + oldLen)); // We move back to just after the UTF8/UTF16 preamble fs.Position = sw.Encoding.GetPreamble().Length; // And we rewrite the lines foreach (string str in lines) { sw.Write(str); sw.Write("\r\n"); } } 想关闭基座StreamReader,因此我必须重新打开该文件的各种时间。这有点慢。

StreamWriter

答案 1 :(得分:1)

尝试使用以下方法。

您可以将默认值设置为外部xml架构中的属性。

创建xml文档时,不要创建这些属性。这是:

var results = Regex.Matches(str, @"(?:(?<=-)-)?\d+\.\d+")
          .Cast<Match>()
          .Select(m => m.Value)
          .ToList();

因此,xml看起来像这样:

int count = 5;
int length = 42;

var writerSettings = new XmlWriterSettings { Indent = true };
using (var writer = XmlWriter.Create("data.xml", writerSettings))
{
    writer.WriteStartElement("Table");

    for (int i = 1; i <= count; i++)
    {
        writer.WriteStartElement("Rec");
        writer.WriteAttributeString("recId", i.ToString());
        writer.WriteString("..");
        writer.WriteEndElement();
    }
}

现在为此文档创建一个xml架构,它将指定所需属性的默认值。

<?xml version="1.0" encoding="utf-8"?>
<Table>
  <Rec recId="1">..</Rec>
  <Rec recId="2">..</Rec>
  <Rec recId="3">..</Rec>
  <Rec recId="4">..</Rec>
  <Rec recId="5">..</Rec>
</Table>

或者更容易创建如下的架构:

string ns = "http://www.w3.org/2001/XMLSchema";
using (var writer = XmlWriter.Create("data.xsd", writerSettings))
{
    writer.WriteStartElement("xs", "schema", ns);

    writer.WriteStartElement("xs", "element", ns);
    writer.WriteAttributeString("name", "Table");

    writer.WriteStartElement("xs", "complexType", ns);
    writer.WriteStartElement("xs", "sequence", ns);

    writer.WriteStartElement("xs", "any", ns);
    writer.WriteAttributeString("processContents", "skip");
    writer.WriteAttributeString("maxOccurs", "unbounded");
    writer.WriteEndElement();

    writer.WriteEndElement();

    writer.WriteStartElement("xs", "attribute", ns);
    writer.WriteAttributeString("name", "recCount");
    writer.WriteAttributeString("default", count.ToString()); // <--
    writer.WriteEndElement();

    writer.WriteStartElement("xs", "attribute", ns);
    writer.WriteAttributeString("name", "recLength");
    writer.WriteAttributeString("default", length.ToString()); // <--
    writer.WriteEndElement();
}

请注意变量XNamespace xs = "http://www.w3.org/2001/XMLSchema"; var schema = new XElement(xs + "schema", new XElement(xs + "element", new XAttribute("name", "Table"), new XElement(xs + "complexType", new XElement(xs + "sequence", new XElement(xs + "any", new XAttribute("processContents", "skip"), new XAttribute("maxOccurs", "unbounded") ) ), new XElement(xs + "attribute", new XAttribute("name", "recCount"), new XAttribute("default", count) // <-- ), new XElement(xs + "attribute", new XAttribute("name", "recLength"), new XAttribute("default", length) // <-- ) ) ) ); schema.Save("data.xsd"); count的撰写 - 应该有您的数据。

生成的架构如下所示:

length

现在,在阅读xml文档时,您必须添加此架构 - 将从中获取默认属性值。

<?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="Table">
    <xs:complexType>
      <xs:sequence>
        <xs:any processContents="skip" maxOccurs="unbounded" />
      </xs:sequence>
      <xs:attribute name="recCount" default="5" />
      <xs:attribute name="recLength" default="42" />
    </xs:complexType>
  </xs:element>
</xs:schema>

结果:

XElement xml;

var readerSettings = new XmlReaderSettings();
readerSettings.ValidationType = ValidationType.Schema; // <--
readerSettings.Schemas.Add("", "data.xsd"); // <--

using (var reader = XmlReader.Create("data.xml", readerSettings)) // <--
{
    xml = XElement.Load(reader);
}
xml.Save(Console.Out);
Console.WriteLine();

答案 2 :(得分:0)

您可以尝试将xml文件加载到数据集中,因为这样可以更轻松地计算属性。此外,内存管理由DataSet层完成。为什么不尝试一下,让我们都知道结果。

答案 3 :(得分:0)

我认为FileStream课程对您有所帮助。看一下Read和Write方法。