Question

假设我有一个由一系列对象组成的文件格式，其中每个对象都有一个以下格式的标题：

public struct FileObjectHeader {
    //The type of the object (not important for this question, but it exists)
    public byte TypeID;
    //The length of the object's data, which DOES NOT include the size of the header.
    public UInt16 Length;
}

后跟具有指定长度的数据。

我首先通过创建每个对象和对象标题的位置列表来读取这些数据：

struct FileObjectIndex {
    public FileObjectHeader Header;
    public long Location;
}

public List<FileObject> ReadObjects(Stream s) {
    List<FileObjectReference> objectRefs = new List<FileObjectReference>();

    try {
        while (true) {
            FileObjectHeader header = ReadObjectHeader(s); 
            //The above advances the stream by the size of the header as well.
            FileObjectReference reference = new FileObjectReference() { Header = header, Position = stream.Position };
            objectRefs.add(reference);
            //Advance the stream to the next object's header.
            s.Seek(header.Length, SeekOrigin.Current);
        }
    } catch (EndOfStreamException) {
        //Do nothing as this is an expected case
    }

    //Now we'd read all of the objects that we've previously located.
    //This code isn't too important for the question but I'm including it for reference.
    List<FileObject> objects = new List<FileObject>();
    foreach (var reference in objectRefs) {
        s.seek(reference.Location, SeekOrigin.Begin);

        objects.add(ReadObject(reference.Header, s));
    }

    return objects;
}

一些注意事项：

ReadObjectHeader和ReadObject方法将抛出EndOfStreamException，如果它们无法读取所有需要的数据（IE，如果它们到达流的末尾）。
我在这里使用Seek是因为对象可以引用其他对象，并且会有逻辑来确保在子对象之前加载父对象（文件格式不能保证父对象位于子对象之前）。我不在上面的示例代码中包含它，因为它会使示例变得复杂但不会改进它。
在大多数情况下，这可能是只读的FileStream，但我也不能保证这一点。但是，对于这种情况，我主要担心的是FileStreams。

我的问题是：

由于我使用的是FileStream.seek，因此会使用搜索原因，超出流的末尾并无限期地扩展文件？根据文件：

您可以搜索超出流的长度的任何位置。当您寻找超出文件长度的文件时，文件大小会增加。在Windows NT和更高版本中，添加到文件末尾的数据设置为零。在Windows 98或更早版本中，添加到文件末尾的数据不会设置为零，这意味着以前删除的数据对流可见。

说明的方式，似乎它可以在不延伸的情况下扩展文件，从而导致文件不断增长，因为它从头部读取3个字节。在实践中，似乎并没有发生，但我想确认它不会发生。

Answer 1

FileStream.Read()的文档却说：

返回值
  类型：System.Int32
  读入缓冲区的总字节数。如果该字节数当前不可用，则可能小于请求的字节数，如果到达流末尾，则为零。

因此，我强烈怀疑（但你应该自己验证一下）这种寻求超越最终只适用于你之后写入文件的情况。这是有道理的 - 如果你知道你需要它，你可以保留空间，而不是实际写入任何东西（这会很慢）。

然而，在阅读时，我的猜测是你应该得到0作为回报并且不会读取任何数据。此外，没有文件扩展。

Answer 2

要简单回答您的问题，以下代码不会使您的文件增长。但是它会抛出新的EndOfStreamException（）。只有在文件末尾之外的位置写入才能使文件增长。当文件增长时，文件当前结束和写入开始之间的数据将用零填充（除非您启用了稀疏标志，在这种情况下它将被标记为未分配）。

using (var fileStream = new FileStream("f", FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.None))
{
    var buffer = new byte[10];
    fileStream.Seek(10, SeekOrigin.Begin);
    var bytesRead = fileStream.Read(buffer, 0, 10);
    if (bytesRead == 0) {
        throw new EndOfStreamException();
    }
}

由于您正在阅读/编写二进制结构化数据，我建议三件事：

您的二进制结构化数据应该在磁盘块中具有整数个元素。在大多数系统上，这是4096 MSDN。这样做将允许CLR直接从FileSystem缓存中将数据读入缓冲区。

使用MemoryMappedFile和不安全的指针来访问您的数据（如果您的应用仅在Windows上运行）。您也可以使用ViewAccessor，但由于互操作所产生的额外副本，您可能会发现这比自己进行缓存要慢。如果你走的是不安全的路线，这里的代码将很快填满你的结构：

internal static class Native
{
    [DllImport("kernel32.dll", EntryPoint = "CopyMemory", SetLastError = false)]
    private static unsafe extern void CopyMemory(void *dest, void *src, int count);

    private static unsafe byte[] Serialize(TestStruct[] index)
    {
        var buffer = new byte[Marshal.SizeOf(typeof(TestStruct)) * index.Length];
        fixed (void* d = &index[0])
        {
            fixed (void* s = &buffer[0])
            {
                CopyMemory(d, s, buffer.Length);
            }
        }

        return buffer;
    }
}

以这种方式使用FileStream.seek是否安全？

2 个答案: