找到已保存的numpy数组(.npy或.npz)的形状而不加载到内存中

时间:2016-03-14 14:51:25

标签: python numpy io

我有一个巨大的压缩numpy数组保存到磁盘(内存中约20GB,压缩时少得多)。我需要知道这个数组的形状,但我没有可用的内存来加载它。如何在不将其加载到内存中的情况下找到numpy数组的形状?

2 个答案:

答案 0 :(得分:5)

这样做:

import numpy as np
import zipfile

def npz_headers(npz):
    """Takes a path to an .npz file, which is a Zip archive of .npy files.
    Generates a sequence of (name, shape, np.dtype).
    """
    with zipfile.ZipFile(npz) as archive:
        for name in archive.namelist():
            if not name.endswith('.npy'):
                continue

            npy = archive.open(name)
            version = np.lib.format.read_magic(npy)
            shape, fortran, dtype = np.lib.format._read_array_header(npy, version)
            yield name[:-4], shape, dtype

答案 1 :(得分:3)

mmap_mode中打开文件可能会有所帮助。

    If not None, then memory-map the file, using the given mode
    (see `numpy.memmap` for a detailed description of the modes).
    A memory-mapped array is kept on disk. However, it can be accessed
    and sliced like any ndarray.  Memory mapping is especially useful for
    accessing small fragments of large files without reading the entire
    file into memory.

也可以在不读取数据缓冲区的情况下读取标题块,但这需要深入挖掘基础lib/npyio/format代码。我在最近的SO问题中探讨过将多个数组存储在一个文件中(并阅读它们)。

https://stackoverflow.com/a/35752728/901925