Question

我有许多HDF5个文件，每个文件都包含相同形状的大型数据集。我想将所有这些数据集写入单个文件中，并将附加维度作为相应数据集的索引。以下代码完全符合我的要求：

with h5py.File("merged.mat", "w") as output_file:
    new_data = output_file.create_dataset("data", shape + (len(input_files),), dtype=dtype)
    for index, file in enumerate(input_files):
        with h5py.File(file, "r") as m_file:
            defocus_data[:,:,:,:,index] = m_file['data']
            output_file.flush()

但是，步骤defocus_data[:,:,:,:,index] = m_file['data']需要很长时间（input_file大小为1GB的分钟数。有什么办法，我可以加快这个过程吗？我知道有一些低级函数，例如h5.h5o.copy，但不知道如何在这里应用它们。

h5py：加速HDF5数据集的连接

0 个答案: