Tarfile-在tar归档文件中打开tar文件,并从其中提取文件-python

时间:2019-08-09 17:44:43

标签: python tar archive tarfile

我有几个tar档案...其中一些有另一个tar档案。我编写了一个代码以从存档中提取特定文件。到目前为止,它仍然有效,但是当脚本从嵌套存档中提取文件时,提取文件仍然是存档。但是当我尝试手动打开它时,它说档案已损坏。当我手动提取文件时,该文件有效。

#Files in one Folder without checking for existant files (work stable!)
import tarfile
import os, os.path
from pathlib import Path

#time 
time = "2350"

#working dir
windows = "C:/Users/Elisabeth/Desktop"
ubuntu = "/home/elisabeth/Dokumente/master/radolan_data/raw"
download_directory = "/radolan_downloads" #Directory where files will be saved

os.chdir(ubuntu + download_directory) 

#Actual Working Dir
print("Actual Working dir:", os.getcwd()) 

#All files inside Working dir
files = os.listdir()
print("Files inside this folder: ", len(files))


#Iterate through folders get tar archiv names loop through them and extract only with specified time
tar_files = [x for x in files if ".tar.gz" in x]
print("Tar files inside this folder: ", len(tar_files))
for file in tar_files:
    print("Open tar: ", file)
    tar = tarfile.open(file)
    names = tar.getnames()
    print(len(names), "files are inside the tar")
    names_f = [x for x in names if time in x]
    if len(names) == 1:
        tar_final = tarfile.open(fileobj=tar.extractfile(names[0]))
        names_final = tar_final.getnames()
        print(len(names_final), "files inside second tar")
        names_f_final = [x for x in names_final if time in x]
        tar.extractall(members=[x for x in tar_final.getmembers() if x.name in names_f_final])
        print("Finish with extraction of files: ", names_f_final)
        continue
    else:
        tar.extractall(members=[x for x in tar.getmembers() if x.name in names_f])
        print("Finish with extraction of files: ", names_f)
        continue

其他部分可以很好地工作,它可以解压缩正确的文件,并且该文件是可读的二进制文件。如果部分也提取文件,文件名也必须像文件名一样,但是它说它是一种存档类型,当我用存档处理程序打开它时说存档已损坏?我无法上传tar归档文件,因为它有几个GB。也许是因为我在if部分的内存中打开了tar对象?

0 个答案:

没有答案