Python遍历子目录查找文件对

时间:2018-07-28 12:12:50

标签: python path directory

我有一个像这样的深层子文件夹结构:

a/b/file1.txt
a/b/file1.doc
a/b/file2.txt
a/b/file2.doc
a/c/file3.txt
a/c/file3.doc
a/c/d/file4.txt
a/c/d/file4.doc

我想提取所有的.txt和.doc文件对-例如成一个元组列表-文件名是相同的,只是文件类型不同。

到目前为止,我想出的最好的方法似乎不太有效:

files = []
for root, dirs, files in os.walk(path):
    for filename in files:
        if os.path.isdir(os.path.join(os.path.abspath("."), filename)):
            file_list = os.listdir(filename)
            file_list_copy = file_list.copy()
            #for each in file_list of type .txt
            # find .doc of same name in file_list_copy
            #add the 2 to tuple nd append to list

1 个答案:

答案 0 :(得分:0)

可能不是最有效的,但是可以起作用:

使用shell命令将类型移动到单独的文件夹(同时运行txt和doc扩展名以创建2个文件夹):

find /path-to-files-root/ -type f -name '*.txt' -exec mv -i {} /new-path-to-files/txt/ \;

然后我跑了

def get_all_files(path, pattern):
#see https://stackoverflow.com/questions/17282887/getting-files-with-same-name-irrespective-of-their-extension
    datafiles = []
    for root,dirs,files in os.walk(path):
        for file in fnmatch.filter(files, pattern):
            datafiles.append(file)
    return datafiles

txt_files = [f for f in os.listdir(txt_path) if isfile(join(txt_path, f))]
doc_files = [f for f in os.listdir(doc_path) if isfile(join(doc_path, f))]
for i, txt_file in enumerate(txt_files):
    filename = (os.path.splitext(txt_file)[0])
    doc_files = get_all_files(doc_path, '{0}.doc'.format(filename))
    if len(doc_files)== 1:
        doc_file = doc_files[0]
        #do something with txt_file and doc_file