递归代码遍历python和过滤文件中的目录

时间:2018-08-30 16:16:49

标签: python-3.x lambda subdirectory os.path

我想在“项目”目录中递归搜索“反馈报告”文件夹,如果该文件夹没有更多子目录,我希望以特定方式处理文件。

到达目标目录后,我想在该目录中找到最新的反馈report.xlsx(其中将包含许多以前的版本)

数据真的很大,并且目录结构不一致。我相信以下算法应该使我接近理想的行为,但仍不确定。我已经尝试过将多个草率的代码脚本转换为json路径层次结构,然后从中进行解析,但是不一致导致代码确实庞大且不可读

文件的路径很重要。

我要实现的算法是:

dictionary_of_files_paths = {}
def recursive_traverse(path):

    //not sure if this is a right base case
    if(path.isdir):    
        if re.match(dir_name, *eedback*port*) and dir has no sub directory:
          process(path,files)
          return

    for contents in os.listdir(path):
        recursive_traverse(os.path.join(path, contents)) 

    return

def process(path,files):

    files.filter(filter files only with xlsx)
    files.filter(filter files only that have *eedback*port* in it)
    files.filter(os.path.getmtime > 2016)
    files.sort(key=lambda x:os.path.getmtime(x))
    reversed(files)
    dictionary_of_files_paths[path] = files[0]

recursive_traverse("T:\\Something\\Something\\Projects")

在实际实施之前,我需要指导,并且需要验证这是否正确。

我从stackoverflow获得了另一个用于路径层次结构的代码段

try:
    for contents in os.listdir(path):
        recursive_traverse(os.path.join(path, contents)) 
except OSError as e:
    if e.errno != errno.ENOTDIR:
        raise
    //file

1 个答案:

答案 0 :(得分:0)

使用pathlibglob

测试目录结构:

.
├── Untitled.ipynb
├── bar
│   └── foo
│       └── file2.txt
└── foo
    ├── bar
    │   └── file3.txt
    ├── foo
    │   └── file1.txt
    └── test4.txt

代码:

from pathlib import Path
here = Path('.')
for subpath in here.glob('**/foo/'):
    if any(child.is_dir() for child in subpath.iterdir()):
        continue # Skip the current path if it has child directories
    for file in subpath.iterdir():
        print(file.name)
        # process your files here according to whatever logic you need

输出:

file1.txt
file2.txt