Question

我想在“项目”目录中递归搜索“反馈报告”文件夹，如果该文件夹没有更多子目录，我希望以特定方式处理文件。

到达目标目录后，我想在该目录中找到最新的反馈report.xlsx（其中将包含许多以前的版本）

数据真的很大，并且目录结构不一致。我相信以下算法应该使我接近理想的行为，但仍不确定。我已经尝试过将多个草率的代码脚本转换为json路径层次结构，然后从中进行解析，但是不一致导致代码确实庞大且不可读

文件的路径很重要。

我要实现的算法是：

dictionary_of_files_paths = {}
def recursive_traverse(path):

    //not sure if this is a right base case
    if(path.isdir):    
        if re.match(dir_name, *eedback*port*) and dir has no sub directory:
          process(path,files)
          return

    for contents in os.listdir(path):
        recursive_traverse(os.path.join(path, contents)) 

    return

def process(path,files):

    files.filter(filter files only with xlsx)
    files.filter(filter files only that have *eedback*port* in it)
    files.filter(os.path.getmtime > 2016)
    files.sort(key=lambda x:os.path.getmtime(x))
    reversed(files)
    dictionary_of_files_paths[path] = files[0]

recursive_traverse("T:\\Something\\Something\\Projects")

在实际实施之前，我需要指导，并且需要验证这是否正确。

我从stackoverflow获得了另一个用于路径层次结构的代码段

try:
    for contents in os.listdir(path):
        recursive_traverse(os.path.join(path, contents)) 
except OSError as e:
    if e.errno != errno.ENOTDIR:
        raise
    //file

Answer 1

使用pathlib和glob。

测试目录结构：

.
├── Untitled.ipynb
├── bar
│   └── foo
│       └── file2.txt
└── foo
    ├── bar
    │   └── file3.txt
    ├── foo
    │   └── file1.txt
    └── test4.txt

代码：

from pathlib import Path
here = Path('.')
for subpath in here.glob('**/foo/'):
    if any(child.is_dir() for child in subpath.iterdir()):
        continue # Skip the current path if it has child directories
    for file in subpath.iterdir():
        print(file.name)
        # process your files here according to whatever logic you need

输出：

file1.txt
file2.txt

递归代码遍历python和过滤文件中的目录

1 个答案: