Question

我正在尝试找到文件 testing.txt

第一个文件存在于：sub/hbc_cube/college/

第二个文件存在于：sub/hbc/college

但是，当搜索文件存在的位置时，我不能假定字符串“ hbc”，因为名称可能会因用户而有所不同。因此，我试图找到一种方法

通过（如果路径为

）

sub / _cube / college /

失败（如果路径为

）

sub / * /大学

但是我不能使用全局字符（），因为（）会将_cube视为失败。我试图找出一个仅检测字符串而不检测带下划线的正则表达式（例如，hbc_cube）。

我尝试使用python regex字典，但无法找出要使用的正确正则表达式

file_list = lookupfiles(['testing.txt'], dirlist = ['sub/'])
for file in file_list:
     if str(file).find('_cube/college/') #hbc_cube/college
            print("pass")
     if str(file).find('*/college/')     #hbc/college
            print("fail")

如果文件在两个位置都存在，我只希望“失败”打印。问题在于*字符正在计数hbc_cube。

Answer 1

glob模块是您的朋友。您甚至不需要匹配多个目录，glob会为您完成：

from glob import glob

testfiles = glob("sub/*/testing.txt")

if len(testfiles) > 0 and all("_cube/" in path for path in testfiles):
    print("Pass")
else:
    print("Fail")

在不太明显的情况下，测试all("_cube/" in path for path in testfiles)将满足以下要求：

如果文件在两个位置都存在，我只希望“失败”打印。问题是*字符正在计数hbc_cube。

如果某些匹配的路径不包含_cube，则测试失败。由于您想了解导致测试失败的文件，因此您不能仅在包含*_cube的路径中搜索文件-您必须同时检索好路径和坏路径，并进行检查如图所示。

当然，您可以缩短上面的代码，或通过组合文件夹列表和文件列表等中的选项来将其概括化，以构造全局路径，具体取决于您的情况。

请注意，re模块提供了“完整正则表达式”，而glob模块使用了更简单的“ glob”。如果您要查看文档，请不要混淆它们。

Answer 2

os模块非常适合于此：

import os

# This assumes your current working directory has sub in it
for root, dirs, files in os.walk('sub'):
    for file in files:
        if file=='testing.txt':
            # print the file and the directory it's in
            print(root + file)

os.walk将在迭代时返回一个三元素元组：根目录，该当前文件夹中的目录以及该当前文件夹中的文件。要打印目录，请结合根（cwd）和文件名。

例如，在我的机器上：

for root, dirs, files in os.walk(os.getcwd()):
     for file in files:
             if file.endswith('ipynb'):
                     print(root + file)


# returns
/Users/mm92400/Salesforce_Repos/DataExplorationClustersAndTime.ipynb
/Users/mm92400/Salesforce_Repos/DataExplorationUntitled1.ipynb
/Users/mm92400/Salesforce_Repos/DataExplorationExploratory.ipynb
/Users/mm92400/Salesforce_Repos/DataExplorationUntitled3.ipynb
/Users/mm92400/Salesforce_Repos/DataExplorationUntitled.ipynb
/Users/mm92400/Salesforce_Repos/DataExplorationUntitled4.ipynb
/Users/mm92400/Salesforce_Repos/DataExplorationUntitled2.ipynb
/Users/mm92400/Salesforce_Repos/DataExplorationClusterAnalysis.ipynb

Answer 3

使用pathlib解析路径，从路径对象获取父对象，这将丢弃/college部分，并检查路径字符串是否以_cube结尾

from pathlib import Path

file_list = lookupfiles(['testing.txt'], dirlist = ['sub/'])
for file in file_list:
     path = Path(file)
     if str(path.parent).endswith('_cube'):
         print('pass')
     else:
         print('Fail')

编辑：

如果for循环中的file变量包含文件名（sub/_cube/college/testing.txt），只需在路径上调用父对象两次，path.parent.parent

另一种方法是过滤lookupfiles()中的文件，即如果您可以访问该功能并可以对其进行编辑

正则表达式以查找特定的文件路径

3 个答案: