Question

我有一个包含大量文件的目录：eee2314，asd3442 ...和eph。

我想要使用eph功能排除以glob开头的所有文件。

我该怎么做？

Answer 1

glob的模式规则不是正则表达式。相反，它们遵循标准的Unix路径扩展规则。只有几个特殊字符：两个不同的通配符，并且[来自glob]支持字符范围。

因此，您可以排除一些带有模式的文件例如，要使用glob排除清单文件（以_开头的文件），您可以使用：

files = glob.glob('files_path/[!_]*')

Answer 2

您不能使用glob函数排除模式，globs仅允许包含模式。 Globbing syntax非常有限（即使[!..]字符类必须匹配一个字符，因此对于每个不在的字符，它都是包含模式班级）。

你必须自己进行过滤;列表理解通常在这里很好用：

files = [fn for fn in glob('somepath/*.txt') 
         if not os.path.basename(fn).startswith('eph')]

Answer 3

您可以扣除套装：

set(glob("*")) - set(glob("eph*"))

Answer 4

游戏后期但您可以将filter应用于glob的结果：

files = glob.iglob('your_path_here')
files_i_care_about = filter(lambda x: not x.startswith("eph"), files)

或用适当的正则表达式搜索等替换lambda ......

编辑：我刚刚意识到，如果您使用完整路径startswith无法正常工作，那么您需要一个正则表达式

In [10]: a
Out[10]: ['/some/path/foo', 'some/path/bar', 'some/path/eph_thing']

In [11]: filter(lambda x: not re.search('/eph', x), a)
Out[11]: ['/some/path/foo', 'some/path/bar']

Answer 5

更一般地说，要排除不符合某些shell正则表达式的文件，可以使用模块fnmatch：

import fnmatch

file_list = glob('somepath')    
for ind, ii in enumerate(file_list):
    if not fnmatch.fnmatch(ii, 'bash_regexp_with_exclude'):
        file_list.pop(ind)

上面将首先从给定路径生成一个列表，然后弹出不满足具有所需约束的正则表达式的文件。

Answer 6

与A B C D E 0 -280 -58 -58 -6 25 1 85 85 85 85 85 2 -22 68 68 68 68 3 40 65 65 65 65 4 92 92 92 92 92相比，我建议glob，过滤一种模式非常简单。

pathlib

如果要过滤更复杂的模式，可以定义一个函数来执行此操作，就像：

from pathlib import Path
p = Path(YOUR_PATH)
filtered = [x for x in p.glob('**/*') if not x.name.startswith('eph'))]

使用该代码，您可以过滤以def not_in_pattern(x): return (not x.name.startswith('eph')) and not x.name.startswith('epi')')) filtered = [x for x in p.glob('**/*') if not_in_pattern(x)]开头或以eph开头的所有文件。

Answer 7

正如接受的答案所提到的，你不能用glob排除模式，所以以下是一种过滤你的全局结果的方法。

接受的答案可能是最好的pythonic方式做事但如果你认为列表理解看起来有点难看，并且想要使你的代码最大程度上是numpythonic（就像我做的那样）那么你可以做到这一点（但请注意这是可能效率低于列表理解方法）：

import glob

data_files = glob.glob("path_to_files/*.fits")

light_files = np.setdiff1d( data_files, glob.glob("*BIAS*"))
light_files = np.setdiff1d(light_files, glob.glob("*FLAT*"))

（在我的情况下，我在一个目录中有一些图像帧，偏置帧和平面帧，我只想要图像帧）

Answer 8

如何在遍历文件夹中的所有文件时跳过特定文件！下面的代码将跳过所有以'eph'开头的Excel文件

import glob
import re
for file in glob.glob('*.xlsx'):
    if re.match('eph.*\.xlsx',file):
        continue
    else:
        #do your stuff here
        print(file)

这样，您可以使用更复杂的正则表达式模式在文件夹中包含/排除一组特定的文件。

Answer 9

您可以使用以下方法：

# Get all the files
allFiles = glob.glob("*")
# Files starting with eph
ephFiles = glob.glob("eph*")
# Files which doesnt start with eph
noephFiles = []
for file in allFiles:
    if file not in ephFiles:
        noephFiles.append(file)
# noepchFiles has all the file which doesnt start with eph.

Thank you.

Answer 10

如果字符的位置不重要，例如排除清单文件（无论在何处找到_），其中glob和{{1 }}-regular expression operations，您可以使用：

re

或者以更优雅的方式使用-import glob import re for file in glob.glob('*.txt'): if re.match(r'.*\_.*', file): continue else: print(file)

list comprehension

glob排除模式

10 个答案: