递归dircmp(比较两个目录以确保它们具有相同的文件和子目录)

时间:2010-11-15 18:25:30

标签: python recursion

从我观察到的filecmp.dircmp 递归,但不满足我的需求,至少在py2中。我想比较两个目录及其所有包含的文件。这是否存在,或者我是否需要构建(例如,使用os.walk)。我更喜欢预先构建,其他人已经完成了单元测试:)

实际的“比较”可能很草率(例如,忽略权限),如果有帮助的话。

我想要布尔值,report_full_closure是打印报告。它也只是常见的子目录。 AFIAC,如果左边或右边的任何东西只有那些是不同的目录。我使用os.walk来构建它。

13 个答案:

答案 0 :(得分:21)

这是使用filecmp模块的比较函数的替代实现。它使用递归而不是os.walk,因此它更简单一些。但是,它并不仅仅通过使用common_dirssubdirs属性来递归,因为在这种情况下,我们将隐式使用文件比较的默认“浅层”实现,这可能不是您想要的。在下面的实现中,当比较具有相同名称的文件时,我们总是只比较它们的内容。

import filecmp
import os.path

def are_dir_trees_equal(dir1, dir2):
    """
    Compare two directories recursively. Files in each directory are
    assumed to be equal if their names and contents are equal.

    @param dir1: First directory path
    @param dir2: Second directory path

    @return: True if the directory trees are the same and 
        there were no errors while accessing the directories or files, 
        False otherwise.
   """

    dirs_cmp = filecmp.dircmp(dir1, dir2)
    if len(dirs_cmp.left_only)>0 or len(dirs_cmp.right_only)>0 or \
        len(dirs_cmp.funny_files)>0:
        return False
    (_, mismatch, errors) =  filecmp.cmpfiles(
        dir1, dir2, dirs_cmp.common_files, shallow=False)
    if len(mismatch)>0 or len(errors)>0:
        return False
    for common_dir in dirs_cmp.common_dirs:
        new_dir1 = os.path.join(dir1, common_dir)
        new_dir2 = os.path.join(dir2, common_dir)
        if not are_dir_trees_equal(new_dir1, new_dir2):
            return False
    return True

答案 1 :(得分:14)

filecmp.dircmp是要走的路。但它没有比较两个比较目录中使用相同路径找到的文件的内容。相反,filecmp.dircmp仅查看文件属性。由于dircmp是一个类,因此您使用dircmp子类修复它,并覆盖其phase3函数,该函数会比较文件以确保比较内容,而不是仅比较os.stat属性。< / p>

import filecmp

class dircmp(filecmp.dircmp):
    """
    Compare the content of dir1 and dir2. In contrast with filecmp.dircmp, this
    subclass compares the content of files with the same path.
    """
    def phase3(self):
        """
        Find out differences between common files.
        Ensure we are using content comparison with shallow=False.
        """
        fcomp = filecmp.cmpfiles(self.left, self.right, self.common_files,
                                 shallow=False)
        self.same_files, self.diff_files, self.funny_files = fcomp

然后你可以用它来返回一个布尔值:

import os.path

def is_same(dir1, dir2):
    """
    Compare two directory trees content.
    Return False if they differ, True is they are the same.
    """
    compared = dircmp(dir1, dir2)
    if (compared.left_only or compared.right_only or compared.diff_files 
        or compared.funny_files):
        return False
    for subdir in compared.common_dirs:
        if not is_same(os.path.join(dir1, subdir), os.path.join(dir2, subdir)):
            return False
    return True

如果您想重复使用此代码段,则特此专用于您选择的Public Domain或Creative Commons CC0(除了SO提供的默认许可CC-BY-SA)。

答案 2 :(得分:5)

report_full_closure()方法是递归的:

comparison = filecmp.dircmp('/directory1', '/directory2')
comparison.report_full_closure()

编辑:在OP编辑之后,我会说最好只使用filecmp中的其他功能。我认为os.walk是不必要的;最好简单地通过common_dirs等产生的列表进行递归,尽管在某些情况下(大型目录树),如果实施不当,这可能会导致Max Recursion Depth错误。

答案 3 :(得分:3)

这是一个带递归函数的简单解决方案:

import filecmp

def same_folders(dcmp):
    if dcmp.diff_files:
        return False
    for sub_dcmp in dcmp.subdirs.values():
        return same_folders(sub_dcmp)
    return True

same_folders(filecmp.dircmp('/tmp/archive1', '/tmp/archive2'))

答案 4 :(得分:2)

dircmp可以是递归的:请参阅report_full_closure

据我所知dircmp没有提供目录比较功能。不过,编写自己的内容会非常容易;在left_only上使用right_onlydircmp来检查目录中的文件是否相同,然后在subdirs属性上进行递归。

答案 5 :(得分:2)

比较布局dir1和dir2的另一种解决方案,忽略文件内容

请参阅此处的要点:https://gist.github.com/4164344

编辑:这是代码,以防因为某些原因导致gist丢失:

import os

def compare_dir_layout(dir1, dir2):
    def _compare_dir_layout(dir1, dir2):
        for (dirpath, dirnames, filenames) in os.walk(dir1):
            for filename in filenames:
                relative_path = dirpath.replace(dir1, "")
                if os.path.exists( dir2 + relative_path + '\\' +  filename) == False:
                    print relative_path, filename
        return

    print 'files in "' + dir1 + '" but not in "' + dir2 +'"'
    _compare_dir_layout(dir1, dir2)
    print 'files in "' + dir2 + '" but not in "' + dir1 +'"'
    _compare_dir_layout(dir2, dir1)


compare_dir_layout('xxx', 'yyy')

答案 6 :(得分:0)

以下是我的解决方案:gist

def dirs_same_enough(dir1,dir2,report=False):
    ''' use os.walk and filecmp.cmpfiles to
    determine if two dirs are 'same enough'.

    Args:
        dir1, dir2:  two directory paths
        report:  if True, print the filecmp.dircmp(dir1,dir2).report_full_closure()
                 before returning

    Returns:
        bool

    '''
    # os walk:  root, list(dirs), list(files)
    # those lists won't have consistent ordering,
    # os.walk also has no guaranteed ordering, so have to sort.
    walk1 = sorted(list(os.walk(dir1)))
    walk2 = sorted(list(os.walk(dir2)))

    def report_and_exit(report,bool_):
        if report:
            filecmp.dircmp(dir1,dir2).report_full_closure()
            return bool_
        else:
            return bool_

    if len(walk1) != len(walk2):
        return false_or_report(report)

    for (p1,d1,fl1),(p2,d2,fl2) in zip(walk1,walk2):
        d1,fl1, d2, fl2 = set(d1),set(fl1),set(d2),set(fl2)
        if d1 != d2 or fl1 != fl2:
            return report_and_exit(report,False)
        for f in fl1:
            same,diff,weird = filecmp.cmpfiles(p1,p2,fl1,shallow=False)
            if diff or weird:
                return report_and_exit(report,False)

    return report_and_exit(report,True)

答案 7 :(得分:0)

def same(dir1, dir2):
"""Returns True if recursively identical, False otherwise

"""
    c = filecmp.dircmp(dir1, dir2)
    if c.left_only or c.right_only or c.diff_files or c.funny_files:
        return False
    else:
        safe_so_far = True
        for i in c.common_dirs:
            same_so_far = same_so_far and same(os.path.join(frompath, i), os.path.join(topath, i))
            if not same_so_far:
                break
        return same_so_far

答案 8 :(得分:0)

基于python issue 12932filecmp documentation,您可以使用以下示例:

import os
import filecmp

# force content compare instead of os.stat attributes only comparison
filecmp.cmpfiles.__defaults__ = (False,)

def _is_same_helper(dircmp):
    assert not dircmp.funny_files
    if dircmp.left_only or dircmp.right_only or dircmp.diff_files or dircmp.funny_files:
        return False
    for sub_dircmp in dircmp.subdirs.values():
       if not _is_same_helper(sub_dircmp):
           return False
    return True

def is_same(dir1, dir2):
    """
    Recursively compare two directories
    :param dir1: path to first directory 
    :param dir2: path to second directory
    :return: True in case directories are the same, False otherwise
    """
    if not os.path.isdir(dir1) or not os.path.isdir(dir2):
        return False
    dircmp = filecmp.dircmp(dir1, dir2)
    return _is_same_helper(dircmp)

答案 9 :(得分:0)

这将检查文件是否位于相同位置,以及文件内容是否相同。无法正确验证空的子文件夹。

import filecmp
import glob
import os

path_1 = '.'
path_2 = '.'

def folders_equal(f1, f2):
    file_pairs = list(zip(
        [x for x in glob.iglob(os.path.join(f1, '**'), recursive=True) if os.path.isfile(x)],
        [x for x in glob.iglob(os.path.join(f2, '**'), recursive=True) if os.path.isfile(x)]
    ))

    locations_equal = any([os.path.relpath(x, f1) == os.path.relpath(y, f2) for x, y in file_pairs])
    files_equal = all([filecmp.cmp(*x) for x in file_pairs]) 

    return locations_equal and files_equal

folders_equal(path_1, path_2)

答案 10 :(得分:0)

由于只需要True或False结果,如果您安装了diff

def are_dir_trees_equal(dir1, dir2):
    process = Popen(["diff", "-r", dir1, dir2], stdout=PIPE)
    exit_code = process.wait()
    return not exit_code

答案 11 :(得分:0)

这个递归函数似乎对我有用:

def has_differences(dcmp):
    differences = dcmp.left_only + dcmp.right_only + dcmp.diff_files
    if differences:
        return True
    return any([has_differences(subdcmp) for subdcmp in dcmp.subdirs.values()])

假设我没有忽略任何东西,如果你想知道目录是否相同,你可以否定结果:

from filecmp import dircmp

comparison = dircmp("dir1", "dir2")
same = not has_differences(comparison)

答案 12 :(得分:0)

致任何正在寻找简单图书馆的人:

https://github.com/mitar/python-deep-dircmp

DeepDirCmp 基本上是 filecmp.dircmp 的子类,并显示与 diff -qr dir1 dir2 相同的输出。

用法:

from deep_dircmp import DeepDirCmp

cmp = DeepDirCmp(dir1, dir2)
if len(cmp.get_diff_files_recursive()) == 0:
    print("Dirs match")
else:
    print("Dirs don't match")