计算两个Python词典中包含的键的差异

时间:2009-07-22 13:43:13

标签: python dictionary

假设我有两个Python词典 - dictAdictB。我需要查明dictB中是否存在任何密钥,而dictA中是否存在密钥。什么是最快的方法呢?

我应该将字典键转换成一个集合,然后再去吗?

有兴趣了解你的想法......


感谢您的回复。

抱歉没有正确陈述我的问题。 我的情况是这样的 - 我有一个dictA可能与dictB相同,或者与dictB相比可能会丢失一些密钥,否则某些密钥的值可能会有所不同必须设置为dictA键的值。

问题是字典没有标准,并且可以包含可以作为dict字典的值。

dictA={'key1':a, 'key2':b, 'key3':{'key11':cc, 'key12':dd}, 'key4':{'key111':{....}}}
dictB={'key1':a, 'key2:':newb, 'key3':{'key11':cc, 'key12':newdd, 'key13':ee}.......

因此'key2'值必须重置为新值,并且必须在dict中添加'key13'。 键值没有固定格式。它可以是一个简单的价值,也可以是字典或dict的字典。

21 个答案:

答案 0 :(得分:229)

您可以在键上使用设置操作:

diff = set(dictb.keys()) - set(dicta.keys())

这是一个找到所有可能性的类:添加了什么,删除了什么,哪些键值对是相同的,以及哪些键值对被更改。

class DictDiffer(object):
    """
    Calculate the difference between two dictionaries as:
    (1) items added
    (2) items removed
    (3) keys same in both but changed values
    (4) keys same in both and unchanged values
    """
    def __init__(self, current_dict, past_dict):
        self.current_dict, self.past_dict = current_dict, past_dict
        self.set_current, self.set_past = set(current_dict.keys()), set(past_dict.keys())
        self.intersect = self.set_current.intersection(self.set_past)
    def added(self):
        return self.set_current - self.intersect 
    def removed(self):
        return self.set_past - self.intersect 
    def changed(self):
        return set(o for o in self.intersect if self.past_dict[o] != self.current_dict[o])
    def unchanged(self):
        return set(o for o in self.intersect if self.past_dict[o] == self.current_dict[o])

以下是一些示例输出:

>>> a = {'a': 1, 'b': 1, 'c': 0}
>>> b = {'a': 1, 'b': 2, 'd': 0}
>>> d = DictDiffer(b, a)
>>> print "Added:", d.added()
Added: set(['d'])
>>> print "Removed:", d.removed()
Removed: set(['c'])
>>> print "Changed:", d.changed()
Changed: set(['b'])
>>> print "Unchanged:", d.unchanged()
Unchanged: set(['a'])

作为github回购提供: https://github.com/hughdbrown/dictdiffer

答案 1 :(得分:53)

如果你想以递归方式获得差异,我已经为python编写了一个包: https://github.com/seperman/deepdiff

安装

从PyPi安装:

pip install deepdiff

使用示例

导入

>>> from deepdiff import DeepDiff
>>> from pprint import pprint
>>> from __future__ import print_function # In case running on Python 2

相同的对象返回空

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = t1
>>> print(DeepDiff(t1, t2))
{}

项目类型已更改

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:"2", 3:3}
>>> pprint(DeepDiff(t1, t2), indent=2)
{ 'type_changes': { 'root[2]': { 'newtype': <class 'str'>,
                                 'newvalue': '2',
                                 'oldtype': <class 'int'>,
                                 'oldvalue': 2}}}

项目的价值已更改

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:4, 3:3}
>>> pprint(DeepDiff(t1, t2), indent=2)
{'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}}

添加和/或删除了项目

>>> t1 = {1:1, 2:2, 3:3, 4:4}
>>> t2 = {1:1, 2:4, 3:3, 5:5, 6:6}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff)
{'dic_item_added': ['root[5]', 'root[6]'],
 'dic_item_removed': ['root[4]'],
 'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}}

字符串差异

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world"}}
>>> t2 = {1:1, 2:4, 3:3, 4:{"a":"hello", "b":"world!"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'values_changed': { 'root[2]': {'newvalue': 4, 'oldvalue': 2},
                      "root[4]['b']": { 'newvalue': 'world!',
                                        'oldvalue': 'world'}}}

字符串差异2

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world!\nGoodbye!\n1\n2\nEnd"}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n1\n2\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'values_changed': { "root[4]['b']": { 'diff': '--- \n'
                                                '+++ \n'
                                                '@@ -1,5 +1,4 @@\n'
                                                '-world!\n'
                                                '-Goodbye!\n'
                                                '+world\n'
                                                ' 1\n'
                                                ' 2\n'
                                                ' End',
                                        'newvalue': 'world\n1\n2\nEnd',
                                        'oldvalue': 'world!\n'
                                                    'Goodbye!\n'
                                                    '1\n'
                                                    '2\n'
                                                    'End'}}}

>>> 
>>> print (ddiff['values_changed']["root[4]['b']"]["diff"])
--- 
+++ 
@@ -1,5 +1,4 @@
-world!
-Goodbye!
+world
 1
 2
 End

输入类型

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n\n\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'type_changes': { "root[4]['b']": { 'newtype': <class 'str'>,
                                      'newvalue': 'world\n\n\nEnd',
                                      'oldtype': <class 'list'>,
                                      'oldvalue': [1, 2, 3]}}}

列表差异

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3, 4]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{'iterable_item_removed': {"root[4]['b'][2]": 3, "root[4]['b'][3]": 4}}

列表差异2:

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2, 3]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'iterable_item_added': {"root[4]['b'][3]": 3},
  'values_changed': { "root[4]['b'][1]": {'newvalue': 3, 'oldvalue': 2},
                      "root[4]['b'][2]": {'newvalue': 2, 'oldvalue': 3}}}

列出忽略顺序或重复的差异:(使用与上面相同的词典)

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2, 3]}}
>>> ddiff = DeepDiff(t1, t2, ignore_order=True)
>>> print (ddiff)
{}

包含字典的列表:

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:1, 2:2}]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:3}]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'dic_item_removed': ["root[4]['b'][2][2]"],
  'values_changed': {"root[4]['b'][2][1]": {'newvalue': 3, 'oldvalue': 1}}}

设定:

>>> t1 = {1, 2, 8}
>>> t2 = {1, 2, 3, 5}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (DeepDiff(t1, t2))
{'set_item_added': ['root[3]', 'root[5]'], 'set_item_removed': ['root[8]']}

命名元组:

>>> from collections import namedtuple
>>> Point = namedtuple('Point', ['x', 'y'])
>>> t1 = Point(x=11, y=22)
>>> t2 = Point(x=11, y=23)
>>> pprint (DeepDiff(t1, t2))
{'values_changed': {'root.y': {'newvalue': 23, 'oldvalue': 22}}}

自定义对象:

>>> class ClassA(object):
...     a = 1
...     def __init__(self, b):
...         self.b = b
... 
>>> t1 = ClassA(1)
>>> t2 = ClassA(2)
>>> 
>>> pprint(DeepDiff(t1, t2))
{'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}}

添加了对象属性:

>>> t2.c = "new attribute"
>>> pprint(DeepDiff(t1, t2))
{'attribute_added': ['root.c'],
 'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}}

答案 2 :(得分:18)

不确定它是否“快”,但通常可以做到这一点

dicta = {"a":1,"b":2,"c":3,"d":4}
dictb = {"a":1,"d":2}
for key in dicta.keys():
    if not key in dictb:
        print key

答案 3 :(得分:14)

正如Alex Martelli写的那样,如果您只是想检查B中的任何键是否不在A中,any(True for k in dictB if k not in dictA)就可以了。

找到丢失的密钥:

diff = set(dictB)-set(dictA) #sets

C:\Dokumente und Einstellungen\thc>python -m timeit -s "dictA =    
dict(zip(range(1000),range
(1000))); dictB = dict(zip(range(0,2000,2),range(1000)))" "diff=set(dictB)-set(dictA)"
10000 loops, best of 3: 107 usec per loop

diff = [ k for k in dictB if k not in dictA ] #lc

C:\Dokumente und Einstellungen\thc>python -m timeit -s "dictA = 
dict(zip(range(1000),range
(1000))); dictB = dict(zip(range(0,2000,2),range(1000)))" "diff=[ k for k in dictB if
k not in dictA ]"
10000 loops, best of 3: 95.9 usec per loop

所以这两种解决方案的速度几乎相同。

答案 4 :(得分:12)

如果你真正的意思是你所说的(你只需要找出中间的“有任何钥匙”而不是A中,而不是那些可能是那些),最快的方法应该是:< / p>

if any(True for k in dictB if k not in dictA): ...

如果你真的需要找出哪个键,如果有,在B中而不在A中,而不只是“IF”有这样的键,那么现有的答案是非常合适的(但我确实建议将来更精确)如果这确实是你的意思,请提出问题; - )。

答案 5 :(得分:7)

Use set()

set(dictA.keys()).intersection(dictB.keys())

答案 6 :(得分:5)

The top answer by hughdbrown建议使用set difference,这绝对是最好的方法:

diff = set(dictb.keys()) - set(dicta.keys())

这段代码的问题在于它构建两个列表只是为了创建两个集合,因此它浪费了4N时间和2N空间。它也比它需要的复杂一点。

通常,这不是什么大问题,但如果是:

diff = dictb.keys() - dicta

Python 2

在Python 2中,keys()返回键的列表,而不是KeysView。所以你必须直接询问viewkeys()

diff = dictb.viewkeys() - dicta

对于双版本2.7 / 3.x代码,您希望使用six或类似代码,以便使用six.viewkeys(dictb)

diff = six.viewkeys(dictb) - dicta

在2.4-2.6中,没有KeysView。但是你可以通过直接从迭代器中构建你的左集来减少4N到N的成本,而不是先建立一个列表:

diff = set(dictb) - dicta

产品

  

我有一个与dictB相同的dictA,或者与dictB相比可能会丢失一些键,否则某些键的值可能会有所不同

所以你真的不需要比较关键,但是项目。如果值是可清除的,ItemsView只是Set,就像字符串一样。如果是,那很简单:

diff = dictb.items() - dicta.items()

递归差异

虽然问题不是直接要求递归diff,但是一些示例值是dicts,并且看起来预期的输出会递归地区分它们。这里已经有多个答案显示了如何做到这一点。

答案 7 :(得分:5)

还有一个question in stackoverflow about this argument我不得不承认有一个简单的解决方案:python的datadiff library有助于打印两个词典之间的差异。

答案 8 :(得分:3)

这是一种可行的方法,允许评估为False的密钥,并且如果可能的话仍然使用生成器表达式尽早退出。但它并不是特别漂亮。

any(map(lambda x: True, (k for k in b if k not in a)))

修改

THC4k在另一个答案中回复了我的评论。以上是一种更好,更漂亮的方法:

any(True for k in b if k not in a)

不确定这是怎么回事......

答案 9 :(得分:3)

这是一个老问题,并且比我需要的要少一点,所以这个答案实际上解决的问题不仅仅是这个问题。这个问题的答案帮助我解决了以下问题:

  1. (问)记录两个词典之间的差异
  2. 将差异从#1合并到基本词典
  3. (问)合并两个字典之间的差异(将字典#2视为差异字典)
  4. 尝试检测项目移动和更改
  5. (问)所有这些递归
  6. 所有这些与JSON相结合,可以提供非常强大的配置存储支持。

    解决方案(also on github):

    from collections import OrderedDict
    from pprint import pprint
    
    
    class izipDestinationMatching(object):
        __slots__ = ("attr", "value", "index")
    
        def __init__(self, attr, value, index):
            self.attr, self.value, self.index = attr, value, index
    
        def __repr__(self):
            return "izip_destination_matching: found match by '%s' = '%s' @ %d" % (self.attr, self.value, self.index)
    
    
    def izip_destination(a, b, attrs, addMarker=True):
        """
        Returns zipped lists, but final size is equal to b with (if shorter) a padded with nulls
        Additionally also tries to find item reallocations by searching child dicts (if they are dicts) for attribute, listed in attrs)
        When addMarker == False (patching), final size will be the longer of a, b
        """
        for idx, item in enumerate(b):
            try:
                attr = next((x for x in attrs if x in item), None)  # See if the item has any of the ID attributes
                match, matchIdx = next(((orgItm, idx) for idx, orgItm in enumerate(a) if attr in orgItm and orgItm[attr] == item[attr]), (None, None)) if attr else (None, None)
                if match and matchIdx != idx and addMarker: item[izipDestinationMatching] = izipDestinationMatching(attr, item[attr], matchIdx)
            except:
                match = None
            yield (match if match else a[idx] if len(a) > idx else None), item
        if not addMarker and len(a) > len(b):
            for item in a[len(b) - len(a):]:
                yield item, item
    
    
    def dictdiff(a, b, searchAttrs=[]):
        """
        returns a dictionary which represents difference from a to b
        the return dict is as short as possible:
          equal items are removed
          added / changed items are listed
          removed items are listed with value=None
        Also processes list values where the resulting list size will match that of b.
        It can also search said list items (that are dicts) for identity values to detect changed positions.
          In case such identity value is found, it is kept so that it can be re-found during the merge phase
        @param a: original dict
        @param b: new dict
        @param searchAttrs: list of strings (keys to search for in sub-dicts)
        @return: dict / list / whatever input is
        """
        if not (isinstance(a, dict) and isinstance(b, dict)):
            if isinstance(a, list) and isinstance(b, list):
                return [dictdiff(v1, v2, searchAttrs) for v1, v2 in izip_destination(a, b, searchAttrs)]
            return b
        res = OrderedDict()
        if izipDestinationMatching in b:
            keepKey = b[izipDestinationMatching].attr
            del b[izipDestinationMatching]
        else:
            keepKey = izipDestinationMatching
        for key in sorted(set(a.keys() + b.keys())):
            v1 = a.get(key, None)
            v2 = b.get(key, None)
            if keepKey == key or v1 != v2: res[key] = dictdiff(v1, v2, searchAttrs)
        if len(res) <= 1: res = dict(res)  # This is only here for pretty print (OrderedDict doesn't pprint nicely)
        return res
    
    
    def dictmerge(a, b, searchAttrs=[]):
        """
        Returns a dictionary which merges differences recorded in b to base dictionary a
        Also processes list values where the resulting list size will match that of a
        It can also search said list items (that are dicts) for identity values to detect changed positions
        @param a: original dict
        @param b: diff dict to patch into a
        @param searchAttrs: list of strings (keys to search for in sub-dicts)
        @return: dict / list / whatever input is
        """
        if not (isinstance(a, dict) and isinstance(b, dict)):
            if isinstance(a, list) and isinstance(b, list):
                return [dictmerge(v1, v2, searchAttrs) for v1, v2 in izip_destination(a, b, searchAttrs, False)]
            return b
        res = OrderedDict()
        for key in sorted(set(a.keys() + b.keys())):
            v1 = a.get(key, None)
            v2 = b.get(key, None)
            #print "processing", key, v1, v2, key not in b, dictmerge(v1, v2)
            if v2 is not None: res[key] = dictmerge(v1, v2, searchAttrs)
            elif key not in b: res[key] = v1
        if len(res) <= 1: res = dict(res)  # This is only here for pretty print (OrderedDict doesn't pprint nicely)
        return res
    

答案 10 :(得分:2)

如果在Python≥2.7:

# update different values in dictB
# I would assume only dictA should be updated,
# but the question specifies otherwise

for k in dictA.viewkeys() & dictB.viewkeys():
    if dictA[k] != dictB[k]:
        dictB[k]= dictA[k]

# add missing keys to dictA

dictA.update( (k,dictB[k]) for k in dictB.viewkeys() - dictA.viewkeys() )

答案 11 :(得分:2)

怎么样的标准(比较完整对象)

PyDev-&gt;新的PyDev模块 - &gt;模块:unittest

import unittest


class Test(unittest.TestCase):


    def testName(self):
        obj1 = {1:1, 2:2}
        obj2 = {1:1, 2:2}
        self.maxDiff = None # sometimes is usefull
        self.assertDictEqual(d1, d2)

if __name__ == "__main__":
    #import sys;sys.argv = ['', 'Test.testName']

    unittest.main()

答案 12 :(得分:1)

以下是深入比较2个词典键的解决方案:

def compareDictKeys(dict1, dict2):
  if type(dict1) != dict or type(dict2) != dict:
      return False

  keys1, keys2 = dict1.keys(), dict2.keys()
  diff = set(keys1) - set(keys2) or set(keys2) - set(keys1)

  if not diff:
      for key in keys1:
          if (type(dict1[key]) == dict or type(dict2[key]) == dict) and not compareDictKeys(dict1[key], dict2[key]):
              diff = True
              break

  return not diff

答案 13 :(得分:1)

这是一个可以比较两个以上的决策的解决方案:

def diff_dict(dicts, default=None):
    diff_dict = {}
    # add 'list()' around 'd.keys()' for python 3 compatibility
    for k in set(sum([d.keys() for d in dicts], [])):
        # we can just use "values = [d.get(k, default) ..." below if 
        # we don't care that d1[k]=default and d2[k]=missing will
        # be treated as equal
        if any(k not in d for d in dicts):
            diff_dict[k] = [d.get(k, default) for d in dicts]
        else:
            values = [d[k] for d in dicts]
            if any(v != values[0] for v in values):
                diff_dict[k] = values
    return diff_dict

用法示例:

import matplotlib.pyplot as plt
diff_dict([plt.rcParams, plt.rcParamsDefault, plt.matplotlib.rcParamsOrig])

答案 14 :(得分:1)

我的两个词典之间的对称差异的配方:

param   dict1   dict2
1       a       b
2       b       a
5       e       N\A
6       N\A     f

结果是:

{{1}}

答案 15 :(得分:1)

正如其他答案中所提到的,unittest为比较dicts产生了一些不错的输出,但在这个例子中我们不想首先构建一个完整的测试。

刮掉unittest来源,看起来你可以得到一个公平的解决方案:

import difflib
import pprint

def diff_dicts(a, b):
    if a == b:
        return ''
    return '\n'.join(
        difflib.ndiff(pprint.pformat(a, width=30).splitlines(),
                      pprint.pformat(b, width=30).splitlines())
    )

所以

dictA = dict(zip(range(7), map(ord, 'python')))
dictB = {0: 112, 1: 'spam', 2: [1,2,3], 3: 104, 4: 111}
print diff_dicts(dictA, dictB)

结果:

{0: 112,
-  1: 121,
-  2: 116,
+  1: 'spam',
+  2: [1, 2, 3],
   3: 104,
-  4: 111,
?        ^

+  4: 111}
?        ^

-  5: 110}

其中:

  • &#39; - &#39;表示第一个但不是第二个字典中的键/值
  • &#39; +&#39;表示第二个但不是第一个字典中的键/值

与unittest类似,唯一需要注意的是,由于尾随逗号/括号,最终映射可以被认为是差异。

答案 16 :(得分:1)

@Maxx有一个很好的答案,使用Python提供的unittest工具:

import unittest


class Test(unittest.TestCase):
    def runTest(self):
        pass

    def testDict(self, d1, d2, maxDiff=None):
        self.maxDiff = maxDiff
        self.assertDictEqual(d1, d2)

然后,您可以在代码中的任何位置调用:

try:
    Test().testDict(dict1, dict2)
except Exception, e:
    print e

结果输出看起来像diff的输出,用+-在每个不同的行前面打印字典。

答案 17 :(得分:0)

如果你想要一个内置的解决方案来与任意字典结构进行完全比较,@ Maxx的答案是一个良好的开端。

import unittest

test = unittest.TestCase()
test.assertEqual(dictA, dictB)

答案 18 :(得分:0)

基于ghostdog74的回答,

dicta = {"a":1,"d":2}
dictb = {"a":5,"d":2}

for value in dicta.values():
    if not value in dictb.values():
        print value

将打印不同的dicta值

答案 19 :(得分:0)

尝试这样找到de intersection,这两个词都在dictionarie中,如果你想在第二个词典中找不到键,只需使用不在 ...

intersect = filter(lambda x, dictB=dictB.keys(): x in dictB, dictA.keys())

答案 20 :(得分:0)

不确定它是否仍然相关但是我遇到了这个问题,我的情况我只需要返回所有嵌套词典等变化的字典。无法找到一个好的解决方案,但我确实结束了{ {3}}。希望这会有所帮助,