合并两个复杂的JSON对象

时间:2019-04-05 14:58:54

标签: python json merge

我想将两个JSON对象合并为一个新对象。我尝试将jsonmerge与完整的json模式一起使用,但是我不知道如何正确设置合并策略。我很确定可以做到。

代码:

import json
from jsonmerge import Merger
from jsonschema import validate
full_build = {
         "captures": [
            {
               "compiler": "gnu",
               "executable": "gcc",
               "cmd": ["gcc", "options", "file1.cpp"],
               "cwd": ".",
               "env": ["A=1", "B=2"],
            },
            {
               "compiler": "gnu",
               "executable": "gcc",
               "cmd": ["gcc", "options", "file2.cpp"],
               "cwd": ".",
               "env": ["A=1", "B=2"],
            }
         ]
}
incremental_build = {
         "captures": [
            {
               "compiler": "gnu",
               "executable": "gcc",
               "cmd": ["gcc", "new options", "file2.cpp"],
               "cwd": ".",
               "env": ["A=1", "NEW=2"],
            },
            {
               "compiler": "gnu",
               "executable": "gcc",
               "cmd": ["gcc", "options", "file3.cpp"],
               "cwd": ".",
               "env": ["A=1", "B=2"],
            }
         ]
}
schema = {
   "type" : "object",
   "properties" : {
      "captures": {
         "type" : "array",
         "items" : {
            "type" : "object",
            "properties" : {
               "cmd" : {
                  "type" : "array",
                  "items" : {"type" : "string"},
               },
               "compiler" : {"type" : "string"},
               "cwd" : {"type" : "string"},
               "env" : {
                  "type" : "array",
                  "items" : {"type" : "string"},
               },
               "executable" : {"type" : "string"},
            }
         }
      }
   }
}
validate(instance=full_build, schema=schema)

mergeSchema = schema
merger = Merger(mergeSchema)
result = merger.merge(full_build, incremental_build)
print(json.dumps(result, indent=3))

结果:

{
   "captures": [
      {
         "compiler": "gnu",
         "executable": "gcc",
         "cmd": [
            "gcc",
            "options",
            "file3.cpp"
         ],
         "cwd": ".",
         "env": [
            "A=1",
            "B=2"
         ]
      }
   ]
}

预期结果:

{
   "captures": [
      {
         "compiler": "gnu",
         "executable": "gcc",
         "cmd": [
            "gcc",
            "options",
            "file1.cpp"
         ],
         "cwd": ".",
         "env": [
            "A=1",
            "B=2"
         ]
      },
      {
         "compiler": "gnu",
         "executable": "gcc",
         "cmd": [
            "gcc",
            "new options",
            "file2.cpp"
         ],
         "cwd": ".",
         "env": [
            "A=1",
            "NEW=2"
         ]
      },
      {
         "compiler": "gnu",
         "executable": "gcc",
         "cmd": [
            "gcc",
            "options",
            "file3.cpp"
         ],
         "cwd": ".",
         "env": [
            "A=1",
            "B=2"
         ]
      }
   ]
}

还有更多需要考虑的事情(例如比以前有更多或更少的选项/环境变量),但我认为我会设法完成任务。 我真的不想硬编码。

不,我不能更改json的结构:(。

背景:我想合并SonarQube构建包装器输出,因为我不想做一个完整的构建以将所有文件放入包装器输出。

2 个答案:

答案 0 :(得分:2)

It seems you don't really need any complex merge operation at all. You basically want to combine the ‘captures’ lists from both structures into a new structure which contains all of them. This can be achieved by making a copy and simply extending the list afterwards:

full_build = ...
incremental_build = ...
combined = copy.deepcopy(full_build)
combined['captures'].extend(incremental_build['captures'])

If you want to ‘deduplicate’ based on some attribute, e.g. the file name, you can use something like this:

def get_filename_from_capture(cmd):
    return cmd["cmd"][-1]


all_captures = full_build["captures"] + incremental_build["captures"]
captures_by_filename = {
    get_filename_from_capture(capture): capture for capture in all_captures
}

combined = copy.deepcopy(full_build)
combined["captures"] = list(captures_by_filename.values())

答案 1 :(得分:2)

您有两个JSON对象数组,您想基于它们构造一个数组。

在您的示例中,似乎有时您希望incremental_build中的对象覆盖full_build中的对象(在最终数组中只有一个对象提到file2.cpp),但是有时您不需要(file3.cpp的对象不会用file1.cpp覆盖的对象)。

您没有指定确切的规则,但是我猜您要匹配的文件名。我还猜测您想将数组元素本身视为不可变的,并且不想在文件名匹配时将它们进一步合并在一起。

要实现此目的,可以使用以下架构:

schema = {
   "properties" : {
      "captures": {
         "mergeStrategy": "arrayMergeById",
         "mergeOptions": {
            "idRef": "/cmd/2"
         },
         "items": {
            "mergeStrategy": "overwrite"
         }
      }
   }
}

merger = Merger(schema)
result = merger.merge(full_build, incremental_build)

您不需要完整的架构,除非您还想验证JSON。 jsonmerge本身仅关心合并策略信息。

以上架构指定应使用arrayMergeById策略合并顶级对象中属性 captures 下的数组。此策略根据idRef引用所指向的值合并数组的元素。在您的示例中,文件名是cmd属性的第三个元素(JSON指针使用基于零的索引)。

arrayMergeById根据匹配的数组元素自己的模式进行合并。默认情况下,它们将使用objectMerge策略进行合并。在incremental_build中的元素缺少匹配的full_build元素中存在的属性的情况下,这将产生错误的结果。因此,以上架构还为captures数组的所有项目指定了 overwrite 策略。