Question

我想将两个JSON对象合并为一个新对象。我尝试将jsonmerge与完整的json模式一起使用，但是我不知道如何正确设置合并策略。我很确定可以做到。

代码：

import json
from jsonmerge import Merger
from jsonschema import validate
full_build = {
         "captures": [
            {
               "compiler": "gnu",
               "executable": "gcc",
               "cmd": ["gcc", "options", "file1.cpp"],
               "cwd": ".",
               "env": ["A=1", "B=2"],
            },
            {
               "compiler": "gnu",
               "executable": "gcc",
               "cmd": ["gcc", "options", "file2.cpp"],
               "cwd": ".",
               "env": ["A=1", "B=2"],
            }
         ]
}
incremental_build = {
         "captures": [
            {
               "compiler": "gnu",
               "executable": "gcc",
               "cmd": ["gcc", "new options", "file2.cpp"],
               "cwd": ".",
               "env": ["A=1", "NEW=2"],
            },
            {
               "compiler": "gnu",
               "executable": "gcc",
               "cmd": ["gcc", "options", "file3.cpp"],
               "cwd": ".",
               "env": ["A=1", "B=2"],
            }
         ]
}
schema = {
   "type" : "object",
   "properties" : {
      "captures": {
         "type" : "array",
         "items" : {
            "type" : "object",
            "properties" : {
               "cmd" : {
                  "type" : "array",
                  "items" : {"type" : "string"},
               },
               "compiler" : {"type" : "string"},
               "cwd" : {"type" : "string"},
               "env" : {
                  "type" : "array",
                  "items" : {"type" : "string"},
               },
               "executable" : {"type" : "string"},
            }
         }
      }
   }
}
validate(instance=full_build, schema=schema)

mergeSchema = schema
merger = Merger(mergeSchema)
result = merger.merge(full_build, incremental_build)
print(json.dumps(result, indent=3))

结果：

{
   "captures": [
      {
         "compiler": "gnu",
         "executable": "gcc",
         "cmd": [
            "gcc",
            "options",
            "file3.cpp"
         ],
         "cwd": ".",
         "env": [
            "A=1",
            "B=2"
         ]
      }
   ]
}

预期结果：

{
   "captures": [
      {
         "compiler": "gnu",
         "executable": "gcc",
         "cmd": [
            "gcc",
            "options",
            "file1.cpp"
         ],
         "cwd": ".",
         "env": [
            "A=1",
            "B=2"
         ]
      },
      {
         "compiler": "gnu",
         "executable": "gcc",
         "cmd": [
            "gcc",
            "new options",
            "file2.cpp"
         ],
         "cwd": ".",
         "env": [
            "A=1",
            "NEW=2"
         ]
      },
      {
         "compiler": "gnu",
         "executable": "gcc",
         "cmd": [
            "gcc",
            "options",
            "file3.cpp"
         ],
         "cwd": ".",
         "env": [
            "A=1",
            "B=2"
         ]
      }
   ]
}

还有更多需要考虑的事情（例如比以前有更多或更少的选项/环境变量），但我认为我会设法完成任务。我真的不想硬编码。

不，我不能更改json的结构：（。

背景：我想合并SonarQube构建包装器输出，因为我不想做一个完整的构建以将所有文件放入包装器输出。

Answer 1

It seems you don't really need any complex merge operation at all. You basically want to combine the ‘captures’ lists from both structures into a new structure which contains all of them. This can be achieved by making a copy and simply extending the list afterwards:

full_build = ...
incremental_build = ...
combined = copy.deepcopy(full_build)
combined['captures'].extend(incremental_build['captures'])

If you want to ‘deduplicate’ based on some attribute, e.g. the file name, you can use something like this:

def get_filename_from_capture(cmd):
    return cmd["cmd"][-1]


all_captures = full_build["captures"] + incremental_build["captures"]
captures_by_filename = {
    get_filename_from_capture(capture): capture for capture in all_captures
}

combined = copy.deepcopy(full_build)
combined["captures"] = list(captures_by_filename.values())

Answer 2

您有两个JSON对象数组，您想基于它们构造一个数组。

在您的示例中，似乎有时您希望incremental_build中的对象覆盖full_build中的对象（在最终数组中只有一个对象提到file2.cpp），但是有时您不需要（file3.cpp的对象不会用file1.cpp覆盖的对象）。

您没有指定确切的规则，但是我猜您要匹配的文件名。我还猜测您想将数组元素本身视为不可变的，并且不想在文件名匹配时将它们进一步合并在一起。

要实现此目的，可以使用以下架构：

schema = {
   "properties" : {
      "captures": {
         "mergeStrategy": "arrayMergeById",
         "mergeOptions": {
            "idRef": "/cmd/2"
         },
         "items": {
            "mergeStrategy": "overwrite"
         }
      }
   }
}

merger = Merger(schema)
result = merger.merge(full_build, incremental_build)

您不需要完整的架构，除非您还想验证JSON。 jsonmerge本身仅关心合并策略信息。

以上架构指定应使用arrayMergeById策略合并顶级对象中属性 captures 下的数组。此策略根据idRef引用所指向的值合并数组的元素。在您的示例中，文件名是cmd属性的第三个元素（JSON指针使用基于零的索引）。

arrayMergeById根据匹配的数组元素自己的模式进行合并。默认情况下，它们将使用objectMerge策略进行合并。在incremental_build中的元素缺少匹配的full_build元素中存在的属性的情况下，这将产生错误的结果。因此，以上架构还为captures数组的所有项目指定了 overwrite 策略。

合并两个复杂的JSON对象

2 个答案: