将实体拆分成碎片

时间:2018-03-09 11:34:58

标签: sesam

如何安全地将实体拆分成多个部分?例如。我有一个看起来像这样的文件:

{
  "_id": "Britney Spears",
  "hits": [
    {
      "title": "Crazy",
      "rating": 2
    },
    {
      "title": "Oops! I Did It Again",
      "rating": 3
    }
  ]
}

分成两个看起来像这样的实体:

[
    {
      "_id": "Britney Spears - Crazy",
      "artist": "Britney Spears",
      "title": "Crazy",
      "rating": 2
    },
    {
      "_id": "Britney Spears - Oops! I Did It Again",
      "artist": "Britney Spears",
      "title": "Oops! I Did It Again",
      "rating": 3
    }
]

1 个答案:

答案 0 :(得分:2)

要使用删除跟踪安全地处理诸如流,您需要创建两个管道。在第一个管道中,您使用_id函数构建子实体列表(请注意它们需要create-child)。然后,您必须将输出存储在中间数据集中,并记住在此数据集上将track_children设置为true

{
  "_id": "artists",
  "type": "pipe",
  "source": {
    "type": "embedded",
    "entities": [{
      "_id": "Britney Spears",
      "hits": [{
        "rating": 2,
        "title": "Crazy"
      }, {
        "rating": 3,
        "title": "Oops! I Did It Again"
      }]
    }]
  },
  "sink": {
    "type": "dataset",
    "dataset": "artists-with-hits",
    "track_children": true
  },
  "transform": {
    "type": "dtl",
    "rules": {
      "default": [
        ["copy", "_id"],
        ["create-child",
          ["apply", "song", "_S.hits"]
        ]
      ],
      "song": [
        ["add", "_id",
          ["concat", "_P._S._id", " - ", "_S.title"]
        ],
        ["add", "artist", "_P._S._id"],
        ["copy", "*"]
      ]
    }
  }
}

在下一个管道中,您可以拆分此实体:

{
  "_id": "hits",
  "type": "pipe",
  "source": {
    "type": "dataset",
    "dataset": "artists-with-hits"
  },
  "transform": {
    "type": "emit_children"
  }
}

如果您尝试在具有多个转换的一个管道中执行此操作,则删除跟踪将不起作用。

这将在hits数据集中为您提供所需的输出。