基于风暴螺栓中的公共场加入两个流

时间:2016-04-18 07:35:13

标签: apache-kafka apache-storm

问题陈述: 我想从两个不同的Kafka Spouts(比如S1和S2)加入两个Streams,并希望根据其中的一些常见字段加入每个流的元组。 如果“S1”在json下面作为元组收到

{"l7ProtocolID":"dhcp",
"packets_out":1,
"bytes_out":400,
"start_time":1454281199898,
"flow_sample":0,
"duration":102,
"path":["base","ip","udp","dhcp"],
"bytes_in":1200,
"l4":[{"client":"68","server":"67","level":0}],
"l2":[{"client":"52:54:00:50:04:B2","server":"FF:FF:FF:FF:FF:FF","level":0}],
"l3":[{"client":"::ffff:0.0.0.0","server":"::ffff:255.255.255.255","level":0}],
"flow_id":"81454281200000731489",
"applicationID":"dhcp",
"packets_in":1}

和“S2”接收JSON下面的元组

{"portGroupName":"dhcp",
"hypervisorName":1,
"bytes_out":400,
"monitoredIP":1454281199898,
"monitoredInstance":0,
"duration":102,
"bytes_in":1200,
"flow_id":"81454281200000731489",
"tenant":1}

我想基于一个共同的字段加入两者,在这里说“flow_id”。 建议示例或方法。与.fieldsGrouping混淆,这是我的用例的解决方案。

1 个答案:

答案 0 :(得分:0)

您可以使用Tident API进行连接:

TridentTopology topology = new TridentTopology();
// do some stuff here
topology.join(stream1, new Fields("key"), stream2, new Fields("x"), new Fields("key", "a", "b", "c"));

有关详细信息,请参阅文档:https://storm.apache.org/releases/1.0.0/Trident-API-Overview.html

如果你想使用低级API,使用fieldsGrouping是正确的(当然,你需要考虑"窗口"你自己)

这样的事情:

TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout1",...);
builder.setSpout("spout2",...);

builder.setSpout("join",...)
       .fieldsGrouping("spout1", new Fields("flow_id"))
       .fieldsGrouping("spout2", new Fields("flow_id"));
相关问题