Apache Beam Combine功能没有做任何事情

时间:2017-04-03 12:58:50

标签: google-cloud-dataflow apache-beam

我第一次尝试使用简单的Combine功能应用10秒的固定窗口。目前我只是将一些日志记录作为转换的一部分打印出来,以查看是否实际发生了某些事情,但似乎ExtractStreamingMeasures()之后的转换似乎从未被实际调用过。我正在运行DirectRunner。

我错过了什么吗?

PipelineOptions options = PipelineOptionsFactory.create();
PubsubOptions dataflowOptions = options.as(PubsubOptions.class);
dataflowOptions.setStreaming(true);

Pipeline p = Pipeline.create(options);

p
            .apply(Window.<Txn>into(FixedWindows.of(Duration.standardSeconds(10))))
            .apply(ParDo.of(new ExtractStreamingMeasures()))
            .apply(Count.<String>perElement())
            .apply(ParDo.of(new DoSomething()));

来变换:

static class ExtractStreamingMeasures extends DoFn<Txn, String> {
    @ProcessElement
    public void processElement(ProcessContext c) {
        System.out.println(c.element().getLocationId()); // <= this prints
        c.output(c.element().getLocationId());
    }
}

static class DoSomething extends DoFn<KV<String, Long>, KV<String, Long>> {
    @ProcessElement
    public void processElement(ProcessContext c) {
        System.out.println(c.element()); // <= this doesn't print
        c.output(c.element());
    }
}

1 个答案:

答案 0 :(得分:1)

必须提供不同的触发器才能使窗口正常启动。以下代码将每隔10秒触发一次输出,窗口大小为10分钟。

p.apply("AssignToWindow", Window.<Txn>into(FixedWindows.of(Duration.standardMinutes(10)))
                .triggering(Repeatedly.forever(AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardSeconds(10))))
                .accumulatingFiredPanes()
                .withAllowedLateness(Duration.standardDays(1)))