如何在Dataflow中创建用户定义的计数器?

时间:2015-01-23 22:47:39

标签: google-cloud-dataflow

如何在DoFns中创建自己的计数器?

在我的DoFn中,我想在处理记录时每次满足条件时递增计数器。我喜欢这个计数器来对所有记录中的值求和。

1 个答案:

答案 0 :(得分:2)

您可以使用Aggregators,计数器的总值将显示在用户界面中。

以下是我在一个管道中试验聚合器的示例,该管道仅为sleepSecsshards工作者休眠sleepSecs秒。 (开头的GenFakeInput PTransform只返回一个扁平的PCollection< String>,其大小为numOutputShards):

PCollection<String> output = p
    .apply(new GenFakeInput(options.getNumOutputShards()))
    .apply(ParDo.named("Sleep").of(new DoFn<String, String>() {
         private Aggregator<Long> tSleepSecs;
         private Aggregator<Integer> tWorkers;
         private Aggregator<Long> tExecTime;
         private long startTimeMillis;

         @Override
         public void startBundle(Context c) {
           tSleepSecs = c.createAggregator("Total Slept (sec)", new Sum.SumLongFn());
           tWorkers = c.createAggregator("Num Workers", new Sum.SumIntegerFn());
           tExecTime = c.createAggregator("Total Wallclock (sec)", new Sum.SumLongFn());
           startTimeMillis = System.currentTimeMillis();
         }

         @Override
         public void finishBundle(Context c) {
           tExecTime.addValue((System.currentTimeMillis() - startTimeMillis)/1000);
         }

         @Override
         public void processElement(ProcessContext c) {
           try {
             LOG.info("Sleeping for {} seconds.", sleepSecs);
             tSleepSecs.addValue(sleepSecs);
             tWorkers.addValue(1);
             TimeUnit.SECONDS.sleep(sleepSecs);
           } catch (InterruptedException e) {
             LOG.info("Ignoring caught InterruptedException during sleep.");
           }
           c.output(c.element());
         }}));
相关问题