DirectPipelineRunner - 它是否支持标准的glob模式?

时间:2015-02-18 05:33:49

标签: google-cloud-storage google-cloud-dataflow

在云中执行我们的管道运行正常。但是当它作为DirectPipelineRunner(即本地)运行时,它会borks,并抱怨所提供的文件模式。文件模式使用glob。

这是在本地运行时的预期行为吗?

[..]
TextIO.Read.from("gs://cdf-testing/NetworkClicks_123456_2015010[1-2]*")
[..]

Feb 18, 2015 4:19:09 PM com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner run
INFO: Executing pipeline using the DirectPipelineRunner.
Feb 18, 2015 4:19:10 PM com.google.cloud.dataflow.sdk.util.GcsUtil expand
INFO: matching files in bucket cdf-testing, prefix NetworkClicks_123456_2015010[1-2] against pattern NetworkClicks_123456_2015010[1-2][^/]*
Exception in thread "main" java.lang.RuntimeException: Failed to read from source: com.google.cloud.dataflow.sdk.runners.worker.TextReader@55dbc59b
    at com.google.cloud.dataflow.sdk.util.ReaderUtils.readElemsFromReader(ReaderUtils.java:40)
    at com.google.cloud.dataflow.sdk.io.TextIO.evaluateReadHelper(TextIO.java:702)
    at com.google.cloud.dataflow.sdk.io.TextIO.access$000(TextIO.java:98)
    at com.google.cloud.dataflow.sdk.io.TextIO$Read$Bound$1.evaluate(TextIO.java:310)
    at com.google.cloud.dataflow.sdk.io.TextIO$Read$Bound$1.evaluate(TextIO.java:306)
    at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner$Evaluator.visitTransform(DirectPipelineRunner.java:611)
    at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:200)
    at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:196)
    at com.google.cloud.dataflow.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:109)
    at com.google.cloud.dataflow.sdk.Pipeline.traverseTopologically(Pipeline.java:204)
    at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner$Evaluator.run(DirectPipelineRunner.java:584)
    at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner.run(DirectPipelineRunner.java:328)
    at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner.run(DirectPipelineRunner.java:70)
    at com.google.cloud.dataflow.sdk.Pipeline.run(Pipeline.java:145)
    at com.shinetech.tpc.engine.CDFEngine.loadClicks(CDFEngine.java:88)
    at com.shinetech.tpc.engine.CDFEngine.doMagic(CDFEngine.java:75)
    at com.shinetech.tpc.Main.main(Main.java:15)
Caused by: java.io.IOException: No match for file pattern 'gs://cdf-testing/NetworkClicks_123456_2015010[1-2]*'
    at com.google.cloud.dataflow.sdk.runners.worker.FileBasedReader.iterator(FileBasedReader.java:101)
    at com.google.cloud.dataflow.sdk.util.ReaderUtils.readElemsFromReader(ReaderUtils.java:35)
    ... 16 more

1 个答案:

答案 0 :(得分:2)

不,两位参赛者应该表现得一样。听起来像是DirectRunner中的一个错误。感谢您提供报告 - 修复结束时将在此处回复。

相关问题