数据流无法从区域" asia-northeast1"中的BigQuery数据集读取。

时间:2018-05-05 13:07:54

标签: google-bigquery google-cloud-dataflow

我有一个BigQuery数据集位于新的" asia-northeast1"区域。我试图运行Dataflow模板化管道(在澳大利亚地区运行)从中读取表格。即使数据集/表确实存在,它也会丢失以下错误:

Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 404 Not Found
{
  "code" : 404,
  "errors" : [ {
    "domain" : "global",
    "message" : "Not found: Dataset grey-sort-challenge:Konnichiwa_Tokyo",
    "reason" : "notFound"
  } ],
  "message" : "Not found: Dataset grey-sort-challenge:Konnichiwa_Tokyo"
}

我在这里做错了吗?

/**
 * BigQuery -> ParDo -> GCS (one file)
 */
public class BigQueryTableToOneFile {
    public static void main(String[] args) throws Exception {
        PipelineOptionsFactory.register(TemplateOptions.class);
        TemplateOptions options = PipelineOptionsFactory
                .fromArgs(args)
                .withValidation()
                .as(TemplateOptions.class);
        options.setAutoscalingAlgorithm(THROUGHPUT_BASED);
        Pipeline pipeline = Pipeline.create(options);
        pipeline.apply(BigQueryIO.read().from(options.getBigQueryTableName()).withoutValidation())
                .apply(ParDo.of(new DoFn<TableRow, String>() {
                    @ProcessElement
                    public void processElement(ProcessContext c) throws Exception {
                        String commaSep = c.element().values()
                                .stream()
                                .map(cell -> cell.toString().trim())
                                .collect(Collectors.joining("\",\""));
                        c.output(commaSep);
                    }
                }))
                .apply(TextIO.write().to(options.getOutputFile())
                        .withoutSharding()
                        .withWritableByteChannelFactory(GZIP)
                );
        pipeline.run();
    }

    public interface TemplateOptions extends DataflowPipelineOptions {
        @Description("The BigQuery table to read from in the format project:dataset.table")
        @Default.String("bigquery-samples:wikipedia_benchmark.Wiki1k")
        ValueProvider<String> getBigQueryTableName();

        void setBigQueryTableName(ValueProvider<String> value);

        @Description("The name of the output file to produce in the format gs://bucket_name/filname.csv")
        @Default.String("gs://bigquery-table-to-one-file/output/bar.csv.gz")
        ValueProvider<String> getOutputFile();

        void setOutputFile(ValueProvider<String> value);
    }
}

参数数量:

--project=grey-sort-challenge
--runner=DataflowRunner
--jobName=bigquery-table-to-one-file
--maxNumWorkers=1
--zone=australia-southeast1-a
--stagingLocation=gs://bigquery-table-to-one-file/jars
--tempLocation=gs://bigquery-table-to-one-file/tmp
--templateLocation=gs://bigquery-table-to-one-file/template

工作编号:2018-05-05_05_37_08-8260293482986343692

enter image description here

enter image description here

1 个答案:

答案 0 :(得分:0)

对不起,这个问题。将在即将发布的Beam SDK 2.5.0中解决(您可以尝试使用Beam回购中的当前头部快照)