运行wordcount示例

时间:2017-01-13 20:56:44

标签: google-cloud-dataflow apache-beam apache-beam-io

我想我已经按照文档的步骤,但我仍然遇到了这个例外。 (唯一不同的是我从Eclipse J2EE运行它,但我不希望这真的很重要,不是吗?)

代码:(我没有写过这个,它正好来自光束项目示例)。我认为您必须指定一个Google云平台项目并提供正确的凭据才能访问它。但是,我在这个设置项目的项目中找不到任何地方。

  public static void main(String[] args) {
// Create a PipelineOptions object. This object lets us set various execution
// options for our pipeline, such as the runner you wish to use. This example
// will run with the DirectRunner by default, based on the class path configured
// in its dependencies.
PipelineOptions options = PipelineOptionsFactory.create();

// Create the Pipeline object with the options we defined above.
Pipeline p = Pipeline.create(options);

// Apply the pipeline's transforms.

// Concept #1: Apply a root transform to the pipeline; in this case, TextIO.Read to read a set
// of input text files. TextIO.Read returns a PCollection where each element is one line from
// the input text (a set of Shakespeare's texts).

// This example reads a public data set consisting of the complete works of Shakespeare.
p.apply(TextIO.Read.from("gs://apache-beam-samples/shakespeare/*"))
.....
)

例外:

Exception in thread "main" java.lang.IllegalStateException: Failed to validate gs://apache-beam-samples/shakespeare/*
at org.apache.beam.sdk.io.TextIO$Read$Bound.expand(TextIO.java:309)
at org.apache.beam.sdk.io.TextIO$Read$Bound.expand(TextIO.java:205)
at org.apache.beam.sdk.runners.PipelineRunner.apply(PipelineRunner.java:76)
at org.apache.beam.runners.direct.DirectRunner.apply(DirectRunner.java:296)
at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:388)
at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:302)
at org.apache.beam.sdk.values.PBegin.apply(PBegin.java:47)
at org.apache.beam.sdk.Pipeline.apply(Pipeline.java:152)
at google.dataflow.beam.example.MinimalWordCount.main(MinimalWordCount.java:77)
Caused by: java.io.IOException: Unable to match files in bucket apache-beam-samples, prefix shakespeare/ against pattern shakespeare/[^/]*
at org.apache.beam.sdk.util.GcsUtil.expand(GcsUtil.java:234)
at org.apache.beam.sdk.util.GcsIOChannelFactory.match(GcsIOChannelFactory.java:53)
at org.apache.beam.sdk.io.TextIO$Read$Bound.expand(TextIO.java:304)
... 8 more
Caused by: com.google.api.client.http.HttpResponseException: 400 Bad Request
{


"error" : "invalid_grant"
}
    at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1070)
    at com.google.auth.oauth2.UserCredentials.refreshAccessToken(UserCredentials.java:207)
    at com.google.auth.oauth2.OAuth2Credentials.refresh(OAuth2Credentials.java:149)
    at com.google.auth.oauth2.OAuth2Credentials.getRequestMetadata(OAuth2Credentials.java:135)
    at com.google.auth.http.HttpCredentialsAdapter.initialize(HttpCredentialsAdapter.java:96)
    at com.google.cloud.hadoop.util.ChainingHttpRequestInitializer.initialize(ChainingHttpRequestInitializer.java:52)
    at com.google.api.client.http.HttpRequestFactory.buildRequest(HttpRequestFactory.java:93)
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.buildHttpRequest(AbstractGoogleClientRequest.java:300)
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
    at com.google.cloud.hadoop.util.ResilientOperation$AbstractGoogleClientRequestExecutor.call(ResilientOperation.java:166)
    at com.google.cloud.hadoop.util.ResilientOperation.retry(ResilientOperation.java:66)
    at com.google.cloud.hadoop.util.ResilientOperation.retry(ResilientOperation.java:103)
    at org.apache.beam.sdk.util.GcsUtil.expand(GcsUtil.java:227)
    ... 10 more

1 个答案:

答案 0 :(得分:1)

尝试运行它从命令提示如果使用Windows。 转到包含pom.xml文件的文件夹,然后在那里打开cmd。 然后用相应的参数给出命令。

mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount -Dexec.args=" --output=counts" -Pdirect-runner

如果您想使用输入文件运行。然后创建一个带有任何名称的txt文件并将其放在包含pom的文件夹中。然后按照命令开火。

mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount -Dexec.args="--inputFile=YOURFILENAME.txt --output=counts" -Pdirect-runner**

希望这样做。我正在调查你的问题