JsonNode可以用AvroKeyValueSinkWriter编写吗?

时间:2018-01-30 11:05:17

标签: apache-flink

我尝试使用AvroKeyValueSinkWriter来保存json数据。

我正在创建这样的作家:

   protected static AvroKeyValueSinkWriter<String, JsonNode> getWriter() {

        Map<String, String> properties = new HashMap<>();
        Schema keySchema = Schema.create(Schema.Type.STRING);

        String valueSchema = "\n" +
                "        {\"namespace\": \"transaction.avro\",\n" +
                "         \"type\": \"record\",\n" +
                "         \"name\": \"Transaction\",\n" +
                "         \"fields\": [\n" +
                "             {\"name\": \"InvoiceNo\",     \"type\": \"int\"    },\n" +
                "             {\"name\": \"StockCode\",     \"type\": \"string\" },\n" +

                              // ... some lines hidden

                "             {\"name\": \"StoreID\",       \"type\": \"int\"    },\n" +
                "             {\"name\": \"TransactionID\", \"type\": \"string\" }\n" +
                "         ]\n" +
                "        }";


        properties.put(AvroKeyValueSinkWriter.CONF_OUTPUT_KEY_SCHEMA, keySchema.toString());
        properties.put(AvroKeyValueSinkWriter.CONF_OUTPUT_VALUE_SCHEMA, valueSchema);
        properties.put(AvroKeyValueSinkWriter.CONF_COMPRESS, String.valueOf(true));
        properties.put(AvroKeyValueSinkWriter.CONF_COMPRESS_CODEC, DataFileConstants.SNAPPY_CODEC);

        return new AvroKeyValueSinkWriter<String, JsonNode>(properties);
    }

然后从我的测试用例开始:

    @Test
    public void testWriter() throws Exception {

        AvroKeyValueSinkWriter<String, JsonNode> writer = getWriter();

        String key = "5639281840180123";

        String jsonString = "{\n" +
                "  \"InvoiceNo\": 5370812,\n" +
                "  \"StockCode\": 22409,\n" +

                // ... some lines hidden

                "  \"StoreID\": 0,\n" +
                "  \"TransactionID\": \"537081210180130\"\n" +
                "}";

        ObjectMapper mapper = new ObjectMapper();
        JsonNode jsonObj = mapper.readTree(jsonString);

        writer.setSyncOnFlush(true);
        writer.open(fs, path);
        writer.write(new Tuple2<String, JsonNode>(key, jsonObj));
        writer.flush();

        //assertEquals();
    }
}

然而,

org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.ClassCastException: org.codehaus.jackson.node.ObjectNode cannot be cast to org.apache.avro.generic.IndexedRecord

    at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:308)
    at org.apache.flink.streaming.connectors.fs.AvroKeyValueSinkWriter$AvroKeyValueWriter.write(AvroKeyValueSinkWriter.java:247)
    at org.apache.flink.streaming.connectors.fs.AvroKeyValueSinkWriter.write(AvroKeyValueSinkWriter.java:178)
    at com.ibm.cloud.flink.StreamingJobTest.testWriter(StreamingJobTest.java:88)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
    at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
    at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
    at org.junit.rules.RunRules.evaluate(RunRules.java:20)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
    at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
    at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
    at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
    at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
Caused by: java.lang.ClassCastException: org.codehaus.jackson.node.ObjectNode cannot be cast to org.apache.avro.generic.IndexedRecord

可以用AvroKeyValueSinkWriter编写JsonNode吗?如果没有,我可以传递什么呢?

1 个答案:

答案 0 :(得分:0)

解决方案是从我的avro架构生成java源代码并使用生成的类:

    DataStream<Tuple2<String, Object>> avroStream = objStream
            .map(new MapFunction<ObjectNode, Tuple2<String, Object>>() {

                @Override
                public Tuple2<String, Object> map(ObjectNode value) throws Exception {

                    JsonNode result = value.get("value");

                    Transaction tx = convertToAvro(result);

                    return new Tuple2(value.get("key").toString(), tx);
                }
            });

....

public static Transaction convertToAvro(JsonNode result) {
    return Transaction.newBuilder()
                                .setInvoiceNo(result.get("InvoiceNo").intValue())
                                .setStockCode(result.get("StockCode").intValue())
                                .setDescription(result.get("Description").toString())
                                .setQuantity(result.get("Quantity").intValue())
                                .setInvoiceDate(result.get("InvoiceDate").longValue())
                                .setUnitPrice(result.get("UnitPrice").floatValue())
                                .setCustomerID(result.get("CustomerID").intValue())
                                .setCountry(result.get("Country").toString())
                                .setLineNo(result.get("LineNo").intValue())
                                .setInvoiceTime(result.get("InvoiceTime").toString())
                                .setStoreID(result.get("StoreID").intValue())
                                .setTransactionID(result.get("TransactionID").toString())
                                .build();
}