KSQL从带句点(点号)的JSON字段创建流。

时间:2019-07-19 08:25:56

标签: apache-kafka ksql

我有一个像{"destination.port":"443","network.packets":"4464","event.end":"2019-07-19T07:47:22.000Z","source.address":"1.2.2.3","message":"OK","server.address":"ip-1-2-2-3.ec2.internal","event.action":"ACCEPT","event.module":"S3bucket","source.port":"56448","network.protocol":"6","cloud.account.id":"512889038796","event.type":"AWS_VPC_log","organization.id":"DeloitteFusion","destination.address":"1.2.2.3","network.bytes":"178584","event.start":"2019-07-19T07:46:22.000Z","event.kind":"2","host.id":"eni-0c5e3a6282a912997","timestamp":"2019-07-19T07:51:52.584Z","srckey_val":"167772160_184549375","srckey_rev":"15072019_1541"}

的json

将KSQL流创建为create stream vpc_log ("destination.port" integer, "network.packets" integer, "event.end" varchar, "source.address" varchar, message varchar, "server.address" varchar, "event.action" varchar, "event.module" varchar, "source.port" integer, "network.protocol" integer, "cloud.account.id" bigint, "event.type" varchar, "organization.id" varchar, "destination.address" varchar, "network.bytes" integer, "event.start" varchar, "event.kind" integer, "host.id" varchar, timestamp varchar, srckey_val varchar, srckey_rev varchar) WITH (KAFKA_TOPIC='client_data_parsed', VALUE_FORMAT='JSON');

在运行select * from vpc_log;时抛出以下错误Caused by: Cannot create field because of field name duplication address

因此已将流查询修改为create stream vpc_log ("destination.port2" integer, "network.packets" integer, "event.end" varchar, "source.addres" varchar, message varchar, "server.adress" varchar, "event.action" varchar, "event.module" varchar, "source.port1" integer, "network.protocol" integer, "cloud.account.id2" bigint, "event.type" varchar, "organization.id" varchar, "destination.adres" varchar, "network.bytes" integer, "event.start" varchar, "event.kind" integer, "host.id1" varchar, timestamp varchar, srckey_val varchar, srckey_rev varchar) WITH (KAFKA_TOPIC='client_data_parsed', VALUE_FORMAT='JSON');

选择*的输出为1563522879847 | null | null | null | null | null | OK | null | null | null | null | null | null | null | null | null | null | null | null | null | 2019-07-19T07:51:52.584Z | 167772160_184549375 | 15072019_1541

所有点分(。)字段/键的值为null。我指的是link,但未提供解决方案。帮我明白

我可以快速进行查询,但键中没有点。但是,想知道KSQL有点是什么吗?

1 个答案:

答案 0 :(得分:1)

我在klack开发人员的slack社区中得到了这个答案。可能会帮助某人。 KSQL对此没有官方支持,但一种解决方法是逃避这段时期。 从here安装了ksql v5.3.0。

create stream vpc_log ("DESTINATION\.PORT" INTEGER,"NETWORK\.PACKETS" INTEGER,"EVENT\.END" VARCHAR,"SOURCE\.ADDRESS" VARCHAR,MESSAGE VARCHAR,"SERVER\.ADDRESS" VARCHAR,"EVENT\.ACTION" VARCHAR,"EVENT\.MODULE" VARCHAR,"SOURCE\.PORT" INTEGER,"NETWORK\.PROTOCOL" INTEGER,"CLOUD\.ACCOUNT\.ID" BIGINT,"EVENT\.TYPE" VARCHAR,"ORGANIZATION\.ID" VARCHAR,"DESTINATION\.ADDRESS" VARCHAR,"NETWORK\.BYTES" INTEGER,"EVENT\.START" VARCHAR,"EVENT\.KIND" INTEGER,"HOST\.ID" VARCHAR,TIMESTAMP VARCHAR,SRCKEY_VAL VARCHAR,SRCKEY_REV VARCHAR) WITH (KAFKA_TOPIC='client_data_parsed',

VALUE_FORMAT ='JSON');

感谢罗宾·莫法特。