蜂巢通过regexp_extract从类似字符串的字典中提取键?

时间:2019-01-26 06:38:27

标签: regex hive

我想从如下所示的配置单元表的列中提取键

{"agya":3,"gentong":1,"tronton":0,"tasikmalaya":4,"tanja":2}
{"afifah":3,"sctv":10,"samuel zylgwyn":2,"naysila mirdad":0,"shared":8}
{"aferia":1,"jatimtimes":3,"apbdes":2,"siltap":4,"mudjito":0}
{"aerox":0,"flasher":1,"lampu hazard":2,"aftermarket":4,"dcs":5}
{"administratif":6,"fakta":7,"prabowo":5,"cek":4,"admistratif":0}
{"adeg":2,"tiru":1,"film film":3,"romantis":0,"nggak":5}

对于第一个,我想得到"agya", "gentong", "tronton"等。后来,我可以将它们分解为多行。 如何使用regexp_extract实现呢?

2 个答案:

答案 0 :(得分:0)

regexp_extract()返回字符串。要获取数组,请使用split()函数,它还使用regexp作为分隔符模式。因此,您可以除以':\\d+,'

split(
     regexp_replace(col, '^\\{|\\}$',''), --remove outer curly braces {}
     ':\\d+,' --array elements delimiter pattern
     ) --this will give array "agya", "gentong", etc

展开数组后,可以使用regexp_replace(col_exploded,'\\"','')删除引号

更新

最后一个键:值不包含,,因此需要修复模板并使用,|$(逗号或字符串结尾)。 同样,最后一个元素将为空,需要将其过滤掉。

测试:

hive> select regexp_replace(key,'\\"','') key
    > from
    > (
    > select explode(
    > split(
    >      regexp_replace('{"agya":3,"gentong":1,"tronton":0,"tasikmalaya":4,"tanja":2}', '^\\{|\\}$',''), --remove outer curly braces {}
    >      ':\\d+(,|$)' --array elements delimiter pattern
    >      )
    > ) as key
    > )s
    > where key!=''
    > ;
OK
agya
gentong
tronton
tasikmalaya
tanja

答案 1 :(得分:0)

您可以尝试以下解决方案:

select map_keys(str_to_map(regexp_replace(mycol,'[{}"]','')));

在这里

1.regexp_replace function is used to replace all the '{','}','"' characters with nothing.
2.str_to_map function has beeen used to convert the string to map.
3.map_keys function is used to extract the keys from the map which will give the result in an array format.
4.You can then explode this array as per your need.

谢谢