使用Hive中的regexp_extract提取数据

时间:2016-01-20 09:14:54

标签: regex hadoop hive

{'offercode': u'5100003454', 'offer': 'Book 14 days in Advance and Get 15% Off\r\n', 'original_baseprice': [[3700.0], [3700.0]], 'taxbreakup': {'taxinfo': {'othertaxon_display': u'Sell Rate', 'servtax': 0.0, 'luxtaxon_display': u'Sell Rate', 'servtaxon_display': u'Sell Rate', 'nettservicetaxflag': True, 'servtaxon': u'sellrate', 'othertax': 0.0, 'taxonextrabedflag': True, 'luxtaxon': u'sellrate', 'taxincluded': False, 'taxcode': u'7500003113', 'luxtax': 18.66, 'othertaxon': u'sellrate'}, 'LT': 1174, 'OT': 0, 'ST': 0}, 'baseprice': [[3145.0], [3145.0]], 'success': True, 'extraguest': [[0], [0]], 'extraguest_nett': [[0], [0]], 'original_nettbreakup': [[2775.0], [2775.0]], 'original': [[3700.0], [3700.0]]}    

我无法获取' LT':1174,' OT':0,' ST':0 in the hive

我已经尝试了这个

regexp_extract(字符串," \' LT \':(。?)",1)为LT,regexp_extract(字符串," \& #39; OT \' :(。?)",1)作为OT,regexp_extract(字符串," \' ST \' :(。 *?)",1)

1 个答案:

答案 0 :(得分:0)

SELECT 
regexp_replace(regexp_extract(string,"\'LT\': (.*?) ", 1), ',', '') as LT,
regexp_replace(regexp_extract(string,"\'OT\': (.*?) ", 1), ',', '') as OT,
regexp_replace(regexp_extract(string,"\'ST\': (.*?) ", 1 ), '},', '') as ST