我在BigQuery中有一个包含数百万行的表,我想将adx_catg_id列拆分为多个新列。请注意,adx_catg_id列包含由空格分隔的任意数量的单词。
如果字符串只包含少于五个字,则下面的查询示例可以将adx_catg_id拆分为多个列。我可以扩展它以支持更多的单词,但我需要自动化它。
SELECT
TS, str0, str2, str4, str6, str7
from
(select REGEXP_EXTRACT(str5, r'^(.*) .*') as str7
from
(select SUBSTR (str5, LENGTH(REGEXP_EXTRACT(str5, r'^(.*) .*')) + 2, LENGTH(str5)) as str6
from
(select REGEXP_EXTRACT(str3, r'^(.*) .*') as str5
from
(select SUBSTR (str3, LENGTH(REGEXP_EXTRACT(str3, r'^(.*) .*')) + 2, LENGTH(str3)) as str4
from
(select REGEXP_EXTRACT(str1, r'^(.*) .*') as str3
from
(select SUBSTR (str1, LENGTH(REGEXP_EXTRACT(str1, r'^(.*) .*')) + 2, LENGTH(str1)) as str2
from
(select REGEXP_EXTRACT(TS, r'^(.*) .*') as str1
from
(select SUBSTR(TS, LENGTH(REGEXP_EXTRACT(TS, r'^(.*) .*')) + 2,LENGTH(TS)) as str0
from
(select adx_catg_id TS from [mydataset.conversions])
))))))))
如何根据字符串长度循环上述查询以生成新列中的所有单词?
答案 0 :(得分:3)
检查出来......
SELECT
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){0}([^\s]*)\s?') as Word0,
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){1}([^\s]*)\s?') as Word1,
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){2}([^\s]*)\s?') as Word2,
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){3}([^\s]*)\s?') as Word3,
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){4}([^\s]*)\s?') as Word4,
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){5}([^\s]*)\s?') as Word5,
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){6}([^\s]*)\s?') as Word6,
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){7}([^\s]*)\s?') as Word7,
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){8}([^\s]*)\s?') as Word8,
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){9}([^\s]*)\s?') as Word9,
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){10}([^\s]*)\s?') as Word10,
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){11}([^\s]*)\s?') as Word11,
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){12}([^\s]*)\s?') as Word12,
FROM
(SELECT 'arbitrary number of words separated by space.' as StringToParse)
或者如果你想要它的顺序相反:
SELECT
Regexp_extract(StringToParse,r'\s?([^\s]*)(?:[^\s]*\s?){1}$') as Word1,
Regexp_extract(StringToParse,r'\s?([^\s]*)(?:[^\s]*\s?){2}$') as Word2,
Regexp_extract(StringToParse,r'\s?([^\s]*)(?:[^\s]*\s?){3}$') as Word3,
Regexp_extract(StringToParse,r'\s?([^\s]*)(?:[^\s]*\s?){4}$') as Word4,
Regexp_extract(StringToParse,r'\s?([^\s]*)(?:[^\s]*\s?){5}$') as Word5,
Regexp_extract(StringToParse,r'\s?([^\s]*)(?:[^\s]*\s?){6}$') as Word6,
Regexp_extract(StringToParse,r'\s?([^\s]*)(?:[^\s]*\s?){7}$') as Word7,
FROM
(SELECT 'arbitrary number of words separated by space.' as StringToParse)
它仍然是固定数量的字段,但编码更简单,更易读。
希望这有帮助
答案 1 :(得分:0)
不幸的是,今天BigQuery中没有简单的SPLIT() - 但它是一个很好的功能请求。
我喜欢你开发的答案,我会更多地尝试它。对于替代方法,您还可以尝试https://stackoverflow.com/a/18711812/132438。
同时自动执行此操作的最佳方法可能是在BigQuery外部自动生成查询。