Bigquery:将SPLIT()输出行分解为多个列

时间:2016-05-12 18:35:30

标签: google-bigquery

我在一列中有一个长字符串,需要在多行中将其分解,然后拆分成多列。数据如下:((a:10,b:20,c:test1)(a:40,b:50,c:test2)(a:60,b:70,c:test3))。当我应用split和regexp_replace时,我得到了像

这样的结果

选择SPLIT(REGEXP_REPLACE(REGEXP_REPLACE(message,r'))',''),r'((',''),')(')as msg FROM [mydataset.mytable]

输出:

msg
a:10,b:20,c:test1
a:40,b:50,c:test2
a:60,b:70,c:test3

What I am looking for is:
a b c
10 20 test1
40 50 test2
60 70 test3

我再次使用拆分来按(,)分割行,但它只给我一行而不是3.非常感谢你的帮助。

2 个答案:

答案 0 :(得分:2)

尝试以下示例

SELECT
  MIN(CASE WHEN name = 'a' THEN value END) AS a,
  MIN(CASE WHEN name = 'b' THEN value END) AS b,
  MIN(CASE WHEN name = 'c' THEN value END) AS c
FROM (
  SELECT
    message, msg, 
    REGEXP_EXTRACT(pair, r'(\w*):') AS name, 
    REGEXP_EXTRACT(pair, r':(\w*)') AS value
  FROM (
    SELECT message, msg, 
      SPLIT(msg) AS pair
    FROM (
      SELECT message, 
        SPLIT(REPLACE(REPLACE(message, '))',''), '((','') ,')(') AS msg
      FROM 
        (SELECT '((a:10,b:20,c:test1)(a:40,b:50,c:test2)(a:60,b:70,c:test3))' AS message),
        (SELECT '((a:12,b:22,c:test4)(a:42,b:52,c:test5)(a:62,b:72,c:test6))' AS message),
    )
  )
) 
GROUP BY message, msg

答案 1 :(得分:2)

这是使用standard SQL(取消选中“显示选项”下的“使用旧版SQL”框)的替代解决方案,该框仍然相对冗长但需要较少的文本操作:

WITH MyTable AS (
  SELECT messages
  FROM UNNEST(['((a:10,b:20,c:test1)(a:40,b:50,c:test2)(a:60,b:70,c:test3))',
               '((a:12,b:22,c:test4)(a:42,b:52,c:test5)(a:62,b:72,c:test6))'])
    AS messages)
SELECT
  (SELECT value FROM UNNEST(message_parts) WHERE name = 'a') AS a,
  (SELECT value FROM UNNEST(message_parts) WHERE name = 'b') AS b,
  (SELECT value FROM UNNEST(message_parts) WHERE name = 'c') AS c
FROM (
  SELECT ARRAY(SELECT AS STRUCT
                 SPLIT(part, ':')[OFFSET(0)] AS name,
                 SPLIT(part, ':')[OFFSET(1)] AS value
               FROM UNNEST(SPLIT(message, ',')) AS part) AS message_parts
  FROM (SELECT message FROM MyTable,
          UNNEST(REGEXP_EXTRACT_ALL(messages, r'\(([^\(\)]+)\)')) AS message)
);