行汇总中的案例声明

时间:2018-12-14 16:33:41

标签: sql google-bigquery

我正在使用标准SQL在Google Big Query中工作。

我有综合浏览量数据。所有相关的是

| user_id | entity_id | url |

URL的格式为/entities/entity_id/showentities/entity_id/reply/new

用户可能会同时使用这两种类型的URL,或者两者都显示,并且可能会重复。

我的目标是一张看起来像这样的桌子

| user_id | entity_id | view_type |

view_type是“显示”或“新”的地方

每个用户/实体对应该只有一行。如果该view_type / user_id对曾经出现在entity_id形式的一个网址的旁边,则/entities/entity_id/replies/new应该是“新的”,但是如果该对没有“新的”,则“显示”网址。如果原始表中没有user_id / entity_id对的示例,则最终表中应该没有它们。

我将在示例中加入with语句,以确保重复性

WITH data AS (
    select 1 as user_id, 23 as entity_id, '/entities/23/replies/new' as url

    UNION ALL

    select 1 as user_id, 23 as entity_id, '/entities/23/show' as url

    UNION ALL

    select 2 as user_id, 30 as entity_id, '/entities/30/show' as url
)
SELECT * from data

那会建立一个这样的表

| user_id | entity_id |            url             |
----------------------------------------------------
|       1 |        23 | '/entities/23/replies/new' |
|       1 |        23 |        '/entities/23/show' |
|       2 |        30 |        '/entities/30/show' |

我可以通过使用两个with语句对任一类型的网址执行select distinct,然后重新加入并执行case语句来实现我的目标,该语句在存在或不存在任何联接的情况下运行为给定的user / entity对工作。

这是我的意思:

WITH data AS (
    select 1 as user_id, 23 as entity_id, '/entities/23/replies/new' as url

    UNION ALL

    select 1 as user_id, 23 as entity_id, '/entities/23/show' as url

    UNION ALL

    select 2 as user_id, 30 as entity_id, '/entities/30/show' as url
), news AS (
    SELECT DISTINCT user_id, entity_id, 1 as found
    FROM data 
    WHERE url like '%new'
), shows AS (
    SELECT DISTINCT user_id, entity_id, 1 as found 
    FROM data
    WHERE url like '%show'
)
SELECT DISTINCT d.user_id, 
    d.entity_id,
    CASE WHEN n.found = 1 then 'new'
        WHEN s.found = 1 then 'show' end as view_type
FROM data d
LEFT JOIN news n on n.user_id = d.user_id and n.entity_id = d.entity_id
LEFT JOIN shows s on s.user_id = d.user_id and s.entity_id = d.entity_id

显然,样本数据使它看起来比实际的还要令人生畏,但这仍然是一个笨拙的,难以理解的查询,如果我添加另一个我想考虑的view_type,很难扩展。

我想一定有更好的方法!

在我看来,我可以尝试将user_id / entity_id对的所有url填充到一个数组中,然后使用case语句对数组进行操作,例如:如果数组中的任何元素都匹配“ new”,则匹配“ new”,等等。)但是我不确定如何进行“元素正则表达式匹配”,或者甚至是可能的。

任何人都能提供的见解,我将不胜感激!

1 个答案:

答案 0 :(得分:1)

一种方法是聚合:

SELECT user_id, entity_id, 
       (CASE WHEN COUNTIF(url like '%new') > 0 THEN 'new' ELSE 'show'
        END) as view_type
FROM data 
GROUP BY user_id, entity_id