查找组中最常出现的价值

时间:2019-06-12 20:37:15

标签: clickhouse

我想找到每个组中出现最多的值。

我尝试使用top(k)(column),但出现以下错误: 列类不在聚合函数下,也不在GROUP BY中。

例如: 如果我的表test_date具有column(pid,value)

pid, value
----------
1,a
1,b
1,a
1,c

我想要结果:

pid, value
----------
1,a

我尝试了SELECT pid,top(1)(value) top_value FROM test_data group by pid

I get the error: 

Column value  is not under aggregate function and not in GROUP BY

我也尝试过使用anyHeavy(),但是它只适用于出现一半以上情况的值

2 个答案:

答案 0 :(得分:2)

此查询应为您提供帮助:

    SELECT
        pid,
        /*
        Decompose the query in parts:
        1. groupArray((value, count)): convert the group of rows with the same 'pid' to the array of tuples (value, count)
        2. arrayReverseSort: make reverse sorting by 'count' ('x.2' is 'count')
        3. [1].1: take the 'value' from the first item of the sorted array
        */
        arrayReverseSort(x -> x.2, groupArray((value, count)))[1].1 AS value
    FROM
    (
        SELECT
            pid,
            value,
            count() AS count
        FROM test_date
        GROUP BY
            pid,
            value
    )
    GROUP BY pid
    ORDER BY pid ASC

答案 1 :(得分:0)

SELECT pid,topK(1)(value) top_value FROM test_data group by pid