在PostgreSQL表中查找数据的统计信息。每列的唯一计数和最高频率

时间:2018-10-01 20:41:47

标签: sql postgresql

我需要了解每个表列的一些值,并希望能够在一个查询中做到这一点。

假设我们有一个包含列的表:A,B,C。

A     B      C
--------------------
Red   Red    Red
Red   Blue   Red
Blue  Green  Red
Blue  Green  Red

我想要一个输出,该输出说明A,B和C作为单独的列有多少个唯一值。 所以,它会给出

2, 3, 1
  • A(红色和蓝色)的2个唯一值
  • B的3个唯一值(红色,蓝色和绿色)
  • C(红色)的1个唯一值

反正有机会在一个电话中得到它。

此外,我想获得最常用值的频率:

2, 2, 4
  • 2,因为有2个红色(或蓝色,值相同),
  • 2因为有2个绿色,
  • 4因为有4个红色

在相同或另一个查询中。

我不想为每列进行单独的查询,因为理论上可能会有很多列。

有有效的方法吗?

1 个答案:

答案 0 :(得分:5)

使用aggregate functiionsDISTINCT,每列有多少个唯一值:

select
  count(distinct a) as cnt_a,
  count(distinct b) as cnt_b,
  count(distinct c) as cnt_c
from yourtable

返回:

2,3,1

使用window functionsaggregate functiions的最常见值的频率:

select 
  max(cnt_a) as fr_a,
  max(cnt_b) as fr_b,
  max(cnt_c) as fr_c
from (
  select
    count(*) over (partition by a) as cnt_a,
    count(*) over (partition by b) as cnt_b,
    count(*) over (partition by c) as cnt_c
  from yourtable
) t

返回:

2,2,4

UNION ALL 组合在一起:

select
  'unique values' as description,
  count(distinct a) as cnt_a,
  count(distinct b) as cnt_b,
  count(distinct c) as cnt_c
from yourtable
union all
select
  'freq of most common value',
  max(cnt_a),
  max(cnt_b),
  max(cnt_c)
from (
  select
    count(*) over (partition by a) as cnt_a,
    count(*) over (partition by b) as cnt_b,
    count(*) over (partition by c) as cnt_c
  from yourtable
) t

返回:

        description        | cnt_a | cnt_b | cnt_c
---------------------------+-------+-------+-------
 unique values             |     2 |     3 |     1
 freq of most common value |     2 |     2 |     4
相关问题