PostgreSQL 在同一列上找到对的常见组合

时间:2021-03-25 07:34:30

标签: postgresql group-by

我需要什么

查找平台 2 提供的更多平台对 平台 1 提供的系列的 50% 以上。返回平台对(重命名为平台 1, platform2) 以及它们共有的系列的比例(numcommon)

我的尝试:这里我有多个系列在不同平台上播出。我只是添加了其中的几个。例如:

完整数据集:

|seriesid|platform       |
|--------|---------------|
|4       |Netflix        |
|6       |Netflix        |
|8       |Netflix        |
|13      |Netflix        |
|13      |TNT            |
|13      |Free Services  |
|16      |Hulu           |
|16      |Netflix        |
|16      |Free Services  |
|20      |Netflix        |

预期输出:

通过使用@Julius

的查询

当前查询

select 
  sp1.platform as platform1, sp2.platform as platform2, count(*) as numcommon 
from 
  seriesonplatform AS sp1
  JOIN seriesonplatform AS sp2 ON (sp1.seriesid = sp2.seriesid) AND (sp1.platform <>  sp2.platform)
group by
  sp1.platform,
  sp2.platform
order by sp1.platform, sp2.platform,numcommon

电流输出

    |platform1|platform2      |numcommon|
|---------|---------------|---------|
|A&E      |Free Services  |1        |
|A&E      |Prime Video    |1        |
|AcornTV  |CBS All Access |1        |
|AcornTV  |Free Services  |1        |
|AcornTV  |Hoopla         |2        |
|AcornTV  |Hulu           |1        |
|AcornTV  |Netflix        |6        |
|AcornTV  |Prime Video    |2        |
|Adult Swim|Free Services  |1        |
|Adult Swim|HBO MAX        |1        |
|BET      |BET+           |1        |
|BET      |Free Services  |1        |
|BET+     |BET            |1        |
|BET+     |Free Services  |1        |
|Bravo    |Free Services  |1        |
|Bravo    |Netflix        |2        |
|BritBox  |Hoopla         |2        |
|CBS All Access|AcornTV        |1        |
|CBS All Access|Free Services  |6        |
|CBS All Access|Hulu           |1        |
|CBS All Access|Netflix        |7        |
|CBS All Access|fuboTV         |2        |
|Cartoon Network|Free Services  |2        |
|Cartoon Network|HBO MAX        |2        |
|Cartoon Network|Hulu           |1        |
|Cartoon Network|Netflix        |1        |
|Comedy Central|Hoopla         |1        |
|Comedy Central|Netflix        |1        |
|Crunchyroll|Free Services  |3        |

预期产出

    platform1    |   platform2   | numcommon 
-----------------+---------------+-----------
 AcornTV         | Netflix       |      0.60
 Adult Swim      | Free Services |      1.00
 Adult Swim      | HBO MAX       |      1.00
 BET             | BET+          |      1.00
 BET             | Free Services |      1.00
 BET+            | BET           |      1.00
 BET+            | Free Services |      1.00
 Bravo           | Netflix       |      1.00
 Cartoon Network | Free Services |      0.67
 Cartoon Network | HBO MAX       |      0.67
 CBS All Access  | Netflix       |      0.58
 Crunchyroll     | Free Services |      1.00
 Disney          | Disney+       |      1.00
 Disney          | Free Services |      1.00
 fuboTV          | Netflix       |      0.60
 Funimation      | Free Services |      0.67
 HBO             | HBO MAX       |      1.00
 History         | Hulu          |      0.67
 History         | Netflix       |      0.67
 IFC             | fuboTV        |      0.67
 IFC             | Netflix       |      0.67
 IndieFlix       | Free Services |      1.00
 IndieFlix       | Hoopla        |      1.00
 Lifetime        | Netflix       |      0.67
 Showtime        | fuboTV        |      1.00
 Showtime        | Netflix       |      0.67
 Sundance        | Free Services |      1.00
 Sundance        | Prime Video   |      1.00

1 个答案:

答案 0 :(得分:1)

我猜你需要一个连接来查找所有平台对:

select 
  sp1.platform as platform1, sp2.platform as platform2, count(*) as numcommon 
from 
  seriesonplatform AS sp1
  JOIN seriesonplatform AS sp2 ON (sp1.seriesid = sp2.seriesid) AND (sp1.platform <>  sp2.platform)
where 
  sp1.platform in ('BET','BET+')
  AND sp2.platform in ('BET','BET+')
group by
  sp1.platform,
  sp2.platform

编辑 2

使用上面的查询,您可以找到常见的序列号。剩下的就是找到百分比:

SELECT 
  platform1.platform, common.platform2, numcommon * 1.0 / platform1.seriescount AS show_on_platform2
FROM (
  SELECT 
    platform, count(*) AS seriescount
  FROM 
    seriesonplatform
  GROUP BY
    platform 
) platform1
LEFT JOIN (
  SELECT 
    sp1.platform as platform1, sp2.platform as platform2, count(*) as numcommon 
  FROM 
    seriesonplatform AS sp1
    JOIN seriesonplatform AS sp2 ON (sp1.seriesid = sp2.seriesid) AND (sp1.platform <>  sp2.platform)
  GROUP BY
    sp1.platform,
    sp2.platform
) AS common ON platform1.platform = common.platform1
WHERE 
  numcommon * 1.0 / platform1.seriescount > 0.5
ORDER BY
  platform1.platform, platform2
相关问题