如何在postgresql查询中找到最佳分组匹配组合

时间:2014-09-07 04:22:47

标签: postgresql postgresql-9.3

我想看看是否有可能用比我更优雅的查询来解决我的问题。也许有窗口分析功能或其他我不知道的东西。

我的树表tableatablebrelation具有以下结构:

tablea               tableb                 relation
------------------   -------------------    ----------------
ida integer PK,      idb integer PK,        id_b integer FK
namea varchar(10)    nameb varchar(10)      id_a integer FK

我们的想法是tableb是包含群组的人,而relation表只是m-ntablea之间的tableb关系

所以我有以下数据(基于n-m关系表)

group1 ( namea1, namea2, namea3 );
group2 ( namea1, namea5, namea6 );
group3 ( namea1, namea6 );
group4 ( namea6, namea7 );
group5 ( namea7, namea8, namea9 );
group6 ( namea1, namea2, namea5, namea6 );

用户想要创建一个新组并为其添加名称(他只能添加tablea上存在的名称)。 我的目标是向该用户展示最接近的相关群组,其中包含他要在此新群组中添加的姓名

因此,假设用户正在创建group7,并希望在其上添加'namea1', 'namea6' and 'namea4'。然后我会向他展示那些已经有这种组合词或至少其中一些词的组。我提出这个问题:

select b.nameb, al.names, array_agg(a.namea), count(a.namea) qty
from
   tablea a 
      INNER JOIN relation g on a.ida = g.id_a
      INNER JOIN tableb b on b.idb = g.id_b
      INNER JOIN 
       (select b.idb, b.nameb, array_agg(a.namea) as names
          from tablea a 
               INNER JOIN relation g on a.ida = g.id_a
               INNER JOIN tableb b on b.idb = g.id_b
         group by b.idb, b.nameb) al ON g.id_b = al.idb
where a.namea in ( 'namea1', 'namea6', 'namea4' )
group by b.nameb, al.names
order by b.nameb;

带给我的是:

Groups     All names on it                 which one he want to add     occur. count
-------------------------------------------------------------------------------------
group1    (namea1,namea2,namea3)           namea1                       1
group2    (namea1,namea5,namea6)           namea1,namea6                2
group3    (namea1,namea6)                  namea1,namea6                2
group4    (namea6,namea7)                  namea6                       1
group6    (namea1,namea2,namea5,namea6)    namea1,namea6                2

向他展示此信息将有助于确定他是否真的需要创建这个新组,或者只是更改已经存在的组,添加不在其上的名称(在我的示例{2}组中, 3或6)。

我用所有这些数据和我的查询创建了一个小提琴示例。它在这里:http://sqlfiddle.com/#!15/8a63b/3

我确实解决了我的问题的问题,我不喜欢它的事实是我不得不重复相同的查询以将其用作子查询来显示所有名称以及用户提供的名称在其中。< / p>

提前致谢。

2 个答案:

答案 0 :(得分:1)

SQL Fiddle

select *, array_length("want to add", 1) as "ocurrences"
from (
    select
        nameb,
        users_in_group,
        (
            select array_agg(namea)
            from unnest(users_in_group) a(namea)
            where namea = any (array['namea1', 'namea6', 'namea4']::varchar(10)[])

        ) as "want to add"
    from (
        select
            b.nameb, array_agg(a.namea) as users_in_group
        from
            tablea a 
            inner join
            relation g on a.ida = g.id_a
            inner join
            tableb b on b.idb = g.id_b
        group by b.nameb
    ) s
    where
        array['namea1', 'namea6', 'namea4']::varchar(10)[] && users_in_group
) s
order by nameb

答案 1 :(得分:1)

这应该很简单,SQLFIDDLE

SELECT
  tableb.nameb AS group_name,
  string_agg( tablea.namea, ', ') as users_in_group,
  string_agg( tablea2.namea, ', ') as want_to_add,
  count(tablea2.ida) as count
FROM
  relation
  JOIN tablea on ( ida = id_a )
  JOIN tableb ON ( idb = id_b )
  LEFT JOIN tablea as tablea2 ON ( tablea2.namea in ('namea1', 'namea6', 'namea4') 
                                  AND tablea2.ida = tablea.ida )
GROUP BY 
  tableb.idb,
  tableb.nameb
HAVING 
   count(tablea2.ida) > 0
ORDER BY 
   tableb.nameb

应该找你,

| GROUP_NAME |                 USERS_IN_GROUP |    WANT_TO_ADD | COUNT |
|------------|--------------------------------|----------------|-------|
|     group1 |         namea1, namea2, namea3 |         namea1 |     1 |
|     group2 |         namea1, namea5, namea6 | namea1, namea6 |     2 |
|     group3 |                 namea1, namea6 | namea1, namea6 |     2 |
|     group4 |                 namea6, namea7 |         namea6 |     1 |
|     group6 | namea1, namea2, namea5, namea6 | namea1, namea6 |     2 |
相关问题