第二组中每组最大的一组

时间:2014-02-12 21:47:59

标签: mysql greatest-n-per-group

我有一个数据库,其中每个条目都是带有源标记,关系和权重的边缘。我想在给定源标记的情况下执行查询,按重量获得前n个边缘,每个关系使用该源标记

例如,给出条目

Id   Source   Relationship   End      Weight
-----------------------------------------
1    cat       isA           feline   56
2    cat       isA           animal   12
3    cat       isA           pet      37
4    cat       desires       food     5
5    cat       desires       play     88
6    dog       isA           canine   72

如果我查询使用“cat”作为源并且n = 2,则结果应为

Id   Source   Relationship   End      Weight
-----------------------------------------
1    cat       isA           feline   56
3    cat       isA           pet      37
4    cat       desires       food     5
5    cat       desires       play     88

我根据其他问题尝试了几种不同的方法。

迄今为止最成功的是基于How to SELECT the newest four items per category?

SELECT *
FROM tablename t1
JOIN tablename t2 ON (t1.relationship = t2.relationship)
LEFT OUTER JOIN tablename t3
  ON (t1.relationship = t3.relationship AND t2.weight < t3.weight)
WHERE t1.source = "cat"
  AND t3.relationship IS NULL
ORDER BY t2.weight DESC;

但是,这会以排序顺序返回source =“cat”的所有边。如果我尝试添加LIMIT,我会得到顶部权重而不是组的边缘。

我尝试的另一件事是

SELECT *
FROM tablename t1
WHERE t1.source="cat"
AND (
     SELECT COUNT(*) 
     FROM tablename t2
     WHERE t1.relationship = t2.relationship 
     AND t1.weight <= t2.weight           
) <= 2;

返回

Id   Source   Relationship   End      Weight
-----------------------------------------
1    cat       isA           feline   56
4    cat       desires       food     5
5    cat       desires       play     88

因为边缘6对于isA关系的权重高于边缘2,但是从结果中排除,因为source =“dog”

我对数据库很新,所以如果我需要采取完全不同的方法,请告诉我。我不怕重新开始。

2 个答案:

答案 0 :(得分:2)

使用相关子查询执行此操作确实效率低下,因为MySQL必须为外部查询的每个行运行子查询,只是为了确定外部查询中的行是否满足条件。这是一个很大的开销。

这是一个不使用子查询的方法:

SELECT t1.*
FROM tablename t1
JOIN tablename t2 ON t1.source = t2.source and t1.relationship = t2.relationship
  AND t1.weight <= t2.weight
WHERE t1.source = 'cat' 
GROUP BY t1.id
HAVING COUNT(*) <= 2;

这里的方法既不使用子查询,也不使用连接/分组:

SELECT *
FROM (
    SELECT tablename.*, IF(@r = relationship, @n:=@n+1, @n:=1) AS _n, 
        @r:=relationship AS _r
    FROM (SELECT @r:=null, @n:=1) _init, tablename
    WHERE source = 'cat'
    ORDER BY relationship, weight DESC
) AS _t
WHERE _n <= 2;

如果有多个行具有相同的顶部权重,这些解决方案还需要一些决胜局。但这适用于所有解决方案。

更简单的解决方案,不需要特殊的体操或破坏者,使用SQL窗口函数,如ROW_NUMBER() OVER (PARTITION BY relationship),但MySQL does not support these

答案 1 :(得分:0)

它不会太高效,但MySQL允许你做这样的事情:

SELECT t1.*
FROM
  tablename t1 INNER JOIN (
    SELECT SUBSTRING_INDEX(
             GROUP_CONCAT(Id ORDER BY Weight DESC),
             ',',
             2) top_2
    FROM tablename
    WHERE Source='cat'
    GROUP BY Relationship) t2
  ON FIND_IN_SET(t1.id, t2.top_2);

请参阅小提琴here