在完整外部联接查询中按日期过滤行 - >缺少一些结果

时间:2011-10-27 09:00:31

标签: mysql where-clause full-outer-join

背景

我在MySQL中有两个包含不同类型反馈项的表。我已经构建了一个查询来组合这些表FULL OUTER JOIN(它实际上写成两个连接和MySQL中的联合)并计算一些平均成绩。此查询似乎完美无缺:

  (SELECT name, AVG(l.overallQuality) AS avgLingQual,
    AVG(s.overallSatisfaction) AS avgSvcQual
  FROM feedback_linguistic AS l
  LEFT JOIN feedback_service AS s USING(name)
  GROUP BY name)
UNION ALL
  (SELECT name, AVG(l.overallQuality) AS avgLingQual, 
    AVG(s.overallSatisfaction) AS avgSvcQual
  FROM feedback_linguistic AS l
  RIGHT JOIN feedback_service AS s USING(name)
  WHERE l.id IS NULL
  GROUP BY name)
ORDER BY name;

(这在某种程度上简化了可读性,但在这里没有区别)

问题

接下来,我尝试按日期添加过滤(即仅考虑在特定日期之后创建的反馈项目)。凭借我的SQL技能和我所做的研究,我能够想出这个:

  (SELECT name, AVG(l.overallQuality) AS avgLingQual,
    AVG(s.overallSatisfaction) AS avgSvcQual
  FROM feedback_linguistic AS l
  LEFT JOIN feedback_service AS s USING(name)
  WHERE (s.createdTime >= '" & date & "' OR s.createdTime IS NULL)
    AND (l.createdTime >= '" & date & "' OR l.createdTime IS NULL)
  GROUP BY name)
UNION ALL
  (SELECT name, AVG(l.overallQuality) AS avgLingQual, 
    AVG(s.overallSatisfaction) AS avgSvcQual
  FROM feedback_linguistic AS l
  RIGHT JOIN feedback_service AS s USING(name)
  WHERE l.id IS NULL
    AND (s.createdTime >= '" & date & "' OR s.createdTime IS NULL)
  GROUP BY name)
ORDER BY name;

这个几乎有效:我得到的结果是正确的。但是,缺少一些反馈项。例如,在一个月前设置日期,我计算了数据库中21个不同人的反馈,但此查询仅返回19个人。最糟糕的是,我似乎无法找到丢失物品之间的任何相似之处。

我在这个查询中做错了吗?我认为WHERE子句在JOIN之后进行日期过滤,理想情况下我可能会在之前做过。然后,我不知道这是否会导致我的问题,我也不知道如何以不同的方式编写此查询。

2 个答案:

答案 0 :(得分:2)

我接受了Johans的回答,因为他很好地向我解释了这些东西,即使在更一般的意义上,答案也很有用。但是,我想我也会发布我到达的第一个解决方案。它使用子查询:

  (SELECT name, AVG(l.overallQuality) AS avgLingQual,
    AVG(s.overallSatisfaction) AS avgSvcQual
  FROM (
    SELECT * FROM feedback_linguistic WHERE createdTime >= '" & date & "'
  ) AS l
  LEFT JOIN (
    SELECT * FROM feedback_service WHERE createdTime >= '" & date & "'
  ) AS s USING(name)
  GROUP BY name)
UNION ALL
  (SELECT name, AVG(l.overallQuality) AS avgLingQual, 
    AVG(s.overallSatisfaction) AS avgSvcQual
  FROM (
    SELECT * FROM feedback_linguistic WHERE createdTime >= '" & date & "'
  ) AS l
  RIGHT JOIN (
    SELECT * FROM feedback_service WHERE createdTime >= '" & date & "'
  ) AS s USING(name)
  WHERE l.id IS NULL
  GROUP BY name)
ORDER BY name;

此查询的结果是正确的。但是,该解决方案并不真正看起来最佳,因为子查询有时在我的经验中很慢。然后,我还没有做任何性能分析,所以也许在这里使用子查询不是瓶颈。无论如何,它在我的应用程序中运行得足够快。

答案 1 :(得分:1)

完整外连接是3个连接的组合:

A和B之间的1-内连接 2-左和A和B之间的排除连接 3- A和B之间的右排除连接

请注意,内部和外部排除联接的组合是左外部联接,因此您通常会将查询重写为left outer join + right exclusion join。 但是出于调试目的,union所有3个连接都有用,并添加一些关于哪个连接执行的标记:

  /*inner join*/
  (SELECT
     'inner' as join_type 
     , COALESCE(s.name, l.name) as listname
     , AVG(l.overallQuality) AS avgLingQual
     , AVG(s.overallSatisfaction) AS avgSvcQual 
  FROM feedback_linguistic l 
  INNER JOIN feedback_service s ON (l.name = s.name) 
  WHERE (s.createdTime >= '" & date & "' OR s.createdTime IS NULL) 
    AND (l.createdTime >= '" & date & "' OR l.createdTime IS NULL) 
  GROUP BY l.name) 
UNION ALL
  (SELECT
     'left exclusion' as join_type 
     , COALESCE(s.name, l.name) as listname
     , AVG(l.overallQuality) AS avgLingQual
     , AVG(s.overallSatisfaction) AS avgSvcQual 
  FROM feedback_linguistic l 
  LEFT JOIN feedback_service s ON (l.name = s.name) 
  WHERE s.id IS NULL
    /*AND (s.createdTime >= '" & date & "' OR s.createdTime IS NULL) */
    AND (l.createdTime >= '" & date & "' OR l.createdTime IS NULL) 
  GROUP BY l.name) 
UNION ALL
  (SELECT 
     'right exclusion' as join_type
     , COALESCE(s.name, l.name) as listname
     , AVG(l.overallQuality) AS avgLingQual 
     , AVG(s.overallSatisfaction) AS avgSvcQual 
  FROM feedback_linguistic l 
  RIGHT JOIN feedback_service s ON (s.name = l.name) 
  WHERE l.id IS NULL
    AND (s.createdTime >= '" & date & "' OR s.createdTime IS NULL) 
    /*AND (l.createdTime >= '" & date & "' OR l.createdTime IS NULL) */
  GROUP BY s.name) 
ORDER BY listname; 
  

我认为WHERE子句在JOIN之后执行日期过滤,理想情况下我可能会在之前执行此操作。

如果您想先进行过滤,请将其放在join子句中。