在物化子查询上指定外部查询的条件

时间:2016-07-13 04:12:32

标签: mysql subquery views query-optimization

我有以下查询引用了几个观点&goldedRunQueries'和' currentGoldMarkings'。我的问题似乎来自子查询中引用的视图 - currentGoldMarkings 。执行时,MySQL首先实现此子查询,然后实现' queryCode'的where子句。和' runId'因此,当视图引用具有数百万行数据的表时,因此导致执行时间超过小时。我的问题是如何在子查询实现之前强制执行这两个条件。

SELECT  goldedRunQueries.queryCode, goldedRunQueries.runId
    FROM  goldedRunQueries
    LEFT OUTER JOIN  
      ( SELECT  measuredRunId, queryCode, COUNT(resultId) as c
            FROM  currentGoldMarkings
            GROUP BY  measuredRunId, queryCode
      ) AS accuracy  ON accuracy.measuredRunId = goldedRunQueries.runId
      AND  accuracy.queryCode = goldedRunQueries.queryCode
    WHERE  goldedRunQueries.queryCode IN ('CH001', 'CH002', 'CH003')
      and  goldedRunQueries.runid = 5000
    ORDER BY  goldedRunQueries.runId DESC, goldedRunQueries.queryCode;

以下是两种观点。这两个都可以在独立模式下使用,因此无法将任何子句集成到它们中。

CREATE VIEW currentGoldMarkings
AS
SELECT  result.resultId, result.runId AS measuredRunId, result.documentId,
        result.queryCode, result.queryValue AS measuredValue,
        gold.queryValue AS goldValue,
        CASE result.queryValue WHEN gold.queryValue THEN 1 ELSE 0 END AS correct
    FROM  results AS result
    INNER JOIN  gold  ON gold.documentId = result.documentId
      AND  gold.queryCode = result.queryCode
    WHERE  gold.isCurrent = 1 

CREATE VIEW goldedRunQueries
AS
SELECT  runId, queryCode
    FROM  runQueries
    WHERE  EXISTS 
      ( SELECT  1 AS Expr1
            FROM  runs
            WHERE  (runId = runQueries.runId)
              AND  (isManual = 0)
      )
      AND  EXISTS 
      ( SELECT  1 AS Expr1
            FROM  results
            WHERE  (runId = runQueries.runId)
              AND  (queryCode = runQueries.queryCode)
              AND  EXISTS 
              ( SELECT  1 AS Expr1
                    FROM  gold
                    WHERE  (documentId = results.documentId)
                      AND  (queryCode = results.queryCode)
              )
      ) 

注意:以上查询仅反映了我实际查询的一部分。还有3个左外连接在性质上与上述子查询类似,这使问题更加严重。

编辑:正如所建议的,这是表格的结构和一些示例数据

CREATE TABLE `results`(
`resultId` int auto_increment NOT NULL,
`runId` int NOT NULL,
`documentId` int NOT NULL,
`queryCode` char(5) NOT NULL,
`queryValue` char(1) NOT NULL,
`comment` varchar(255) NULL,
 CONSTRAINT `PK_results` PRIMARY KEY 
(
`resultId`
)
);


insert into results values (100, 242300, 'AC001', 'I', NULL)
insert into results values (100, 242300, 'AC001', 'S', NULL)
insert into results values (150, 242301, 'AC005', 'I', 'abc')
insert into results values (100, 242300, 'AC001', 'I', NULL)
insert into results values (109, 242301, 'PQ001', 'S', 'zzz')
insert into results values (400, 242400, 'DD006', 'I', NULL)



CREATE TABLE `gold`(
`goldId` int auto_increment NOT NULL,
`runDate` datetime NOT NULL,
`documentId` int NOT NULL,
`queryCode` char(5) NOT NULL,
`queryValue` char(1) NOT NULL,
`comment` varchar(255) NULL,
`isCurrent` tinyint(1) NOT NULL DEFAULT 0,
CONSTRAINT `PK_gold` PRIMARY KEY 
(
`goldId`
)
);



insert into gold values ('2015-02-20 00:00:00', 138904, 'CH001', 'N', NULL, 1)
insert into gold values ('2015-05-20 00:00:00', 138904, 'CH001', 'N', 'aaa', 1)
insert into gold values ('2016-02-20 00:00:00', 138905, 'CH002', 'N', NULL, 0)
insert into gold values ('2015-12-12 00:00:00', 138804, 'CH001', 'N', 'zzzz', 1)



CREATE TABLE `runQueries`(
`runId` int NOT NULL,
`queryCode` char(5) NOT NULL,
CONSTRAINT `PK_runQueries` PRIMARY KEY 
(
`runId`,
`queryCode`
)
);


insert into runQueries values (100, 'AC001')
insert into runQueries values (109, 'PQ001')
insert into runQueries values (400, 'DD006')



CREATE TABLE `runs`(
`runId` int auto_increment NOT NULL,
`runName` varchar(63) NOT NULL,
`isManual` tinyint(1) NOT NULL,
`runDate` datetime NOT NULL,
`comment` varchar(1023) NULL,
`folderName` varchar(63) NULL,
`documentSetId` int NOT NULL,
`pipelineVersion` varchar(50) NULL,
`isArchived` tinyint(1) NOT NULL DEFAULT 0,
`pipeline` varchar(50) NULL,
CONSTRAINT `PK_runs` PRIMARY KEY 
(
`runId`
)
);


insert into runs values ('test1', 0, '2015-08-04 06:30:46.000000', 'zzzz', '2015-08-04_103046', 2, '2015-08-03', 0, NULL)
insert into runs values ('test2', 1, '2015-12-04 12:30:46.000000', 'zzzz', '2015-08-04_103046', 2, '2015-08-03', 0, NULL)
insert into runs values ('test3', 1, '2015-06-24 10:56:46.000000', 'zzzz', '2015-08-04_103046', 2, '2015-08-03', 0, NULL)
insert into runs values ('test4', 1, '2016-05-04 11:30:46.000000', 'zzzz', '2015-08-04_103046', 2, '2015-08-03', 0, NULL)

1 个答案:

答案 0 :(得分:1)

首先,让我们尝试通过索引提高性能:

结果:INDEX(runId,queryCode) - 按任意顺序排列    gold:INDEX(documentId,query_code,isCurrent) - 按此顺序

之后,更新问题中的CREATE TABLEs并添加输出:

EXPLAIN EXTENDED SELECT ...;
SHOW WARNINGS;

你在运行什么版本?你实际上有FROM ( SELECT ... ) JOIN ( SELECT ... )。在5.6之前,子查询都没有索引;使用5.6,即时生成索引。

遗憾的是,查询是以这种方式构建的,因为您知道要使用哪一个:and goldedRunQueries.runid = 5000

底线:添加索引;升级到5.6或5.7;如果这还不够,那么请重新考虑使用VIEWs