SQLite查询运行速度比MS Access查询慢10倍

时间:2010-01-06 23:25:04

标签: sqlite query-optimization

我有一个800MB MS Access数据库,我迁移到SQLite。数据库的结构如下(迁移后的SQLite数据库大约为330MB):

Occurrence有1,600,000条记录。该表看起来像:

CREATE TABLE Occurrence 
(
SimulationID  INTEGER,    SimRunID   INTEGER,    OccurrenceID   INTEGER,
OccurrenceTypeID    INTEGER,    Period    INTEGER,    HasSucceeded    BOOL, 
PRIMARY KEY (SimulationID,  SimRunID,   OccurrenceID)
)

它有以下索引:

CREATE INDEX "Occurrence_HasSucceeded_idx" ON "Occurrence" ("HasSucceeded" ASC)

CREATE INDEX "Occurrence_OccurrenceID_idx" ON "Occurrence" ("OccurrenceID" ASC)

CREATE INDEX "Occurrence_SimRunID_idx" ON "Occurrence" ("SimRunID" ASC)

CREATE INDEX "Occurrence_SimulationID_idx" ON "Occurrence" ("SimulationID" ASC)

OccurrenceParticipant有3,400,000条记录。该表看起来像:

CREATE TABLE OccurrenceParticipant 
(
SimulationID    INTEGER,     SimRunID    INTEGER,    OccurrenceID     INTEGER,
RoleTypeID     INTEGER,     ParticipantID    INTEGER
)

它有以下索引:

CREATE INDEX "OccurrenceParticipant_OccurrenceID_idx" ON "OccurrenceParticipant" ("OccurrenceID" ASC)

CREATE INDEX "OccurrenceParticipant_ParticipantID_idx" ON "OccurrenceParticipant" ("ParticipantID" ASC)

CREATE INDEX "OccurrenceParticipant_RoleType_idx" ON "OccurrenceParticipant" ("RoleTypeID" ASC)

CREATE INDEX "OccurrenceParticipant_SimRunID_idx" ON "OccurrenceParticipant" ("SimRunID" ASC)

CREATE INDEX "OccurrenceParticipant_SimulationID_idx" ON "OccurrenceParticipant" ("SimulationID" ASC)

InitialParticipant有130条记录。表的结构是

CREATE TABLE InitialParticipant 
(
ParticipantID    INTEGER  PRIMARY KEY,     ParticipantTypeID    INTEGER,
ParticipantGroupID     INTEGER
)

该表包含以下索引:

CREATE INDEX "initialpart_participantTypeID_idx" ON "InitialParticipant" ("ParticipantGroupID" ASC)

CREATE INDEX "initialpart_ParticipantID_idx" ON "InitialParticipant" ("ParticipantID" ASC)

ParticipantGroup有22条记录。它看起来像

CREATE TABLE ParticipantGroup   (
ParticipantGroupID    INTEGER,    ParticipantGroupTypeID     INTEGER,
Description    varchar (50),      PRIMARY KEY(  ParticipantGroupID  )
)

该表具有以下索引: CREATE INDEX“ParticipantGroup_ParticipantGroupID_idx”ON“ParticipantGroup”(“ParticipantGroupID”ASC)

tmpSimArgs有18条记录。它具有以下结构:

CREATE TABLE tmpSimArgs (SimulationID varchar, SimRunID int(10))

以下索引:

CREATE INDEX tmpSimArgs_SimRunID_idx ON tmpSimArgs(SimRunID ASC)

CREATE INDEX tmpSimArgs_SimulationID_idx ON tmpSimArgs(SimulationID ASC)

表'tmpPartArgs'有80条记录。它具有以下结构:

CREATE TABLE tmpPartArgs(participantID INT)

以下索引:

CREATE INDEX tmpPartArgs_participantID_idx ON tmpPartArgs(participantID ASC)

我有一个涉及多个INNER JOIN的查询,我面临的问题是查询的Access版本大约需要一秒钟,而同一查询的SQLite版本需要10秒(大约10倍慢!)这是不可能的让我迁移回Access和SQLite是我唯一的选择。

我是编写数据库查询的新手,因此这些查询可能看起来很愚蠢,所以请告知任何您看错或儿童菜的内容。

Access中的查询是(整个查询需要1秒才能执行):

SELECT ParticipantGroup.Description, Occurrence.SimulationID, Occurrence.SimRunID, Occurrence.Period, Count(OccurrenceParticipant.ParticipantID) AS CountOfParticipantID FROM 
( 
   ParticipantGroup INNER JOIN InitialParticipant ON ParticipantGroup.ParticipantGroupID =  InitialParticipant.ParticipantGroupID
) INNER JOIN 
(
tmpPartArgs INNER JOIN 
  (
     (
        tmpSimArgs INNER JOIN Occurrence ON (tmpSimArgs.SimRunID = Occurrence.SimRunID)   AND (tmpSimArgs.SimulationID = Occurrence.SimulationID)
     ) INNER JOIN OccurrenceParticipant ON (Occurrence.OccurrenceID =    OccurrenceParticipant.OccurrenceID) AND (Occurrence.SimRunID = OccurrenceParticipant.SimRunID) AND (Occurrence.SimulationID = OccurrenceParticipant.SimulationID)
  ) ON tmpPartArgs.participantID = OccurrenceParticipant.ParticipantID
) ON InitialParticipant.ParticipantID = OccurrenceParticipant.ParticipantID WHERE (((OccurrenceParticipant.RoleTypeID)=52 Or (OccurrenceParticipant.RoleTypeID)=49)) AND Occurrence.HasSucceeded = True GROUP BY ParticipantGroup.Description, Occurrence.SimulationID, Occurrence.SimRunID, Occurrence.Period;

SQLite查询如下(此查询大约需要10秒):

SELECT ij1.Description, ij2.occSimulationID, ij2.occSimRunID, ij2.Period, Count(ij2.occpParticipantID) AS CountOfParticipantID FROM 
(
   SELECT ip.ParticipantGroupID AS ipParticipantGroupID, ip.ParticipantID AS ipParticipantID, ip.ParticipantTypeID, pg.ParticipantGroupID AS pgParticipantGroupID, pg.ParticipantGroupTypeID, pg.Description FROM ParticipantGroup as pg INNER JOIN InitialParticipant AS ip ON pg.ParticipantGroupID = ip.ParticipantGroupID
) AS ij1 INNER JOIN 
(
   SELECT tpa.participantID AS tpaParticipantID, ij3.* FROM tmpPartArgs AS tpa INNER JOIN 
     (
       SELECT ij4.*, occp.SimulationID as occpSimulationID, occp.SimRunID AS occpSimRunID, occp.OccurrenceID AS occpOccurrenceID, occp.ParticipantID AS occpParticipantID, occp.RoleTypeID FROM 
          (
              SELECT tsa.SimulationID AS tsaSimulationID, tsa.SimRunID AS tsaSimRunID, occ.SimulationID AS occSimulationID, occ.SimRunID AS occSimRunID, occ.OccurrenceID AS occOccurrenceID, occ.OccurrenceTypeID, occ.Period, occ.HasSucceeded FROM tmpSimArgs AS tsa INNER JOIN Occurrence AS occ ON (tsa.SimRunID = occ.SimRunID) AND (tsa.SimulationID = occ.SimulationID)
          ) AS ij4 INNER JOIN OccurrenceParticipant AS occp ON (occOccurrenceID =      occpOccurrenceID) AND (occSimRunID = occpSimRunID) AND (occSimulationID = occpSimulationID)
    ) AS ij3 ON tpa.participantID = ij3.occpParticipantID
) AS ij2 ON ij1.ipParticipantID = ij2.occpParticipantID WHERE (((ij2.RoleTypeID)=52 Or (ij2.RoleTypeID)=49)) AND ij2.HasSucceeded = 1 GROUP BY ij1.Description, ij2.occSimulationID, ij2.occSimRunID, ij2.Period;   

我不知道我在这里做错了什么。我有所有的索引,但我认为我缺少声明一些关键索引,将为我做的伎俩。有趣的是,在迁移之前,我对SQLite的“研究”表明,SQLite在各方面都比Access更快,更小,更好。但在查询方面,我似乎无法让SQLite比Access更快。我重申我是SQLite的新手,显然没有太多的想法和经验,所以如果有任何有学识的灵魂可以帮助我解决这个问题,我将非常感激。

3 个答案:

答案 0 :(得分:2)

我已经重新格式化了你的代码(使用我的自制的sql formatter),希望能让其他人更容易阅读..

重新格式化查询:

SELECT
    ij1.Description,
    ij2.occSimulationID,
    ij2.occSimRunID,
    ij2.Period,
    Count(ij2.occpParticipantID) AS CountOfParticipantID

FROM (

    SELECT
        ip.ParticipantGroupID AS ipParticipantGroupID,
        ip.ParticipantID AS ipParticipantID,
        ip.ParticipantTypeID,
        pg.ParticipantGroupID AS pgParticipantGroupID,
        pg.ParticipantGroupTypeID,
        pg.Description

    FROM ParticipantGroup AS pg

    INNER JOIN InitialParticipant AS ip
            ON pg.ParticipantGroupID = ip.ParticipantGroupID

) AS ij1

INNER JOIN (

    SELECT
        tpa.participantID AS tpaParticipantID,
        ij3.*

    FROM tmpPartArgs AS tpa

    INNER JOIN (

        SELECT
            ij4.*,
            occp.SimulationID AS occpSimulationID,
            occp.SimRunID AS occpSimRunID,
            occp.OccurrenceID AS occpOccurrenceID,
            occp.ParticipantID AS occpParticipantID,
            occp.RoleTypeID

        FROM (

            SELECT
                tsa.SimulationID AS tsaSimulationID,
                tsa.SimRunID AS tsaSimRunID,
                occ.SimulationID AS occSimulationID,
                occ.SimRunID AS occSimRunID,
                occ.OccurrenceID AS occOccurrenceID,
                occ.OccurrenceTypeID,
                occ.Period,
                occ.HasSucceeded

            FROM tmpSimArgs AS tsa

            INNER JOIN Occurrence AS occ
                    ON (tsa.SimRunID = occ.SimRunID)
                   AND (tsa.SimulationID = occ.SimulationID)

        ) AS ij4

        INNER JOIN OccurrenceParticipant AS occp
                ON (occOccurrenceID = occpOccurrenceID)
               AND (occSimRunID = occpSimRunID)
               AND (occSimulationID = occpSimulationID)

    ) AS ij3
      ON tpa.participantID = ij3.occpParticipantID

) AS ij2
  ON ij1.ipParticipantID = ij2.occpParticipantID

WHERE (

    (

        (ij2.RoleTypeID) = 52
        OR
        (ij2.RoleTypeID) = 49

    )

)
  AND ij2.HasSucceeded = 1

GROUP BY
    ij1.Description,
    ij2.occSimulationID,
    ij2.occSimRunID,
    ij2.Period;

根据JohnFx(上图),我对派生的视图感到困惑。我认为实际上没有必要,特别是因为它们都是内在的联系。所以,下面我试图降低复杂性。请检查并测试性能。我不得不与tmpSimArgs进行交叉连接,因为它只与Occurence结合 - 我认为这是期望的行为。

SELECT
    pg.Description,
    occ.SimulationID,
    occ.SimRunID,
    occ.Period,
    COUNT(occp.ParticipantID) AS CountOfParticipantID

FROM ParticipantGroup AS pg

INNER JOIN InitialParticipant AS ip
        ON pg.ParticipantGroupID = ip.ParticipantGroupID

CROSS JOIN tmpSimArgs AS tsa

INNER JOIN Occurrence AS occ
        ON tsa.SimRunID = occ.SimRunID
       AND tsa.SimulationID = occ.SimulationID

INNER JOIN OccurrenceParticipant AS occp
        ON occ.OccurrenceID = occp.OccurrenceID
       AND occ.SimRunID = occp.SimRunID
       AND occ.SimulationID = occp.SimulationID

INNER JOIN tmpPartArgs AS tpa
        ON tpa.participantID = occp.ParticipantID

WHERE occ.HasSucceeded = 1
  AND (occp.RoleTypeID = 52 OR occp.RoleTypeID = 49 )

GROUP BY
    pg.Description,
    occ.SimulationID,
    occ.SimRunID,
    occ.Period;

答案 1 :(得分:0)

我提供了一个较小的缩小版本的查询。希望这比我早些时候更清晰易读。

SELECT5 * FROM 
(
SELECT4 FROM ParticipantGroup as pg INNER JOIN InitialParticipant AS ip ON pg.ParticipantGroupID = ip.ParticipantGroupID
) AS ij1 INNER JOIN 
(
   SELECT3 * FROM tmpPartArgs AS tpa INNER JOIN 
      (
          SELECT2 * FROM 
              (
                  SELECT1 * FROM tmpSimArgs AS tsa INNER JOIN Occurrence AS occ ON (tsa.SimRunID = occ.SimRunID) AND (tsa.SimulationID = occ.SimulationID)
              ) AS ij4 INNER JOIN OccurrenceParticipant AS occp ON (occOccurrenceID =      occpOccurrenceID) AND (occSimRunID = occpSimRunID) AND (occSimulationID = occpSimulationID)
      ) AS ij3 ON tpa.participantID = ij3.occpParticipantID
) AS ij2 ON ij1.ipParticipantID = ij2.occpParticipantID WHERE (((ij2.RoleTypeID)=52 Or (ij2.RoleTypeID)=49)) AND ij2.HasSucceeded = 1

我正在处理的应用程序是Simulation应用程序,为了理解上述查询的上下文,我认为有必要对应用程序进行简要说明。让我们假设有一个拥有一些初始资源和生活代理的星球。允许行星存在1000年,并且监视代理执行的动作并将其存储在数据库中。 1000年后,地球被摧毁,并再次使用相同的初始资源和生活代理重新创建,这是第一次。这(创建和销毁)重复18次,并且在这1000年期间执行的代理的所有动作都存储在数据库中。因此,我们的整个实验包括18次重新创建,称为“模拟”。地球被重建的18次中的每一次被称为跑步,并且1000年的跑步中的每一次被称为时期。因此,“模拟”包含18次运行,每次运行包含1000个周期。在每次运行开始时,我们为“模拟”分配一组初始知识项和相互交互的动态代理和项目。知识项由知识存储内的代理存储。知识库也被认为是我们的模拟中的参与实体。但这个概念(关于知识库)并不重要。我试图详细说明每个SELECT语句和涉及的表。

SELECT1:我认为这个查询可以只用表'Occurrence'来代替,因为它什么也没做。表Occurrence存储代理在特定“模拟”的每个模拟运行的每个时段中采取的不同操作。通常每个'模拟'由18次运行组成。每次运行包含1000个周期。允许代理在“模拟”中的每次运行的每个时段中执行操作。但是Occurrence表不存储有关执行操作的代理的任何详细信息。 “发生”表可能存储与多个“模拟”相关的数据。

SELECT2:此查询仅返回在每次“模拟”运行的每个时段中执行的操作的详细信息,以及“模拟”的所有参与者的详细信息,如各自的ParticipantID。 OccurrenceParticipant表存储Simulation的每个参与实体的记录,包括代理,知识库,知识项等。

SELECT3:此查询仅返回伪表ij3中由代理和知识项引起的那些记录。 ij3中有关知识项的所有记录都将被过滤掉。

SELECT4:此查询将“描述”字段附加到“InitialParticipant”的每条记录。请注意,“描述”列是整个查询的“输出”列。表InitialParticipant包含每个代理和最初分配给'Simulation'的每个知识项的记录

SELECT5:此最终查询返回伪表ij2中的所有记录,参与实体的RoleType(可以是代理或知识项)为49或52。

答案 2 :(得分:0)

我建议将ij2.RoleTypeID过滤从最外层的查询移到ij3,使用IN而不是OR,并将HasSucceeded查询移动到ij4。