使用巨大的IN语句帮助优化此查询

时间:2009-11-20 14:52:45

标签: sql-server optimization

我有一个使用条件检查NOT IN的插入。 NOT IN子查询中有大约230k行。

INSERT INTO Validate.ItemError (ItemId, ErrorId, DateCreated) 
(
    SELECT ItemId, 10, GetUTCDate() 
    FROM Validate.Item 
    INNER JOIN Refresh.Company 
    ON Validate.Item.IMCompanyId = Refresh.Company.IMCompanyId 
    WHERE Refresh.Company.CompanyId = 14 
    AND 
    (
        IMAccountId IS NULL OR NOT IMAccountId IN
        (
            SELECT RA.IMAccountId 
            FROM Refresh.Account RA 
            INNER JOIN Refresh.BalancePool BP 
            ON RA.BalancePoolId = BP.BalancePoolId 
            WHERE BP.CompanyId = 14
        )
    )
)

当我按原样运行时,大约需要30多分钟(哎呀!)。 Validate.Item表中的值的数量可以是从150行到超过200k的任何值,因此您可以看到这可能是一种痛苦。

表格中的所有相关字段都有索引,没有一个是过分的。

我的第一个想法是将它分成几部分,然后将它放入WHILE循环中:

DECLARE @StartId int, @EndId int, @MaxId int

SELECT @MaxId = MAX(AccountId) FROM Refresh.Account
SET @StartId = 1
SET @EndId = 1000

WHILE (@StartId < @MaxId)
BEGIN
    INSERT INTO Validate.ItemError (ItemId, ErrorId, DateCreated) 
    (
        SELECT ItemId, 10, GetUTCDate() 
        FROM Validate.Item 
        INNER JOIN Refresh.Company 
        ON Validate.Item.IMCompanyId = Refresh.Company.IMCompanyId 
        WHERE Refresh.Company.CompanyId = 14 
        AND 
        (
            IMAccountId IS NULL  
            OR NOT IMAccountId IN  
            (
                SELECT RA.IMAccountId 
                FROM Refresh.Account RA 
                INNER JOIN Refresh.BalancePool BP 
                ON RA.BalancePoolId = BP.BalancePoolId 
                WHERE BP.CompanyId = 14 
                AND RA.AccountId BETWEEN @StartId AND @EndId
            )
        )
    )
    SET @StartId = @StartId + 1000
    SET @EndId = @EndId + 1000
END

这样做可以让我每次循环约一分钟的时间;乘以230倍,我们有一个更荒谬的数字。

请告诉我你们有更好的想法如何优化它。没有这一个查询,整个过程只需要8秒;它只是Refresh.Account表的绝对大小,它会把所有东西都抛到一片混乱之中。

TIA!

武神

4 个答案:

答案 0 :(得分:2)

摆脱OR条件。

它添加了一个fullscan并阻止优化器使用它将使用的ANTI JOIN

此查询返回相同的内容:

SELECT  ItemId, 10, GetUTCDate() 
FROM    Validate.Item 
INNER JOIN
        Refresh.Company 
ON      Validate.Item.IMCompanyId = Refresh.Company.IMCompanyId 
WHERE   Refresh.Company.CompanyId = 14 
        AND NOT EXISTS
        (
        SELECT  RA.IMAccountId 
        FROM    Refresh.Account RA 
        INNER JOIN
                Refresh.BalancePool BP 
        ON      RA.BalancePoolId = BP.BalancePoolId 
        WHERE   BP.CompanyId = 14
                AND RA.IMAccounID = Validate.Item.IMAccountId
        )

答案 1 :(得分:1)

改为使用NOT EXISTS:

...OR  NOT EXISTS (SELECT 1 FROM 
Refresh.Account RA INNER JOIN Refresh.BalancePool BP 
ON RA.BalancePoolId = BP.BalancePoolId WHERE BP.CompanyId = 14 AND RA.IMAccountId = xxx.IMAccountId)))

EXISTS后面的子查询只返回满足条件的第一条记录。 (请记住将xxx替换为右表的别名)

答案 2 :(得分:1)

您可以只是对相关表进行左连接并检查空键,而不是“不在”中吗?不确定查询是否100%正确:

INSERT INTO Validate.ItemError (ItemId, ErrorId, DateCreated) 
SELECT ItemId, 10, GetUTCDate() 
FROM Validate.Item 
INNER JOIN Refresh.Company ON Validate.Item.IMCompanyId = Refresh.Company.IMCompanyId 
LEFT JOIN Refresh.Account
  INNER JOIN Refresh.BalancePool BP ON BP.BalancePoolId = RA.BalancePoolId
ON Refresh.Account.IMAccountId = Validate.Item.IMAccountId 
WHERE Refresh.Company.CompanyId = 14 
AND Validate.Item.IMAccountId IS NULL OR Refresh.Account.IMAccountId IS NULL

答案 3 :(得分:0)

在这里使用NOT EXISTS有帮助吗?

(SELECT ItemId, 10, GetUTCDate() 
FROM Validate.Item INNER JOIN Refresh.Company ON 
Validate.Item.IMCompanyId = Refresh.Company.IMCompanyId 
WHERE Refresh.Company.CompanyId = 14 
AND (IMAccountId IS NULL  OR  NOT EXISTS (SELECT TOP 1 RA.IMAccountId FROM 
Refresh.Account RA INNER JOIN Refresh.BalancePool BP 
ON RA.BalancePoolId = BP.BalancePoolId WHERE BP.CompanyId = 14 AND 
RA.IMAcccountID = Validate.Item.IMAccountId)))

我不确定,如果查询是正确的。

但是,我在子查询中使用NOT EXISTSTOP 1 此外,子查询通过添加额外的AND RA.IMAcccountID = Validate.Item.IMAccountId来限制记录。

编辑:我希望你能理解我想要做的事情 我没有对Refresh.Account中的所有行进行检查,而是限制行并尝试查找至少1个匹配的行,并且匹配的IMAccountID - 根据您的原始查询(使用NOT IN ...)不应该存在。< / p>