带有滚动自联接的SQL Update

时间:2013-06-26 22:33:32

标签: mysql sql-update self-join

我想将年度增长率添加到如下(必要字段)创建的年度行业销售数据表中:

CREATE  TABLE IF NOT EXISTS MarketSizes (
  marketSizeID INT PRIMARY KEY AUTO_INCREMENT ,
  industry INT NOT NULL,
  year INT NOT NULL,
  countryID INT NOT NULL REFERENCES Countries (countryID),
  annualSales DEC(20,2) NULL,
  growthRate DEC(5,2) NULL) 

根据25年,100多个国家和5000多个行业的年度数据,填充/更新growthRate列的最有效方法是什么?是最有效的索引方式(行业,年份,国家ID)?谢谢你的时间!

2 个答案:

答案 0 :(得分:1)

免责声明:这是未经测试的,是出于好奇和一些游戏而产生的。如果你想使用它而不是走“更安全”的路线,请自己判断。欢迎提出意见,如果有人想玩更多,这里是sqlfiddle我使用的。其余的都没有了,但是已经很晚了,所以请不要因为任何错误而投票。

好吧,出于好奇,我发现了一种(hacky)加速更新的方法。除了这个小测试之外,我还没有测试过它:

    create table foo(id int, newid int);
    insert into foo (id) values (1), (2), (3);

    update foo, (select @prev:=0) vars
    set foo.newid = @prev,
    foo.id = if(@prev := id, id, id);

    select * from foo

    | ID | NEWID |
    --------------
    |  1 |     0 |
    |  2 |     1 |
    |  3 |     2 |

但是我对使用上一行信息的select语句有很好的体验。通过使用用户变量,不必使用自联接表(在选择中)。由于您无法同时更新正在读取的表,因此需要使用虚拟表。只是提一下为什么我提出这个答案的原因。所以这就是:

您的更新声明将是

SET @prev = 1; /*this is the value the row should have which has no previous year (or if countryID or industry changed)*/
SET @prevCountry = (SELECT countryID FROM MarketSizes ORDER BY `year`, countryID, industry, marketSizeID LIMIT 1);
SET @prevIndustry = (SELECT industry FROM MarketSizes ORDER BY `year`, countryID, industry, marketSizeID LIMIT 1);

/*also it's important to initialize the variable before-hand, not on the fly like in the example above. Otherwise MySQL complains about a syntax error, because it doesn't support an ORDER BY clause in a multi-table update statement. ORDER BY will be important in the statement!*/

UPDATE MarketSizes
SET growthRate = (annualSales - @prev) / @prev, /*here @prev holds the value of the previous row*/

/*and here come's your "where" clause. If country or industry change reset previousYear value to 1*/
marketSizeID = IF(@prevCountry != countryID OR @prevIndustry != industry, IF(@prev := 1, marketSizeID, marketSizeID), IF(@prev := 1, marketSizeID, marketSizeID)), /*why the convoluted IF()s? see explanation below, things got a bit messed up*/
marketSizeID = IF(@prev := annualSales, marketSizeID , marketSizeID), /*here the value of the current row gets assigned to @prev*/

/*Why the update on marketSizeID? And the IF(this,then,else)? That's the trick. Every other way to assign a new value to our variable @prev results in a syntax error. I just chose the primary key, because it's there. Actually it doesn't matter which column is used here and it might be another performance boost to choose a column which has no index on it (primary key has of course).*/

marketSizeID = IF(@prevCountry := countryID, marketSizeID, marketSizeID),
marketSizeID = IF(@prevIndustry := industry, marketSizeID, marketSizeID)

ORDER BY `year`, countryID, industry, marketSizeID;

答案 1 :(得分:1)

考虑将growRate放在VIEW中:

CREATE VIEW growthRate AS
SELECT
m1.*,
(m1.annualSales - m2.annualSales) / m2.annualSales AS growthRate
FROM
MarketSizes m1
LEFT JOIN MarketSizes m2 ON m1.industry = m2.industry 
                         AND m1.countryID = m2.countryID 
                         AND m2.year = m1.year - 1

在(行业,国家/地区ID)和年份创建索引,它应该足够高性能。