Question

我在2个数据库中测试了以下查询，结构完全相同，在第一个数据库中有4M条目，它在33秒内返回结果。第二个表有29M行，自从我执行查询以来已经有16个小时了，我还没有回复。

SELECT sbvpip*4 as smallbvpip,btnvpip*4 as buttonvpip, sum(amt_won)*400/count(*) AS winrate, count(*) as count

FROM holdem_hand_player_statistics

    JOIN (

    SELECT id_player AS pid2, id_hand AS hid, sbvpip
    FROM holdem_hand_player_statistics

        JOIN (
        SELECT id_player AS pid, ROUND(avg(flg_vpip::int)*25) AS sbvpip
        FROM holdem_hand_player_statistics
        WHERE position = 8 AND cnt_players = 6
        GROUP BY id_player
        ) AS auxtable
        ON pid = id_player

    WHERE position = 8 AND cnt_players = 6
    ) AS auxtable2
    ON hid = id_hand


    JOIN (

    SELECT id_player AS pid4, id_hand AS hid2, btnvpip
    FROM holdem_hand_player_statistics

        JOIN (
        SELECT id_player AS pid3, ROUND(avg(flg_vpip::int)*25) AS btnvpip
        FROM holdem_hand_player_statistics
        WHERE position = 0 AND cnt_players = 6
        GROUP BY id_player
        ) AS auxtable3
        ON pid3 = id_player

    WHERE position = 0 AND cnt_players = 6
    ) AS auxtable4
    ON hid2 = id_hand


WHERE POSITION = 0 and cnt_players = 6



GROUP BY sbvpip,btnvpip
ORDER BY 1,2;

如何才能让此查询执行得更快？

该表可能已损坏或类似吗？一个表只比另一个表大7~8倍，但处理时间要多15000倍，这是正常的吗？

欢迎任何其他评论！

如果我的英语不清楚，请告诉我，我会尝试以不同的方式表达自己。

非常感谢您的帮助，

附加信息：

根据我使用的变量，其中3个是索引：id_hand，id_player，position。主键是（id_hand，id_player）。该表共有129列和6个索引。

我也在两个表中运行了EXPLAIN，但得到了不同的结果。这俩结果在gdocs电子表格中： https://spreadsheets.google.com/ccc?key=tGxqxVNzHYznb1VVjtKyAuw&authkey=CJ-BiYkN&authkey=CJ-BiYkN#gid=0

Answer 1

我建议在其中一台服务器上建立索引是不存在的或不正确的。

还可能阻止查询完成。特别是如果有一个未提交的交易坐在那里。

Answer 2

可能你会为更多的行使用更多的排序内存：你的work_mem设置是什么？与buffercache类似，因为您多次扫描同一个表，所以将行装入缓存可能是至关重要的。

此外，您应该重新检查该查询，并尝试找到无需多次将统计信息表重新连接到自身的方法。如果没有至少一些小的测试数据和预期的输出，很难建议。您使用的是哪个版本的PostgreSQL？使用8.4，您可能至少可以从单个CTE获得auxtable和auxtable3 ......

Answer 3

查询看起来很好。提高性能尝试像@HLGEM那样做索引。还尝试执行每个单独的子查询，以查看哪个子查询性能较低。

Answer 4

我很容易相信这些查询需要更长的时间。您有一个29M行表，您正在执行多个组并在不同列上多次链接回自身。如果整个表不适合内存，可能会涉及很多涉及行的1/7不需要的分页。向内工作，你是：

从位置= 0的29M行表和cnt_players = 6
两次链接回id_hand列上的29M行表
为cnt_players = 6和位置0和8过滤29M行表两次并按玩家计算平均值flg_vpip
链接到数百万行的id_hand上的分组结果

你可以将表分成不同的表吗？你的字段究竟是什么意思，样本手的样子是什么？

至少需要id_player，id_hand，position和cnt_players的索引。

在索引中包含所有字段可能会很好。我不确定postgresql，但如果查询所需的所有数据都在索引中，SQL Server可以跳过加载实际的表数据页面。所以，如果你有一个位置索引，cnt_players，id_player和flg_vpip，你最内层的选择可能要快得多。

如果您不打算经常运行查询，我认为更好的方法是提前计算这些内部选择到一个或两个表。

select id_player, position, cnt_players,
    ROUND(avg(flg_vpip::int)*25) AS avg_vpip
into auxtable
from holdem oldem
group by id_player, position, cnt_players

alter table auxtable add constraint PK_auxtable 
    primary key clustered (id_player, position, cnt_players)

像这样：

SELECT sbvpip*4 as smallbvpip,btnvpip*4 as buttonvpip, sum(amt_won)*400/count(*) AS winrate, count(*) as count
FROM holdem
    JOIN (
        SELECT id_player AS pid2, id_hand AS hid, sbvpip
        FROM holdem
            JOIN auxtable ON auxtable.id_payer = holdem.id_player 
                and auxtable.position = holdem.position
                and auxtable.cnt_players = holdem.cnt_players
        WHERE holdem.position = 8 AND holdem.cnt_players = 6
    ) AS auxtable2 ON hid = id_hand

查询优化

4 个答案: