建议如何使此查询执行良好

时间:2013-12-14 17:05:33

标签: sql postgresql psql

我在postgress 9.2.0中有一个类似于

的模式
CREATE TABLE emails
(
  id serial NOT NULL,
  subject text,
  body text,
  CONSTRAINT emails_pkey PRIMARY KEY (id)
)

CREATE TABLE email_participants
(
  id serial NOT NULL,
  kind text NOT NULL,
  email_id integer NOT NULL,
  CONSTRAINT email_participants_pkey PRIMARY KEY (id),
  CONSTRAINT email_participants_email_id_fkey FOREIGN KEY (email_id)
  REFERENCES emails (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE CASCADE
)

CREATE TABLE todos
(
  id serial NOT NULL,
  description text,
  reference_email_id integer,
  CONSTRAINT todos_pkey PRIMARY KEY (id),
  CONSTRAINT todos_reference_email_id_fkey FOREIGN KEY (reference_email_id)
  REFERENCES emails (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE CASCADE
)

CREATE INDEX todos_reference_email_id_index
  ON todos
  USING btree
  (reference_email_id);

CREATE TABLE calls
(
  id serial NOT NULL,
  description text,
  reference_email_id integer,
  CONSTRAINT calls_pkey PRIMARY KEY (id),
  CONSTRAINT calls_reference_email_id_fkey FOREIGN KEY (reference_email_id)
  REFERENCES emails (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE CASCADE
)

CREATE INDEX calls_reference_email_id_index
  ON calls
  USING btree
  (reference_email_id);

CREATE TABLE meetings
(
  id serial NOT NULL,
  description text,
  reference_email_id integer,
  CONSTRAINT meetings_pkey PRIMARY KEY (id),
  CONSTRAINT meetings_reference_email_id_fkey FOREIGN KEY (reference_email_id)
  REFERENCES emails (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE CASCADE
)

CREATE INDEX meetings_reference_email_id_index
  ON meetings
  USING btree
  (reference_email_id);

CREATE TABLE attachments
(
  id serial NOT NULL,
  description text,
  reference_email_id integer,
  CONSTRAINT attachments_pkey PRIMARY KEY (id),
  CONSTRAINT attachments_reference_email_id_fkey FOREIGN KEY (reference_email_id)
  REFERENCES emails (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE CASCADE
)

CREATE INDEX attachments_reference_email_id_index
  ON attachments
  USING btree
  (reference_email_id);

上面的所有email_id列都有外键约束。

还有其他表引用了电子邮件表,但您可以得到一般的想法。

我需要选择所有电子邮件以及email_participants,todos,电话,会议,附件中任何引用行的计数或ID

因此,最让人想到的是在email_participants上进行内部联接,在其他表上进行左外部联接:

SELECT * FROM "emails" e INNER JOIN "email_participants" ep
ON ep.email_id = e.id
LEFT JOIN TODOS t
on e.id = t.reference_email_id
LEFT JOIN Calls c
on e.id = c.reference_email_id
LEFT JOIN meetings m
on e.id = m.reference_email_id
LEFT JOIN Attachments at
on e.id = at.reference_email_id
WHERE ("user_id" = 1)

如果我使用说明,我会得到以下查询计划,我恐怕不太明白:

"Hash Right Join  (cost=51.11..68.16 rows=123 width=1047)"
"  Hash Cond: (t.reference_email_id = e.id)"
"  ->  Seq Scan on todos t  (cost=0.00..14.30 rows=430 width=157)"
"  ->  Hash  (cost=50.44..50.44 rows=53 width=890)"
"        ->  Nested Loop Left Join  (cost=23.06..50.44 rows=53 width=890)"
"              ->  Nested Loop Left Join  (cost=23.06..41.78 rows=15 width=797)"
"                    ->  Nested Loop Left Join  (cost=23.06..37.78 rows=7 width=645)"
"                          ->  Hash Join  (cost=23.06..35.58 rows=4 width=458)"
"                                Hash Cond: (e.id = ep.email_id)"
"                                ->  Seq Scan on emails e  (cost=0.00..11.80 rows=180 width=410)"
"                                ->  Hash  (cost=23.00..23.00 rows=5 width=48)"
"                                      ->  Seq Scan on email_participants ep  (cost=0.00..23.00 rows=5 width=48)"
"                                            Filter: (user_id = 1)"
"                          ->  Index Scan using meetings_reference_email_id_index on meetings m  (cost=0.00..0.53 rows=2 width=187)"
"                                Index Cond: (e.id = reference_email_id)"
"                    ->  Index Scan using attachments_reference_email_id_index on attachments at  (cost=0.00..0.55 rows=2 width=152)"
"                          Index Cond: (e.id = reference_email_id)"
"              ->  Index Scan using calls_reference_email_id_index on calls c  (cost=0.00..0.55 rows=3 width=93)"
"                    Index Cond: (e.id = reference_email_id)"

这个sql需要是我能够做到的最高效的,还有什么我可以做的更快或者避免所有这些左连接?这些连接表有很多。

创建视图会使这更好吗?如果是这样,任何人都可以就创建这样的视图提出任何建议吗?

1 个答案:

答案 0 :(得分:1)

如果将父记录加入到不同表中的多个子记录中,则会出现表A中的10个子记录和表B中的20个子记录在最终结果中产生200条记录的问题。

你可能会用这样的计数做得更好:

create view ...
select ...,
       (select count(*) from  child_table_1
                        where foreign_key = parent_key) child_1_count,
       (select count(*) from  child_table_2
                        where foreign_key = parent_key) child_2_count,
       ...
from   parent_table
where  user_id = 1

编辑:这样做的另一个好处是,当针对此视图运行省略子计数列的查询时,优化器会避免包含该代码路径。

另一个编辑:要返回id,它们确实需要作为单独的查询返回,但你可以尝试使用字符串转换的数组聚合来返回应用程序的id列表 - 否则你最好用一个多个查询之间的UNION ALL(每个子表一个),或者实际上每个子表一个查询。

(select array_to_string(array_agg(reference_email_id), ',')
   from child_table_2
  where foreign_key = parent_key) child_2_id_list,