优化跨多个表的全文搜索

时间:2016-02-10 14:13:22

标签: mysql sql full-text-search query-optimization full-text-indexing

我想在标题&的内容表中搜索请求的字词($ q)。关键字,也适用于模型,它们位于另一个表中,并通过其间的表链接。另外,我需要在另一个表中获取视图的数量。

这是我到目前为止一直在进行的查询,结果很好但是速度太慢(我在PhpMyAdmin中运行它时平均为0.6s ......我们每月有数百万访问者)

SELECT DISTINCT SQL_CALC_FOUND_ROWS
    c.*,
    cv.views,
    (MATCH (c.title) AGAINST ('{$q}') * 3) Relevance1,
    MATCH (c.keywords) AGAINST ('{$q}') Relevance2,
    (MATCH (a.`name`) AGAINST ('{$q}') * 2) Relevance3
FROM
    content AS c
LEFT JOIN
    content_actors AS ca ON ca.content = c.record_num
LEFT JOIN
    actors AS a ON a.record_num = cm.actor
LEFT JOIN
    content_views AS cv ON cv.content = c.record_num
WHERE
    c.enabled = 1
GROUP BY c.title, c.length
HAVING (Relevance1 + Relevance2 + Relevance3) > 0
ORDER BY (Relevance1 + Relevance2 + Relevance3) DESC

表架构如下所示:

content
record_num     title     keywords
1              Video1    Comedy, Action, Supercool
2              Video2    Comet

content_actors
content     model
1           1
1           2
2           1

actors
record_num     name
1              Jennifer Lopez
2              Bruce Willis

content_views
content     views
1           160
2           312

以下是我通过SHOW INDEX FROM tablename:

找到的索引
Table              Column_Name     Seq_in_index     Key_name     Index_type
---------------------------------------------------------------------------
content            record_num      1                PRIMARY      BTREE
content            keywords        1                keywords     FULLTEXT
content            keywords        2                title        FULLTEXT
content            title           1                title        FULLTEXT
content            description     1                description  FULLTEXT
content            keywords        1                keywords_2   FULLTEXT

content_actors     content         1                content      BTREE
content_actors     actor           2                content      BTREE
content_actor      actor           1                actor        BTREE

actors             record_num      1                PRIMARY      BTREE
actors             name            1                name         BTREE
actors             name            1                name_2       FULLTEXT

content_views      content         1                PRIMARY      BTREE
content_views      views           1                views        BTREE

以下是查询的EXPLAIN:

ID     SELECT_TYPE     TABLE     TYPE       POSSIBLE_KEYS          KEY         ROWS      EXTRA
1      SIMPLE          c         ref        enabled_2, enabled     enabled     29210     Using where; Using temporary; Using filesort
1      SIMPLE          ca        ref        content                content     1         Using index
1      SIMPLE          a         eq_ref     PRIMARY                PRIMARY     1
1      SIMPLE          cv        eq_ref     PRIMARY                PRIMARY     1

我正在使用GROUP BY来避免重复内容,但单独使用此组似乎会使处理查询所需的时间加倍。

编辑 在稍微玩了一下查询之后,我意识到如果我删除了GROUP BY,我会得到重复项,如果我让GROUP BY在那里,它就不会采取适当的Relevance3值(没有GROUP BY,一个为Relevance3返回值,而另一个不是......)

1 个答案:

答案 0 :(得分:0)

MATCHes(或者' d加在一起)添加到WHERE - 这将显着减少SQL_CALC_FOUND_ROWS中要处理的行数,并且无需HAVING...

而不是

cv.views,
...
LEFT JOIN  content_views AS cv ON cv.content = c.record_num

DO

( SELECT views FROM content_views ON content = c.record_num ) AS views,

修改

LEFTGROUP BY是必需的,因为actors是可选的,可能有多个actors。既然你根本不需要演员姓名,你可以通过

来摆脱它
WHERE ... AND ( EXISTS SELECT * 
                    FROM content_actors
                    JOIN actors AS a ON ...
                    WHERE MATCH (a.`name`) AGAINST ('{$q}')
                      AND ca...
              )

但是这不允许您在ORDER BY中包含相关性。

因此,您需要使用UNION DISTINCT构建子查询。将有2 SELECTs

SELECT#1:

SELECT c.id,
       3 * MATCH(c.title) AGAINST ('{$q}')
       +   MATCH(c.keywords) AGAINST ('{$q}')  AS relevance
    FROM Content AS c
    WHERE MATCH(c.title, c.keywords) AGAINST ('{$q}')

(并且FULLTEXT(title, keywords)) This will efficiently fetch the ids for内容行有用。

SELECT#2:

SELECT c.id,
       2*MAX(MATCH(a.actor) AGAINST ('{$q}') AS actor_rel) AS relevance
    FROM content AS c
    JOIN content_actors ca  ON ca.content = c.record_num
    JOIN actors a  ON a.record_num = ca.actor
    WHERE MATCH(a.actor) AGAINST ('{$q}')
    GROUP BY c.id;

请务必拥有content_actors: INDEX(actor)content: INDEX(record_num)。此SELECT将有效地从actors开始,然后返回content。请注意,当两个演员MATCH时,它会与您的代码有所不同;希望我的MAX是一个更好的解决方案。

现在,让我们把事情放在一起......

SELECT#3:

SELECT id, SUM(rel) AS relevance
    FROM ( ... select #1 ... )
         UNION ALL
         ( ... select #2 ... )
    GROUP BY id

但这并不是全部...

SELECT#4:

SELECT c.*,
       ( ... views ... ) AS views
    FROM ( ... select #3 ... ) AS u
    JOIN content c  ON c.id = u.id

我建议您手动运行这些步骤以验证它们,逐步将所有部分组合在一起。是的,它很复杂,但应该非常快。