请考虑下表:
_____________________
| sentence_word |
|---------|---------|
| sent_id | word_id |
|---------|---------|
| 1 | 1 |
| 1 | 2 |
| ... | ... |
| 2 | 4 |
| 2 | 1 |
| ... | ... |
使用这个表结构我想存储句子的单词。现在我想找出哪个单词和句子中的特定单词一起。结果应如下所示:
_____________________
| word_id | counted |
|---------|---------|
| 5 | 1000 |
| 7 | 800 |
| 3 | 600 |
| 1 | 400 |
| 2 | 100 |
| ... | ... |
查询如下所示:
SELECT
word_id,
COUNT(*) AS counted
FROM sentence_word
WHERE sentence_word.sent_id IN (SELECT
sent_id
FROM sentence_word
WHERE word_id = [desired word]
)
AND word_id != [desired word]
GROUP BY word_id
ORDER BY counted DESC;
查询正常工作,但它始终扫描整个表。我为sent_id和word_id创建了一个索引。您有什么想法来优化它,它不需要一直扫描整个表吗?
答案 0 :(得分:1)
您可以尝试这样的自我加入:
SELECT COUNT(DISTINCT sw1.word_id)
FROM sentence_word sw1
JOIN sentence_word sw2 ON (
sw1.sent_id = sw2.sent_id
AND sw2.word_id = [your word id]
)
WHERE sw1.word_id != [your word id]
或者甚至更好
SELECT COUNT(DISTINCT sw1.word_id)
FROM sentence_word sw1
JOIN sentence_word sw2 ON (
sw1.sent_id = sw2.sent_id
AND sw2.word_id = [your word id]
AND sw2.word_id != sw1.word_id
)