在PostgreSQL中搜索单词相似性?

时间:2019-06-24 09:09:12

标签: sql postgresql

可以说,在PostgreSQL数据库中,我有一个名为questions的表。如您在该表中看到的,我有一些与人类相似但与数据库不相似的记录。是否有可能获得与问题清单的90%相似的所有记录?

| QUESTION_ID | QUESTION_TEXT                                    |
|-------------|--------------------------------------------------|
| 1           | What is your favorite movie, cartoon and series? |
| 2           | What is your favorite movie cartoon and series   |
| 3           | what is your favorite Movie, Cartoon and Series  |
| 4           | Do you like apple?                               |
| 5           | do you like Apple                                |

现在我使用下一个代码,该代码仅返回2条记录:

select
    *
from
    questions
where
    question_text in (
        'What is your favorite movie, cartoon and series?',
        'Do you like apple?'
    )

我知道PostgreSQL具有pg_trgm模块,该模块可通过word_similarity函数帮助搜索相似性。如何正确将此功能添加到我的请求中?

1 个答案:

答案 0 :(得分:1)

您会这样做:

CREATE EXTENSION pg_trgm;
CREATE INDEX ON questions USING gin (question_text gin_trgm_ops).

然后,您可以像这样高效地进行搜索:

SELECT question_id
FROM questions
WHERE question_text % 'What is your favorite movie, cartoon and series?';

%是“相似性运算符”,可以使用参数pg_trgm.similarity_threshold设置事物相似的阈值。

有关更多信息,请参见the documentation

相关问题