可以说,在PostgreSQL
数据库中,我有一个名为questions
的表。如您在该表中看到的,我有一些与人类相似但与数据库不相似的记录。是否有可能获得与问题清单的90%相似的所有记录?
| QUESTION_ID | QUESTION_TEXT |
|-------------|--------------------------------------------------|
| 1 | What is your favorite movie, cartoon and series? |
| 2 | What is your favorite movie cartoon and series |
| 3 | what is your favorite Movie, Cartoon and Series |
| 4 | Do you like apple? |
| 5 | do you like Apple |
现在我使用下一个代码,该代码仅返回2条记录:
select
*
from
questions
where
question_text in (
'What is your favorite movie, cartoon and series?',
'Do you like apple?'
)
我知道PostgreSQL具有pg_trgm
模块,该模块可通过word_similarity
函数帮助搜索相似性。如何正确将此功能添加到我的请求中?
答案 0 :(得分:1)
您会这样做:
CREATE EXTENSION pg_trgm;
CREATE INDEX ON questions USING gin (question_text gin_trgm_ops).
然后,您可以像这样高效地进行搜索:
SELECT question_id
FROM questions
WHERE question_text % 'What is your favorite movie, cartoon and series?';
%
是“相似性运算符”,可以使用参数pg_trgm.similarity_threshold
设置事物相似的阈值。
有关更多信息,请参见the documentation。