Find similar Column Names using difference SYNTAX

时间:2018-11-13 14:21:47

标签: php mysql sql

I am working with a database with approximately 100k entries and want to find all similar names in this database that I put within one column. I am now using soundex but the results are way to fuzzy and filtering those fuzzy results in my php makes the process with so many soundex classes and entries in the database very slow so I hope there is another way to filter out better matches than soundex does.

My Query:

SELECT soundex(full_name) AS soundex, 
    full_name AS customer_name
FROM (SELECT CONCAT(cu.first_name,' ', cu.last_name) AS full_name
    FROM `customers` AS cu  
    WHERE cu.`status` = 1) a
ORDER BY soundex(full_name))

So I compare all the names that I put into one column and show them all ordered by soundex. Is there a way to user DIFFERENCE(soundex, soundex) in a perfomatively good way besides cross joining the whole table and compare each and every name with each other? Or is there a good way to sufficiently sort out not very similar names?

1 个答案:

答案 0 :(得分:-1)

全名的soundex可能不是模糊匹配的最佳方法。您是否看过levenshtein函数的实现。如果使用它,则可以获取两个字符串之间的距离,并以此来对最佳匹配进行排序。

请参见以下示例。 Levenshtein distance in T-SQL