使用levenshtein搜索多个单词

时间:2013-02-03 13:31:34

标签: php search replace levenshtein-distance

levenshtein搜索是否有可能针对数组检查搜索查询中的所有单词?

代码如下:

        $input = $query;

    // array of words to check against
    $words  = $somearray;

    // no shortest distance found, yet
    $shortest = -1;

    // loop through words to find the closest
    foreach ($words as $word) {

        // calculate the distance between the input word,
        // and the current word
        $lev = levenshtein($input, $word);

        // check for an exact match
        if ($lev == 0) {

            // closest word is this one (exact match)
            $closest = $word;
            $shortest = 0;

            // break out of the loop; we've found an exact match
            break;
        }

        // if this distance is less than the next found shortest
        // distance, OR if a next shortest word has not yet been found
        if ($lev <= $shortest || $shortest < 0) {
            // set the closest match, and shortest distance
            $closest  = $word;
            $shortest = $lev;
        }
    }

            if ($shortest == 0) {
      echo "Exact match found: $closest\n";
       } else {
         echo "Did you mean: $closest?\n";
        }

在这一个中,它可能只考虑第一个单词或整个句子作为与数组匹配的字符串。如何才能获得结果并用纠正后的单词显示整个句子?

1 个答案:

答案 0 :(得分:0)

根据我现在从你的问题中理解的内容,首先你需要将句子分成单词,例如: How can I convert a sentence to an array of words?

之后,您可以将每个单词与您的字典进行比较,方法是循环遍历第一个数组,然后通过第二个数组进行比较,例如:

foreach ($words as $word)
{
    $min_distance = strlen($word); // use mb_strlen() for non-Latin
    foreach ($dictionary as $new_word)
    {
        $dist = levenshtein($word, $new_word);
        if (($dist < $min_distance) and ($dist > -1))
        {
            $min_distance = $dist;
            $suggestion = $new_word;
        }
    }
}

然后,如果距离大于0,建议$suggestion

注意这实际上非常低效!假设levinshtein()在O(1)处运行,它在Θ(n * m)处运行,因为您需要为每个单词循环遍历整个字典。您可能想要从概念的角度来了解这些事物在现实生活中是如何设计的,或者至少为较长的单词提供建议并循环通过字典中更相关的部分。

相关问题