如何从文本文件中提取部分匹配/近似匹配子字符串

时间:2017-09-22 13:20:19

标签: c# string substring lcs

例如:

string1 = 
  "Sherlock Holmes is a fictional private detective created by British author Sir Arthur Conan Doyle. Known as a consulting detective in the stories, Holmes is known for his proficiency with observation."

string2 = 
  "fictional detective created by British author Conan Doyle. Also, known as a consulting detective"

我想从string2中提取与string1的近似匹配。 结果应该是:

"fictional private detective created by British author Sir Arthur Conan Doyle. Known as a consulting detective"

我尝试过的事情:

  1. 将句子拆分成数组并在第一个和最后一个单词之间提取字符串。但是,如果string1
  2. 中不存在第一个或最后一个单词本身,则此解决方案将失败
  3. 使用LCS和Levenshtein距离。但我只能提取一部分字符串。

0 个答案:

没有答案