Question

我正在尝试完成以下操作。假设我们有一个包含这些字段的表（ID，内容）

1 |苹果

2 |菠萝

3 |应用

4 |国家

现在，我正在寻找能告诉我所有可能的常见匹配的功能。例如，如果参数为“3”，则该函数将返回出现在多条记录中的3个字符的所有可能字符串。

在这种情况下，我得到“app”，“ppl”，“ple”，“ati”，“tio”，“ion”

如果参数是“4”，我得到：“appl”，“pple”，“atio”，“tion”

如果建议是“5”，我得到：“苹果”，“ation”

如果参数为“6”，则返回nohting。

直到现在，我没有找到完成此任务的功能。

THX！

一些额外信息：我在带有MySQL数据库的PHP脚本中使用它。我真的只是想把大量的字符作为参数，当然还有要搜索的表格。

Answer 1

嗯，这有点难看，但确实很好用。它是通用SQL，可以在任何环境中使用。只需生成一些子串的选择，该子串大于您正在读取的字段的最大长度。将函数中的数字50更改为超出字段长度的数字。它可能会返回一个真正长的查询，但就像我说的那样，它会正常工作。以下是Python中的一个示例：

import sqlite3

c = sqlite3.connect('test.db')

c.execute('create table myTable (id integer, content varchar[50])')
for id, content in ((1,'apple'),(2,'pineapple'),(3,'application'),(4,'nation')):
    c.execute('insert into myTable values (?,?)', [id,content])

c.commit();

def GenerateSQL(substrSize):
    subqueries = ["select substr(content,%i,%i) AS substr, count(*) AS myCount from myTable where length(substr(content,%i,%i))=%i group by substr(content,%i,%i) " % (i,substrSize,i,substrSize,substrSize,i,substrSize)  for i in range(50)]
    sql = 'select substr FROM \n\t(' + '\n\tunion all '.join(subqueries) + ') \nGROUP BY substr HAVING sum(myCount) > 1'
    return sql

print GenerateSQL(3)

print c.execute(GenerateSQL(3)).fetchall()

生成的查询如下所示：

select substr FROM 
    (select substr(content,0,3) AS substr, count(*) AS myCount from myTable where length(substr(content,0,3))=3 group by substr(content,0,3) 
    union all select substr(content,1,3) AS substr, count(*) AS myCount from myTable where length(substr(content,1,3))=3 group by substr(content,1,3) 
    union all select substr(content,2,3) AS substr, count(*) AS myCount from myTable where length(substr(content,2,3))=3 group by substr(content,2,3) 
    union all select substr(content,3,3) AS substr, count(*) AS myCount from myTable where length(substr(content,3,3))=3 group by substr(content,3,3) 
    union all select substr(content,4,3) AS substr, count(*) AS myCount from myTable where length(substr(content,4,3))=3 group by substr(content,4,3) 
    ... ) 
GROUP BY substr HAVING sum(myCount) > 1

它产生的结果是：

[(u'app',), (u'ati',), (u'ion',), (u'nat',), (u'pin',), (u'ple',), (u'ppl',), (u'tio',)]

Answer 2

我很抱歉因为我暂时没有玩过php而且我没有适当的测试环境，但我很快设计了一种在c＃3.5

中执行此操作的方法

伪代码：使用指定长度和字符串的字符串构建一个表。旁边出现的次数。选择计数＆gt; 1：

    static void Main(string[] args)
    {

        string[] data = { "apple", "pinapple", "application", "nation" };
        string[] result = my_func(3,data);

        foreach (string str in result)
        {
            Console.WriteLine(str);
        }
        Console.ReadKey();
    }

    private static string[] my_func(int l, string[] data)
    {
        Dictionary<string,int> dict = new Dictionary<string,int>();
        foreach (string str in data)
        {
            for (int i = 0; i < str.Length - l + 1; i++)
            {
                string part = str.Substring(i, l);
                if (dict.ContainsKey(part))
                {
                    dict[part]++;
                }else {
                    dict.Add(part,1);
                }
            }
        }
        var result = from k in dict.Keys
                where dict[k] > 1
                orderby dict[k] descending
                select k;

        return result.ToArray<string>();
    }

Answer 3

一个显而易见的选择是使用REGEX。我之前没有经验，但这可能对你有所帮助： http://dev.mysql.com/doc/refman/5.1/en/regexp.html

您需要找到一个合适的表达式来匹配您所需的表达式。

MySQL，选择至少X个字符匹配的记录

3 个答案: