C#从字符串

时间:2016-05-14 14:44:38

标签: c# string split nlp

我应该从saing开始:我不擅长编程,但它极其有趣! 我正在开发类似Siri的程序,我正在尝试实现维基百科功能。要做到这一点,我问一个问题,例如:告诉我有关超人的事情

我需要提取超人或任何其他人可能会从字符串中提出的随机词。这并不难,但真正的问题始于有人问:你能不能告诉我超人我还想提取超人这个词。

这是我之前尝试过的一个例子:

if ((c.Contains("tell me about")) || (c.Contains("Tell me about")))
{
    string query = c;
    var part = query.Split('t').Last(); //cant search for words containing the letter t like artificial intelligence

    string url = ("http://lookup.dbpedia.org/api/search.asmx/KeywordSearch?QueryString=" + part + "&MaxHits=1");

    XmlReader reader = XmlReader.Create(url);
    while (reader.Read())
        switch (reader.Name.ToString())
        {
            case "Description":
                sp(reader.ReadString());
                break;

        }
}

我几乎能够解决问题,看起来这个解决方案大约有80%的时间可以解决。然而,这是朝着正确方向迈出的一步。

     if ((c.Contains("tell me about")) || (c.Contains("Tell me about")))
        {
            string query = c;
            string[] lines = Regex.Split(query, "about ");
            foreach (string line in lines)
            {

            string url = ("http://lookup.dbpedia.org/api/search.asmx/KeywordSearch?QueryString=" + line + "&MaxHits=1");

                XmlReader reader = XmlReader.Create(url);
                while (reader.Read())

                    switch (reader.Name.ToString())
                    {
                        case "Description":
                            sp(reader.ReadString());
                            break;

                    }
            }

有更好/更简单的方法吗?

2 个答案:

答案 0 :(得分:2)

正如评论中所建议的那样,对于任何类型的生产应用程序,最好的选择是使用一些现有的库。

自己做这件事仍然是一项有趣的练习。

我想说还有很多方法可以询问超人。

"what do you know about Superman"
"let's talk about Superman"
"who is Superman"

还有更多。

所有问题都来自一些辅助词:“what”,“who”,“a”,“about”,以及描述问题主题的实际词:“超人”。 简化的方法是消除所有辅助设备,并采取任何剩余物。

快速构建我使用English grammar site的问题单词和问题短语的简单列表。我接受了这些短语,并删除了问题的主题。这给了我列表中50-60个辅助词的列表。

现在我所做的就是取出句子并删除辅助列表中的所有单词。代码如下:

class Program
{
    // All the words collected from the sample question phrases.
    private static string auxStr = @"Who is the Who are Who is that there Where is the Where do you Where are my 
        When do the When is his When are we Why do we Why are they always Why does he What is What is her What is the Which 
        drink did you Which Which is How do you How does he know the answer How can I learn many much often far tell say 
        explain answer for from with about on me he his him her hers your yours they theyr theyrs";

    private static List<string> aux = new List<string>();

    static void Main(string[] args)
    {
        // Build a list of auxiliary words.
        aux = auxStr.ToLower().Split(' ').Distinct().ToList();

        // Test the method to get a subject.
        var subject = GetSubject("Do you know where is Poland", aux);

        foreach(var s in subject)
        {
            Console.WriteLine(s);
        }

        Console.ReadLine();
    }

    private static List<string> GetSubject(string question, List<string> auxiliaries)
    {
        // Convert the question to a list of strings
        var listQuestion = question.ToLower().Split(' ').Distinct().ToList();

        // Remove from the question all the words 
        // that are in the list of auxiliary phrases
        var notAux = listQuestion.Where(w => !auxiliaries.Contains(w)).ToList();

        return notAux;
    }
}

这是相当简单的,但没有努力,它缩小了问题的潜在主题列表。

答案 1 :(得分:0)

我终于找到了答案:

dexOptions {
   javaMaxHeapSize "2g"
}

它现在100%有效! 如果有人知道更好的方法,我会非常高兴听到。

相关问题