Question

我正在尝试将SpeechRecognizer与自定义语法一起使用来处理以下模式：

“你能打开{item}吗？”其中{item}使用DictationGrammar。

我正在使用Vista和.NET 4.0中内置的语音引擎。

我希望能够获得返回的SemanticValues的信心。见下面的例子。

如果我只使用“recognizer.AddGrammar（new DictationGrammar（））”，我可以浏览e.Results.Alternates并查看每个替代的置信度值。如果DictationGrammar位于顶级，则可行。

编写示例：

你能打开Firefox吗？ 0.95
你能打开费尔法克斯吗？ 0.93
你能打开文件传真吗？ 0.72
你可以写Firefox吗？ 0.85
你能固定费尔法克斯吗？ 0.63

但是，如果我构建一个语法来查找“你能打开{semanticValue Key ='item'GrammarBuilder = new DictationGrammar（）}吗？”，那么我明白了：

你能打开Firefox吗？ .91 - Semantics = {GrammarBuilder.Name =“你可以打开”}
你能打开费尔法克斯吗？ .91 - Semantics = {GrammarBuilder.Name =“你可以打开”}
你能打开文件传真吗？ .91 - Semantics = {GrammarBuilder.Name =“你可以打开”}
你可以写Firefox吗？ .85 - Semantics = null
你能固定费尔法克斯吗？ .63 - Semantics = null

.91向我展示了它与“你能打开{item}？”模式相匹配的程度。但是没有进一步区分。

但是，如果我再看看e.Result.Alternates.Semantics.Where（s =＆gt; s.Key ==“item”），并查看他们的信心，我就明白了：

Firefox 1.0
Fairfax 1.0
file fax 1.0

这对我没有多大帮助。

当我查看匹配的SemanticValues的信心时，我真正想要的是这样的事情：

Firefox .95
费尔法克斯.93
file fax .85

好像应该那样工作......

我做错了吗？甚至还有一种方法可以在Speech框架内做到这一点吗？

我希望有一些内置机制，以便我能以“正确”的方式做到这一点。

至于可能工作的另一种方法......

使用SemanticValue方法匹配模式
对于与该模式匹配的任何内容，提取{item}的原始音频（使用RecognitionResult.Words和RecognitionResult.GetAudioForWordRange）
使用DictationGrammar通过SpeechRecognizer运行{item}的原始音频以获得置信度

......但这比我真正想做的更多。

Answer 1

我认为听写语法只能做转录。它在不提取语义的情况下对文本进行语音处理，因为根据定义，听写语法支持所有单词，并且没有任何关于特定语义映射的线索。您需要使用自定义语法来提取语义。如果您提供SRGS语法或使用SpeechServer工具构建SRGS语法，则可以为某些单词和短语指定语义映射。然后识别器可以提取语义，并给你一个语义信任。

您应该能够在识别时从识别器获取置信度值，请尝试System.Speech.Recognition.RecognitionResult.Confidence。

Microsoft Server Speech Platform 10.2 SDK附带的帮助文件包含更多详细信息。（这是用于服务器应用程序的Microsoft.Speech API，它与用于客户端应用程序的System.Speech API非常相似）请参阅（http://www.microsoft.com/downloads/en/details.aspx?FamilyID=1b1604d3-4f66 -4241-9a21-90a294a5c9a4。）或http://msdn.microsoft.com/en-us/library/microsoft.speech.recognition.semanticvalue(v=office.13).aspx上的Microsoft.Speech文档

对于SemanticValue类，它说：

所有基于语音平台的识别   引擎输出提供有效的实例   所有公认的SemanticValue   输出，甚至没有明确的短语   语义结构。

一个的SemanticValue实例   短语是使用语义学获得的   RecognizedPhrase上的属性   对象（或继承自的对象）   它，例如RecognitionResult）。

获取的SemanticValue对象   识别短语没有语义   结构的特点是：

没有孩子（伯爵是0）

Value属性为null。

人为置信度为1.0   （由信心返回）

通常，应用程序会创建   间接的SemanticValue实例，   将它们添加到语法对象中   使用SemanticResultValue和   SemanticResultKey实例   与...，选择和   GrammarBuilder对象。

直接建设   SemanticValue在期间非常有用   创建强类型语法

当您在语法中使用SemanticValue功能时，通常会尝试将不同的短语映射到单个含义。在您的情况下，短语“I.E”或“Internet Explorer”都应映射到相同的语义。您可以在语法中设置选项，以了解可以映射到特定含义的每个短语。这是一个简单的Winform示例：

private void btnTest_Click(object sender, EventArgs e)
{
    SpeechRecognitionEngine myRecognizer = new SpeechRecognitionEngine();

    Grammar testGrammar = CreateTestGrammar();  
    myRecognizer.LoadGrammar(testGrammar);

    // use microphone
    try
    {
        myRecognizer.SetInputToDefaultAudioDevice();
        WriteTextOuput("");
        RecognitionResult result = myRecognizer.Recognize();              

        string item = null;
        float confidence = 0.0F;
        if (result.Semantics.ContainsKey("item"))
        {
            item = result.Semantics["item"].Value.ToString();
            confidence = result.Semantics["item"].Confidence;
            WriteTextOuput(String.Format("Item is '{0}' with confidence {1}.", item, confidence));
        }

    }
    catch (InvalidOperationException exception)
    {
        WriteTextOuput(String.Format("Could not recognize input from default aduio device. Is a microphone or sound card available?\r\n{0} - {1}.", exception.Source, exception.Message));
        myRecognizer.UnloadAllGrammars();
    }

}

private Grammar CreateTestGrammar()
{                        
    // item
    Choices item = new Choices();
    SemanticResultValue itemSRV;
    itemSRV = new SemanticResultValue("I E", "explorer");
    item.Add(itemSRV);
    itemSRV = new SemanticResultValue("explorer", "explorer");
    item.Add(itemSRV);
    itemSRV = new SemanticResultValue("firefox", "firefox");
    item.Add(itemSRV);
    itemSRV = new SemanticResultValue("mozilla", "firefox");
    item.Add(itemSRV);
    itemSRV = new SemanticResultValue("chrome", "chrome");
    item.Add(itemSRV);
    itemSRV = new SemanticResultValue("google chrome", "chrome");
    item.Add(itemSRV);
    SemanticResultKey itemSemKey = new SemanticResultKey("item", item);

    //build the permutations of choices...
    GrammarBuilder gb = new GrammarBuilder();
    gb.Append(itemSemKey);

    //now build the complete pattern...
    GrammarBuilder itemRequest = new GrammarBuilder();
    //pre-amble "[I'd like] a"
    itemRequest.Append(new Choices("Can you open", "Open", "Please open"));

    itemRequest.Append(gb, 0, 1);

    Grammar TestGrammar = new Grammar(itemRequest);
    return TestGrammar;
}

为什么Microsoft语音识别SemanticValue.Confidence值始终为1？

1 个答案: