如何通过网络套接字将音频发送到Nexmo语音

时间:2019-02-27 19:28:36

标签: c# websocket speech-recognition text-to-speech nexmo

我正在尝试在.Net Core 2网络api中使用websockets实现Nexmo的Voice API。

此api需要:
  • 通过Nexmo接收电话中的音频
  • 使用Microsoft Cognitive Speech to text API
  • 将文本发送给机器人
  • 在漫游器的回复上使用Microsoft Cognitive text to speech
  • 通过语音API网络套接字将语音发回nexmo

就目前而言,由于我首先尝试连接到Websocket,因此我绕过了机器人操作步骤。 尝试使用echo方法(将收到的音频发送回websocket)时,它可以正常工作。 但是,当我尝试将语音从Microsoft文本发送到语音时,电话结束。

我没有找到任何实现回声之外的东西的文档。

TextToSpeech和SpeechToText方法在websocket外部使用时按预期工作。

这是带有语音转文字的网络套接字:

public static async Task Echo(HttpContext context, WebSocket webSocket)
    {
        var buffer = new byte[1024 * 4];
        WebSocketReceiveResult result = await webSocket.ReceiveAsync(new ArraySegment<byte>(buffer), CancellationToken.None);
        while (!result.CloseStatus.HasValue)
        {
            while(!result.EndOfMessage)
            {
                result = await webSocket.ReceiveAsync(new ArraySegment<byte>(buffer), CancellationToken.None);
            }
            var text = SpeechToText.RecognizeSpeechFromBytesAsync(buffer).Result;
            Console.WriteLine(text);
        }
        await webSocket.CloseAsync(result.CloseStatus.Value, result.CloseStatusDescription, CancellationToken.None);
    }

这是带有文本转语音功能的网络套接字:

public static async Task Echo(HttpContext context, WebSocket webSocket)
    {
        var buffer = new byte[1024 * 4];
        WebSocketReceiveResult result = await webSocket.ReceiveAsync(new ArraySegment<byte>(buffer), CancellationToken.None);
        while (!result.CloseStatus.HasValue)
        {
            var ttsAudio = await TextToSpeech.TransformTextToSpeechAsync("Hello, this is a test", "en-US");
            await webSocket.SendAsync(new ArraySegment<byte>(ttsAudio, 0, ttsAudio.Length), WebSocketMessageType.Binary, true, CancellationToken.None);

            result = await webSocket.ReceiveAsync(new ArraySegment<byte>(buffer), CancellationToken.None);
        }
        await webSocket.CloseAsync(result.CloseStatus.Value, result.CloseStatusDescription, CancellationToken.None);
    }

更新2019年3月1日

回复 Sam Machin 的评论 我尝试将数组拆分为每个640字节的块(我使用的是16000khz采样率),但是nexmo仍然挂断了电话,并且我仍然听不到任何声音。

public static async Task NexmoTextToSpeech(HttpContext context, WebSocket webSocket)
    {
        var ttsAudio = await TextToSpeech.TransformTextToSpeechAsync("This is a test", "en-US");
        var buffer = new byte[1024 * 4];
        WebSocketReceiveResult result = await webSocket.ReceiveAsync(new ArraySegment<byte>(buffer), CancellationToken.None);

        while (!result.CloseStatus.HasValue)
        {
            await SendSpeech(context, webSocket, ttsAudio);
            result = await webSocket.ReceiveAsync(new ArraySegment<byte>(buffer), CancellationToken.None);
        }
        await webSocket.CloseAsync(WebSocketCloseStatus.NormalClosure, "Closing Socket", CancellationToken.None);
    }

    private static async Task SendSpeech(HttpContext context, WebSocket webSocket, byte[] ttsAudio)
    {
        const int chunkSize = 640;
        var chunkCount = 1;
        var offset = 0;

        var lastFullChunck = ttsAudio.Length < (offset + chunkSize);
        try
        {
            while(!lastFullChunck)
            {
                await webSocket.SendAsync(new ArraySegment<byte>(ttsAudio, offset, chunkSize), WebSocketMessageType.Binary, false, CancellationToken.None);
                offset = chunkSize * chunkCount;
                lastFullChunck = ttsAudio.Length < (offset + chunkSize);
                chunkCount++;
            }

            var lastMessageSize = ttsAudio.Length - offset;
            await webSocket.SendAsync(new ArraySegment<byte>(ttsAudio, offset, lastMessageSize), WebSocketMessageType.Binary, true, CancellationToken.None);
        }
        catch (Exception ex)
        {
        }
    }

以下是有时会出现在日志中的例外情况:

  

System.Net.WebSockets.WebSocketException(0x80004005):远程   一方未完成关闭就关闭了WebSocket连接   握手。

1 个答案:

答案 0 :(得分:2)

看起来您正在将整个音频剪辑写入websocket,Nexmo界面要求音频的每条消息的音频格式必须为20ms帧,这意味着您需要将剪辑分为320或640字节(具体取决于如果您使用的是8Khz或16Khz),则将其分别写入套接字。如果尝试将太大的文件写入套接字,它将如您所见的那样关闭。

有关详细信息,请参见https://developer.nexmo.com/voice/voice-api/guides/websockets#writing-audio-to-the-websocket