Question

使用RabbitMQ我有一个消耗队列消息的服务，然后根据这些消息下载页面，并在最终消费并存储在数据库中的另一个队列上发布另一个消息。但是我遇到了一个特殊来源的问题，这个来源很慢，导致一切都崩溃了。

消费者的工作方式是它将从队列中取出几条消息（预取计数当前为50）并触发线程进行下载。所以我有一个看起来像这样的函数（注意这可能会同时在多个线程上运行）：

private void ReceivedHandler(object model, BasicDeliverEventArgs ea)
{
      var message = CrawledJobInfo.Deserialize(ea.Body);
      try
      {
           _downloaders.Download(message);   // this is the part that can take some time
           if (queueIsDead)
           {
               return;
           }
           var body = message.Serialize();
           var properties = channel.CreateBasicProperties();
           properties.Persistent = true;

           channel.BasicPublish(exchange: directExchange,
                                routingKey: queueName,
                                basicProperties: properties,
                                body: body);

           channel.BasicAck(ea.DeliveryTag, false);
       }
       catch (Exception ex)
       {
       }
}

大部分时间都可以正常工作，但是如果一个下载源需要很长时间，我会得到一个这样的例外：

System.IO.IOException: Unable to write data to the transport connection: A connection 
attempt failed because the connected party did not properly respond after a period of time, 
or established connection failed because connected host has failed to respond. 
---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond at System.Net.Sockets.Socket.Send(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags) 
at System.Net.Sockets.NetworkStream.Write(Byte[] buffer, Int32 offset, Int32 size) --- End of inner exception stack trace --- 
at System.Net.Sockets.NetworkStream.Write(Byte[] buffer, Int32 offset, Int32 size) at System.IO.BufferedStream.Flush() 
at System.IO.BinaryWriter.Flush() at RabbitMQ.Client.Impl.SocketFrameHandler.WriteFrameSet(IList`1 frames) 
at RabbitMQ.Client.Impl.Command.TransmitAsFrameSet(Int32 channelNumber, Connection connection) at RabbitMQ.Client.Impl.Command.Transmit(Int32 channelNumber, Connection connection) 
at RabbitMQ.Client.Impl.SessionBase.Transmit(Command cmd) 
at RabbitMQ.Client.Impl.ModelBase.ModelSend(MethodBase method, ContentHeaderBase header, Byte[] body) 
at RabbitMQ.Client.Framing.Impl.Model._Private_BasicPublish(String exchange, String routingKey, Boolean mandatory, IBasicProperties basicProperties, Byte[] body) 
at RabbitMQ.Client.Impl.ModelBase.BasicPublish(String exchange, String routingKey, Boolean mandatory, IBasicProperties basicProperties, Byte[] body) 
at RabbitMQ.Client.Impl.AutorecoveringModel.BasicPublish(String exchange, String routingKey, IBasicProperties basicProperties, Byte[] body) 
at JobDownloadLibrary.JobDownloadQueue.ReceivedHandler(Object model, BasicDeliverEventArgs ea) in C:\Users\matt.burland\Documents\Visual Studio 2015\Projects\Downloader\DownloadLibrary\DownloadQueue.cs:line 154

然后似乎导致所有其他正在处理的作业死于RabbitMQ.Client.Exceptions.AlreadyClosedException

所以我的问题是如何避免这种情况？或者至少我如何优雅地清理和重启，以便我的服务继续运行？我认为这可能是心跳的问题，所以我尝试将它缩短到20秒，但这似乎并没有改善这种情况。我还尝试捕获AlreadyClosedExceptions并使用它来断开我收到的处理程序，使队列成为死机并让客户端代码创建一个新的队列实例，但这似乎仍然一团糟。

看着兔子自己的日志，我确实偶尔会看到记忆警报，但它们似乎几乎立刻就清醒了：

=警告报告==== 2016年6月15日:: 10：04：27 ===   在节点rabbit @ CLUST01上设置内存资源限制警报。

*发布商将被阻止，直到此警报清除*

=警告报告==== 2016年6月15日:: 10：04：28 ===   在节点rabbit @ CLUST01
上清除内存资源限制警报
=警告报告==== 2016年6月15日:: 10：04：28 ===   在整个群集中清除内存资源限制警报

Answer 1

派对有点晚了。它是已关闭的连接或会话。我不确定自动恢复是否适用于通道，但据我所知，如果AutomaticRecoveryEnabled等于true，则应该处理连接问题。您可以订阅在销毁频道时触发的ModelShutdown事件，并尝试重新创建频道。我还订阅了在连接被破坏时触发的ConnectionShutdown事件。如果事件没有自动重新创建，您可以尝试重新创建频道/连接/两者。

还要确保不在线程/任务之间共享通道。建议每个线程使用一个通道。

超时导致发布者崩溃

1 个答案: