超时导致发布者崩溃

时间:2016-06-15 13:17:48

标签: c# rabbitmq

使用RabbitMQ我有一个消耗队列消息的服务,然后根据这些消息下载页面,并在最终消费并存储在数据库中的另一个队列上发布另一个消息。但是我遇到了一个特殊来源的问题,这个来源很慢,导致一切都崩溃了。

消费者的工作方式是它将从队列中取出几条消息(预取计数当前为50)并触发线程进行下载。所以我有一个看起来像这样的函数(注意这可能会同时在多个线程上运行):

private void ReceivedHandler(object model, BasicDeliverEventArgs ea)
{
      var message = CrawledJobInfo.Deserialize(ea.Body);
      try
      {
           _downloaders.Download(message);   // this is the part that can take some time
           if (queueIsDead)
           {
               return;
           }
           var body = message.Serialize();
           var properties = channel.CreateBasicProperties();
           properties.Persistent = true;

           channel.BasicPublish(exchange: directExchange,
                                routingKey: queueName,
                                basicProperties: properties,
                                body: body);

           channel.BasicAck(ea.DeliveryTag, false);
       }
       catch (Exception ex)
       {
       }
}

大部分时间都可以正常工作,但是如果一个下载源需要很长时间,我会得到一个这样的例外:

System.IO.IOException: Unable to write data to the transport connection: A connection 
attempt failed because the connected party did not properly respond after a period of time, 
or established connection failed because connected host has failed to respond. 
---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond at System.Net.Sockets.Socket.Send(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags) 
at System.Net.Sockets.NetworkStream.Write(Byte[] buffer, Int32 offset, Int32 size) --- End of inner exception stack trace --- 
at System.Net.Sockets.NetworkStream.Write(Byte[] buffer, Int32 offset, Int32 size) at System.IO.BufferedStream.Flush() 
at System.IO.BinaryWriter.Flush() at RabbitMQ.Client.Impl.SocketFrameHandler.WriteFrameSet(IList`1 frames) 
at RabbitMQ.Client.Impl.Command.TransmitAsFrameSet(Int32 channelNumber, Connection connection) at RabbitMQ.Client.Impl.Command.Transmit(Int32 channelNumber, Connection connection) 
at RabbitMQ.Client.Impl.SessionBase.Transmit(Command cmd) 
at RabbitMQ.Client.Impl.ModelBase.ModelSend(MethodBase method, ContentHeaderBase header, Byte[] body) 
at RabbitMQ.Client.Framing.Impl.Model._Private_BasicPublish(String exchange, String routingKey, Boolean mandatory, IBasicProperties basicProperties, Byte[] body) 
at RabbitMQ.Client.Impl.ModelBase.BasicPublish(String exchange, String routingKey, Boolean mandatory, IBasicProperties basicProperties, Byte[] body) 
at RabbitMQ.Client.Impl.AutorecoveringModel.BasicPublish(String exchange, String routingKey, IBasicProperties basicProperties, Byte[] body) 
at JobDownloadLibrary.JobDownloadQueue.ReceivedHandler(Object model, BasicDeliverEventArgs ea) in C:\Users\matt.burland\Documents\Visual Studio 2015\Projects\Downloader\DownloadLibrary\DownloadQueue.cs:line 154

然后似乎导致所有其他正在处理的作业死于RabbitMQ.Client.Exceptions.AlreadyClosedException

所以我的问题是如何避免这种情况?或者至少我如何优雅地清理和重启,以便我的服务继续运行?我认为这可能是心跳的问题,所以我尝试将它缩短到20秒,但这似乎并没有改善这种情况。我还尝试捕获AlreadyClosedExceptions并使用它来断开我收到的处理程序,使队列成为死机并让客户端代码创建一个新的队列实例,但这似乎仍然一团糟。

看着兔子自己的日志,我确实偶尔会看到记忆警报,但它们似乎几乎立刻就清醒了:

  

=警告报告==== 2016年6月15日:: 10:04:27 ===   在节点rabbit @ CLUST01上设置内存资源限制警报。

           

*发布商将被阻止,直到此警报清除*

           

=警告报告==== 2016年6月15日:: 10:04:28 ===   在节点rabbit @ CLUST01

上清除内存资源限制警报      

=警告报告==== 2016年6月15日:: 10:04:28 ===   在整个群集中清除内存资源限制警报

1 个答案:

答案 0 :(得分:0)

派对有点晚了。它是已关闭的连接或会话。我不确定自动恢复是否适用于通道,但据我所知,如果AutomaticRecoveryEnabled等于true,则应该处理连接问题。您可以订阅在销毁频道时触发的ModelShutdown事件,并尝试重新创建频道。我还订阅了在连接被破坏时触发的ConnectionShutdown事件。如果事件没有自动重新创建,您可以尝试重新创建频道/连接/两者。

还要确保不在线程/任务之间共享通道。建议每个线程使用一个通道。