在制作中,由于某种原因,我们的delayed_job
进程正在消亡。我不确定它是在崩溃还是被操作系统杀死了。我在delayed_job.log
文件中看不到任何错误。
我该怎么做才能解决这个问题?我正在考虑安装monit来监控它,但这只会告诉我它何时死亡。它不会真的告诉我为什么它会死。
有没有办法让它对日志文件更加健谈,所以我可以告诉它为什么会死?
还有其他建议吗?
答案 0 :(得分:12)
我遇到过remote_job无声失败的两个原因。第一个是当人们在分叉进程中使用libxml时的实际段错误(这在一段时间后会在邮件列表中弹出)。
第二个问题是与delayed_job依赖的1.1.0版本的守护进程有问题(https://github.com/collectiveidea/delayed_job/issues#issue/81),这可以通过使用1.0.10轻松解决,这就是我自己的Gemfile有它。
在delayed_job中有登录,所以如果工作人员在没有打印错误的情况下死亡,通常是因为它没有抛出异常(例如Segfault)或外部正在杀死该进程。
我使用bluepill来监控我的延迟作业实例,到目前为止,这已经非常成功地确保了作业仍在运行。为应用程序运行bluepill的步骤非常简单
将bluepill gem添加到Gemfile:
# Monitoring
gem 'i18n' # Not sure why but it complained I didn't have it
gem 'bluepill'
我创建了一个bluepill配置文件:
app_home = "/home/mi/production"
workers = 5
Bluepill.application("mi_delayed_job", :log_file => "#{app_home}/shared/log/bluepill.log") do |app|
(0...workers).each do |i|
app.process("delayed_job.#{i}") do |process|
process.working_dir = "#{app_home}/current"
process.start_grace_time = 10.seconds
process.stop_grace_time = 10.seconds
process.restart_grace_time = 10.seconds
process.start_command = "cd #{app_home}/current && RAILS_ENV=production ruby script/delayed_job start -i #{i}"
process.stop_command = "cd #{app_home}/current && RAILS_ENV=production ruby script/delayed_job stop -i #{i}"
process.pid_file = "#{app_home}/shared/pids/delayed_job.#{i}.pid"
process.uid = "mi"
process.gid = "mi"
end
end
end
然后在我的capistrano部署文件中,我刚刚添加:
# Bluepill related tasks
after "deploy:update", "bluepill:quit", "bluepill:start"
namespace :bluepill do
desc "Stop processes that bluepill is monitoring and quit bluepill"
task :quit, :roles => [:app] do
run "cd #{current_path} && bundle exec bluepill --no-privileged stop"
run "cd #{current_path} && bundle exec bluepill --no-privileged quit"
end
desc "Load bluepill configuration and start it"
task :start, :roles => [:app] do
run "cd #{current_path} && bundle exec bluepill --no-privileged load /home/mi/production/current/config/delayed_job.bluepill"
end
desc "Prints bluepills monitored processes statuses"
task :status, :roles => [:app] do
run "cd #{current_path} && bundle exec bluepill --no-privileged status"
end
end
希望这有点帮助。
答案 1 :(得分:2)
我遇到此问题的最常见情况是由数据库问题(mysql连接错误等)引起的。默认情况下没有日志。
所以我建议您使用god来控制您的delayed_job(您可以看到它的日志文件!)。
假设您在Rails4中使用delayed_job,您应该:
1.install god gem:$ gem install god
2.有这个脚本文件:
# filename: cache_cleaner.god
RAILS_ROOT = '/sg552/workspace/m-api-cache-cleaner'
God.watch do |w|
w.name = 'cache_cleaner'
w.dir = RAILS_ROOT
w.start = "cd #{RAILS_ROOT} && RAILS_ENV=production bundle exec bin/delayed_job -n 5 start"
w.stop = "cd #{RAILS_ROOT} && RAILS_ENV=production bundle exec bin/delayed_job stop"
w.restart = "cd #{RAILS_ROOT} && RAILS_ENV=production bundle exec bin/delayed_job -n 5 restart"
w.log = "#{RAILS_ROOT}/log/cache_cleaner_stdout.log"
w.pid_file = File.join(RAILS_ROOT, "log/delayed_job.total.pid")
# you should NEVER use this config settings:
# w.keepalive (always comment it out! )
end
3.启动/停止/重启delayed_jobs,从以下命令更改命令:
$ bundle exec bin/delayed_job -n 3 start
到:
$ god -c cache_cleaner.god -D
$ god start/stop/restart cache_cleaner
请参阅我的个人博客:http://siwei.me/blog/posts/using-delayed-job-with-god