Skip to content

Sidekiq Pro Zero-Downtime Deployment Strategy

This project uses Sidekiq Pro with a zero-downtime deployment strategy that eliminates Sidekiq::Shutdown errors during deployments.

┌─────────────────────────────────────────────────────────────────┐
│ BEFORE DEPLOYMENT STARTS │
│ ↓ │
│ 1. Send TSTP signal (sidekiq:quiet) │
│ - Stops accepting NEW jobs immediately │
│ - Running jobs continue on OLD code │
│ - No interruptions, no Sidekiq::Shutdown errors │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ DURING DEPLOYMENT │
│ ↓ │
│ 2. Deploy new code │
│ - Upload assets │
│ - Run migrations │
│ - Publish new release │
│ - Existing Sidekiq jobs finish on old code (no interruption) │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ AFTER DEPLOYMENT SUCCEEDS │
│ ↓ │
│ 3. Restart Sidekiq (sidekiq:restart) │
│ - Stop old processes (graceful, 60s timeout) │
│ - Start new processes with new code │
│ - Begin accepting jobs again │
└─────────────────────────────────────────────────────────────────┘
T+0s → Deployment starts
T+0s → Send TSTP signal to all Sidekiq processes
T+0s → Sidekiq stops fetching new jobs (queue paused)
T+0-45s → Deploy code (upload, migrate, publish)
T+10s → Running EDI job continues uninterrupted ✅
T+35s → EDI job completes successfully ✅
T+45s → Deployment finished, trigger sidekiq:restart
T+45s → Old Sidekiq processes shutdown gracefully
T+46s → New Sidekiq processes start with new code
T+46s → Queue processing resumes ✅

File: config/deploy.rb

# Sidekiq Pro zero-downtime deployment strategy:
before :starting, 'sidekiq:quiet' # Quiet before deployment starts
after :finished, 'sidekiq:restart' # Restart after deployment succeeds

File: config/initializers/sidekiq.rb

config[:timeout] = 60
  • Gives jobs 60 seconds to complete during graceful shutdown
  • Prevents force-kill of jobs that are almost done
  • Applies to the restart phase (after deployment)

File: config/deploy/templates/sidekiq.service.capistrano.erb

TimeoutStopSec=90
  • Gives systemd 90 seconds to wait for Sidekiq shutdown
  • Must be longer than Sidekiq timeout (60s) + buffer (30s)
  • Prevents systemd from sending SIGKILL prematurely

File: config/deploy.rb

set :sidekiq_roles, :worker
set :sidekiq_default_hooks, false # We control hooks manually
set :sidekiq_timeout, 60 # Matches Sidekiq initializer timeout
  • Before deployment: Jobs stop being queued but running jobs finish
  • During deployment: No jobs are interrupted (they run on old code)
  • After deployment: New jobs run on new code

The old approach (after :finished, 'sidekiq:restart_noblock') would:

  • Let jobs continue during deployment
  • Interrupt them when restarting after deployment
  • Cause Sidekiq::Shutdown exceptions

The new approach:

  • Pauses queue before deployment starts
  • Lets running jobs finish before code changes
  • No interruptions = no errors

The quiet signal (TSTP) is specifically designed for deployments:

  • Instant: Stops fetching new jobs immediately
  • Safe: Doesn’t interrupt running jobs
  • Reversible: If deployment fails, can un-quiet
  • Old jobs always run on old code (no mid-flight code changes)
  • New jobs always run on new code
  • Clear boundary between old and new
Terminal window
# View all Sidekiq tasks
cap production sidekiq -T
# Common tasks
cap production sidekiq:quiet # Stop accepting new jobs (TSTP signal)
cap production sidekiq:restart # Graceful restart (stop + start)
cap production sidekiq:stop # Graceful stop (60s timeout)
cap production sidekiq:start # Start Sidekiq processes
cap production sidekiq:install # Install systemd service
cap production sidekiq:status # Check Sidekiq status
SignalCommandEffectUse Case
TSTPsidekiq:quietStop accepting new jobs, continue running jobsDeployments (before code change)
TERMsidekiq:stopGraceful shutdown (60s timeout)Normal shutdown
INTSame as TERMGraceful shutdownCtrl+C / manual stop
TTINN/APrint thread backtraces to logDebugging hung jobs
KILLForce killImmediate termination (no cleanup)Emergency only

Continue uninterrupted until completion or timeout (60s)

⏸️ Remain queued - will be processed after new Sidekiq starts

⏸️ Remain queued - will be processed after new Sidekiq starts

If you have truly critical jobs that must process immediately:

Option 1: Schedule around deployments

# Deploy during low-traffic periods
# Avoid deploying during critical job windows

Option 2: Run separate “critical” Sidekiq process

config/sidekiq_critical.yml
:concurrency: 2
:queues:
- [critical, 2] # Only critical jobs
# Don't quiet this one during deployments
set :sidekiq_config_files, ['sidekiq.yml'] # Exclude critical

Option 3: Use scheduled jobs instead of immediate

# Instead of perform_async (immediate)
MyWorker.perform_in(5.minutes, args) # Delayed

The quiet phase lasts as long as your deployment takes:

Deployment Duration =
Upload Assets (~10-30s) +
Run Migrations (~5-60s) +
Publish Release (~5s) +
Other Hooks (~10s)
30-105 seconds typical

During this time:

  • ⏸️ New jobs queue up in Redis (not lost)
  • ✅ Running jobs complete
  • 📊 Monitor queue depth in Sidekiq Web UI

Most jobs can wait 30-60 seconds, but if queues grow too large:

Solution 1: Faster deployments

  • Optimize asset compilation (already done with local builds)
  • Use zero-downtime migrations (already common practice)
  • Parallelize upload tasks

Solution 2: Multiple worker servers

# Deploy to servers one at a time (rolling deployment)
# Some workers always available

Solution 3: Pre-quiet strategy

# Quiet 30 seconds before deployment to drain queue
before :starting, 'sidekiq:custom_quiet_and_wait'
task :custom_quiet_and_wait do
invoke 'sidekiq:quiet'
puts "Waiting 30s for queue to drain..."
sleep 30
end
Terminal window
# SSH to production server
ssh deploy@chi-vultr-heatwave-util1
# Check all Sidekiq services are running
systemctl status 'sidekiq*.service' --no-pager
# Check processes are using new code
ps aux | grep sidekiq
# Look for new PID and recent start time
# Check logs for clean restart
journalctl -u sidekiq-heatwave-production-sidekiq -n 50
# Monitor queue in Sidekiq Web UI
# https://crm.warmlyyours.me:3000/sidekiq
# Check for:
# - Queue depth (should drain after restart)
# - No Sidekiq::Shutdown errors in dead jobs
# - Processed jobs resuming

Before this change:

❌ Frequent Sidekiq::Shutdown exceptions
❌ Jobs interrupted during API calls
❌ Incomplete data synchronization

After this change:

✅ No Sidekiq::Shutdown during deployments
✅ Jobs complete or wait in queue
✅ Clean shutdowns only

Symptom: Jobs stuck in queue, not processing

Check:

Terminal window
# Are Sidekiq processes running?
systemctl status 'sidekiq*.service'
# If not running, start them
cap production sidekiq:start
# Check logs
journalctl -u sidekiq-heatwave-production-sidekiq -f

Symptom: Still seeing Sidekiq::Shutdown in Rollbar

Possible causes:

  1. Jobs exceed 60s timeout

    • Solution: Increase timeout or break into smaller jobs
    • See doc/SIDEKIQ_GRACEFUL_SHUTDOWN.md for details
  2. Manual restarts during deployment

    • Check: Are you running cap sidekiq:restart manually?
    • Solution: Let Capistrano handle restarts automatically
  3. Systemd watchdog killing jobs

    • Check: journalctl for “Watchdog timeout”
    • Solution: Increase WatchdogSec in service file

Deployment Hangs at “Quieting Sidekiq”

Section titled “Deployment Hangs at “Quieting Sidekiq””

Symptom: Deployment stuck at sidekiq:quiet task

Check:

Terminal window
# Are Sidekiq processes responding?
ssh deploy@server 'systemctl is-active sidekiq*.service'
# Can you manually quiet?
ssh deploy@server 'systemctl kill -s TSTP sidekiq-heatwave-production-sidekiq.service'

Solution:

  • Increase SSH timeout
  • Check network connectivity
  • Verify systemd is responsive

If a deployment fails or needs rollback:

Terminal window
# Automatic rollback on failure
cap production deploy:rollback
# Sidekiq will restart with previous code version
# Jobs in queue will process with rolled-back code
  1. Always use quiet before deployment (configured automatically)
  2. Let Capistrano manage Sidekiq lifecycle (don’t manual restart)
  3. Keep jobs under 60 seconds when possible
  4. Make jobs idempotent (safe to retry)
  5. Monitor queue depth during deployments
  6. Deploy during low-traffic periods for critical systems
  7. Test deployments in staging with realistic job load

Last Updated: October 10, 2025
Configuration Version: Sidekiq Pro 7.3.x, Rails 7.0.8.7, Capistrano 3.19.2