Sidekiq Pro Zero-Downtime Deployment Strategy
Overview
Section titled “Overview”This project uses Sidekiq Pro with a zero-downtime deployment strategy that eliminates Sidekiq::Shutdown errors during deployments.
How It Works
Section titled “How It Works”Deployment Flow
Section titled “Deployment Flow”┌─────────────────────────────────────────────────────────────────┐│ BEFORE DEPLOYMENT STARTS ││ ↓ ││ 1. Send TSTP signal (sidekiq:quiet) ││ - Stops accepting NEW jobs immediately ││ - Running jobs continue on OLD code ││ - No interruptions, no Sidekiq::Shutdown errors │└─────────────────────────────────────────────────────────────────┘ ↓┌─────────────────────────────────────────────────────────────────┐│ DURING DEPLOYMENT ││ ↓ ││ 2. Deploy new code ││ - Upload assets ││ - Run migrations ││ - Publish new release ││ - Existing Sidekiq jobs finish on old code (no interruption) │└─────────────────────────────────────────────────────────────────┘ ↓┌─────────────────────────────────────────────────────────────────┐│ AFTER DEPLOYMENT SUCCEEDS ││ ↓ ││ 3. Restart Sidekiq (sidekiq:restart) ││ - Stop old processes (graceful, 60s timeout) ││ - Start new processes with new code ││ - Begin accepting jobs again │└─────────────────────────────────────────────────────────────────┘Timeline Example
Section titled “Timeline Example”T+0s → Deployment startsT+0s → Send TSTP signal to all Sidekiq processesT+0s → Sidekiq stops fetching new jobs (queue paused)T+0-45s → Deploy code (upload, migrate, publish)T+10s → Running EDI job continues uninterrupted ✅T+35s → EDI job completes successfully ✅T+45s → Deployment finished, trigger sidekiq:restartT+45s → Old Sidekiq processes shutdown gracefullyT+46s → New Sidekiq processes start with new codeT+46s → Queue processing resumes ✅Configuration
Section titled “Configuration”Capistrano Deploy Configuration
Section titled “Capistrano Deploy Configuration”File: config/deploy.rb
# Sidekiq Pro zero-downtime deployment strategy:before :starting, 'sidekiq:quiet' # Quiet before deployment startsafter :finished, 'sidekiq:restart' # Restart after deployment succeedsSidekiq Timeout Configuration
Section titled “Sidekiq Timeout Configuration”File: config/initializers/sidekiq.rb
config[:timeout] = 60- Gives jobs 60 seconds to complete during graceful shutdown
- Prevents force-kill of jobs that are almost done
- Applies to the restart phase (after deployment)
Systemd Service Configuration
Section titled “Systemd Service Configuration”File: config/deploy/templates/sidekiq.service.capistrano.erb
TimeoutStopSec=90- Gives systemd 90 seconds to wait for Sidekiq shutdown
- Must be longer than Sidekiq timeout (60s) + buffer (30s)
- Prevents systemd from sending SIGKILL prematurely
Capistrano Sidekiq Settings
Section titled “Capistrano Sidekiq Settings”File: config/deploy.rb
set :sidekiq_roles, :workerset :sidekiq_default_hooks, false # We control hooks manuallyset :sidekiq_timeout, 60 # Matches Sidekiq initializer timeoutBenefits of This Approach
Section titled “Benefits of This Approach”✅ Zero Job Interruptions
Section titled “✅ Zero Job Interruptions”- Before deployment: Jobs stop being queued but running jobs finish
- During deployment: No jobs are interrupted (they run on old code)
- After deployment: New jobs run on new code
✅ No Sidekiq::Shutdown Errors
Section titled “✅ No Sidekiq::Shutdown Errors”The old approach (after :finished, 'sidekiq:restart_noblock') would:
- Let jobs continue during deployment
- Interrupt them when restarting after deployment
- Cause
Sidekiq::Shutdownexceptions
The new approach:
- Pauses queue before deployment starts
- Lets running jobs finish before code changes
- No interruptions = no errors
✅ Graceful Queue Pause
Section titled “✅ Graceful Queue Pause”The quiet signal (TSTP) is specifically designed for deployments:
- Instant: Stops fetching new jobs immediately
- Safe: Doesn’t interrupt running jobs
- Reversible: If deployment fails, can un-quiet
✅ Predictable Behavior
Section titled “✅ Predictable Behavior”- Old jobs always run on old code (no mid-flight code changes)
- New jobs always run on new code
- Clear boundary between old and new
Available Capistrano Tasks
Section titled “Available Capistrano Tasks”# View all Sidekiq taskscap production sidekiq -T
# Common taskscap production sidekiq:quiet # Stop accepting new jobs (TSTP signal)cap production sidekiq:restart # Graceful restart (stop + start)cap production sidekiq:stop # Graceful stop (60s timeout)cap production sidekiq:start # Start Sidekiq processescap production sidekiq:install # Install systemd servicecap production sidekiq:status # Check Sidekiq statusSidekiq Signals Reference
Section titled “Sidekiq Signals Reference”| Signal | Command | Effect | Use Case |
|---|---|---|---|
| TSTP | sidekiq:quiet | Stop accepting new jobs, continue running jobs | Deployments (before code change) |
| TERM | sidekiq:stop | Graceful shutdown (60s timeout) | Normal shutdown |
| INT | Same as TERM | Graceful shutdown | Ctrl+C / manual stop |
| TTIN | N/A | Print thread backtraces to log | Debugging hung jobs |
| KILL | Force kill | Immediate termination (no cleanup) | Emergency only |
What Happens to Jobs During Quiet?
Section titled “What Happens to Jobs During Quiet?”Jobs Already Running
Section titled “Jobs Already Running”✅ Continue uninterrupted until completion or timeout (60s)
Jobs in Redis Queue
Section titled “Jobs in Redis Queue”⏸️ Remain queued - will be processed after new Sidekiq starts
New Jobs Enqueued During Deployment
Section titled “New Jobs Enqueued During Deployment”⏸️ Remain queued - will be processed after new Sidekiq starts
Critical Jobs That Can’t Wait
Section titled “Critical Jobs That Can’t Wait”If you have truly critical jobs that must process immediately:
Option 1: Schedule around deployments
# Deploy during low-traffic periods# Avoid deploying during critical job windowsOption 2: Run separate “critical” Sidekiq process
:concurrency: 2:queues: - [critical, 2] # Only critical jobs# Don't quiet this one during deploymentsset :sidekiq_config_files, ['sidekiq.yml'] # Exclude criticalOption 3: Use scheduled jobs instead of immediate
# Instead of perform_async (immediate)MyWorker.perform_in(5.minutes, args) # DelayedDeployment Timing Considerations
Section titled “Deployment Timing Considerations”How Long Does Quiet Phase Last?
Section titled “How Long Does Quiet Phase Last?”The quiet phase lasts as long as your deployment takes:
Deployment Duration = Upload Assets (~10-30s) + Run Migrations (~5-60s) + Publish Release (~5s) + Other Hooks (~10s) ≈ 30-105 seconds typicalDuring this time:
- ⏸️ New jobs queue up in Redis (not lost)
- ✅ Running jobs complete
- 📊 Monitor queue depth in Sidekiq Web UI
If Queue Builds Up
Section titled “If Queue Builds Up”Most jobs can wait 30-60 seconds, but if queues grow too large:
Solution 1: Faster deployments
- Optimize asset compilation (already done with local builds)
- Use zero-downtime migrations (already common practice)
- Parallelize upload tasks
Solution 2: Multiple worker servers
# Deploy to servers one at a time (rolling deployment)# Some workers always availableSolution 3: Pre-quiet strategy
# Quiet 30 seconds before deployment to drain queuebefore :starting, 'sidekiq:custom_quiet_and_wait'
task :custom_quiet_and_wait do invoke 'sidekiq:quiet' puts "Waiting 30s for queue to drain..." sleep 30endMonitoring and Verification
Section titled “Monitoring and Verification”After Deployment
Section titled “After Deployment”# SSH to production serverssh deploy@chi-vultr-heatwave-util1
# Check all Sidekiq services are runningsystemctl status 'sidekiq*.service' --no-pager
# Check processes are using new codeps aux | grep sidekiq# Look for new PID and recent start time
# Check logs for clean restartjournalctl -u sidekiq-heatwave-production-sidekiq -n 50
# Monitor queue in Sidekiq Web UI# https://crm.warmlyyours.me:3000/sidekiq# Check for:# - Queue depth (should drain after restart)# - No Sidekiq::Shutdown errors in dead jobs# - Processed jobs resumingIn Rollbar
Section titled “In Rollbar”Before this change:
❌ Frequent Sidekiq::Shutdown exceptions❌ Jobs interrupted during API calls❌ Incomplete data synchronizationAfter this change:
✅ No Sidekiq::Shutdown during deployments✅ Jobs complete or wait in queue✅ Clean shutdowns onlyTroubleshooting
Section titled “Troubleshooting”Queue Not Processing After Deployment
Section titled “Queue Not Processing After Deployment”Symptom: Jobs stuck in queue, not processing
Check:
# Are Sidekiq processes running?systemctl status 'sidekiq*.service'
# If not running, start themcap production sidekiq:start
# Check logsjournalctl -u sidekiq-heatwave-production-sidekiq -fJobs Still Being Interrupted
Section titled “Jobs Still Being Interrupted”Symptom: Still seeing Sidekiq::Shutdown in Rollbar
Possible causes:
-
Jobs exceed 60s timeout
- Solution: Increase timeout or break into smaller jobs
- See
doc/SIDEKIQ_GRACEFUL_SHUTDOWN.mdfor details
-
Manual restarts during deployment
- Check: Are you running
cap sidekiq:restartmanually? - Solution: Let Capistrano handle restarts automatically
- Check: Are you running
-
Systemd watchdog killing jobs
- Check:
journalctlfor “Watchdog timeout” - Solution: Increase
WatchdogSecin service file
- Check:
Deployment Hangs at “Quieting Sidekiq”
Section titled “Deployment Hangs at “Quieting Sidekiq””Symptom: Deployment stuck at sidekiq:quiet task
Check:
# Are Sidekiq processes responding?ssh deploy@server 'systemctl is-active sidekiq*.service'
# Can you manually quiet?ssh deploy@server 'systemctl kill -s TSTP sidekiq-heatwave-production-sidekiq.service'Solution:
- Increase SSH timeout
- Check network connectivity
- Verify systemd is responsive
Rollback Strategy
Section titled “Rollback Strategy”If a deployment fails or needs rollback:
# Automatic rollback on failurecap production deploy:rollback
# Sidekiq will restart with previous code version# Jobs in queue will process with rolled-back codeBest Practices Summary
Section titled “Best Practices Summary”- ✅ Always use quiet before deployment (configured automatically)
- ✅ Let Capistrano manage Sidekiq lifecycle (don’t manual restart)
- ✅ Keep jobs under 60 seconds when possible
- ✅ Make jobs idempotent (safe to retry)
- ✅ Monitor queue depth during deployments
- ✅ Deploy during low-traffic periods for critical systems
- ✅ Test deployments in staging with realistic job load
Additional Resources
Section titled “Additional Resources”- Sidekiq Deployment Wiki
- Sidekiq Signals Documentation
- Capistrano-Sidekiq GitHub
- Sidekiq Pro Features
Last Updated: October 10, 2025
Configuration Version: Sidekiq Pro 7.3.x, Rails 7.0.8.7, Capistrano 3.19.2