Sidekiq Pro Zero-Downtime Deployment Strategy
Overview
This project uses Sidekiq Pro with a zero-downtime deployment strategy that eliminates Sidekiq::Shutdown errors during deployments.
How It Works
Deployment Flow
┌─────────────────────────────────────────────────────────────────┐
│ BEFORE DEPLOYMENT STARTS │
│ ↓ │
│ 1. Send TSTP signal (sidekiq:quiet) │
│ - Stops accepting NEW jobs immediately │
│ - Running jobs continue on OLD code │
│ - No interruptions, no Sidekiq::Shutdown errors │
└─────────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────┐
│ DURING DEPLOYMENT │
│ ↓ │
│ 2. Deploy new code │
│ - Upload assets │
│ - Run migrations │
│ - Publish new release │
│ - Existing Sidekiq jobs finish on old code (no interruption) │
└─────────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────┐
│ AFTER DEPLOYMENT SUCCEEDS │
│ ↓ │
│ 3. Restart Sidekiq (sidekiq:restart) │
│ - Stop old processes (graceful, 60s timeout) │
│ - Start new processes with new code │
│ - Begin accepting jobs again │
└─────────────────────────────────────────────────────────────────┘
Timeline Example
T+0s → Deployment starts
T+0s → Send TSTP signal to all Sidekiq processes
T+0s → Sidekiq stops fetching new jobs (queue paused)
T+0-45s → Deploy code (upload, migrate, publish)
T+10s → Running EDI job continues uninterrupted ✅
T+35s → EDI job completes successfully ✅
T+45s → Deployment finished, trigger sidekiq:restart
T+45s → Old Sidekiq processes shutdown gracefully
T+46s → New Sidekiq processes start with new code
T+46s → Queue processing resumes ✅
Configuration
Capistrano Deploy Configuration
File: config/deploy.rb
# Sidekiq Pro zero-downtime deployment strategy:
before :starting, 'sidekiq:quiet' # Quiet before deployment starts
after :finished, 'sidekiq:restart' # Restart after deployment succeeds
Sidekiq Timeout Configuration
File: config/initializers/sidekiq.rb
config[:timeout] = 60
- Gives jobs 60 seconds to complete during graceful shutdown
- Prevents force-kill of jobs that are almost done
- Applies to the restart phase (after deployment)
Systemd Service Configuration
File: config/deploy/templates/sidekiq.service.capistrano.erb
TimeoutStopSec=90
- Gives systemd 90 seconds to wait for Sidekiq shutdown
- Must be longer than Sidekiq timeout (60s) + buffer (30s)
- Prevents systemd from sending SIGKILL prematurely
Capistrano Sidekiq Settings
File: config/deploy.rb
set :sidekiq_roles, :worker
set :sidekiq_default_hooks, false # We control hooks manually
set :sidekiq_timeout, 60 # Matches Sidekiq initializer timeout
Benefits of This Approach
✅ Zero Job Interruptions
- Before deployment: Jobs stop being queued but running jobs finish
- During deployment: No jobs are interrupted (they run on old code)
- After deployment: New jobs run on new code
✅ No Sidekiq::Shutdown Errors
The old approach (after :finished, 'sidekiq:restart_noblock') would:
- Let jobs continue during deployment
- Interrupt them when restarting after deployment
- Cause
Sidekiq::Shutdownexceptions
The new approach:
- Pauses queue before deployment starts
- Lets running jobs finish before code changes
- No interruptions = no errors
✅ Graceful Queue Pause
The quiet signal (TSTP) is specifically designed for deployments:
- Instant: Stops fetching new jobs immediately
- Safe: Doesn't interrupt running jobs
- Reversible: If deployment fails, can un-quiet
✅ Predictable Behavior
- Old jobs always run on old code (no mid-flight code changes)
- New jobs always run on new code
- Clear boundary between old and new
Available Capistrano Tasks
# View all Sidekiq tasks
cap production sidekiq -T
# Common tasks
cap production sidekiq:quiet # Stop accepting new jobs (TSTP signal)
cap production sidekiq:restart # Graceful restart (stop + start)
cap production sidekiq:stop # Graceful stop (60s timeout)
cap production sidekiq:start # Start Sidekiq processes
cap production sidekiq:install # Install systemd service
cap production sidekiq:status # Check Sidekiq status
Sidekiq Signals Reference
| Signal | Command | Effect | Use Case |
|---|---|---|---|
| TSTP | sidekiq:quiet |
Stop accepting new jobs, continue running jobs | Deployments (before code change) |
| TERM | sidekiq:stop |
Graceful shutdown (60s timeout) | Normal shutdown |
| INT | Same as TERM | Graceful shutdown | Ctrl+C / manual stop |
| TTIN | N/A | Print thread backtraces to log | Debugging hung jobs |
| KILL | Force kill | Immediate termination (no cleanup) | Emergency only |
What Happens to Jobs During Quiet?
Jobs Already Running
✅ Continue uninterrupted until completion or timeout (60s)
Jobs in Redis Queue
⏸️ Remain queued - will be processed after new Sidekiq starts
New Jobs Enqueued During Deployment
⏸️ Remain queued - will be processed after new Sidekiq starts
Critical Jobs That Can't Wait
If you have truly critical jobs that must process immediately:
Option 1: Schedule around deployments
# Deploy during low-traffic periods
# Avoid deploying during critical job windows
Option 2: Run separate "critical" Sidekiq process
# config/sidekiq_critical.yml
:concurrency: 2
:queues:
- [critical, 2] # Only critical jobs
# Don't quiet this one during deployments
set :sidekiq_config_files, ['sidekiq.yml'] # Exclude critical
Option 3: Use scheduled jobs instead of immediate
# Instead of perform_async (immediate)
MyWorker.perform_in(5.minutes, args) # Delayed
Deployment Timing Considerations
How Long Does Quiet Phase Last?
The quiet phase lasts as long as your deployment takes:
Deployment Duration =
Upload Assets (~10-30s) +
Run Migrations (~5-60s) +
Publish Release (~5s) +
Other Hooks (~10s)
≈ 30-105 seconds typical
During this time:
- ⏸️ New jobs queue up in Redis (not lost)
- ✅ Running jobs complete
- 📊 Monitor queue depth in Sidekiq Web UI
If Queue Builds Up
Most jobs can wait 30-60 seconds, but if queues grow too large:
Solution 1: Faster deployments
- Optimize asset compilation (already done with local builds)
- Use zero-downtime migrations (already common practice)
- Parallelize upload tasks
Solution 2: Multiple worker servers
# Deploy to servers one at a time (rolling deployment)
# Some workers always available
Solution 3: Pre-quiet strategy
# Quiet 30 seconds before deployment to drain queue
before :starting, 'sidekiq:custom_quiet_and_wait'
task :custom_quiet_and_wait do
invoke 'sidekiq:quiet'
puts "Waiting 30s for queue to drain..."
sleep 30
end
Monitoring and Verification
After Deployment
# SSH to production server
ssh deploy@chi-vultr-heatwave-util1
# Check all Sidekiq services are running
systemctl status 'sidekiq*.service' --no-pager
# Check processes are using new code
ps aux | grep sidekiq
# Look for new PID and recent start time
# Check logs for clean restart
journalctl -u sidekiq-heatwave-production-sidekiq -n 50
# Monitor queue in Sidekiq Web UI
# https://crm.warmlyyours.me:3000/sidekiq
# Check for:
# - Queue depth (should drain after restart)
# - No Sidekiq::Shutdown errors in dead jobs
# - Processed jobs resuming
In Rollbar
Before this change:
❌ Frequent Sidekiq::Shutdown exceptions
❌ Jobs interrupted during API calls
❌ Incomplete data synchronization
After this change:
✅ No Sidekiq::Shutdown during deployments
✅ Jobs complete or wait in queue
✅ Clean shutdowns only
Troubleshooting
Queue Not Processing After Deployment
Symptom: Jobs stuck in queue, not processing
Check:
# Are Sidekiq processes running?
systemctl status 'sidekiq*.service'
# If not running, start them
cap production sidekiq:start
# Check logs
journalctl -u sidekiq-heatwave-production-sidekiq -f
Jobs Still Being Interrupted
Symptom: Still seeing Sidekiq::Shutdown in Rollbar
Possible causes:
-
Jobs exceed 60s timeout
- Solution: Increase timeout or break into smaller jobs
- See
doc/SIDEKIQ_GRACEFUL_SHUTDOWN.mdfor details
-
Manual restarts during deployment
- Check: Are you running
cap sidekiq:restartmanually? - Solution: Let Capistrano handle restarts automatically
- Check: Are you running
-
Systemd watchdog killing jobs
- Check:
journalctlfor "Watchdog timeout" - Solution: Increase
WatchdogSecin service file
- Check:
Deployment Hangs at "Quieting Sidekiq"
Symptom: Deployment stuck at sidekiq:quiet task
Check:
# Are Sidekiq processes responding?
ssh deploy@server 'systemctl is-active sidekiq*.service'
# Can you manually quiet?
ssh deploy@server 'systemctl kill -s TSTP sidekiq-heatwave-production-sidekiq.service'
Solution:
- Increase SSH timeout
- Check network connectivity
- Verify systemd is responsive
Rollback Strategy
If a deployment fails or needs rollback:
# Automatic rollback on failure
cap production deploy:rollback
# Sidekiq will restart with previous code version
# Jobs in queue will process with rolled-back code
Best Practices Summary
- ✅ Always use quiet before deployment (configured automatically)
- ✅ Let Capistrano manage Sidekiq lifecycle (don't manual restart)
- ✅ Keep jobs under 60 seconds when possible
- ✅ Make jobs idempotent (safe to retry)
- ✅ Monitor queue depth during deployments
- ✅ Deploy during low-traffic periods for critical systems
- ✅ Test deployments in staging with realistic job load
Additional Resources
- Sidekiq Deployment Wiki
- Sidekiq Signals Documentation
- Capistrano-Sidekiq GitHub
- Sidekiq Pro Features
Last Updated: October 10, 2025
Configuration Version: Sidekiq Pro 7.3.x, Rails 7.0.8.7, Capistrano 3.19.2