Skip to content

Sidekiq Graceful Shutdown Configuration

During deployments, Sidekiq workers were being forcibly terminated before jobs could complete, resulting in Sidekiq::Shutdown exceptions. This was particularly problematic for:

  • Long-running EDI API calls (Amazon, etc.)
  • HTTP requests with slow response times
  • Jobs that take longer than the default 25-second timeout

Example Error:

Sidekiq::Shutdown at HTTP::Connection#read_headers!

We’ve implemented a three-layer graceful shutdown strategy:

File: config/initializers/sidekiq.rb

config[:timeout] = 60
  • What it does: Tells Sidekiq to wait up to 60 seconds for running jobs to complete before forcing shutdown
  • Why 60 seconds: Accommodates typical EDI API call times, including retries and network delays
  • How it works: When Sidekiq receives a TERM signal (during deployment), it:
    1. Stops accepting new jobs immediately
    2. Waits for running jobs to complete (up to 60 seconds)
    3. Only force-kills jobs that exceed this timeout

File: config/deploy/templates/sidekiq.service.capistrano.erb

TimeoutStopSec=90
  • What it does: Tells systemd to wait 90 seconds for Sidekiq to gracefully shut down
  • Why 90 seconds: Must be longer than Sidekiq’s timeout (60s) + buffer for cleanup (30s)
  • How it works: If systemd doesn’t receive confirmation of shutdown within 90 seconds, it will send SIGKILL

File: config/deploy.rb

set :sidekiq_timeout, 60
  • What it does: Ensures Capistrano waits for Sidekiq to shut down properly before proceeding with deployment
  • Why it matches: Should align with Sidekiq’s internal timeout for consistency
  1. Capistrano triggers Sidekiq restart via sidekiq:restart_noblock
  2. Systemd sends SIGTERM to Sidekiq process
  3. Sidekiq enters shutdown mode:
    • Stops fetching new jobs from Redis
    • Marks itself as “quiet” (won’t accept work)
    • Waits for currently executing jobs to complete
  4. Jobs have 60 seconds to finish:
    • Jobs that complete within 60s: ✅ Success, no errors
    • Jobs exceeding 60s: ⚠️ Receive Sidekiq::Shutdown exception
  5. Sidekiq exits cleanly after all jobs complete or timeout
  6. Systemd starts new Sidekiq process with updated code
  7. Capistrano continues deployment
T+0s → Deployment starts, systemd sends SIGTERM
T+0s → Sidekiq stops accepting new jobs
T+0s → Running EDI job continues (waiting for API response)
T+45s → EDI job completes successfully ✅
T+46s → Sidekiq shuts down gracefully
T+47s → New Sidekiq process starts with updated code

Jobs under 60 seconds:

  • Complete normally
  • No exceptions raised
  • Results saved successfully

Jobs over 60 seconds:

  • Receive Sidekiq::Shutdown exception at 60-second mark
  • Can catch this exception and handle gracefully:
    rescue Sidekiq::Shutdown => e
    # Log the interruption
    # Save partial progress if possible
    # Re-enqueue for retry after deployment
    raise # Re-raise to mark job as failed for retry
    end

Jobs over 90 seconds:

  • Forcibly killed by systemd (SIGKILL)
  • No opportunity to handle gracefully
  • Solution: Break these into smaller jobs or use batch processing
  • ❌ Jobs killed immediately or within 25 seconds
  • ❌ Frequent Sidekiq::Shutdown exceptions in Rollbar
  • ❌ Incomplete EDI synchronizations
  • ❌ Lost API responses
  • ✅ Jobs have 60 seconds to complete gracefully
  • ✅ Significantly fewer Sidekiq::Shutdown errors
  • ✅ API calls can complete before shutdown
  • ✅ Better data consistency

When you deploy next, the systemd service files will be regenerated with the new TimeoutStopSec setting automatically by Capistrano.

No manual intervention required - the changes are applied automatically during deployment.

After deployment, verify the configuration:

Terminal window
# SSH to production server
ssh deploy@chi-vultr-heatwave-util1
# Check systemd service timeout
systemctl cat sidekiq-heatwave-production-sidekiq.service | grep TimeoutStopSec
# Should show: TimeoutStopSec=90
# Check Sidekiq is running
systemctl status sidekiq-heatwave-production-sidekiq.service
# Monitor next deployment logs
tail -f /var/www/heatwave/shared/log/sidekiq.log

After deployment, check Rollbar for:

  • Expected: Significant reduction in Sidekiq::Shutdown errors
  • Monitor: Any jobs that still exceed 60 seconds (may need timeout adjustment)

Consider these approaches:

config/initializers/sidekiq.rb
config[:timeout] = 120 # Increase to 2 minutes
# config/deploy/templates/sidekiq.service.capistrano.erb
TimeoutStopSec=150 # Must be longer than Sidekiq timeout
# config/deploy.rb
set :sidekiq_timeout, 120

When to use: Jobs legitimately need more time to complete

Option 2: Break Into Smaller Jobs (Better)

Section titled “Option 2: Break Into Smaller Jobs (Better)”
# Instead of one long job:
def perform
fetch_inventory # 30s
process_inventory # 40s
sync_to_database # 30s
end
# Break into separate jobs:
FetchInventoryWorker.perform_async
ProcessInventoryWorker.perform_async
SyncInventoryWorker.perform_async

When to use: Jobs can be logically decomposed

Option 3: Handle Shutdown Gracefully (Best)

Section titled “Option 3: Handle Shutdown Gracefully (Best)”
def perform
begin
long_running_operation
rescue Sidekiq::Shutdown => e
# Save checkpoint/progress
store_partial_results
# Re-enqueue with resume logic
ResumeJobWorker.perform_in(30.seconds, checkpoint_id)
# Re-raise to mark as interrupted
raise
end
end

When to use: Jobs can resume from a checkpoint

Current timeout (60s) may be excessive if most jobs complete in < 10 seconds:

config[:timeout] = 30 # Faster restarts

Trade-off: Faster deployments vs. job completion safety

SettingValuePurpose
Sidekiq timeout60sJob completion grace period
Systemd timeout90sService shutdown deadline
Capistrano timeout60sDeployment wait time

Sidekiq responds to these signals:

  • TERM (default): Graceful shutdown with timeout
  • INT: Same as TERM
  • TSTP: Quiet mode (stop accepting new jobs, continue running)
  • TTIN: Print thread backtraces to log (debugging)
  • KILL: Immediate termination (no cleanup)

Check systemd logs:

Terminal window
journalctl -u sidekiq-heatwave-production-sidekiq -n 100

Look for:

  • “Timeout during operation” - Systemd killed it (increase TimeoutStopSec)
  • “SIGTERM received” - Check if jobs are honoring timeout
  • “Forcing shutdown” - Jobs exceeded Sidekiq timeout

If deployments hang waiting for Sidekiq:

  1. Check for stuck jobs: bundle exec sidekiqctl busy
  2. Consider reducing timeout if jobs normally complete quickly
  3. Verify no infinite loops in worker code

Sidekiq::Shutdown exceptions will appear as failures in Sidekiq retry queue:

  • Expected behavior for jobs exceeding timeout
  • Solution: Review job duration, break into smaller jobs, or increase timeout
  1. Keep jobs short: Target < 30 seconds when possible
  2. Make jobs idempotent: Can safely retry without side effects
  3. Checkpoint progress: Save intermediate state for long jobs
  4. Handle interruptions:
    def perform
    begin
    work
    rescue Sidekiq::Shutdown
    cleanup_and_save_progress
    raise # Allow Sidekiq to handle retry
    end
    end
  1. Track job duration: Alert on jobs approaching timeout
  2. Monitor shutdown errors: Rollbar Sidekiq::Shutdown count
  3. Review retry queue: Jobs repeatedly interrupted may need redesign

Last Updated: October 9, 2025 Configuration Version: Sidekiq 7.3.9, Rails 7.0.8.7