Database Migrations Guide

This guide covers best practices for writing safe database migrations in the Heatwave project, including the use of the online_migrations gem for safer PostgreSQL operations.

Table of Contents


Overview

We use the online_migrations gem to catch unsafe PostgreSQL migrations in development and run them safely in production. This gem:

  • ✅ Detects potentially dangerous operations
  • ✅ Prevents them from running by default
  • ✅ Provides instructions and helpers for safer alternatives
  • ✅ Supports background migrations for large data changes

Configuration: config/initializers/online_migrations.rb


Online Migrations Gem

Installation

The gem is already installed. Tables were created via:

bin/rails generate online_migrations:install
bin/rails db:migrate

Key Tables

Table Purpose
background_migrations Tracks background data migration status
background_migration_jobs Individual batch jobs for data migrations
background_schema_migrations Tracks background schema changes (like index creation)

Safety Checks

The gem automatically checks for dangerous operations. When detected, migrations will fail with helpful error messages.

Common Dangerous Operations

1. Adding a Column with Default Value

❌ Bad - Blocks reads/writes while rewriting entire table:

add_column :users, :admin, :boolean, default: false, null: false

✅ Good - Use the helper:

class AddAdminToUsers < ActiveRecord::Migration[7.0]
  disable_ddl_transaction!

  def change
    add_column_with_default :users, :admin, :boolean, default: false, null: false
  end
end

2. Adding an Index

❌ Bad - Locks table:

add_index :items, :product_line_id

✅ Good - Create concurrently:

class AddIndexToItems < ActiveRecord::Migration[7.0]
  disable_ddl_transaction!

  def change
    add_index :items, :product_line_id, algorithm: :concurrently
  end
end

3. Removing a Column

❌ Bad - Can cause errors until app reboots:

remove_column :users, :legacy_field

✅ Good - Two-step process:

Step 1: Ignore the column in the model:

class User < ApplicationRecord
  self.ignored_columns += ["legacy_field"]
end

Step 2: Deploy, then remove in next migration:

class RemoveLegacyFieldFromUsers < ActiveRecord::Migration[7.0]
  def change
    safety_assured { remove_column :users, :legacy_field }
  end
end

4. Executing Raw SQL

⚠️ Warning - The gem cannot inspect execute calls:

# This will raise an error
execute "DROP INDEX some_index"

✅ Good - Wrap in safety_assured with comment:

# Safe: DROP only removes unused index, no data modified
safety_assured do
  execute "DROP INDEX some_index"
end

Disabling Checks

For operations you've verified are safe:

safety_assured do
  # Your migration code here
end

Always add a comment explaining why it's safe!


Background Data Migrations

Use background migrations for large data changes that would be too slow in a regular migration.

When to Use

  • Backfilling data on large tables (>100k rows)
  • Complex data transformations
  • Operations that could timeout in a regular migration

Creating a Background Migration

Step 1: Create the Migration Class

Create in lib/online_migrations/background_migrations/:

# lib/online_migrations/background_migrations/backfill_user_slugs.rb
module OnlineMigrations
  module BackgroundMigrations
    class BackfillUserSlugs < OnlineMigrations::BackgroundMigration
      def relation
        User.where(slug: nil)
      end

      def process_batch(users)
        users.each do |user|
          user.update_columns(slug: user.name.parameterize)
        end
      end

      # Optional: For progress tracking
      delegate :count, to: :relation
    end
  end
end

Step 2: Enqueue in a Migration

# db/migrate/20251205000001_enqueue_backfill_user_slugs.rb
class EnqueueBackfillUserSlugs < ActiveRecord::Migration[7.0]
  def up
    enqueue_background_data_migration(
      'BackfillUserSlugs',
      batch_size: 1000,        # Records per job
      sub_batch_size: 100,     # Records per sub-batch
      batch_pause: 1           # Seconds between batches
    )
  end

  def down
    remove_background_data_migration('BackfillUserSlugs')
  end
end

Configuration Options

Option Default Description
batch_size 1,000 Records per background job
sub_batch_size 100 Records per sub-batch within a job
batch_pause 0 Seconds to pause between batches
sub_batch_pause_ms 100 Milliseconds between sub-batches
batch_max_attempts 5 Max retries for failed batches

How It Runs

Background migrations are processed by OnlineMigrationsSchedulerWorker:

  1. Runs every 1 minute via Sidekiq scheduler
  2. Uses dedicated online_migrations queue (max 1 concurrent process)
  3. Processes one batch at a time
  4. Automatically pauses if database is under load (throttling)

Predefined Background Migrations

The gem includes helpers for common operations:

# Backfill a column with a value
backfill_column_in_background(:users, :status, 'active')

# Copy data between columns
copy_column_in_background(:users, :legacy_name, :name)

# Delete orphaned records
delete_orphaned_records_in_background(:comments, :post)

# Reset counter caches
reset_counters_in_background(:posts, :comments_count)

# Perform action on relation
perform_action_on_relation_in_background(
  'User',
  { verified: false },
  :send_verification_email
)

Depending on Migrated Data

If your code depends on the migration being complete:

# In a later migration
class RequireBackfilledData < ActiveRecord::Migration[7.0]
  def up
    ensure_background_data_migration_succeeded('BackfillUserSlugs')
    # Now safe to add NOT NULL constraint
    change_column_null :users, :slug, false
  end
end

Background Schema Migrations

For long-running schema changes like creating indexes on large tables.

Creating an Index in Background

class AddIndexToLargeTable < ActiveRecord::Migration[7.0]
  disable_ddl_transaction!

  def up
    add_index_in_background :items, :sku, algorithm: :concurrently
  end

  def down
    remove_index :items, :sku
  end
end

Configuration

In config/initializers/online_migrations.rb:

OnlineMigrations.configure do |config|
  config.background_schema_migrations.max_attempts = 5
  config.background_schema_migrations.statement_timeout = 6.hours

  config.background_schema_migrations.error_handler = ->(error, migration) do
    Rails.logger.error "[BackgroundSchemaMigration] #{migration.migration_name} failed"
    Sentry.capture_exception(error) if defined?(Sentry)
  end
end

Common Migration Patterns

Adding a Foreign Key

class AddForeignKey < ActiveRecord::Migration[7.0]
  disable_ddl_transaction!

  def change
    add_foreign_key :orders, :customers, validate: false
    validate_foreign_key :orders, :customers
  end
end

Changing Column Type

class ChangeColumnType < ActiveRecord::Migration[7.0]
  def up
    # Step 1: Add new column
    add_column :users, :age_new, :integer
    
    # Step 2: Copy data (use background migration for large tables)
    copy_column_in_background(:users, :age, :age_new)
  end
end

# Later migration after background job completes:
class FinalizeColumnTypeChange < ActiveRecord::Migration[7.0]
  def up
    ensure_background_data_migration_succeeded('CopyColumn', arguments: ['users', 'age', 'age_new'])
    
    safety_assured do
      remove_column :users, :age
      rename_column :users, :age_new, :age
    end
  end
end

Adding NOT NULL Constraint

class AddNotNullConstraint < ActiveRecord::Migration[7.0]
  def up
    # First ensure all rows have values (via background migration if large)
    add_not_null_constraint :users, :email, validate: false
    validate_not_null_constraint :users, :email
  end
end

Renaming a Table

⚠️ Avoid if possible. If necessary:

  1. Create a view with old name pointing to new table
  2. Update code to use new name
  3. Drop view in later migration

Testing Migrations

Local Testing

# Run pending migrations
bundle exec rails db:migrate

# Rollback last migration
bundle exec rails db:rollback

# Rollback specific migration
bundle exec rails db:migrate:down VERSION=20251205000001

# Check migration status
bundle exec rails db:migrate:status

Testing Background Migrations

# In development/test, background migrations run inline during db:migrate
# To test manually:
bundle exec rails runner "OnlineMigrations.run_background_data_migrations"

Monitoring & Troubleshooting

Check Background Migration Status

# Find migration
migration = OnlineMigrations::BackgroundMigrations::Migration.find_by(
  migration_name: 'BackfillUserSlugs'
)

# Check status
migration.status        # => "enqueued", "running", "succeeded", "failed"
migration.progress      # => 0.75 (75% complete)
migration.rows_count    # => 50000 (total rows)
migration.started_at
migration.finished_at

# Check individual jobs
migration.migration_jobs.where(status: 'failed')

Rake Tasks

# Check status of all background migrations
bundle exec rake online_migrations:status

# Run background migrations manually
bundle exec rake online_migrations:run_background_migrations

Common Issues

Migration Stuck in "enqueued"

The scheduler might not be running. Check:

  1. Sidekiq is running with online_migrations queue
  2. OnlineMigrationsSchedulerWorker is scheduled
# Check if scheduler is working
Sidekiq::ScheduledSet.new.select { |j| j.klass == 'OnlineMigrationsSchedulerWorker' }

Failed Jobs

# Find failed jobs
failed_jobs = migration.migration_jobs.where(status: 'failed')
failed_jobs.each do |job|
  puts "Job #{job.id}: #{job.error_class} - #{job.error_message}"
end

# Retry failed jobs
migration.retry_failed_jobs

Throttling Active

Check if database load is triggering throttling:

# In rails console
OnlineMigrations.config.throttler.call
# => true means throttling is active

Quick Reference

Migration Helpers

Helper Use Case
add_column_with_default Add column with default safely
add_index(..., algorithm: :concurrently) Create index without locking
add_foreign_key(..., validate: false) Add FK without validation
validate_foreign_key Validate FK separately
safety_assured { } Bypass safety check (use carefully!)
enqueue_background_data_migration Queue large data migration
ensure_background_data_migration_succeeded Verify migration completed

Sidekiq Configuration

  • Queue: online_migrations
  • Concurrency: 1 (max 1 process)
  • Scheduler: Runs every 1 minute
  • Worker: OnlineMigrationsSchedulerWorker

Files

File Purpose
config/initializers/online_migrations.rb Gem configuration
lib/online_migrations/background_migrations/ Custom background migrations
app/workers/online_migrations_scheduler_worker.rb Sidekiq scheduler
config/sidekiq*.yml Queue configuration

Pruning old migration files

Migration .rb files can be deleted once they are old and universally
applied — unlike Scenic db/views/*_vNN.sql files (see the migrations skill's
"keep all versions" note), which must be retained because migrations reference
exact versions. The two are easy to conflate; the difference:

  • Migration files have no cross-file dependencies (no require_relative,
    Rake::Task[…], or load between them) and every applied version is recorded
    in db/structure.sql's schema_migrations INSERT. A fresh db:schema:load
    reads structure.sql, so pruned versions stay marked applied; db:migrate only
    runs pending files. structure.sql routinely holds more version rows than
    there are files (file-less versions from prior prunes) — proof the project
    already runs this way.
  • Scenic view SQL files are read lazily by create_view … version: N /
    revert_to_version: N, so deleting one breaks fresh-DB replay and rollback.

Procedure (rolling cleanup)

  1. Pick a cutoff (everything dated ≤ N months ago). All such migrations must be
    applied in every live + dev DB — true for anything past one production-snapshot
    cycle.
  2. Confirm zero cross-file deps in the prune set:
    git grep -lE "require_relative|Rake::Task\[|load File" HEAD -- 'db/migrate/<range>*'
    (expect none). If a migration invokes a rake task or another file, retire them
    together (e.g. the 2024-10 import_nodes migration invoked
    amazon:import_amazon_browse_nodes).
  3. Confirm structure.sql records the boundary versions:
    grep "('<version>')" db/structure.sql.
  4. git rm the files (reversible; structure.sql is untouched).
  5. Re-point doc citations. Grep docs for the deleted paths
    (grep -rnoE "db/migrate/[0-9]{14}_[a-z0-9_]+\.rb" --include=*.md --include=*.md .)
    and update any that cite a deleted file as a canonical example to a surviving
    one — or inline the snippet so the next prune can't orphan it. Historical
    pointers in task/feature/legacy docs can stay (git history keeps the file).

First prune (2026-06-04): removed all 713 migrations dated ≤ 2025-12
(1293 → 580; db/migrate 5.9M → 2.7M, which also trims the deployed image since
migrations run from it). The migrations skill's dev-gate precedent was re-pointed
from the deleted 20250313210532_refresh_amazon_schemas… to the surviving
20260525050027_scrub_null_click_ids… per step 5.


References