Database Migrations Guide

This guide covers best practices for writing safe database migrations in the Heatwave project, including the use of the online_migrations gem for safer PostgreSQL operations.

Overview
Online Migrations Gem
Safety Checks
Background Data Migrations
Background Schema Migrations
Common Migration Patterns
Testing Migrations
Monitoring & Troubleshooting

Overview

We use the online_migrations gem to catch unsafe PostgreSQL migrations in development and run them safely in production. This gem:

✅ Detects potentially dangerous operations
✅ Prevents them from running by default
✅ Provides instructions and helpers for safer alternatives
✅ Supports background migrations for large data changes

Configuration: config/initializers/online_migrations.rb

Online Migrations Gem

Installation

The gem is already installed. Tables were created via:

bin/rails generate online_migrations:install
bin/rails db:migrate

Key Tables

Table	Purpose
`background_migrations`	Tracks background data migration status
`background_migration_jobs`	Individual batch jobs for data migrations
`background_schema_migrations`	Tracks background schema changes (like index creation)

Safety Checks

The gem automatically checks for dangerous operations. When detected, migrations will fail with helpful error messages.

Common Dangerous Operations

1. Adding a Column with Default Value

❌ Bad - Blocks reads/writes while rewriting entire table:

add_column :users, :admin, :boolean, default: false, null: false

✅ Good - Use the helper:

class AddAdminToUsers < ActiveRecord::Migration[7.0]
  disable_ddl_transaction!

  def change
    add_column_with_default :users, :admin, :boolean, default: false, null: false
  end
end

2. Adding an Index

❌ Bad - Locks table:

add_index :items, :product_line_id

✅ Good - Create concurrently:

class AddIndexToItems < ActiveRecord::Migration[7.0]
  disable_ddl_transaction!

  def change
    add_index :items, :product_line_id, algorithm: :concurrently
  end
end

3. Removing a Column

❌ Bad - Can cause errors until app reboots:

remove_column :users, :legacy_field

✅ Good - Two-step process:

Step 1: Ignore the column in the model:

class User < ApplicationRecord
  self.ignored_columns += ["legacy_field"]
end

Step 2: Deploy, then remove in next migration:

class RemoveLegacyFieldFromUsers < ActiveRecord::Migration[7.0]
  def change
    safety_assured { remove_column :users, :legacy_field }
  end
end

4. Executing Raw SQL

⚠️ Warning - The gem cannot inspect execute calls:

# This will raise an error
execute "DROP INDEX some_index"

✅ Good - Wrap in safety_assured with comment:

# Safe: DROP only removes unused index, no data modified
safety_assured do
  execute "DROP INDEX some_index"
end

Disabling Checks

For operations you’ve verified are safe:

safety_assured do
  # Your migration code here
end

Always add a comment explaining why it’s safe!

Background Data Migrations

Use background migrations for large data changes that would be too slow in a regular migration.

When to Use

Backfilling data on large tables (>100k rows)
Complex data transformations
Operations that could timeout in a regular migration

Creating a Background Migration

Step 1: Create the Migration Class

Create in lib/online_migrations/background_migrations/:

module OnlineMigrations
  module BackgroundMigrations
    class BackfillUserSlugs < OnlineMigrations::BackgroundMigration
      def relation
        User.where(slug: nil)
      end

      def process_batch(users)
        users.each do |user|
          user.update_columns(slug: user.name.parameterize)
        end
      end

      # Optional: For progress tracking
      delegate :count, to: :relation
    end
  end
end

Step 2: Enqueue in a Migration

class EnqueueBackfillUserSlugs < ActiveRecord::Migration[7.0]
  def up
    enqueue_background_data_migration(
      'BackfillUserSlugs',
      batch_size: 1000,        # Records per job
      sub_batch_size: 100,     # Records per sub-batch
      batch_pause: 1           # Seconds between batches
    )
  end

  def down
    remove_background_data_migration('BackfillUserSlugs')
  end
end

Configuration Options

Option	Default	Description
`batch_size`	1,000	Records per background job
`sub_batch_size`	100	Records per sub-batch within a job
`batch_pause`	0	Seconds to pause between batches
`sub_batch_pause_ms`	100	Milliseconds between sub-batches
`batch_max_attempts`	5	Max retries for failed batches

How It Runs

Background migrations are processed by OnlineMigrationsSchedulerWorker:

Runs every 1 minute via Sidekiq scheduler
Uses dedicated online_migrations queue (max 1 concurrent process)
Processes one batch at a time
Automatically pauses if database is under load (throttling)

Predefined Background Migrations

The gem includes helpers for common operations:

# Backfill a column with a value
backfill_column_in_background(:users, :status, 'active')

# Copy data between columns
copy_column_in_background(:users, :legacy_name, :name)

# Delete orphaned records
delete_orphaned_records_in_background(:comments, :post)

# Reset counter caches
reset_counters_in_background(:posts, :comments_count)

# Perform action on relation
perform_action_on_relation_in_background(
  'User',
  { verified: false },
  :send_verification_email
)

Depending on Migrated Data

If your code depends on the migration being complete:

# In a later migration
class RequireBackfilledData < ActiveRecord::Migration[7.0]
  def up
    ensure_background_data_migration_succeeded('BackfillUserSlugs')
    # Now safe to add NOT NULL constraint
    change_column_null :users, :slug, false
  end
end

Background Schema Migrations

For long-running schema changes like creating indexes on large tables.

Creating an Index in Background

class AddIndexToLargeTable < ActiveRecord::Migration[7.0]
  disable_ddl_transaction!

  def up
    add_index_in_background :items, :sku, algorithm: :concurrently
  end

  def down
    remove_index :items, :sku
  end
end

Configuration

In config/initializers/online_migrations.rb:

OnlineMigrations.configure do |config|
  config.background_schema_migrations.max_attempts = 5
  config.background_schema_migrations.statement_timeout = 6.hours

  config.background_schema_migrations.error_handler = ->(error, migration) do
    Rails.logger.error "[BackgroundSchemaMigration] #{migration.migration_name} failed"
    Sentry.capture_exception(error) if defined?(Sentry)
  end
end

Common Migration Patterns

Adding a Foreign Key

class AddForeignKey < ActiveRecord::Migration[7.0]
  disable_ddl_transaction!

  def change
    add_foreign_key :orders, :customers, validate: false
    validate_foreign_key :orders, :customers
  end
end

Changing Column Type

class ChangeColumnType < ActiveRecord::Migration[7.0]
  def up
    # Step 1: Add new column
    add_column :users, :age_new, :integer

    # Step 2: Copy data (use background migration for large tables)
    copy_column_in_background(:users, :age, :age_new)
  end
end

# Later migration after background job completes:
class FinalizeColumnTypeChange < ActiveRecord::Migration[7.0]
  def up
    ensure_background_data_migration_succeeded('CopyColumn', arguments: ['users', 'age', 'age_new'])

    safety_assured do
      remove_column :users, :age
      rename_column :users, :age_new, :age
    end
  end
end

Adding NOT NULL Constraint

class AddNotNullConstraint < ActiveRecord::Migration[7.0]
  def up
    # First ensure all rows have values (via background migration if large)
    add_not_null_constraint :users, :email, validate: false
    validate_not_null_constraint :users, :email
  end
end

Renaming a Table

⚠️ Avoid if possible. If necessary:

Create a view with old name pointing to new table
Update code to use new name
Drop view in later migration

Testing Migrations

Local Testing

# Run pending migrations
bundle exec rails db:migrate

# Rollback last migration
bundle exec rails db:rollback

# Rollback specific migration
bundle exec rails db:migrate:down VERSION=20251205000001

# Check migration status
bundle exec rails db:migrate:status

Testing Background Migrations

# In development/test, background migrations run inline during db:migrate
# To test manually:
bundle exec rails runner "OnlineMigrations.run_background_data_migrations"

Monitoring & Troubleshooting

Check Background Migration Status

# Find migration
migration = OnlineMigrations::BackgroundMigrations::Migration.find_by(
  migration_name: 'BackfillUserSlugs'
)

# Check status
migration.status        # => "enqueued", "running", "succeeded", "failed"
migration.progress      # => 0.75 (75% complete)
migration.rows_count    # => 50000 (total rows)
migration.started_at
migration.finished_at

# Check individual jobs
migration.migration_jobs.where(status: 'failed')

Rake Tasks

# Check status of all background migrations
bundle exec rake online_migrations:status

# Run background migrations manually
bundle exec rake online_migrations:run_background_migrations

Common Issues

Migration Stuck in “enqueued”

The scheduler might not be running. Check:

Sidekiq is running with online_migrations queue
OnlineMigrationsSchedulerWorker is scheduled

# Check if scheduler is working
Sidekiq::ScheduledSet.new.select { |j| j.klass == 'OnlineMigrationsSchedulerWorker' }

Failed Jobs

# Find failed jobs
failed_jobs = migration.migration_jobs.where(status: 'failed')
failed_jobs.each do |job|
  puts "Job #{job.id}: #{job.error_class} - #{job.error_message}"
end

# Retry failed jobs
migration.retry_failed_jobs

Throttling Active

Check if database load is triggering throttling:

# In rails console
OnlineMigrations.config.throttler.call
# => true means throttling is active

Quick Reference

Migration Helpers

Helper	Use Case
`add_column_with_default`	Add column with default safely
`add_index(..., algorithm: :concurrently)`	Create index without locking
`add_foreign_key(..., validate: false)`	Add FK without validation
`validate_foreign_key`	Validate FK separately
`safety_assured { }`	Bypass safety check (use carefully!)
`enqueue_background_data_migration`	Queue large data migration
`ensure_background_data_migration_succeeded`	Verify migration completed

Sidekiq Configuration

Queue: online_migrations
Concurrency: 1 (max 1 process)
Scheduler: Runs every 1 minute
Worker: OnlineMigrationsSchedulerWorker

Files

File	Purpose
`config/initializers/online_migrations.rb`	Gem configuration
`lib/online_migrations/background_migrations/`	Custom background migrations
`app/workers/online_migrations_scheduler_worker.rb`	Sidekiq scheduler
`config/sidekiq*.yml`	Queue configuration

Pruning old migration files

Migration .rb files can be deleted once they are old and universally applied — unlike Scenic db/views/*_vNN.sql files (see the migrations skill’s “keep all versions” note), which must be retained because migrations reference exact versions. The two are easy to conflate; the difference:

Migration files have no cross-file dependencies (no require_relative, Rake::Task[…], or load between them) and every applied version is recorded in db/structure.sql’s schema_migrations INSERT. A fresh db:schema:load reads structure.sql, so pruned versions stay marked applied; db:migrate only runs pending files. structure.sql routinely holds more version rows than there are files (file-less versions from prior prunes) — proof the project already runs this way.
Scenic view SQL files are read lazily by create_view … version: N / revert_to_version: N, so deleting one breaks fresh-DB replay and rollback.

Procedure (rolling cleanup)

Pick a cutoff (everything dated ≤ N months ago). All such migrations must be applied in every live + dev DB — true for anything past one production-snapshot cycle.
Confirm zero cross-file deps in the prune set: git grep -lE "require_relative|Rake::Task\[|load File" HEAD -- 'db/migrate/<range>*' (expect none). If a migration invokes a rake task or another file, retire them together (e.g. the 2024-10 import_nodes migration invoked amazon:import_amazon_browse_nodes).
Confirm structure.sql records the boundary versions: grep "('<version>')" db/structure.sql.
git rm the files (reversible; structure.sql is untouched).
Re-point doc citations. Grep docs for the deleted paths (grep -rnoE "db/migrate/[0-9]{14}_[a-z0-9_]+\.rb" --include=*.md --include=*.md .) and update any that cite a deleted file as a canonical example to a surviving one — or inline the snippet so the next prune can’t orphan it. Historical pointers in task/feature/legacy docs can stay (git history keeps the file).

First prune (2026-06-04): removed all 713 migrations dated ≤ 2025-12 (1293 → 580; db/migrate 5.9M → 2.7M, which also trims the deployed image since migrations run from it). The migrations skill’s dev-gate precedent was re-pointed from the deleted 20250313210532_refresh_amazon_schemas… to the surviving 20260525050027_scrub_null_click_ids… per step 5.