SEO Metrics Data Model

Overview

This document describes the time-series data model for SEO metrics, replacing the
previous "wide table" approach with a flexible fact table design.

Design Goals

  1. Temporal accuracy: Each metric knows exactly what time period it covers
  2. Flexibility: Add new metrics without schema changes
  3. Granularity: Support daily, weekly, monthly, or custom periods
  4. Context preservation: Keyword-specific metrics retain their keyword reference
  5. Query efficiency: Optimized for trend analysis and period comparisons

Schema

CREATE TYPE seo_metric_type AS ENUM (
  -- Google Search Console (28-day periods typically)
  'gsc_clicks',
  'gsc_impressions', 
  'gsc_ctr',
  'gsc_avg_position',
  
  -- Google Analytics 4 (flexible periods)
  'ga4_page_views',
  'ga4_sessions',
  'ga4_users',
  'ga4_bounce_rate',
  'ga4_engagement_rate',
  'ga4_avg_session_duration',
  
  -- Ahrefs (point-in-time snapshots)
  'ahrefs_traffic',
  'ahrefs_keywords_count',
  'ahrefs_traffic_value',
  'ahrefs_domain_rating',
  
  -- Keyword-specific (requires reference)
  'keyword_position',
  'keyword_search_volume',
  'keyword_traffic_share',
  
  -- Internal (first-party)
  'visit_count'
);

CREATE TABLE site_map_data_points (
  id              BIGSERIAL PRIMARY KEY,
  site_map_id     BIGINT NOT NULL REFERENCES site_maps(id) ON DELETE CASCADE,
  
  -- Temporal
  recorded_at     TIMESTAMPTZ NOT NULL DEFAULT NOW(),  -- When we captured this
  period          DATERANGE NOT NULL,                   -- Time range this covers
  
  -- Metric identification
  metric_type     seo_metric_type NOT NULL,
  reference       TEXT,                                 -- Context (e.g., keyword text)
  
  -- Value (decimal handles all numeric types)
  value           DECIMAL(15, 4) NOT NULL,
  
  -- Metadata
  source_batch_id UUID,                                 -- Group related data points
  created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Primary query pattern: get metrics for a site_map in a date range
CREATE INDEX idx_data_points_lookup 
  ON site_map_data_points (site_map_id, metric_type, period)
  USING GIST (period);

-- Find overlapping periods
CREATE INDEX idx_data_points_period
  ON site_map_data_points USING GIST (period);

-- Query by source batch
CREATE INDEX idx_data_points_batch
  ON site_map_data_points (source_batch_id)
  WHERE source_batch_id IS NOT NULL;

Usage Patterns

Recording Metrics

# GSC sync (28-day window)
SiteMapDataPoint.create!(
  site_map: site_map,
  period: (28.days.ago.to_date..Date.yesterday),
  metric_type: :gsc_clicks,
  value: 1234,
  source_batch_id: batch_id
)

# GA4 sync (30-day window)  
SiteMapDataPoint.create!(
  site_map: site_map,
  period: (30.days.ago.to_date..Date.yesterday),
  metric_type: :ga4_page_views,
  value: 5678,
  source_batch_id: batch_id
)

# Keyword position (point-in-time, single day)
SiteMapDataPoint.create!(
  site_map: site_map,
  period: (Date.current..Date.current),
  metric_type: :keyword_position,
  reference: 'heated driveway cost',
  value: 3.2,
  source_batch_id: batch_id
)

Querying Trends

# Get GA4 page views trend over last 6 months
site_map.data_points
  .where(metric_type: :ga4_page_views)
  .where("period && daterange(?, ?)", 6.months.ago.to_date, Date.current)
  .order(:recorded_at)

# Compare two periods
current_period = (30.days.ago.to_date..Date.yesterday)
previous_period = (60.days.ago.to_date..31.days.ago.to_date)

current = site_map.data_points
  .where(metric_type: :ga4_page_views)
  .where("period @> ?::date", 15.days.ago.to_date)
  .order(recorded_at: :desc)
  .first&.value

previous = site_map.data_points
  .where(metric_type: :ga4_page_views)  
  .where("period @> ?::date", 45.days.ago.to_date)
  .order(recorded_at: :desc)
  .first&.value

# Track keyword position over time
site_map.data_points
  .where(metric_type: :keyword_position)
  .where(reference: 'heated driveway cost')
  .order(:recorded_at)
  .pluck(:recorded_at, :value)

PostgreSQL Range Operators

Operator Meaning
@> Contains element
<@ Contained by
&& Overlaps
-|- Adjacent to

Migration Strategy

Phase 1: Create New Table

  • Create site_map_data_points table
  • Update sync services to write to new table
  • Keep existing site_maps columns for backward compatibility

Phase 2: Backfill

  • Migrate existing seo_metrics_histories data
  • Convert wide rows to individual data points

Phase 3: Update UI

  • Update dashboards to read from new table
  • Add trend visualizations

Phase 4: Cleanup

  • Remove deprecated columns from site_maps
  • Drop seo_metrics_histories table

Advantages Over Previous Design

Aspect Old (Wide Table) New (Fact Table)
New metrics Migration required Just add enum value
Time periods Single snapshot_date Explicit daterange per metric
Keyword tracking Separate table Same table with reference
Query flexibility Fixed columns Dynamic pivoting
Storage efficiency Sparse NULLs Dense rows
Audit trail Overwritten Append-only history

Considerations

  1. Query complexity: Pivoting requires more SQL, but PostgreSQL handles this well
  2. Index strategy: GIST indexes on daterange are essential
  3. Volume management: Consider partitioning by month if data grows large
  4. Aggregation: May want materialized views for common aggregations