PostgreSQL 18 Production Tuning (updated 2026-06-15)
Date: November 26, 2025 (originally); re-baselined 2026-06-15 for the Latitude.sh fleet
Server: dal-latitude-heatwave-01 (Dallas primary, tailnet 100.123.47.52) + Chicago standby
Host: Latitude.sh f4.metal.medium (bare metal)
CPU: AMD EPYC 4564P, 16 cores / 32 threads
OS: Ubuntu 26.04 LTS
RAM: 192 GB (187 GiB usable)
PostgreSQL: 18.4
Storage: NVMe (ZFS)
Re-baselined 2026-06-15. Earlier revisions of this doc were written for the now-decommissioned Vultr box
chi-vultr-heatwave-db4(Ubuntu 22.04, ~32 GB RAM) and its memory-sizing math assumed ~32 GB. The live PostgreSQL 18 hosts are bare-metal Latitude.shf4.metal.mediumwith 192 GB RAM. Every numeric value below has been replaced with the value actually read from the Dallas primary (pg_settings) on 2026-06-15. Sections still tied to the old host (the apt/systemctl“Deployment” flow, the/etc/postgresql/...file path) are explicitly marked legacy.
Overview
Section titled “Overview”This document explains the reasoning behind each tuned setting, read live from the Dallas primary
(dal-latitude-heatwave-01) on 2026-06-15. PostgreSQL runs in the heatwave-postgres Kamal
accessory, so the tuning lives in postgresql.conf inside the data dir, not in a host conf.d/
include.
File location — container PGDATA is /var/lib/postgresql/data, bind-mounted from the host ZFS
path /data/prod-replica/data (config/deploy.yml: - /data/prod-replica/data:/var/lib/postgresql/data):
- inside the
heatwave-postgrescontainer:/var/lib/postgresql/data/postgresql.conf - on the Dallas host:
/data/prod-replica/data/postgresql.conf
The legacy host path /etc/postgresql/18/main/conf.d/custom.conf applied only to the decommissioned
Vultr box.
Connection Settings
Section titled “Connection Settings”| Setting | Value | Default | Reasoning |
|---|---|---|---|
listen_addresses | '*' | localhost | Accept connections from all interfaces (Tailscale, etc.) |
max_connections | 200 | 100 | Support Rails connection pool + Sidekiq workers |
idle_in_transaction_session_timeout | 5 min | disabled | Critical: Prevents abandoned transactions from blocking autovacuum and holding locks |
idle_session_timeout | 1 hour | disabled | Clean up completely idle connections |
statement_timeout | 10 min | disabled | Prevent runaway queries from consuming resources indefinitely |
tcp_keepalives_* | 60/10/6 | system | Detect dead connections faster (6 minutes vs system default ~2 hours) |
Why idle_in_transaction_session_timeout Matters
Section titled “Why idle_in_transaction_session_timeout Matters”Long-running transactions prevent autovacuum from cleaning up dead tuples. A single forgotten BEGIN without COMMIT can cause massive table bloat over time.
Memory Settings
Section titled “Memory Settings”Values below are the live settings read from the Dallas primary on 2026-06-15.
| Setting | Value | Default | Reasoning |
|---|---|---|---|
shared_buffers | 32 GB | 128 MB | ~17% of 192 GB - primary database cache. On a 192 GB box the marginal benefit past ~32 GB is small for this OLTP workload; the OS page cache (covered by effective_cache_size) carries the rest |
effective_cache_size | 120 GB | 4 GB | ~63% of RAM - tells the planner how much OS cache is realistically available for index/heap pages |
maintenance_work_mem | 4 GB | 64 MB | Speeds up VACUUM, CREATE INDEX, ALTER TABLE |
work_mem | 96 MB | 4 MB | Per-operation memory for sorts/hashes (careful: multiplied by connections × operations) |
autovacuum_work_mem | 1 GB | use maintenance | Dedicated memory for autovacuum - doesn’t compete with user queries |
huge_pages | try | try | Use huge pages if available (reduces TLB misses) |
hash_mem_multiplier | 2.0 | 2.0 | Hash operations can use 2x work_mem |
Staging runs tuned down on the shared Dallas box:
shared_buffers=8GB,effective_cache_size=24GB(seekamal/MANAGING.md). The values above are the production primary.
Memory Calculation
Section titled “Memory Calculation”Total RAM: 192 GB (187 GiB usable)shared_buffers: 32 GB (~17% of RAM)effective_cache_size: 120 GB (~63% of RAM — planner hint, not an allocation)Remaining for OS + per-connection work_mem + page cache headroom
Per-connection worst case: work_mem × active operations200 connections × 96 MB = ~18.75 GB (theoretical max, rarely hit;hash nodes can take up to hash_mem_multiplier × work_mem each)shared_buffers (32 GB) and effective_cache_size (120 GB) are an explicit allocation; the gap up
to 192 GB is left for the OS page cache plus the per-connection work_mem fan-out above. This is a
deliberate move away from the old ~32 GB-box “25% / 75% of RAM” split — at 192 GB a literal 25/75 split
(48 GB / 144 GB) would over-commit shared_buffers for an OLTP workload while leaving too little
headroom for page cache + connection memory, so the live box pins absolute sizes instead.
Query Planner (NVMe-Optimized)
Section titled “Query Planner (NVMe-Optimized)”Values below are the live settings read from the Dallas primary on 2026-06-15. The local NVMe (under
ZFS) makes the low-random_page_cost / high-effective_io_concurrency stance even more appropriate
than it was on the old SATA-SSD Vultr box.
| Setting | Value | Default | Reasoning |
|---|---|---|---|
random_page_cost | 1.1 | 4.0 | NVMe SSDs have nearly identical random vs sequential read cost |
effective_io_concurrency | 200 | 1 | NVMe can service many concurrent I/O requests |
default_statistics_target | 100 | 100 | Left at the default; raise per-column with ALTER TABLE … ALTER COLUMN … SET STATISTICS where a specific plan needs finer histograms |
jit | off | on | JIT compilation often hurts OLTP workloads - adds latency to simple queries |
Why Disable JIT?
Section titled “Why Disable JIT?”JIT (Just-In-Time compilation) compiles query expressions to native code. Benefits:
- Complex analytical queries with many expressions
- Queries running for seconds/minutes
Drawbacks:
- Compilation overhead (10-100ms) on every query
- Rails apps mostly run simple, fast queries
- Can cause 2-3x slowdown on typical CRUD operations
Recommendation: Keep jit = off for Rails/OLTP. Enable selectively per-session for analytics.
Parallelism
Section titled “Parallelism”Values below are the live settings read from the Dallas primary on 2026-06-15.
| Setting | Value | Default | Reasoning |
|---|---|---|---|
max_worker_processes | 16 | 8 | Total background workers available — sized to the 16-core / 32-thread EPYC 4564P |
max_parallel_workers | 16 | 8 | Workers available for parallel queries |
max_parallel_workers_per_gather | 4 | 2 | Max workers per parallel query |
max_parallel_maintenance_workers | 4 | 2 | Workers for CREATE INDEX, VACUUM |
min_parallel_table_scan_size | 4 MB | 8 MB | Use parallelism for smaller tables |
min_parallel_index_scan_size | 256 kB | 512 kB | Use parallelism for smaller indexes |
max_worker_processes / max_parallel_workers were lifted from 8 to 16 on the Latitude box
(32 hardware threads vs the old Vultr box’s lower core count), so multiple parallel queries can run
concurrently without starving the global worker pool. max_parallel_maintenance_workers,
min_parallel_table_scan_size, and min_parallel_index_scan_size were not re-read in the 2026-06-15
live snapshot — the values shown are the intended postgresql.conf settings; confirm against the
running server if a specific plan depends on them.
Parallelism Strategy
Section titled “Parallelism Strategy”With a large primary database and a separate ~100+ GB heatwave_versions audit DB, parallel queries
significantly speed up:
- Full table scans
- Large aggregations
- Index creation
- VACUUM operations
Write-Ahead Log (WAL)
Section titled “Write-Ahead Log (WAL)”Values below are the live settings read from the Dallas primary on 2026-06-15 (checkpoint_timeout
was not in that snapshot — see note).
| Setting | Value | Default | Reasoning |
|---|---|---|---|
wal_level | replica | replica | Required for streaming replication |
wal_log_hints | on | off | Required for pg_rewind (easy failover recovery) |
wal_buffers | 64 MB | ~4 MB | Larger buffer for write-heavy workloads |
wal_compression | lz4 | off | PG15+ feature - reduces WAL I/O by ~50% |
min_wal_size | 2 GB | 80 MB | Keep more WAL files to reduce recycling overhead |
max_wal_size | 8 GB | 1 GB | Allow more WAL before a forced checkpoint — fewer checkpoint-driven I/O spikes under write bursts |
checkpoint_timeout | 15 min | 5 min | Less frequent checkpoints = less I/O spikes (intended value; not re-read in the 2026-06-15 snapshot) |
checkpoint_completion_target | 0.9 | 0.9 | Spread checkpoint I/O over 90% of interval |
min_wal_size (2 GB) and max_wal_size (8 GB) were both raised vs the old box (1 GB / 4 GB), giving
the WAL more room before a checkpoint is forced — appropriate for the larger NVMe volume and the
write-heavy primary feeding the cross-DC standby.
WAL Compression Benefits
Section titled “WAL Compression Benefits”wal_compression = lz4 (PostgreSQL 15+):
- Compresses full-page writes in WAL
- Reduces WAL volume by 40-60%
- Faster replication to standby
- Less disk I/O
- Minimal CPU overhead (lz4 is very fast)
Autovacuum (Critical Section)
Section titled “Autovacuum (Critical Section)”Values below are the intended autovacuum settings in postgresql.conf. The 2026-06-15 live snapshot
read only the memory/planner/WAL/parallelism settings, so none of the autovacuum values in the
table below were re-verified that day — confirm any you depend on with SHOW <name>; on the live
box: autovacuum_max_workers, autovacuum_naptime, autovacuum_vacuum_cost_delay,
autovacuum_vacuum_cost_limit, autovacuum_vacuum_scale_factor, autovacuum_analyze_scale_factor,
vacuum_buffer_usage_limit.
| Setting | Value | Default | Reasoning |
|---|---|---|---|
autovacuum_max_workers | 5 | 3 | More workers for a large database with many tables |
autovacuum_naptime | 30s | 1 min | Check for vacuum-eligible tables more often |
autovacuum_vacuum_cost_delay | 0 | 2 ms | No throttling - let vacuum run at full speed |
autovacuum_vacuum_cost_limit | 2000 | 200 | 10x more work per cycle before sleeping |
autovacuum_vacuum_scale_factor | 0.05 | 0.2 | Vacuum when 5% dead tuples (not 20%) |
autovacuum_analyze_scale_factor | 0.025 | 0.1 | Analyze when 2.5% rows changed |
vacuum_buffer_usage_limit | 1 GB | 256 kB | Critical - 4000x more memory for vacuum! |
Why These Autovacuum Settings?
Section titled “Why These Autovacuum Settings?”Problem Discovered (historical — diagnosed on the old Vultr box, November 26, 2025): analysis revealed severe table bloat. The aggressive autovacuum settings below were the fix and remain in force on the Latitude box:
| Table | Dead Tuples | Live Tuples | Bloat % | Ever Vacuumed? |
|---|---|---|---|---|
| subscribers | 332,361 | 6,706 | 4,956% | Never |
| line_items | 490,518 | 127,216 | 386% | Never |
| communications | 277,040 | 173,090 | 160% | Never |
Root Causes:
vacuum_buffer_usage_limit = 256kB(default) - vacuum could only use 256 KB of memory!autovacuum_vacuum_cost_delay = 2ms- vacuum was heavily throttledautovacuum_vacuum_scale_factor = 0.2- large tables needed 20% dead tuples to trigger- Only 3 autovacuum workers for many tables
Solution: Aggressive autovacuum settings that:
- Remove all throttling (
cost_delay = 0) - Provide adequate memory (
vacuum_buffer_usage_limit = 1GB) - Trigger earlier (
scale_factor = 0.05) - Run more workers (
max_workers = 5)
Logging
Section titled “Logging”| Setting | Value | Default | Reasoning |
|---|---|---|---|
log_min_duration_statement | 1000 | disabled | Log queries taking > 1 second |
log_checkpoints | on | on | Track checkpoint frequency and duration |
log_lock_waits | on | off | Log when queries wait for locks > deadlock_timeout |
log_temp_files | 10 MB | disabled | Alert when queries spill to disk |
log_autovacuum_min_duration | 0 | 10 min | Log ALL autovacuum activity (critical for monitoring) |
log_connections | on | off | Track connection patterns |
log_disconnections | on | off | Track disconnection patterns |
Monitoring Autovacuum
Section titled “Monitoring Autovacuum”With log_autovacuum_min_duration = 0, every autovacuum run is logged:
LOG: automatic vacuum of table "heatwave.public.subscribers": ... tuples: 6706 removed, 6706 remain, 0 are dead but not yet removable buffer usage: 1234 hits, 567 misses, 89 dirtied avg read rate: 12.345 MB/s, avg write rate: 1.234 MB/sStatistics & Monitoring
Section titled “Statistics & Monitoring”| Setting | Value | Default | Reasoning |
|---|---|---|---|
track_io_timing | on | off | Track I/O time in EXPLAIN ANALYZE |
track_wal_io_timing | on | off | Track WAL write timing |
track_functions | pl | none | Track PL/pgSQL function execution |
track_activity_query_size | 4096 | 1024 | Capture longer queries in pg_stat_activity |
compute_query_id | on | auto | Required for pg_stat_statements query grouping |
Extensions
Section titled “Extensions”Required Extensions
Section titled “Required Extensions”The following extensions must be installed for full functionality:
-- Core monitoring (loaded via shared_preload_libraries)CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- Hypothetical indexes for AI-powered index tuning (used by postgres-mcp)CREATE EXTENSION IF NOT EXISTS hypopg;
-- Table/index reorganization without locksCREATE EXTENSION IF NOT EXISTS pg_repack;| Extension | Purpose | Requires Restart? |
|---|---|---|
pg_stat_statements | Query performance tracking | Yes (shared_preload) |
auto_explain | Automatic query plan logging | Yes (shared_preload) |
hypopg | Test hypothetical indexes without creating them | No |
pg_repack | Online table/index defragmentation | No |
Installing Extensions
Section titled “Installing Extensions”On the live Latitude hosts PostgreSQL runs in the heatwave-postgres Kamal accessory, and the image
already ships hypopg and pg_repack alongside the in-core pg_stat_statements / auto_explain, so
there is nothing to apt install on the host — just CREATE EXTENSION as shown above.
The host-level package install below was the legacy Vultr (Ubuntu 22.04) method, kept for reference:
# pg_stat_statements and auto_explain are included with PostgreSQL# hypopg and pg_repack needed separate packages on the bare-host install:
apt updateapt install postgresql-18-hypopg postgresql-18-repackpg_stat_statements
Section titled “pg_stat_statements”Tracks query statistics for performance analysis:
SELECT query, calls, mean_exec_time, total_exec_timeFROM pg_stat_statementsORDER BY total_exec_time DESCLIMIT 20;| Setting | Value | Reasoning |
|---|---|---|
pg_stat_statements.max | 10000 | Track more unique queries |
pg_stat_statements.track | all | Track all statements including nested |
pg_stat_statements.track_utility | on | Track DDL, VACUUM, etc. |
pg_stat_statements.track_planning | on | Include planning time |
auto_explain
Section titled “auto_explain”Automatically logs execution plans for slow queries:
| Setting | Value | Reasoning |
|---|---|---|
auto_explain.log_min_duration | 1000 | Log plans for queries > 1 second |
auto_explain.log_format | json | Machine-parseable format |
auto_explain.log_analyze | on | Include actual row counts |
auto_explain.log_buffers | on | Include buffer usage |
auto_explain.log_timing | off | Reduces overhead (timing per-node is expensive) |
auto_explain.log_settings | on | Show non-default settings affecting query |
Scheduled Maintenance (Cron Jobs)
Section titled “Scheduled Maintenance (Cron Jobs)”In addition to autovacuum, scheduled maintenance jobs ensure comprehensive database health.
Crontab Configuration
Section titled “Crontab Configuration”Location: /var/spool/cron/crontabs/root or crontab -e as root
# ─────────────────────────────────────────────────────────────────────────────# heatwave database (main application - 65 GB)# ─────────────────────────────────────────────────────────────────────────────0 2 * * * su -p -s /bin/false -c "vacuumdb -e -v -z heatwave" postgres 2>&1 | mail -s "vacuum report: heatwave" dba@warmlyyours.com0 8 * * 6 su -p -s /bin/false -c "pg_repack -d heatwave -j 4" postgres 2>&1 | mail -s "pg_repack report: heatwave" dba@warmlyyours.com
# ─────────────────────────────────────────────────────────────────────────────# heatwave_versions database (audit trail - 162 GB)# ─────────────────────────────────────────────────────────────────────────────0 4 * * * su -p -s /bin/false -c "vacuumdb -e -v -z heatwave_versions" postgres 2>&1 | mail -s "vacuum report: heatwave_versions" dba@warmlyyours.com0 8 * * 0 su -p -s /bin/false -c "pg_repack -d heatwave_versions -j 4" postgres 2>&1 | mail -s "pg_repack report: heatwave_versions" dba@warmlyyours.comContainerized deployment (systemd timers)
Section titled “Containerized deployment (systemd timers)”On the Kamal box PostgreSQL runs in the heatwave-postgres accessory, so maintenance runs
inside the container (the image ships vacuumdb + pg_repack) driven by systemd timers
on the host — cron isn’t installed (systemd-first box), so this replaces the root crontab above.
Same schedule, same dba@ reports via the host postfix→SendGrid relay.
Runs on the Dallas primary only.
vacuumdb/pg_repackare write operations and cannot run on the Chicago read-only standby (it just replicates the primary’s cleaned-up state). The timers are provisioned on each box but do real work only on whichever node is currently primary.
- Script:
/usr/local/sbin/pg-maintenance.sh {vacuum-main|vacuum-versions|repack-main|repack-versions}→docker exec -u postgres heatwave-postgres vacuumdb|pg_repack … 2>&1 | mail -s "… [<host>]" dba@warmlyyours.com - Units:
pg-maintenance@.service(templated) +pg-maintenance-{vacuum-main,vacuum-versions,repack-main,repack-versions}.timer. - Schedule is in UTC (the box’s system TZ): vacuum 02:00 / 04:00 daily, pg_repack Sat / Sun 08:00.
- Inspect:
systemctl list-timers 'pg-maintenance-*'· run on demand:systemctl start pg-maintenance@repack-main.service.
Schedule Overview
Section titled “Schedule Overview”| Time | Daily | Saturday | Sunday |
|---|---|---|---|
| 2:00 AM | vacuum heatwave | vacuum heatwave | vacuum heatwave |
| 4:00 AM | vacuum heatwave_versions | vacuum heatwave_versions | vacuum heatwave_versions |
| 8:00 AM | — | pg_repack heatwave | pg_repack heatwave_versions |
Job Details
Section titled “Job Details”| Database | Job | Schedule | Tool | Purpose |
|---|---|---|---|---|
| heatwave | Daily vacuum | 2am daily | vacuumdb -e -v -z | Remove dead tuples, update statistics |
| heatwave | Weekly repack | Saturday 8am | pg_repack -j 4 | Reclaim disk space, defragment tables |
| heatwave_versions | Daily vacuum | 4am daily | vacuumdb -e -v -z | Remove dead tuples, update statistics |
| heatwave_versions | Weekly repack | Sunday 8am | pg_repack -j 4 | Reclaim disk space, defragment tables |
Command Options
Section titled “Command Options”vacuumdb options:
-e/--echo: Show commands being sent to server-v/--verbose: Detailed progress output-z/--analyze: Update statistics after vacuum
pg_repack options:
-a/--all: Repack all databases-j 4/--jobs=4: Use 4 parallel workers
Why Both Autovacuum AND Scheduled Jobs?
Section titled “Why Both Autovacuum AND Scheduled Jobs?”| Aspect | Autovacuum | Scheduled VACUUM | pg_repack |
|---|---|---|---|
| Trigger | Dead tuple threshold | Time-based | Time-based |
| Coverage | Only tables needing it | All tables | All tables |
| Disk space | Marks space reusable | Marks space reusable | Returns to OS |
| Locks table | No | No | No |
| Best for | Continuous maintenance | Catch-all, statistics | Heavy bloat |
Autovacuum handles day-to-day cleanup but may fall behind during high-write periods.
Scheduled VACUUM ensures every table gets analyzed for query planner statistics, even if autovacuum didn’t trigger.
pg_repack is essential because regular VACUUM only marks space as reusable within PostgreSQL—it doesn’t shrink files. Over time, tables can have significant “internal fragmentation.” pg_repack rebuilds tables online, returning space to the OS.
Monitoring
Section titled “Monitoring”Check vacuum logs (live Latitude box — PG runs in the heatwave-postgres container):
# View recent vacuum activity from the container's PG logdocker logs heatwave-postgres 2>&1 | grep -i vacuum | tail -50
# Check mail reports (host postfix → SendGrid relay)cat /var/mail/root | grep -A 100 "vacuum report"Check if the scheduled maintenance jobs are active (systemd timers, not cron):
systemctl list-timers 'pg-maintenance-*'Troubleshooting
Section titled “Troubleshooting”Jobs not running?
- Live box: check the systemd timers —
systemctl list-timers 'pg-maintenance-*'andsystemctl status pg-maintenance@vacuum-main.service - Check mail delivery:
cat /var/mail/root - (Legacy Vultr/cron host only) ensure jobs are NOT commented out and the cron daemon is up:
crontab -l | grep -E "(vacuum|repack)"·systemctl status cron
pg_repack fails?
- Ensure extension is installed:
CREATE EXTENSION IF NOT EXISTS pg_repack; - Check for exclusive locks: long transactions can block repack
- Verify disk space: repack needs ~2x table size temporarily
Deployment
Section titled “Deployment”Apply Changes
Section titled “Apply Changes”Superseded 2026-06: prod moved to Kamal/Latitude (containerized PG18) — tuning now lives in the postgres accessory’s
postgresql.confinside the data dir (/data/prod-replica/dataon Dallas), applied with a barekamal accessory reboot postgres(config/deploy.yml IS production). The apt/systemctlflow below is the legacy Vultr method (hostchi-vultr-heatwave-db4is decommissioned), kept for reference only. The live values documented in the sections above (re-read from the Dallas primary on 2026-06-15) supersede anything implied by this legacy flow.
# Legacy Vultr method (decommissioned host):scp custom.conf root@chi-vultr-heatwave-db4:/etc/postgresql/18/main/conf.d/
# Most settings - reload onlysudo systemctl reload postgresql@18-main
# Settings requiring restart:# - max_worker_processes# - autovacuum_max_workers# - wal_buffers# - shared_bufferssudo systemctl restart postgresql@18-mainVerify Settings
Section titled “Verify Settings”On the live box, run psql inside the accessory:
docker exec -u postgres heatwave-postgres psql -c "<query>".
-- Check current settingsSELECT name, setting, unit, sourceFROM pg_settingsWHERE source = 'configuration file'ORDER BY name;
-- Check autovacuum is catching upSELECT relname, last_autovacuum, autovacuum_count, n_dead_tupFROM pg_stat_user_tablesWHERE n_dead_tup > 10000ORDER BY n_dead_tup DESC;References
Section titled “References”- PostgreSQL 18 Documentation
- PGTune - Configuration calculator
- PostgreSQL Wiki - Tuning Your PostgreSQL Server
- Autovacuum Tuning Basics