Backup & Disaster Recovery
Don't lose your customers' data — pg_dump, WAL, snapshots, and cross-region.
Postgres
- ★
pg_dump/pg_dumpall— built-in; logical backups; the boring default. Cron + S3 / R2 + lifecycle rules. - ★ WAL-G — continuous WAL archiving; PITR (point-in-time recovery) to within seconds. The default for production.
- pgBackRest — alternative WAL archiver; very feature-rich; great for big DBs.
- Barman — older WAL-archive solution.
- CloudNativePG — Kubernetes operator with backups built in.
Hosted Postgres backups
- ★ Neon — instant branches double as backups; PITR built in.
- ★ Supabase — daily backups on free, PITR on paid.
- AWS RDS / Aurora — automated backups + snapshots.
- GCP Cloud SQL, Azure DB for Postgres — same.
- Crunchy Bridge — first-class backup story.
- Railway / Render / Fly Postgres — varied; check the specific plan.
SQLite at scale
- ★ Litestream — continuous WAL replication of SQLite to S3 / R2; PITR. Default for self-host SQLite.
- LiteFS (Fly) — distributed SQLite + replication; backups via Litestream pattern.
- Turso — multi-region replicas double as backups; branching.
- Cloudflare D1 — Time Travel feature for PITR.
MySQL / MariaDB
mysqldump/mariabackup— built-in.- Percona XtraBackup — fast hot backups.
- PlanetScale (Vitess) — branches + restores.
Object storage / files
- ★ R2 / S3 lifecycle rules — versioning + lifecycle = a free time-machine. Default.
- AWS Backup — cross-service AWS backup orchestration.
- rclone — sync any cloud to any cloud; great for moving / verifying.
s3cmd/aws s3 sync— simple scripts.
Filesystems / VPS
- ★ restic — encrypted, deduplicated backups to any object storage; the open-source default.
- borg — also encrypted + deduplicated; older; restic's predecessor in spirit.
- Kopia — modern restic competitor; nicer UI.
- rsync — file mirror; not a real backup but useful for staging.
- ZFS / btrfs snapshots — filesystem-level point-in-time.
Application / cross-store
- Supabase backups script, Convex export, Firebase Firestore export — provider-specific.
- Cloudflare KV / D1 / R2 — Workers-based scheduled exports to R2.
pgcopy/pg_dump | aws s3 cp— DIY cron jobs.
Verify your backups (this is the hard part)
- ★ Restore drill — once a quarter, restore your latest backup to a fresh DB. Test the app against it. A backup you've never restored is a hope, not a backup.
- Checksums — store hashes of source data; verify after restore.
- Restore time objective (RTO) — measure and document. "How long until we're back up?"
- Recovery point objective (RPO) — "how much data are we OK losing?" Drives backup frequency.
Cross-region / availability
- Postgres logical replication — read replicas in another region; failover with a coordinator.
- Multi-region object storage — R2's multi-region or S3 cross-region replication.
- DNS failover — Route 53 / NS1 / Cloudflare load balancer health checks.
Encryption / compliance
- At rest — most cloud DBs encrypt by default; verify.
- In transit — TLS to/from backup storage.
- Customer-managed keys (CMK) — for SOC 2 / HIPAA / FedRAMP customers.
- Retention policies — document and enforce. GDPR may require both retention and deletion in different cases.
Patterns to know
- 3-2-1 rule — 3 copies, on 2 different media, with 1 off-site. Still applies.
- Application-consistent backups — quiesce the app or use DB-native consistent snapshots; raw filesystem snapshots can corrupt under load.
- Don't trust a single provider — your hosted DB has backups, but cross-cloud or cross-account copies survive provider-side disasters.
- Test restore latency before you need it — restoring 500GB takes hours.
Pick this if…
- Default Postgres production: WAL-G + nightly
pg_dumpto R2 with lifecycle rules. - Already on Neon / Supabase: lean on built-in PITR; export to your own R2 weekly as a paranoia layer.
- SQLite production: Litestream → R2.
- VPS / files: restic → R2 / B2.
- Full disaster scenario planning: document RTO + RPO; quarterly restore drills; off-cloud copy.