mdcms/sample-sites/neuraldb-docs/pages/ops-backup.md
2026-05-18 14:30:49 +07:00

5.8 KiB

title sort section-id keywords description language
Backup & Restore 110 operations backup, restore, snapshot, WAL archiving, PITR, point-in-time recovery Backup and restore strategies for NeuralDB — snapshots, WAL archiving, and point-in-time recovery en

Backup & Restore

A comprehensive backup strategy for NeuralDB combines base snapshots with continuous WAL archiving, enabling point-in-time recovery (PITR) to any moment within your retention window.

Backup Strategies

Strategy Recovery point objective Recovery time Storage
Snapshot only Time of last snapshot Fast Medium
WAL archiving only Continuous (any point) Slow High
Snapshot + WAL Best of both Fast High

Recommendation: Use snapshot + WAL archiving in production. Take daily base snapshots and archive WAL continuously.

Physical Snapshot (pg_basebackup)

pg_basebackup creates a consistent physical copy of the data directory:

# Full backup — local filesystem
pg_basebackup \
  --host=localhost \
  --port=5432 \
  --username=backup_user \
  --pgdata=/backups/neuraldb/$(date +%Y%m%d) \
  --wal-method=stream \
  --checkpoint=fast \
  --compress=lz4 \
  --progress \
  --verbose

# Full backup — tar format (smaller, easier to upload to S3)
pg_basebackup \
  --host=localhost \
  --pgdata=- \
  --format=tar \
  --wal-method=stream \
  --compress=lz4 \
  | aws s3 cp - s3://my-backups/neuraldb/base-$(date +%Y%m%d).tar.lz4

Create a dedicated backup user:

CREATE USER backup_user WITH REPLICATION PASSWORD 'backup-password';
GRANT CONNECT ON DATABASE neuraldb TO backup_user;

WAL Archiving

WAL archiving copies each WAL segment to a secure location as it is completed. Combined with a base snapshot, this enables PITR.

Enable WAL archiving:

# neuraldb.conf
wal_level = replica
archive_mode = on
archive_command = 'aws s3 cp %p s3://my-backups/neuraldb/wal/%f'
archive_timeout = 60   # archive at least every 60 seconds even if no WAL activity

Verify archiving is working:

SELECT last_archived_wal, last_archived_time,
       last_failed_wal, last_failed_time,
       archived_count, failed_count
FROM pg_stat_archiver;

S3 Archive Command

#!/bin/bash
# /usr/local/bin/neuraldb-archive.sh
# Usage: %p = source file path, %f = file name

set -e
SOURCE="$1"
DEST_FILE="$2"
S3_BUCKET="${ARCHIVE_S3_BUCKET}"
S3_PREFIX="${ARCHIVE_S3_PREFIX:-neuraldb/wal/}"

aws s3 cp "$SOURCE" "s3://${S3_BUCKET}/${S3_PREFIX}${DEST_FILE}" \
  --storage-class STANDARD_IA \
  --sse aws:kms
archive_command = '/usr/local/bin/neuraldb-archive.sh %p %f'

Automated Backups with pgBackRest

pgBackRest is the recommended tool for production NeuralDB backups:

# Install
sudo apt install pgbackrest

# Configure
sudo tee /etc/pgbackrest/pgbackrest.conf <<'EOF'
[global]
repo1-path=/var/lib/pgbackrest
repo1-retention-full=7
repo1-retention-diff=14
repo1-type=s3
repo1-s3-bucket=my-neuraldb-backups
repo1-s3-endpoint=s3.amazonaws.com
repo1-s3-region=us-east-1
compress-type=lz4
start-fast=y
backup-standby=y

[neuraldb]
pg1-path=/var/lib/neuraldb/data
pg1-port=5432
pg1-user=backup_user
EOF

# Initialise
sudo -u postgres pgbackrest --stanza=neuraldb stanza-create

# Full backup
sudo -u postgres pgbackrest --stanza=neuraldb backup --type=full

# Differential backup (only changes since last full)
sudo -u postgres pgbackrest --stanza=neuraldb backup --type=diff

# Incremental (only changes since last backup of any type)
sudo -u postgres pgbackrest --stanza=neuraldb backup --type=incr

Schedule backups with cron:

# /etc/cron.d/neuraldb-backup
0 1 * * 0  postgres pgbackrest --stanza=neuraldb backup --type=full
0 1 * * 1-6 postgres pgbackrest --stanza=neuraldb backup --type=diff

Point-in-Time Recovery (PITR)

To restore to a specific point in time:

# Stop NeuralDB
systemctl stop neuraldb

# Restore a base backup
pgbackrest --stanza=neuraldb restore \
  --target="2026-05-15 14:30:00+00" \
  --target-action=promote \
  --delta

# Or restore to just before a specific transaction
pgbackrest --stanza=neuraldb restore \
  --target-name="before_accidental_delete" \
  --target-action=promote

# Start NeuralDB — it will replay WAL up to the target point
systemctl start neuraldb

Create named restore points before risky operations:

-- Before running a migration
SELECT pg_create_restore_point('before_migration_20260515');

Logical Backup (pg_dump)

For smaller databases or table-level backups, pg_dump provides a logical backup:

# Dump entire database
pg_dump -h localhost -U neuraldb mydb | \
  lz4 | \
  aws s3 cp - s3://my-backups/neuraldb/logical-$(date +%Y%m%d).sql.lz4

# Dump specific table
pg_dump -h localhost -U neuraldb -t documents mydb > documents-backup.sql

# Dump in custom format (best compression, selective restore)
pg_dump -Fc -h localhost -U neuraldb mydb > mydb-$(date +%Y%m%d).dump

Note: Logical backups do not include vector index data — only the raw vector column values. After restore, recreate indexes manually.

Restoring from pg_dump

# Restore entire database
lz4 -d backup.sql.lz4 | psql -h localhost -U neuraldb -d mydb_restore

# Restore custom format
pg_restore -h localhost -U neuraldb -d mydb_restore --jobs=8 mydb.dump

# Restore a single table
pg_restore -h localhost -U neuraldb -d mydb -t documents mydb.dump

Testing Backups

Never trust backups you haven't tested. Automate monthly restore tests:

#!/bin/bash
# Test backup restore in a separate environment
pgbackrest --stanza=neuraldb restore --pg1-path=/tmp/restore-test --delta
pg_ctl -D /tmp/restore-test start
psql -h /tmp/restore-test -c "SELECT COUNT(*) FROM documents;" neuraldb
pg_ctl -D /tmp/restore-test stop
rm -rf /tmp/restore-test
echo "Restore test passed: $(date)"