mdcms/sample-sites/neuraldb-docs/pages/ops-backup.md
2026-05-18 14:30:49 +07:00

227 lines
5.8 KiB
Markdown

---
title: Backup & Restore
sort: 110
section-id: operations
keywords: backup, restore, snapshot, WAL archiving, PITR, point-in-time recovery
description: Backup and restore strategies for NeuralDB — snapshots, WAL archiving, and point-in-time recovery
language: en
---
# Backup & Restore
A comprehensive backup strategy for NeuralDB combines base snapshots with continuous WAL archiving, enabling point-in-time recovery (PITR) to any moment within your retention window.
## Backup Strategies
| Strategy | Recovery point objective | Recovery time | Storage |
|----------|--------------------------|---------------|---------|
| Snapshot only | Time of last snapshot | Fast | Medium |
| WAL archiving only | Continuous (any point) | Slow | High |
| Snapshot + WAL | Best of both | Fast | High |
**Recommendation:** Use snapshot + WAL archiving in production. Take daily base snapshots and archive WAL continuously.
## Physical Snapshot (pg_basebackup)
`pg_basebackup` creates a consistent physical copy of the data directory:
```bash
# Full backup — local filesystem
pg_basebackup \
--host=localhost \
--port=5432 \
--username=backup_user \
--pgdata=/backups/neuraldb/$(date +%Y%m%d) \
--wal-method=stream \
--checkpoint=fast \
--compress=lz4 \
--progress \
--verbose
# Full backup — tar format (smaller, easier to upload to S3)
pg_basebackup \
--host=localhost \
--pgdata=- \
--format=tar \
--wal-method=stream \
--compress=lz4 \
| aws s3 cp - s3://my-backups/neuraldb/base-$(date +%Y%m%d).tar.lz4
```
Create a dedicated backup user:
```sql
CREATE USER backup_user WITH REPLICATION PASSWORD 'backup-password';
GRANT CONNECT ON DATABASE neuraldb TO backup_user;
```
## WAL Archiving
WAL archiving copies each WAL segment to a secure location as it is completed. Combined with a base snapshot, this enables PITR.
Enable WAL archiving:
```ini
# neuraldb.conf
wal_level = replica
archive_mode = on
archive_command = 'aws s3 cp %p s3://my-backups/neuraldb/wal/%f'
archive_timeout = 60 # archive at least every 60 seconds even if no WAL activity
```
Verify archiving is working:
```sql
SELECT last_archived_wal, last_archived_time,
last_failed_wal, last_failed_time,
archived_count, failed_count
FROM pg_stat_archiver;
```
### S3 Archive Command
```bash
#!/bin/bash
# /usr/local/bin/neuraldb-archive.sh
# Usage: %p = source file path, %f = file name
set -e
SOURCE="$1"
DEST_FILE="$2"
S3_BUCKET="${ARCHIVE_S3_BUCKET}"
S3_PREFIX="${ARCHIVE_S3_PREFIX:-neuraldb/wal/}"
aws s3 cp "$SOURCE" "s3://${S3_BUCKET}/${S3_PREFIX}${DEST_FILE}" \
--storage-class STANDARD_IA \
--sse aws:kms
```
```ini
archive_command = '/usr/local/bin/neuraldb-archive.sh %p %f'
```
## Automated Backups with pgBackRest
pgBackRest is the recommended tool for production NeuralDB backups:
```bash
# Install
sudo apt install pgbackrest
# Configure
sudo tee /etc/pgbackrest/pgbackrest.conf <<'EOF'
[global]
repo1-path=/var/lib/pgbackrest
repo1-retention-full=7
repo1-retention-diff=14
repo1-type=s3
repo1-s3-bucket=my-neuraldb-backups
repo1-s3-endpoint=s3.amazonaws.com
repo1-s3-region=us-east-1
compress-type=lz4
start-fast=y
backup-standby=y
[neuraldb]
pg1-path=/var/lib/neuraldb/data
pg1-port=5432
pg1-user=backup_user
EOF
# Initialise
sudo -u postgres pgbackrest --stanza=neuraldb stanza-create
# Full backup
sudo -u postgres pgbackrest --stanza=neuraldb backup --type=full
# Differential backup (only changes since last full)
sudo -u postgres pgbackrest --stanza=neuraldb backup --type=diff
# Incremental (only changes since last backup of any type)
sudo -u postgres pgbackrest --stanza=neuraldb backup --type=incr
```
Schedule backups with cron:
```cron
# /etc/cron.d/neuraldb-backup
0 1 * * 0 postgres pgbackrest --stanza=neuraldb backup --type=full
0 1 * * 1-6 postgres pgbackrest --stanza=neuraldb backup --type=diff
```
## Point-in-Time Recovery (PITR)
To restore to a specific point in time:
```bash
# Stop NeuralDB
systemctl stop neuraldb
# Restore a base backup
pgbackrest --stanza=neuraldb restore \
--target="2026-05-15 14:30:00+00" \
--target-action=promote \
--delta
# Or restore to just before a specific transaction
pgbackrest --stanza=neuraldb restore \
--target-name="before_accidental_delete" \
--target-action=promote
# Start NeuralDB — it will replay WAL up to the target point
systemctl start neuraldb
```
Create named restore points before risky operations:
```sql
-- Before running a migration
SELECT pg_create_restore_point('before_migration_20260515');
```
## Logical Backup (pg_dump)
For smaller databases or table-level backups, `pg_dump` provides a logical backup:
```bash
# Dump entire database
pg_dump -h localhost -U neuraldb mydb | \
lz4 | \
aws s3 cp - s3://my-backups/neuraldb/logical-$(date +%Y%m%d).sql.lz4
# Dump specific table
pg_dump -h localhost -U neuraldb -t documents mydb > documents-backup.sql
# Dump in custom format (best compression, selective restore)
pg_dump -Fc -h localhost -U neuraldb mydb > mydb-$(date +%Y%m%d).dump
```
**Note:** Logical backups do not include vector index data — only the raw vector column values. After restore, recreate indexes manually.
## Restoring from pg_dump
```bash
# Restore entire database
lz4 -d backup.sql.lz4 | psql -h localhost -U neuraldb -d mydb_restore
# Restore custom format
pg_restore -h localhost -U neuraldb -d mydb_restore --jobs=8 mydb.dump
# Restore a single table
pg_restore -h localhost -U neuraldb -d mydb -t documents mydb.dump
```
## Testing Backups
Never trust backups you haven't tested. Automate monthly restore tests:
```bash
#!/bin/bash
# Test backup restore in a separate environment
pgbackrest --stanza=neuraldb restore --pg1-path=/tmp/restore-test --delta
pg_ctl -D /tmp/restore-test start
psql -h /tmp/restore-test -c "SELECT COUNT(*) FROM documents;" neuraldb
pg_ctl -D /tmp/restore-test stop
rm -rf /tmp/restore-test
echo "Restore test passed: $(date)"
```