mirror of
https://github.com/kbenestad/mdcms.git
synced 2026-06-18 15:24:32 +00:00
201 lines
5.6 KiB
Markdown
201 lines
5.6 KiB
Markdown
---
|
||
title: Scaling
|
||
sort: 120
|
||
section-id: operations
|
||
keywords: scaling, sharding, read replicas, horizontal scaling, capacity planning, performance
|
||
description: Scaling NeuralDB horizontally with sharding, read replicas, and capacity planning
|
||
language: en
|
||
---
|
||
|
||
# Scaling
|
||
|
||
NeuralDB is designed to scale horizontally. This page covers adding read replicas for query throughput, sharding for data volume, and capacity planning to avoid resource exhaustion.
|
||
|
||
## Vertical Scaling (Scale Up)
|
||
|
||
Before adding nodes, ensure you have maximised single-node performance:
|
||
|
||
### Memory
|
||
|
||
The biggest lever for NeuralDB performance is memory. Ensure:
|
||
- `vector_buffer` is large enough to hold all active HNSW graphs
|
||
- `shared_buffers` is set to 25% of RAM
|
||
- `work_mem` is appropriate for your query patterns
|
||
|
||
```sql
|
||
-- Check if vectors are being served from disk (slow) vs memory (fast)
|
||
SELECT index_name, hnsw_graph_size_bytes, hnsw_in_memory
|
||
FROM neuraldb_stat_vector_indexes
|
||
ORDER BY hnsw_graph_size_bytes DESC;
|
||
```
|
||
|
||
If `hnsw_in_memory = false`, increase `vector_buffer`.
|
||
|
||
### CPU
|
||
|
||
Vector ANN searches are CPU-bound. Enable parallel query:
|
||
|
||
```ini
|
||
max_parallel_workers_per_gather = 8
|
||
max_parallel_workers = 16
|
||
```
|
||
|
||
```sql
|
||
-- Allow parallel ANN queries for large tables
|
||
SET max_parallel_workers_per_gather = 8;
|
||
SELECT * FROM large_table ORDER BY embedding <=> $1 LIMIT 10;
|
||
```
|
||
|
||
### Storage I/O
|
||
|
||
Use NVMe SSDs with high IOPS. Configure the OS:
|
||
|
||
```bash
|
||
# Increase read-ahead for sequential I/O
|
||
sudo blockdev --setra 1024 /dev/nvme0n1
|
||
|
||
# Use deadline/mq-deadline I/O scheduler
|
||
echo "mq-deadline" | sudo tee /sys/block/nvme0n1/queue/scheduler
|
||
|
||
# Disable transparent huge pages (reduces latency variability)
|
||
echo never | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
|
||
```
|
||
|
||
## Read Replicas
|
||
|
||
Add read replicas to distribute query load.
|
||
|
||
### Setting Up Read Replicas
|
||
|
||
Follow the [Replication guide](config-replication.md) to add replicas. Each replica can independently serve `SELECT` queries, including vector similarity searches.
|
||
|
||
### Client-Side Read Splitting
|
||
|
||
Configure your application to route reads to replicas:
|
||
|
||
**Python:**
|
||
```python
|
||
from neuraldb import NeuralDB
|
||
|
||
primary = NeuralDB("postgresql://neuraldb:pass@primary:5432/mydb")
|
||
replica = NeuralDB("postgresql://neuraldb:pass@replica:5432/mydb")
|
||
|
||
def search(query_vector):
|
||
# Read goes to replica
|
||
return replica.query("SELECT * FROM docs ORDER BY embedding <=> %s LIMIT 10", [query_vector])
|
||
|
||
def insert(content, embedding):
|
||
# Write goes to primary
|
||
return primary.execute("INSERT INTO docs (content, embedding) VALUES (%s, %s)", [content, embedding])
|
||
```
|
||
|
||
**Connection string with `target_session_attrs`:**
|
||
```
|
||
postgresql://neuraldb:pass@primary:5432,replica:5432/mydb?target_session_attrs=prefer-standby
|
||
```
|
||
|
||
### Read Replica Scaling Targets
|
||
|
||
| Replicas | Approximate peak QPS (1536-dim, 10M vectors) |
|
||
|---------|----------------------------------------------|
|
||
| 1 primary | 8,000 |
|
||
| 1 primary + 2 replicas | 24,000 |
|
||
| 1 primary + 4 replicas | 48,000 |
|
||
| 1 primary + 8 replicas | 96,000 |
|
||
|
||
## Horizontal Sharding
|
||
|
||
For datasets exceeding single-node capacity (>50M vectors or >5 TB), shard across multiple primary nodes.
|
||
|
||
### Shard Configuration
|
||
|
||
```sql
|
||
-- Create a sharded cluster (requires NeuralDB Cluster Edition)
|
||
SELECT neuraldb_cluster.init_cluster(
|
||
shards => 8,
|
||
replication_factor => 2
|
||
);
|
||
|
||
-- Create a sharded table
|
||
CREATE TABLE documents (
|
||
id UUID NOT NULL DEFAULT gen_random_uuid(),
|
||
tenant_id UUID NOT NULL,
|
||
content TEXT,
|
||
embedding VECTOR(1536)
|
||
) SHARD BY tenant_id;
|
||
|
||
-- Each shard holds ~1/8 of the data
|
||
-- All rows with the same tenant_id are colocated on the same shard
|
||
```
|
||
|
||
### Cross-Shard Queries
|
||
|
||
Cross-shard queries (where the filter doesn't align with the shard key) are automatically parallelised across shards:
|
||
|
||
```sql
|
||
-- This query executes on all 8 shards in parallel
|
||
SELECT id, content, 1 - (embedding <=> $1) AS similarity
|
||
FROM documents
|
||
ORDER BY embedding <=> $1
|
||
LIMIT 10;
|
||
-- Results are merged and re-ranked by the coordinator
|
||
```
|
||
|
||
Performance with 8 shards: near-linear scaling. An 8-shard cluster serves ~8× the QPS of a single node for cross-shard searches, with ~20% overhead for coordination.
|
||
|
||
### Shard Rebalancing
|
||
|
||
When adding new shard nodes, rebalance data:
|
||
|
||
```sql
|
||
-- Rebalance shards (online, non-blocking)
|
||
SELECT neuraldb_cluster.rebalance_shards();
|
||
|
||
-- Monitor progress
|
||
SELECT * FROM neuraldb_cluster.rebalance_status;
|
||
```
|
||
|
||
## Capacity Planning
|
||
|
||
### Storage Capacity
|
||
|
||
Estimate required storage:
|
||
|
||
```
|
||
Row data ≈ avg_row_size_bytes × num_rows × 1.3 (index overhead)
|
||
Vector data ≈ dimensions × 4 bytes × num_vectors
|
||
HNSW graph ≈ dimensions × 4 bytes × num_vectors × 1.3
|
||
WAL ≈ daily_write_volume × wal_retention_days
|
||
|
||
Total ≈ row_data + vector_data + HNSW_graph + WAL + 20% buffer
|
||
```
|
||
|
||
Example: 100M rows, 1536 dimensions, 500 bytes average row size:
|
||
- Row data: 500B × 100M × 1.3 ≈ **65 GB**
|
||
- Vector data: 1536 × 4B × 100M ≈ **614 GB**
|
||
- HNSW graph: 614 GB × 1.3 ≈ **800 GB** (must fit in `vector_buffer`)
|
||
- WAL (7 days): 10 GB/day × 7 = **70 GB**
|
||
- **Total: ~1.6 TB storage, 800 GB RAM for HNSW**
|
||
|
||
### Connection Capacity
|
||
|
||
```
|
||
max_connections = max_app_connections + pgbouncer_pool_size + replication_slots + 3 (superuser)
|
||
```
|
||
|
||
For 500 app connections through PgBouncer with pool size 20:
|
||
```
|
||
max_connections = 20 + 10 (replicas) + 3 = 33
|
||
```
|
||
|
||
PgBouncer multiplexes 500 app connections → 20 database connections.
|
||
|
||
### Alert Thresholds
|
||
|
||
| Resource | Warning | Critical |
|
||
|---------|---------|---------|
|
||
| Connections | 80% of max | 95% of max |
|
||
| Storage | 70% full | 85% full |
|
||
| vector_buffer utilisation | 80% | 90% |
|
||
| Replication lag | 30s | 120s |
|
||
| Query p99 latency | 500ms | 2000ms |
|