mdcms/sample-sites/neuraldb-docs/pages/ops-scaling.md
2026-05-18 14:30:49 +07:00

201 lines
5.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: Scaling
sort: 120
section-id: operations
keywords: scaling, sharding, read replicas, horizontal scaling, capacity planning, performance
description: Scaling NeuralDB horizontally with sharding, read replicas, and capacity planning
language: en
---
# Scaling
NeuralDB is designed to scale horizontally. This page covers adding read replicas for query throughput, sharding for data volume, and capacity planning to avoid resource exhaustion.
## Vertical Scaling (Scale Up)
Before adding nodes, ensure you have maximised single-node performance:
### Memory
The biggest lever for NeuralDB performance is memory. Ensure:
- `vector_buffer` is large enough to hold all active HNSW graphs
- `shared_buffers` is set to 25% of RAM
- `work_mem` is appropriate for your query patterns
```sql
-- Check if vectors are being served from disk (slow) vs memory (fast)
SELECT index_name, hnsw_graph_size_bytes, hnsw_in_memory
FROM neuraldb_stat_vector_indexes
ORDER BY hnsw_graph_size_bytes DESC;
```
If `hnsw_in_memory = false`, increase `vector_buffer`.
### CPU
Vector ANN searches are CPU-bound. Enable parallel query:
```ini
max_parallel_workers_per_gather = 8
max_parallel_workers = 16
```
```sql
-- Allow parallel ANN queries for large tables
SET max_parallel_workers_per_gather = 8;
SELECT * FROM large_table ORDER BY embedding <=> $1 LIMIT 10;
```
### Storage I/O
Use NVMe SSDs with high IOPS. Configure the OS:
```bash
# Increase read-ahead for sequential I/O
sudo blockdev --setra 1024 /dev/nvme0n1
# Use deadline/mq-deadline I/O scheduler
echo "mq-deadline" | sudo tee /sys/block/nvme0n1/queue/scheduler
# Disable transparent huge pages (reduces latency variability)
echo never | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
```
## Read Replicas
Add read replicas to distribute query load.
### Setting Up Read Replicas
Follow the [Replication guide](config-replication.md) to add replicas. Each replica can independently serve `SELECT` queries, including vector similarity searches.
### Client-Side Read Splitting
Configure your application to route reads to replicas:
**Python:**
```python
from neuraldb import NeuralDB
primary = NeuralDB("postgresql://neuraldb:pass@primary:5432/mydb")
replica = NeuralDB("postgresql://neuraldb:pass@replica:5432/mydb")
def search(query_vector):
# Read goes to replica
return replica.query("SELECT * FROM docs ORDER BY embedding <=> %s LIMIT 10", [query_vector])
def insert(content, embedding):
# Write goes to primary
return primary.execute("INSERT INTO docs (content, embedding) VALUES (%s, %s)", [content, embedding])
```
**Connection string with `target_session_attrs`:**
```
postgresql://neuraldb:pass@primary:5432,replica:5432/mydb?target_session_attrs=prefer-standby
```
### Read Replica Scaling Targets
| Replicas | Approximate peak QPS (1536-dim, 10M vectors) |
|---------|----------------------------------------------|
| 1 primary | 8,000 |
| 1 primary + 2 replicas | 24,000 |
| 1 primary + 4 replicas | 48,000 |
| 1 primary + 8 replicas | 96,000 |
## Horizontal Sharding
For datasets exceeding single-node capacity (>50M vectors or >5 TB), shard across multiple primary nodes.
### Shard Configuration
```sql
-- Create a sharded cluster (requires NeuralDB Cluster Edition)
SELECT neuraldb_cluster.init_cluster(
shards => 8,
replication_factor => 2
);
-- Create a sharded table
CREATE TABLE documents (
id UUID NOT NULL DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
content TEXT,
embedding VECTOR(1536)
) SHARD BY tenant_id;
-- Each shard holds ~1/8 of the data
-- All rows with the same tenant_id are colocated on the same shard
```
### Cross-Shard Queries
Cross-shard queries (where the filter doesn't align with the shard key) are automatically parallelised across shards:
```sql
-- This query executes on all 8 shards in parallel
SELECT id, content, 1 - (embedding <=> $1) AS similarity
FROM documents
ORDER BY embedding <=> $1
LIMIT 10;
-- Results are merged and re-ranked by the coordinator
```
Performance with 8 shards: near-linear scaling. An 8-shard cluster serves ~8× the QPS of a single node for cross-shard searches, with ~20% overhead for coordination.
### Shard Rebalancing
When adding new shard nodes, rebalance data:
```sql
-- Rebalance shards (online, non-blocking)
SELECT neuraldb_cluster.rebalance_shards();
-- Monitor progress
SELECT * FROM neuraldb_cluster.rebalance_status;
```
## Capacity Planning
### Storage Capacity
Estimate required storage:
```
Row data ≈ avg_row_size_bytes × num_rows × 1.3 (index overhead)
Vector data ≈ dimensions × 4 bytes × num_vectors
HNSW graph ≈ dimensions × 4 bytes × num_vectors × 1.3
WAL ≈ daily_write_volume × wal_retention_days
Total ≈ row_data + vector_data + HNSW_graph + WAL + 20% buffer
```
Example: 100M rows, 1536 dimensions, 500 bytes average row size:
- Row data: 500B × 100M × 1.3 ≈ **65 GB**
- Vector data: 1536 × 4B × 100M ≈ **614 GB**
- HNSW graph: 614 GB × 1.3 ≈ **800 GB** (must fit in `vector_buffer`)
- WAL (7 days): 10 GB/day × 7 = **70 GB**
- **Total: ~1.6 TB storage, 800 GB RAM for HNSW**
### Connection Capacity
```
max_connections = max_app_connections + pgbouncer_pool_size + replication_slots + 3 (superuser)
```
For 500 app connections through PgBouncer with pool size 20:
```
max_connections = 20 + 10 (replicas) + 3 = 33
```
PgBouncer multiplexes 500 app connections → 20 database connections.
### Alert Thresholds
| Resource | Warning | Critical |
|---------|---------|---------|
| Connections | 80% of max | 95% of max |
| Storage | 70% full | 85% full |
| vector_buffer utilisation | 80% | 90% |
| Replication lag | 30s | 120s |
| Query p99 latency | 500ms | 2000ms |