| title |
sort |
section-id |
keywords |
description |
language |
| Scaling |
120 |
operations |
scaling, sharding, read replicas, horizontal scaling, capacity planning, performance |
Scaling NeuralDB horizontally with sharding, read replicas, and capacity planning |
en |
Scaling
Read Replicas
primary = NeuralDB("postgresql://neuraldb:pass@primary:5432/mydb")
replica = NeuralDB("postgresql://neuraldb:pass@replica:5432/mydb")
def search(query_vector):
return replica.query("SELECT * FROM docs ORDER BY embedding <=> %s LIMIT 10", [query_vector])
def insert(content, embedding):
return primary.execute("INSERT INTO docs (content, embedding) VALUES (%s, %s)", [content, embedding])
| Replicas |
Approx peak QPS (1536-dim, 10M vectors) |
| 1 primary |
8,000 |
| 1 primary + 2 replicas |
24,000 |
| 1 primary + 4 replicas |
48,000 |
Horizontal Sharding
SELECT neuraldb_cluster.init_cluster(shards => 8, replication_factor => 2);
CREATE TABLE documents (
id UUID NOT NULL DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
content TEXT,
embedding VECTOR(1536)
) SHARD BY tenant_id;
Capacity Planning
Row data ≈ avg_row_bytes × num_rows × 1.3
Vector data ≈ dimensions × 4 bytes × num_vectors
HNSW graph ≈ vector_data × 1.3 (must fit in vector_buffer)
WAL ≈ daily_writes × retention_days
| Resource |
Warning |
Critical |
| Connections |
80% of max |
95% of max |
| Storage |
70% full |
85% full |
| vector_buffer |
80% |
90% |
| Replication lag |
30s |
120s |