mdcms/neuraldb-docs/pages/ops-scaling.md

1.6 KiB
Raw Blame History

title sort section-id keywords description language
Scaling 120 operations scaling, sharding, read replicas, horizontal scaling, capacity planning, performance Scaling NeuralDB horizontally with sharding, read replicas, and capacity planning en

Scaling

Read Replicas

primary = NeuralDB("postgresql://neuraldb:pass@primary:5432/mydb")
replica = NeuralDB("postgresql://neuraldb:pass@replica:5432/mydb")

def search(query_vector):
    return replica.query("SELECT * FROM docs ORDER BY embedding <=> %s LIMIT 10", [query_vector])

def insert(content, embedding):
    return primary.execute("INSERT INTO docs (content, embedding) VALUES (%s, %s)", [content, embedding])
Replicas Approx peak QPS (1536-dim, 10M vectors)
1 primary 8,000
1 primary + 2 replicas 24,000
1 primary + 4 replicas 48,000

Horizontal Sharding

SELECT neuraldb_cluster.init_cluster(shards => 8, replication_factor => 2);

CREATE TABLE documents (
  id UUID NOT NULL DEFAULT gen_random_uuid(),
  tenant_id UUID NOT NULL,
  content TEXT,
  embedding VECTOR(1536)
) SHARD BY tenant_id;

Capacity Planning

Row data    ≈ avg_row_bytes × num_rows × 1.3
Vector data ≈ dimensions × 4 bytes × num_vectors
HNSW graph  ≈ vector_data × 1.3  (must fit in vector_buffer)
WAL         ≈ daily_writes × retention_days
Resource Warning Critical
Connections 80% of max 95% of max
Storage 70% full 85% full
vector_buffer 80% 90%
Replication lag 30s 120s