--- title: Scaling sort: 120 section-id: operations keywords: scaling, sharding, read replicas, horizontal scaling, capacity planning, performance description: Scaling NeuralDB horizontally with sharding, read replicas, and capacity planning language: en --- # Scaling ## Read Replicas ```python primary = NeuralDB("postgresql://neuraldb:pass@primary:5432/mydb") replica = NeuralDB("postgresql://neuraldb:pass@replica:5432/mydb") def search(query_vector): return replica.query("SELECT * FROM docs ORDER BY embedding <=> %s LIMIT 10", [query_vector]) def insert(content, embedding): return primary.execute("INSERT INTO docs (content, embedding) VALUES (%s, %s)", [content, embedding]) ``` | Replicas | Approx peak QPS (1536-dim, 10M vectors) | |---------|-----------------------------------------| | 1 primary | 8,000 | | 1 primary + 2 replicas | 24,000 | | 1 primary + 4 replicas | 48,000 | ## Horizontal Sharding ```sql SELECT neuraldb_cluster.init_cluster(shards => 8, replication_factor => 2); CREATE TABLE documents ( id UUID NOT NULL DEFAULT gen_random_uuid(), tenant_id UUID NOT NULL, content TEXT, embedding VECTOR(1536) ) SHARD BY tenant_id; ``` ## Capacity Planning ``` Row data ≈ avg_row_bytes × num_rows × 1.3 Vector data ≈ dimensions × 4 bytes × num_vectors HNSW graph ≈ vector_data × 1.3 (must fit in vector_buffer) WAL ≈ daily_writes × retention_days ``` | Resource | Warning | Critical | |---------|---------|----------| | Connections | 80% of max | 95% of max | | Storage | 70% full | 85% full | | vector_buffer | 80% | 90% | | Replication lag | 30s | 120s |