mdcms/neuraldb-docs/pages/ops-scaling.md

---
title: Scaling
sort: 120
section-id: operations
keywords: scaling, sharding, read replicas, horizontal scaling, capacity planning, performance
description: Scaling NeuralDB horizontally with sharding, read replicas, and capacity planning
language: en
---

# Scaling

## Read Replicas

```python
primary = NeuralDB("postgresql://neuraldb:pass@primary:5432/mydb")
replica = NeuralDB("postgresql://neuraldb:pass@replica:5432/mydb")

def search(query_vector):
    return replica.query("SELECT * FROM docs ORDER BY embedding <=> %s LIMIT 10", [query_vector])

def insert(content, embedding):
    return primary.execute("INSERT INTO docs (content, embedding) VALUES (%s, %s)", [content, embedding])
```

| Replicas | Approx peak QPS (1536-dim, 10M vectors) |
|---------|-----------------------------------------|
| 1 primary | 8,000 |
| 1 primary + 2 replicas | 24,000 |
| 1 primary + 4 replicas | 48,000 |

## Horizontal Sharding

```sql
SELECT neuraldb_cluster.init_cluster(shards => 8, replication_factor => 2);

CREATE TABLE documents (
  id UUID NOT NULL DEFAULT gen_random_uuid(),
  tenant_id UUID NOT NULL,
  content TEXT,
  embedding VECTOR(1536)
) SHARD BY tenant_id;
```

## Capacity Planning

```
Row data    ≈ avg_row_bytes × num_rows × 1.3
Vector data ≈ dimensions × 4 bytes × num_vectors
HNSW graph  ≈ vector_data × 1.3  (must fit in vector_buffer)
WAL         ≈ daily_writes × retention_days
```

| Resource | Warning | Critical |
|---------|---------|----------|
| Connections | 80% of max | 95% of max |
| Storage | 70% full | 85% full |
| vector_buffer | 80% | 90% |
| Replication lag | 30s | 120s |