mdcms/sample-sites/neuraldb-docs/pages/ops-scaling.md
2026-05-18 14:30:49 +07:00

5.6 KiB
Raw Permalink Blame History

title sort section-id keywords description language
Scaling 120 operations scaling, sharding, read replicas, horizontal scaling, capacity planning, performance Scaling NeuralDB horizontally with sharding, read replicas, and capacity planning en

Scaling

NeuralDB is designed to scale horizontally. This page covers adding read replicas for query throughput, sharding for data volume, and capacity planning to avoid resource exhaustion.

Vertical Scaling (Scale Up)

Before adding nodes, ensure you have maximised single-node performance:

Memory

The biggest lever for NeuralDB performance is memory. Ensure:

  • vector_buffer is large enough to hold all active HNSW graphs
  • shared_buffers is set to 25% of RAM
  • work_mem is appropriate for your query patterns
-- Check if vectors are being served from disk (slow) vs memory (fast)
SELECT index_name, hnsw_graph_size_bytes, hnsw_in_memory
FROM neuraldb_stat_vector_indexes
ORDER BY hnsw_graph_size_bytes DESC;

If hnsw_in_memory = false, increase vector_buffer.

CPU

Vector ANN searches are CPU-bound. Enable parallel query:

max_parallel_workers_per_gather = 8
max_parallel_workers = 16
-- Allow parallel ANN queries for large tables
SET max_parallel_workers_per_gather = 8;
SELECT * FROM large_table ORDER BY embedding <=> $1 LIMIT 10;

Storage I/O

Use NVMe SSDs with high IOPS. Configure the OS:

# Increase read-ahead for sequential I/O
sudo blockdev --setra 1024 /dev/nvme0n1

# Use deadline/mq-deadline I/O scheduler
echo "mq-deadline" | sudo tee /sys/block/nvme0n1/queue/scheduler

# Disable transparent huge pages (reduces latency variability)
echo never | sudo tee /sys/kernel/mm/transparent_hugepage/enabled

Read Replicas

Add read replicas to distribute query load.

Setting Up Read Replicas

Follow the Replication guide to add replicas. Each replica can independently serve SELECT queries, including vector similarity searches.

Client-Side Read Splitting

Configure your application to route reads to replicas:

Python:

from neuraldb import NeuralDB

primary = NeuralDB("postgresql://neuraldb:pass@primary:5432/mydb")
replica = NeuralDB("postgresql://neuraldb:pass@replica:5432/mydb")

def search(query_vector):
    # Read goes to replica
    return replica.query("SELECT * FROM docs ORDER BY embedding <=> %s LIMIT 10", [query_vector])

def insert(content, embedding):
    # Write goes to primary
    return primary.execute("INSERT INTO docs (content, embedding) VALUES (%s, %s)", [content, embedding])

Connection string with target_session_attrs:

postgresql://neuraldb:pass@primary:5432,replica:5432/mydb?target_session_attrs=prefer-standby

Read Replica Scaling Targets

Replicas Approximate peak QPS (1536-dim, 10M vectors)
1 primary 8,000
1 primary + 2 replicas 24,000
1 primary + 4 replicas 48,000
1 primary + 8 replicas 96,000

Horizontal Sharding

For datasets exceeding single-node capacity (>50M vectors or >5 TB), shard across multiple primary nodes.

Shard Configuration

-- Create a sharded cluster (requires NeuralDB Cluster Edition)
SELECT neuraldb_cluster.init_cluster(
  shards => 8,
  replication_factor => 2
);

-- Create a sharded table
CREATE TABLE documents (
  id UUID NOT NULL DEFAULT gen_random_uuid(),
  tenant_id UUID NOT NULL,
  content TEXT,
  embedding VECTOR(1536)
) SHARD BY tenant_id;

-- Each shard holds ~1/8 of the data
-- All rows with the same tenant_id are colocated on the same shard

Cross-Shard Queries

Cross-shard queries (where the filter doesn't align with the shard key) are automatically parallelised across shards:

-- This query executes on all 8 shards in parallel
SELECT id, content, 1 - (embedding <=> $1) AS similarity
FROM documents
ORDER BY embedding <=> $1
LIMIT 10;
-- Results are merged and re-ranked by the coordinator

Performance with 8 shards: near-linear scaling. An 8-shard cluster serves ~8× the QPS of a single node for cross-shard searches, with ~20% overhead for coordination.

Shard Rebalancing

When adding new shard nodes, rebalance data:

-- Rebalance shards (online, non-blocking)
SELECT neuraldb_cluster.rebalance_shards();

-- Monitor progress
SELECT * FROM neuraldb_cluster.rebalance_status;

Capacity Planning

Storage Capacity

Estimate required storage:

Row data ≈ avg_row_size_bytes × num_rows × 1.3 (index overhead)
Vector data ≈ dimensions × 4 bytes × num_vectors
HNSW graph ≈ dimensions × 4 bytes × num_vectors × 1.3
WAL ≈ daily_write_volume × wal_retention_days

Total ≈ row_data + vector_data + HNSW_graph + WAL + 20% buffer

Example: 100M rows, 1536 dimensions, 500 bytes average row size:

  • Row data: 500B × 100M × 1.3 ≈ 65 GB
  • Vector data: 1536 × 4B × 100M ≈ 614 GB
  • HNSW graph: 614 GB × 1.3 ≈ 800 GB (must fit in vector_buffer)
  • WAL (7 days): 10 GB/day × 7 = 70 GB
  • Total: ~1.6 TB storage, 800 GB RAM for HNSW

Connection Capacity

max_connections = max_app_connections + pgbouncer_pool_size + replication_slots + 3 (superuser)

For 500 app connections through PgBouncer with pool size 20:

max_connections = 20 + 10 (replicas) + 3 = 33

PgBouncer multiplexes 500 app connections → 20 database connections.

Alert Thresholds

Resource Warning Critical
Connections 80% of max 95% of max
Storage 70% full 85% full
vector_buffer utilisation 80% 90%
Replication lag 30s 120s
Query p99 latency 500ms 2000ms