mdcms/sample-sites/neuraldb-docs/pages/architecture.md
2026-05-18 14:30:49 +07:00

7.4 KiB
Raw Permalink Blame History

title sort section-id keywords description language
Architecture 120 overview architecture, storage engine, query planner, replication, WAL, HNSW NeuralDB internal architecture — storage engine, query planner, and replication en

Architecture

NeuralDB Architecture

NeuralDB is built on a custom storage engine that co-locates relational and vector data, with a query planner that understands both SQL predicates and vector similarity operations natively.

High-Level Architecture

Client (psql / SDK / REST)
         │
         ▼
┌─────────────────────────────────────────┐
│            Connection Layer             │
│  (PostgreSQL Wire Protocol compatible)  │
└───────────────────┬─────────────────────┘
                    │
         ┌──────────▼──────────┐
         │    Query Parser     │
         │  (SQL + NQL ext.)   │
         └──────────┬──────────┘
                    │
         ┌──────────▼──────────┐
         │   Semantic Planner  │◄── Statistics + Index Metadata
         │ (hybrid cost model) │
         └──────────┬──────────┘
                    │
         ┌──────────▼──────────┐
         │   Execution Engine  │
         │  ┌────────────────┐ │
         │  │ SQL Executor   │ │
         │  └────────────────┘ │
         │  ┌────────────────┐ │
         │  │ ANN Executor   │ │
         │  └────────────────┘ │
         └──────────┬──────────┘
                    │
         ┌──────────▼──────────┐
         │   Storage Engine    │
         │  ┌────────────────┐ │
         │  │ Row Store      │ │◄── SST Files (columnar)
         │  └────────────────┘ │
         │  ┌────────────────┐ │
         │  │ Vector Store   │ │◄── HNSW Graph Files
         │  └────────────────┘ │
         │  ┌────────────────┐ │
         │  │ WAL            │ │◄── Write-Ahead Log
         │  └────────────────┘ │
         └─────────────────────┘

Storage Engine

Row Store

NeuralDB's row store uses a Log-Structured Merge-tree (LSM) architecture inspired by RocksDB. Data is written to an in-memory write buffer (MemTable), which is periodically flushed to sorted string tables (SSTables) on disk. Background compaction merges SSTables and reclaims space.

Key properties:

  • Write-optimised: writes are sequential, not random — excellent NVMe utilisation
  • Columnar format: SSTables store data column-by-column for fast analytical scans
  • Compression: LZ4 by default, Zstd for archival storage — typically 36× compression ratio

Vector Store

Vectors are stored separately from rows in a Vector Store. The Vector Store maintains:

  1. Raw vector data — the float32 arrays, stored in compressed pages
  2. HNSW graph — the in-memory navigation graph for ANN search

The HNSW graph is loaded into memory on startup and kept warm. Memory required ≈ num_vectors × dimensions × 4 bytes × 1.3 (1.3× overhead for the graph structure).

For a 10M-row table with 1536-dimensional embeddings: 10M × 1536 × 4 × 1.3 ≈ 80 GB. Plan memory accordingly.

Write-Ahead Log (WAL)

All writes (row and vector) are first written to the WAL before being applied to the storage engine. The WAL provides:

  • Durability: committed transactions survive crashes
  • Replication: replicas apply the WAL stream from the primary
  • Point-in-time recovery (PITR): archive the WAL to recover to any point in time

WAL segments are 128 MB by default and are archived to the configured storage backend (local disk, S3, GCS) upon rotation.

Query Planner

The Semantic Planner extends a PostgreSQL-compatible query planner with understanding of vector operations.

Hybrid Cost Model

For hybrid queries (vector + relational), the planner considers two physical plans:

Plan A: Pre-filter

Filter(price < 100) → ANN(embedding, k=10)

Cost: selectivity × full_scan_cost + ANN_cost(filtered_set)

Plan B: Post-filter

ANN(embedding, k=10×estimated_filter_ratio) → Filter(price < 100)

Cost: ANN_cost(full_index) + filter_cost

The planner uses column statistics (histogram, null fraction, distinct values) and vector index parameters to estimate costs. It picks the plan with the lower estimated cost.

Index Types

NeuralDB supports the following index types:

Index Data Purpose
B-tree Scalar columns Equality, range queries
Hash Scalar columns Equality only (faster than B-tree)
GIN JSON, arrays Containment queries
HNSW VECTOR columns Approximate nearest neighbour
IVF-Flat VECTOR columns High-recall exact-ish search
BRIN Timestamp columns Range scans on append-only data

Replication

NeuralDB uses streaming replication. The primary continuously ships WAL segments to replicas, which apply them in order.

Synchronous vs Asynchronous Replication

-- Set replication mode per-transaction
SET synchronous_commit = 'on';    -- wait for WAL to reach all sync replicas (safest)
SET synchronous_commit = 'local'; -- wait for local WAL flush only (faster)
SET synchronous_commit = 'off';   -- don't wait (fastest, small durability window)

Read Replicas

Replicas accept SELECT queries. Direct read-heavy workloads to replicas:

primary:   write queries + critical reads
replica-1: analytical queries, reporting
replica-2: search API traffic

The client SDK supports automatic read/write splitting:

client = NeuralDB(
    primary="primary.example.com:5432",
    replicas=["replica1.example.com:5432", "replica2.example.com:5432"],
    read_from="replicas",
    replica_selection="round-robin",
)

Memory Architecture

NeuralDB divides available memory into three pools:

Pool Purpose Default
shared_buffers Row store page cache 25% of RAM
vector_buffer HNSW graph warm cache 40% of RAM
work_mem Per-query sort and hash buffers 64 MB

Tune these in neuraldb.conf:

shared_buffers = 8GB
vector_buffer = 16GB
work_mem = 128MB

Consistency Model

NeuralDB provides strong consistency for primary reads and eventual consistency for replica reads (with a configurable replication lag threshold).

Reads on the primary always see the latest committed data. Reads on replicas may lag behind the primary by the max_replication_lag setting (default: 1 second). To force a replica to wait until it is caught up:

SELECT pg_wait_for_replica_replay('0/1234ABCD');