mirror of
https://github.com/kbenestad/mdcms.git
synced 2026-06-18 15:24:32 +00:00
239 lines
5.4 KiB
Markdown
239 lines
5.4 KiB
Markdown
---
|
|
title: NQL Basics
|
|
sort: 100
|
|
section-id: query-language
|
|
keywords: NQL, NeuralDB Query Language, SQL, syntax, basics, queries
|
|
description: Introduction to NeuralDB Query Language (NQL) — syntax, data types, and basic operations
|
|
language: en
|
|
---
|
|
|
|
# NQL Basics
|
|
|
|
NQL (NeuralDB Query Language) is a superset of standard SQL. Every valid SQL statement is also valid NQL. NQL adds extensions for vector operations, embedding generation, and semantic search primitives.
|
|
|
|
If you know SQL, you already know most of NQL. This page covers the NQL-specific additions and the data types introduced for AI workloads.
|
|
|
|
## Connecting
|
|
|
|
NeuralDB speaks the PostgreSQL wire protocol. Connect with any PostgreSQL client:
|
|
|
|
```bash
|
|
# psql
|
|
psql -h localhost -p 5432 -U neuraldb -d mydb
|
|
|
|
# neuraldb-cli (enhanced interactive shell)
|
|
neuraldb-cli -h localhost
|
|
```
|
|
|
|
Connection string format:
|
|
|
|
```
|
|
postgresql://[user[:password]@][host][:port][/dbname][?param=value...]
|
|
```
|
|
|
|
## Data Types
|
|
|
|
### VECTOR(n)
|
|
|
|
The core NQL extension. Stores a fixed-length array of 32-bit floats representing a vector embedding:
|
|
|
|
```sql
|
|
-- Declare a vector column with 1536 dimensions (OpenAI ada-002 output)
|
|
CREATE TABLE documents (
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
content TEXT NOT NULL,
|
|
embedding VECTOR(1536)
|
|
);
|
|
```
|
|
|
|
Insert a vector by providing a bracketed float array:
|
|
|
|
```sql
|
|
INSERT INTO documents (content, embedding)
|
|
VALUES ('Hello, world', '[0.1, 0.2, 0.3, ...]'); -- 1536 values
|
|
```
|
|
|
|
### HALFVEC(n)
|
|
|
|
A 16-bit float variant of VECTOR. Half the memory, slight precision loss. Useful when vector_buffer is a constraint:
|
|
|
|
```sql
|
|
embedding HALFVEC(1536)
|
|
```
|
|
|
|
### SPARSEVEC(n)
|
|
|
|
Sparse vector representation — stores only non-zero elements. Efficient for high-dimensional but sparse vectors (e.g., BM25 term-frequency vectors):
|
|
|
|
```sql
|
|
bm25_vector SPARSEVEC(30000)
|
|
```
|
|
|
|
## Basic CRUD
|
|
|
|
### Creating Tables
|
|
|
|
```sql
|
|
CREATE TABLE products (
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
name TEXT NOT NULL,
|
|
description TEXT,
|
|
category TEXT,
|
|
price DECIMAL(10, 2),
|
|
stock INTEGER DEFAULT 0,
|
|
embedding VECTOR(1536),
|
|
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
|
|
);
|
|
```
|
|
|
|
### Inserting Data
|
|
|
|
```sql
|
|
INSERT INTO products (name, description, category, price, stock, embedding)
|
|
VALUES (
|
|
'Wireless Headphones',
|
|
'Premium noise-cancelling wireless headphones with 30-hour battery',
|
|
'electronics',
|
|
299.99,
|
|
150,
|
|
'[0.023, -0.187, 0.412, ...]' -- 1536 floats from your embedding model
|
|
);
|
|
```
|
|
|
|
### Reading Data
|
|
|
|
Standard SQL SELECT works as expected:
|
|
|
|
```sql
|
|
SELECT id, name, price FROM products WHERE category = 'electronics';
|
|
SELECT * FROM products WHERE price BETWEEN 50 AND 300 AND stock > 0;
|
|
```
|
|
|
|
### Updating Data
|
|
|
|
```sql
|
|
UPDATE products SET price = 279.99, embedding = '[...]' WHERE id = $1;
|
|
```
|
|
|
|
When updating the `embedding` column, the HNSW index is updated atomically.
|
|
|
|
### Deleting Data
|
|
|
|
```sql
|
|
DELETE FROM products WHERE id = $1;
|
|
```
|
|
|
|
## Creating Vector Indexes
|
|
|
|
Without an index, vector similarity queries perform exact linear scans (O(n)). Create an HNSW index for sub-linear performance:
|
|
|
|
```sql
|
|
-- Cosine similarity (most common for text embeddings)
|
|
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);
|
|
|
|
-- Euclidean distance
|
|
CREATE INDEX ON documents USING hnsw (embedding vector_l2_ops);
|
|
|
|
-- Dot product (for recommendation systems)
|
|
CREATE INDEX ON documents USING hnsw (embedding vector_ip_ops);
|
|
```
|
|
|
|
Build an index on an existing large table in parallel:
|
|
|
|
```sql
|
|
SET max_parallel_maintenance_workers = 8;
|
|
CREATE INDEX CONCURRENTLY ON documents USING hnsw (embedding vector_cosine_ops);
|
|
```
|
|
|
|
## Basic Vector Queries
|
|
|
|
### Find Similar Documents
|
|
|
|
```sql
|
|
SELECT id, content, 1 - (embedding <=> $1) AS similarity
|
|
FROM documents
|
|
ORDER BY embedding <=> $1
|
|
LIMIT 10;
|
|
```
|
|
|
|
The `<=>` operator is cosine distance (lower = more similar). Subtract from 1 for a similarity score (higher = more similar).
|
|
|
|
### Distance Operators
|
|
|
|
| Operator | Distance metric | Index ops |
|
|
|----------|----------------|-----------|
|
|
| `<=>` | Cosine distance | `vector_cosine_ops` |
|
|
| `<->` | Euclidean (L2) distance | `vector_l2_ops` |
|
|
| `<#>` | Negative dot product | `vector_ip_ops` |
|
|
|
|
### Distance Threshold
|
|
|
|
```sql
|
|
-- Only return results with cosine similarity > 0.8
|
|
SELECT id, content, 1 - (embedding <=> $1) AS similarity
|
|
FROM documents
|
|
WHERE 1 - (embedding <=> $1) > 0.8
|
|
ORDER BY embedding <=> $1
|
|
LIMIT 20;
|
|
```
|
|
|
|
**Note:** The `WHERE 1 - (embedding <=> $1) > 0.8` condition is evaluated after the ANN search, not before. Use `LIMIT` generously enough to capture all relevant results before the threshold filter.
|
|
|
|
## NQL Functions
|
|
|
|
### `to_vector(text)`
|
|
|
|
Convert a string literal to a VECTOR:
|
|
|
|
```sql
|
|
SELECT to_vector('[0.1, 0.2, 0.3]')::VECTOR(3);
|
|
```
|
|
|
|
### `vector_dims(v)`
|
|
|
|
Return the number of dimensions:
|
|
|
|
```sql
|
|
SELECT vector_dims(embedding) FROM documents LIMIT 1;
|
|
-- Returns: 1536
|
|
```
|
|
|
|
### `vector_norm(v)`
|
|
|
|
Return the L2 norm of a vector:
|
|
|
|
```sql
|
|
SELECT vector_norm(embedding) FROM documents LIMIT 5;
|
|
```
|
|
|
|
### `cosine_similarity(a, b)`, `l2_distance(a, b)`, `dot_product(a, b)`
|
|
|
|
Named function alternatives to the operators:
|
|
|
|
```sql
|
|
SELECT cosine_similarity(embedding, $1) AS similarity
|
|
FROM documents
|
|
ORDER BY similarity DESC
|
|
LIMIT 10;
|
|
```
|
|
|
|
## Transactions
|
|
|
|
NQL supports full ACID transactions:
|
|
|
|
```sql
|
|
BEGIN;
|
|
|
|
INSERT INTO documents (content, embedding) VALUES ($1, $2);
|
|
UPDATE document_stats SET total_count = total_count + 1;
|
|
|
|
COMMIT;
|
|
```
|
|
|
|
On error, roll back:
|
|
|
|
```sql
|
|
BEGIN;
|
|
-- ... operations ...
|
|
ROLLBACK;
|
|
```
|