mdcms/neuraldb-docs/pages/nql-aggregations.md

2 KiB

title sort section-id keywords description language
Aggregations 130 query-language aggregations, GROUP BY, COUNT, SUM, vectors, AVG, centroid, analytics Aggregating data in NQL including GROUP BY, COUNT, SUM, and vector-specific aggregation functions en

Aggregations

NQL supports the full SQL aggregation toolkit, extended with vector-specific aggregate functions.

Standard Aggregations

SELECT category, COUNT(*) AS doc_count
FROM documents
GROUP BY category
ORDER BY doc_count DESC;

SELECT category, AVG(price), MIN(price), MAX(price)
FROM products
WHERE available = true
GROUP BY category;

Vector Aggregations

AVG(embedding) — Centroid

SELECT AVG(embedding) AS centroid
FROM documents
WHERE category = 'technology';

Find documents closest to the centroid:

WITH centroid AS (
  SELECT AVG(embedding) AS c FROM documents WHERE category = 'technology'
)
SELECT id, title, 1 - (embedding <=> centroid.c) AS similarity
FROM documents, centroid
WHERE category = 'technology'
ORDER BY embedding <=> centroid.c
LIMIT 10;

vector_centroid(embedding, weight)

SELECT vector_centroid(embedding, rating) AS weighted_centroid
FROM products WHERE category = 'electronics';
SELECT DISTINCT ON (category)
  id, category, title, 1 - (embedding <=> $1) AS similarity
FROM documents
ORDER BY category, embedding <=> $1;

Window Functions

SELECT id, title, category,
  1 - (embedding <=> $1) AS similarity,
  RANK() OVER (PARTITION BY category ORDER BY embedding <=> $1) AS rank_in_category
FROM documents
WHERE 1 - (embedding <=> $1) > 0.5
ORDER BY category, rank_in_category;

Time-Series Semantic Analytics

WITH weekly_centroids AS (
  SELECT date_trunc('week', created_at) AS week, AVG(embedding) AS centroid
  FROM documents GROUP BY week
)
SELECT w1.week, 1 - (w1.centroid <=> w2.centroid) AS similarity_to_prev_week
FROM weekly_centroids w1
LEFT JOIN weekly_centroids w2 ON w2.week = w1.week - INTERVAL '1 week'
ORDER BY w1.week;