mdcms/sample-sites/neuraldb-docs/pages/nql-aggregations.md

---
title: Aggregations
sort: 130
section-id: query-language
keywords: aggregations, GROUP BY, COUNT, SUM, vectors, AVG, centroid, analytics
description: Aggregating data in NQL including GROUP BY, COUNT, SUM, and vector-specific aggregation functions
language: en
---

# Aggregations

NQL supports the full SQL aggregation toolkit, extended with vector-specific aggregate functions for centroid computation, clustering, and semantic analytics.

## Standard Aggregations

All standard SQL aggregate functions work as expected:

```sql
-- Count documents by category
SELECT category, COUNT(*) AS doc_count
FROM documents
GROUP BY category
ORDER BY doc_count DESC;

-- Average price by category
SELECT category,
       COUNT(*) AS products,
       AVG(price) AS avg_price,
       MIN(price) AS min_price,
       MAX(price) AS max_price,
       SUM(stock * price) AS inventory_value
FROM products
WHERE available = true
GROUP BY category
ORDER BY inventory_value DESC;
```

## Vector Aggregations

### `AVG(embedding)` — Centroid Computation

Compute the centroid (average vector) of a group:

```sql
-- Centroid of all "technology" documents
SELECT AVG(embedding) AS centroid
FROM documents
WHERE category = 'technology';
```

Use centroids to find documents representative of a cluster:

```sql
WITH centroid AS (
  SELECT AVG(embedding) AS c FROM documents WHERE category = 'technology'
)
SELECT id, title, 1 - (embedding <=> centroid.c) AS similarity_to_centroid
FROM documents, centroid
WHERE category = 'technology'
ORDER BY embedding <=> centroid.c
LIMIT 10;
```

### `vector_centroid(embedding)` — Weighted Centroid

Compute a weighted centroid using a score column:

```sql
-- Weighted centroid by rating (higher-rated items pull more)
SELECT vector_centroid(embedding, rating) AS weighted_centroid
FROM products
WHERE category = 'electronics';
```

### `vector_agg_concat(embedding)` — Vector Array

Collect vectors into an array for downstream processing:

```sql
SELECT category, vector_agg_concat(embedding) AS all_embeddings
FROM documents
GROUP BY category;
```

## GROUP BY with Vector Search

Find the best document in each category for a given query:

```sql
SELECT DISTINCT ON (category)
  id, category, title, 1 - (embedding <=> $1) AS similarity
FROM documents
WHERE embedding IS NOT NULL
ORDER BY category, embedding <=> $1;
```

Or using a lateral join for more control:

```sql
SELECT cat.category, top_doc.id, top_doc.title, top_doc.similarity
FROM (SELECT DISTINCT category FROM documents) cat,
LATERAL (
  SELECT id, title, 1 - (embedding <=> $1) AS similarity
  FROM documents
  WHERE category = cat.category
  ORDER BY embedding <=> $1
  LIMIT 1
) top_doc;
```

## Window Functions

Use window functions to rank results within partitions:

```sql
-- Rank documents by similarity within each category
SELECT
  id, title, category,
  1 - (embedding <=> $1) AS similarity,
  RANK() OVER (
    PARTITION BY category
    ORDER BY embedding <=> $1
  ) AS rank_in_category
FROM documents
WHERE 1 - (embedding <=> $1) > 0.5
ORDER BY category, rank_in_category;
```

Rolling average similarity over time:

```sql
SELECT
  date_trunc('day', created_at) AS day,
  AVG(1 - (embedding <=> $1)) AS avg_daily_similarity,
  AVG(AVG(1 - (embedding <=> $1))) OVER (
    ORDER BY date_trunc('day', created_at)
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) AS rolling_7d_avg
FROM documents
GROUP BY day
ORDER BY day;
```

## Clustering with GROUP BY

Perform k-means style clustering by assigning documents to their nearest centroid:

```sql
-- Given pre-computed centroids in a centroids table:
SELECT d.id, d.content,
       c.cluster_id,
       (d.embedding <=> c.centroid) AS distance_to_centroid
FROM documents d
CROSS JOIN LATERAL (
  SELECT cluster_id, centroid
  FROM centroids
  ORDER BY d.embedding <=> centroid
  LIMIT 1
) c;
```

## HAVING with Vector Conditions

```sql
-- Categories where the average intra-category similarity is high (tight clusters)
SELECT category,
       COUNT(*) AS doc_count,
       1 - AVG(embedding <=> (SELECT AVG(e2.embedding) FROM documents e2 WHERE e2.category = e.category)) AS cohesion
FROM documents e
GROUP BY category
HAVING COUNT(*) > 10
ORDER BY cohesion DESC;
```

## Time-Series Analytics

Analyse how semantic content shifts over time:

```sql
-- Daily semantic drift: how different is today's content from last week's?
WITH weekly_centroids AS (
  SELECT
    date_trunc('week', created_at) AS week,
    AVG(embedding) AS centroid
  FROM documents
  GROUP BY week
)
SELECT
  w1.week,
  1 - (w1.centroid <=> w2.centroid) AS similarity_to_prev_week
FROM weekly_centroids w1
LEFT JOIN weekly_centroids w2
  ON w2.week = w1.week - INTERVAL '1 week'
ORDER BY w1.week;
```

## JSON Aggregation with Vectors

Combine JSON aggregation with vector results:

```sql
SELECT
  category,
  COUNT(*) AS total,
  AVG(price) AS avg_price,
  JSON_AGG(
    JSON_BUILD_OBJECT('id', id, 'name', name, 'similarity', 1 - (embedding <=> $1))
    ORDER BY embedding <=> $1
  ) FILTER (WHERE ROW_NUMBER() OVER (PARTITION BY category ORDER BY embedding <=> $1) <= 3)
    AS top_3_per_category
FROM products
WHERE available = true
GROUP BY category;
```

## ROLLUP and CUBE

Standard SQL ROLLUP and CUBE work for hierarchical aggregation:

```sql
SELECT
  region,
  category,
  COUNT(*) AS count,
  AVG(price) AS avg_price
FROM products
GROUP BY ROLLUP(region, category)
ORDER BY region NULLS LAST, category NULLS LAST;
```