mirror of https://github.com/kbenestad/mdcms.git synced 2026-06-18 15:24:32 +00:00

kbenestad 59efc20dde Updated sample-sites.

2026-05-18 14:30:49 +07:00

5.2 KiB

Raw Blame History

title	sort	section-id	keywords	description	language
Hybrid Queries	120	query-language	hybrid queries, vector, relational, filters, combined, semantic search, metadata	Combining vector similarity and relational filters in NQL hybrid queries	en

Hybrid Queries

Hybrid queries combine vector similarity search with relational filter predicates in a single SQL statement. The NeuralDB query planner handles the execution strategy — you write normal SQL with vector operators.

Basic Hybrid Query

Find the 10 most semantically similar products that are in the "electronics" category and in stock:

SELECT id, name, price, 1 - (embedding <=> $1) AS similarity
FROM products
WHERE category = 'electronics'
  AND stock > 0
  AND price < 500
ORDER BY embedding <=> $1
LIMIT 10;

NeuralDB automatically determines whether to:

Pre-filter: apply relational conditions first, then search the filtered set
Post-filter: run ANN search, then apply conditions to the top-k results

The decision is based on the selectivity of the relational predicates.

Query Planner Hints

Override the planner's strategy:

-- Force pre-filter (good when relational filter is very selective)
SELECT /*+ PREFILTER */ id, name, score
FROM (
  SELECT id, name, 1 - (embedding <=> $1) AS score
  FROM products
  WHERE category = 'electronics'  -- very selective: 2% of rows
) sub
ORDER BY score DESC
LIMIT 10;

-- Force post-filter (good when relational filter is weakly selective)
SELECT /*+ POSTFILTER */ id, name, 1 - (embedding <=> $1) AS score
FROM products
WHERE price < 500   -- weakly selective: 80% of rows
ORDER BY embedding <=> $1
LIMIT 10;

Filtering by Multiple Conditions

SELECT id, name, description,
       1 - (embedding <=> $1) AS similarity,
       price,
       rating
FROM products
WHERE category = ANY($2)          -- multi-category filter
  AND price BETWEEN $3 AND $4
  AND rating >= 4.0
  AND discontinued = false
  AND created_at > NOW() - INTERVAL '1 year'
ORDER BY embedding <=> $1
LIMIT 20;
-- $1 = query embedding
-- $2 = ['electronics', 'computers']
-- $3 = 50, $4 = 1000

Hybrid Full-Text + Vector (BM25)

Combine traditional full-text search with vector similarity using Reciprocal Rank Fusion (RRF):

WITH vector_search AS (
  SELECT id, ROW_NUMBER() OVER (ORDER BY embedding <=> $1) AS rank
  FROM documents
  ORDER BY embedding <=> $1
  LIMIT 100
),
fts_search AS (
  SELECT id, ROW_NUMBER() OVER (ORDER BY ts_rank_cd(tsv, query) DESC) AS rank
  FROM documents, to_tsquery('english', $2) query
  WHERE tsv @@ query
  ORDER BY ts_rank_cd(tsv, query) DESC
  LIMIT 100
),
rrf AS (
  SELECT
    COALESCE(v.id, f.id) AS id,
    (COALESCE(1.0 / (60 + v.rank), 0) + COALESCE(1.0 / (60 + f.rank), 0)) AS rrf_score
  FROM vector_search v
  FULL OUTER JOIN fts_search f ON v.id = f.id
)
SELECT d.id, d.content, rrf.rrf_score
FROM rrf
JOIN documents d ON d.id = rrf.id
ORDER BY rrf_score DESC
LIMIT 10;

NQL also provides a built-in HYBRID_SEARCH function:

SELECT id, content, score
FROM HYBRID_SEARCH(
  table     := 'documents',
  vector_column := 'embedding',
  tsv_column := 'tsv',
  query_vector := $1,
  query_text := $2,
  top_k := 10,
  rrf_k := 60,
  vector_weight := 0.6,
  text_weight := 0.4
);

Joining Vector Results with Other Tables

SELECT
  p.id,
  p.name,
  p.price,
  c.name AS category_name,
  u.display_name AS seller,
  1 - (p.embedding <=> $1) AS similarity
FROM products p
JOIN categories c ON c.id = p.category_id
JOIN users u ON u.id = p.seller_id
WHERE p.available = true
  AND c.slug = ANY($2)
  AND u.verified = true
ORDER BY p.embedding <=> $1
LIMIT 15;

Subquery Vectors

Use a subquery to dynamically compute a query vector from existing data:

-- Find products similar to product #123
SELECT id, name, 1 - (embedding <=> ref.embedding) AS similarity
FROM products,
     (SELECT embedding FROM products WHERE id = $1) ref
WHERE id != $1
ORDER BY embedding <=> ref.embedding
LIMIT 10;

Tenant-Scoped Search

For multi-tenant applications, always include tenant filters:

SELECT id, content, 1 - (embedding <=> $1) AS similarity
FROM documents
WHERE tenant_id = $2       -- partition pruning if SHARD BY tenant_id
  AND embedding IS NOT NULL
ORDER BY embedding <=> $1
LIMIT 10;

If the table is sharded by tenant_id, this query runs entirely on the correct shard without cross-shard coordination.

Composite Scoring

Combine vector similarity with relational signals:

SELECT
  id,
  name,
  price,
  rating,
  -- Weighted composite score: 70% semantic, 20% rating, 10% recency
  (0.7 * (1 - (embedding <=> $1))
   + 0.2 * (rating / 5.0)
   + 0.1 * (1 - EXTRACT(DAYS FROM NOW() - created_at) / 365.0)
  ) AS composite_score
FROM products
WHERE available = true
  AND price < $2
ORDER BY composite_score DESC
LIMIT 20;

Pagination

Cursor-based pagination for vector results:

-- Page 1
SELECT id, name, (embedding <=> $1) AS dist
FROM products
WHERE available = true
ORDER BY dist, id    -- secondary sort by id for stable pagination
LIMIT 20;

-- Page 2 (cursor: last dist and id from page 1)
SELECT id, name, (embedding <=> $1) AS dist
FROM products
WHERE available = true
  AND ((embedding <=> $1) > $2 OR ((embedding <=> $1) = $2 AND id > $3))
ORDER BY dist, id
LIMIT 20;

5.2 KiB Raw Blame History