mirror of
https://github.com/kbenestad/mdcms.git
synced 2026-06-18 15:24:32 +00:00
65 lines
1.9 KiB
Markdown
65 lines
1.9 KiB
Markdown
---
|
|
title: Monitoring
|
|
sort: 100
|
|
section-id: operations
|
|
keywords: monitoring, Prometheus, Grafana, metrics, alerts, observability, dashboards
|
|
description: Monitoring NeuralDB with Prometheus metrics, Grafana dashboards, and alert configuration
|
|
language: en
|
|
---
|
|
|
|
# Monitoring
|
|
|
|
## Prometheus Metrics
|
|
|
|
Enable the metrics exporter:
|
|
|
|
```ini
|
|
metrics.enabled = true
|
|
metrics.port = 9187
|
|
metrics.path = /metrics
|
|
```
|
|
|
|
Key metrics:
|
|
|
|
| Metric | Type | Description |
|
|
|--------|------|-------------|
|
|
| `neuraldb_connections_total` | Gauge | Current connections by state |
|
|
| `neuraldb_query_duration_seconds` | Histogram | Query duration percentiles |
|
|
| `neuraldb_vector_queries_total` | Counter | Vector similarity queries by index |
|
|
| `neuraldb_hnsw_index_size_bytes` | Gauge | In-memory size of HNSW graphs |
|
|
| `neuraldb_replication_lag_seconds` | Gauge | Time lag per replica |
|
|
| `neuraldb_database_size_bytes` | Gauge | Total database size |
|
|
|
|
## Grafana Dashboard
|
|
|
|
Import official dashboard ID **18921** from Grafana.com.
|
|
|
|
## Alerting Rules
|
|
|
|
```yaml
|
|
groups:
|
|
- name: neuraldb
|
|
rules:
|
|
- alert: NeuralDBConnectionsHigh
|
|
expr: neuraldb_connections_total{state="active"} / neuraldb_connections_max > 0.85
|
|
for: 2m
|
|
labels: { severity: warning }
|
|
- alert: NeuralDBReplicationLagHigh
|
|
expr: neuraldb_replication_lag_seconds > 30
|
|
for: 1m
|
|
labels: { severity: warning }
|
|
- alert: NeuralDBVectorBufferExhausted
|
|
expr: neuraldb_hnsw_index_size_bytes > (neuraldb_vector_buffer_size_bytes * 0.90)
|
|
for: 5m
|
|
labels: { severity: warning }
|
|
```
|
|
|
|
## Built-In Query Statistics
|
|
|
|
```sql
|
|
SELECT query, calls, round(mean_exec_time::numeric, 2) AS avg_ms
|
|
FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 10;
|
|
|
|
SELECT sum(blks_hit) * 100.0 / sum(blks_hit + blks_read) AS cache_hit_ratio
|
|
FROM pg_stat_database WHERE datname != 'template0';
|
|
```
|