mdcms/neuraldb-docs/pages/ops-monitoring.md

65 lines
1.9 KiB
Markdown

---
title: Monitoring
sort: 100
section-id: operations
keywords: monitoring, Prometheus, Grafana, metrics, alerts, observability, dashboards
description: Monitoring NeuralDB with Prometheus metrics, Grafana dashboards, and alert configuration
language: en
---
# Monitoring
## Prometheus Metrics
Enable the metrics exporter:
```ini
metrics.enabled = true
metrics.port = 9187
metrics.path = /metrics
```
Key metrics:
| Metric | Type | Description |
|--------|------|-------------|
| `neuraldb_connections_total` | Gauge | Current connections by state |
| `neuraldb_query_duration_seconds` | Histogram | Query duration percentiles |
| `neuraldb_vector_queries_total` | Counter | Vector similarity queries by index |
| `neuraldb_hnsw_index_size_bytes` | Gauge | In-memory size of HNSW graphs |
| `neuraldb_replication_lag_seconds` | Gauge | Time lag per replica |
| `neuraldb_database_size_bytes` | Gauge | Total database size |
## Grafana Dashboard
Import official dashboard ID **18921** from Grafana.com.
## Alerting Rules
```yaml
groups:
- name: neuraldb
rules:
- alert: NeuralDBConnectionsHigh
expr: neuraldb_connections_total{state="active"} / neuraldb_connections_max > 0.85
for: 2m
labels: { severity: warning }
- alert: NeuralDBReplicationLagHigh
expr: neuraldb_replication_lag_seconds > 30
for: 1m
labels: { severity: warning }
- alert: NeuralDBVectorBufferExhausted
expr: neuraldb_hnsw_index_size_bytes > (neuraldb_vector_buffer_size_bytes * 0.90)
for: 5m
labels: { severity: warning }
```
## Built-In Query Statistics
```sql
SELECT query, calls, round(mean_exec_time::numeric, 2) AS avg_ms
FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 10;
SELECT sum(blks_hit) * 100.0 / sum(blks_hit + blks_read) AS cache_hit_ratio
FROM pg_stat_database WHERE datname != 'template0';
```