mirror of
https://github.com/kbenestad/mdcms.git
synced 2026-06-18 15:24:32 +00:00
6.3 KiB
6.3 KiB
| title | sort | section-id | keywords | description | language |
|---|---|---|---|---|---|
| Monitoring | 100 | operations | monitoring, Prometheus, Grafana, metrics, alerts, observability, dashboards | Monitoring NeuralDB with Prometheus metrics, Grafana dashboards, and alert configuration | en |
Monitoring
Observability is critical for database operations. NeuralDB exposes Prometheus-compatible metrics and provides an official Grafana dashboard for real-time monitoring.
Prometheus Metrics
NeuralDB exposes metrics at http://localhost:9187/metrics (via the bundled exporter).
Enable the metrics exporter:
# neuraldb.conf
metrics.enabled = true
metrics.port = 9187
metrics.path = /metrics
Or run the standalone exporter:
neuraldb_exporter \
--web.listen-address=:9187 \
--db.uri="postgresql://monitor:password@localhost:5432/neuraldb?sslmode=disable"
Key Metrics
Connection Metrics
| Metric | Type | Description |
|---|---|---|
neuraldb_connections_total |
Gauge | Current connections by state |
neuraldb_connections_max |
Gauge | max_connections setting |
neuraldb_connection_pool_waiting |
Gauge | Queries waiting for a connection |
Query Metrics
| Metric | Type | Description |
|---|---|---|
neuraldb_queries_total |
Counter | Total queries by database and status |
neuraldb_query_duration_seconds |
Histogram | Query duration (p50, p95, p99) |
neuraldb_slow_queries_total |
Counter | Queries exceeding log_min_duration_statement |
neuraldb_deadlocks_total |
Counter | Deadlocks detected |
Vector Metrics
| Metric | Type | Description |
|---|---|---|
neuraldb_vector_queries_total |
Counter | Vector similarity queries by index |
neuraldb_vector_query_duration_seconds |
Histogram | ANN query latency |
neuraldb_hnsw_index_size_bytes |
Gauge | In-memory size of HNSW graphs |
neuraldb_hnsw_build_duration_seconds |
Histogram | Time to build HNSW indexes |
neuraldb_vector_recall_ratio |
Gauge | Estimated recall for ANN queries |
Replication Metrics
| Metric | Type | Description |
|---|---|---|
neuraldb_replication_lag_bytes |
Gauge | WAL lag per replica |
neuraldb_replication_lag_seconds |
Gauge | Time lag per replica |
neuraldb_wal_size_bytes |
Gauge | Current WAL on-disk size |
Storage Metrics
| Metric | Type | Description |
|---|---|---|
neuraldb_database_size_bytes |
Gauge | Total database size |
neuraldb_table_size_bytes |
Gauge | Size per table |
neuraldb_bloat_ratio |
Gauge | Estimated dead row ratio |
neuraldb_checkpoint_duration_seconds |
Histogram | Checkpoint write time |
Prometheus Configuration
# prometheus.yml
scrape_configs:
- job_name: 'neuraldb'
static_configs:
- targets: ['localhost:9187']
scrape_interval: 15s
metrics_path: /metrics
Grafana Dashboard
Import the official NeuralDB dashboard from Grafana.com (Dashboard ID: 18921):
# Import via Grafana API
curl -X POST \
http://admin:password@localhost:3000/api/dashboards/import \
-H "Content-Type: application/json" \
-d '{ "gnetId": 18921, "overwrite": true, "inputs": [{"name": "DS_PROMETHEUS", "type": "datasource", "pluginId": "prometheus", "value": "Prometheus"}] }'
The dashboard includes panels for:
- Query rate and error rate
- Query latency percentiles (p50, p95, p99)
- Active connections vs max connections
- Vector index memory usage
- Replication lag
- Database and table sizes
- Cache hit ratio
- Checkpoint frequency
Alerting Rules
Create Prometheus alerting rules for critical conditions:
# neuraldb-alerts.yml
groups:
- name: neuraldb
rules:
- alert: NeuralDBConnectionsHigh
expr: neuraldb_connections_total{state="active"} / neuraldb_connections_max > 0.85
for: 2m
labels:
severity: warning
annotations:
summary: "NeuralDB connections above 85%"
description: "{{ $value | humanizePercentage }} of max connections in use"
- alert: NeuralDBConnectionsExhausted
expr: neuraldb_connections_total{state="active"} / neuraldb_connections_max > 0.98
for: 30s
labels:
severity: critical
annotations:
summary: "NeuralDB connections nearly exhausted"
- alert: NeuralDBHighQueryLatency
expr: histogram_quantile(0.99, rate(neuraldb_query_duration_seconds_bucket[5m])) > 1.0
for: 5m
labels:
severity: warning
annotations:
summary: "P99 query latency above 1 second"
- alert: NeuralDBReplicationLagHigh
expr: neuraldb_replication_lag_seconds > 30
for: 1m
labels:
severity: warning
annotations:
summary: "Replication lag above 30 seconds"
- alert: NeuralDBDiskSpaceHigh
expr: (neuraldb_database_size_bytes / disk_total_bytes) > 0.80
for: 5m
labels:
severity: warning
annotations:
summary: "Database storage above 80% capacity"
- alert: NeuralDBVectorBufferExhausted
expr: neuraldb_hnsw_index_size_bytes > (neuraldb_vector_buffer_size_bytes * 0.90)
for: 5m
labels:
severity: warning
annotations:
summary: "HNSW indexes using >90% of vector_buffer"
Built-In Query Statistics
-- Top 10 slowest queries
SELECT query,
calls,
round(mean_exec_time::numeric, 2) AS avg_ms,
round(total_exec_time::numeric, 2) AS total_ms,
round(stddev_exec_time::numeric, 2) AS stddev_ms
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;
-- Cache hit ratio (should be >99%)
SELECT
sum(blks_hit) * 100.0 / sum(blks_hit + blks_read) AS cache_hit_ratio
FROM pg_stat_database
WHERE datname != 'template0';
-- Lock waits
SELECT pid, query, state, wait_event_type, wait_event, query_start
FROM pg_stat_activity
WHERE wait_event_type = 'Lock'
ORDER BY query_start;
Log-Based Alerting
Forward slow query logs to your SIEM or log aggregation system:
# neuraldb.conf
log_destination = 'jsonlog'
log_min_duration_statement = 500 # log queries slower than 500ms
log_line_prefix = '%t [%p] %u@%d '
Parse JSON logs in Loki or Elasticsearch and alert when the rate of slow queries exceeds a threshold.
