mdcms/neuraldb-docs/pages/ops-monitoring.md

1.9 KiB

title sort section-id keywords description language
Monitoring 100 operations monitoring, Prometheus, Grafana, metrics, alerts, observability, dashboards Monitoring NeuralDB with Prometheus metrics, Grafana dashboards, and alert configuration en

Monitoring

Prometheus Metrics

Enable the metrics exporter:

metrics.enabled = true
metrics.port = 9187
metrics.path = /metrics

Key metrics:

Metric Type Description
neuraldb_connections_total Gauge Current connections by state
neuraldb_query_duration_seconds Histogram Query duration percentiles
neuraldb_vector_queries_total Counter Vector similarity queries by index
neuraldb_hnsw_index_size_bytes Gauge In-memory size of HNSW graphs
neuraldb_replication_lag_seconds Gauge Time lag per replica
neuraldb_database_size_bytes Gauge Total database size

Grafana Dashboard

Import official dashboard ID 18921 from Grafana.com.

Alerting Rules

groups:
  - name: neuraldb
    rules:
      - alert: NeuralDBConnectionsHigh
        expr: neuraldb_connections_total{state="active"} / neuraldb_connections_max > 0.85
        for: 2m
        labels: { severity: warning }
      - alert: NeuralDBReplicationLagHigh
        expr: neuraldb_replication_lag_seconds > 30
        for: 1m
        labels: { severity: warning }
      - alert: NeuralDBVectorBufferExhausted
        expr: neuraldb_hnsw_index_size_bytes > (neuraldb_vector_buffer_size_bytes * 0.90)
        for: 5m
        labels: { severity: warning }

Built-In Query Statistics

SELECT query, calls, round(mean_exec_time::numeric, 2) AS avg_ms
FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 10;

SELECT sum(blks_hit) * 100.0 / sum(blks_hit + blks_read) AS cache_hit_ratio
FROM pg_stat_database WHERE datname != 'template0';