feat(alerts): metric-threshold alerting (backend + API)
Operators can define metric-threshold alert rules (cpu_percent, memory_percent, memory_bytes; gt/lt) per-workload or global via /api/metric-alert-rules. A periodic evaluator (internal/metricalert, 30s tick) checks the freshest container stats sample per container against enabled rules and, on breach (per-rule-per-workload cooldown), emits into the existing event_log + bus pipeline (source "metric_alert", workload_id set). Alerts therefore surface on the global events page, the per-app activity timeline, and any configured event-trigger webhook -- no new notification plumbing. Mirrors the log_scan_rules store/API/route patterns and the stats.Collector lifecycle. Rule CRUD reads are authed, mutations AdminOnly. Frontend rule-config UI is a follow-up phase. Reviewed: go APPROVE (0 CRITICAL/HIGH).
This commit is contained in:
@@ -431,6 +431,16 @@ func (s *Server) Router() chi.Router {
|
||||
r.Post("/log-scan-rules/{id}/test", s.testLogScanRule)
|
||||
})
|
||||
|
||||
// Metric-alert rules.
|
||||
r.Get("/metric-alert-rules", s.listMetricAlertRules)
|
||||
r.Get("/metric-alert-rules/{id}", s.getMetricAlertRule)
|
||||
r.Group(func(r chi.Router) {
|
||||
r.Use(auth.AdminOnly)
|
||||
r.Post("/metric-alert-rules", s.createMetricAlertRule)
|
||||
r.Patch("/metric-alert-rules/{id}", s.updateMetricAlertRule)
|
||||
r.Delete("/metric-alert-rules/{id}", s.deleteMetricAlertRule)
|
||||
})
|
||||
|
||||
// System resources (read-only).
|
||||
r.Get("/system/stats", s.getSystemStats)
|
||||
r.Get("/system/stats/history", s.getSystemStatsHistory)
|
||||
|
||||
Reference in New Issue
Block a user