feat(alerts): metric-threshold alerting (backend + API)
Operators can define metric-threshold alert rules (cpu_percent, memory_percent, memory_bytes; gt/lt) per-workload or global via /api/metric-alert-rules. A periodic evaluator (internal/metricalert, 30s tick) checks the freshest container stats sample per container against enabled rules and, on breach (per-rule-per-workload cooldown), emits into the existing event_log + bus pipeline (source "metric_alert", workload_id set). Alerts therefore surface on the global events page, the per-app activity timeline, and any configured event-trigger webhook -- no new notification plumbing. Mirrors the log_scan_rules store/API/route patterns and the stats.Collector lifecycle. Rule CRUD reads are authed, mutations AdminOnly. Frontend rule-config UI is a follow-up phase. Reviewed: go APPROVE (0 CRITICAL/HIGH).
This commit is contained in:
@@ -28,6 +28,7 @@ import (
|
||||
"github.com/alexei/tinyforge/internal/health"
|
||||
"github.com/alexei/tinyforge/internal/logging"
|
||||
"github.com/alexei/tinyforge/internal/logscanner"
|
||||
"github.com/alexei/tinyforge/internal/metricalert"
|
||||
"github.com/alexei/tinyforge/internal/notify"
|
||||
"github.com/alexei/tinyforge/internal/npm"
|
||||
"github.com/alexei/tinyforge/internal/proxy"
|
||||
@@ -390,6 +391,14 @@ func main() {
|
||||
}
|
||||
defer logScanMgr.Stop()
|
||||
|
||||
// Metric-alert manager: evaluates threshold rules against recent
|
||||
// container stats samples and emits event_log entries on breach.
|
||||
// The store satisfies RuleSource/SampleSource/EventSink; the event
|
||||
// bus is the Publisher.
|
||||
metricAlertMgr := metricalert.New(db, db, db, eventBus)
|
||||
metricAlertMgr.Start()
|
||||
defer metricAlertMgr.Stop()
|
||||
|
||||
// Build API server.
|
||||
apiServer := api.NewServer(db, dockerClient, npmClient, proxyProvider, dep, notifier, webhookHandler, eventBus, encKey)
|
||||
apiServer.SetStaleScanner(staleScanner)
|
||||
@@ -451,6 +460,7 @@ func main() {
|
||||
eventBus.Unsubscribe(notifySub)
|
||||
staleScanner.Stop()
|
||||
statsCollector.Stop()
|
||||
metricAlertMgr.Stop()
|
||||
|
||||
// Drain in-progress deploys and notifications.
|
||||
dep.Drain()
|
||||
|
||||
Reference in New Issue
Block a user