feat(alerts): metric-threshold alerting (backend + API)

Operators can define metric-threshold alert rules (cpu_percent,
memory_percent, memory_bytes; gt/lt) per-workload or global via
/api/metric-alert-rules. A periodic evaluator (internal/metricalert,
30s tick) checks the freshest container stats sample per container
against enabled rules and, on breach (per-rule-per-workload cooldown),
emits into the existing event_log + bus pipeline (source "metric_alert",
workload_id set). Alerts therefore surface on the global events page,
the per-app activity timeline, and any configured event-trigger webhook
-- no new notification plumbing.

Mirrors the log_scan_rules store/API/route patterns and the
stats.Collector lifecycle. Rule CRUD reads are authed, mutations
AdminOnly. Frontend rule-config UI is a follow-up phase.

Reviewed: go APPROVE (0 CRITICAL/HIGH).
This commit is contained in:
2026-05-29 14:06:23 +03:00
parent 5c17885197
commit cdb9fd57d1
11 changed files with 1299 additions and 0 deletions
+18
View File
@@ -408,6 +408,24 @@ func (s *Store) runMigrations() error {
)`,
`CREATE INDEX IF NOT EXISTS idx_log_scan_rules_workload ON log_scan_rules(workload_id)`,
`CREATE INDEX IF NOT EXISTS idx_log_scan_rules_overrides ON log_scan_rules(overrides_id)`,
// metric_alert_rules: threshold rules the metric-alert manager
// evaluates against recent container stats samples. WorkloadID is
// nullable (via "" sentinel) so a global rule applies to every
// workload; a non-empty value scopes it to one workload.
`CREATE TABLE IF NOT EXISTS metric_alert_rules (
id INTEGER PRIMARY KEY AUTOINCREMENT,
workload_id TEXT NOT NULL DEFAULT '',
name TEXT NOT NULL DEFAULT '',
metric TEXT NOT NULL,
comparator TEXT NOT NULL,
threshold REAL NOT NULL DEFAULT 0,
severity TEXT NOT NULL DEFAULT 'warn',
cooldown_seconds INTEGER NOT NULL DEFAULT 300,
enabled INTEGER NOT NULL DEFAULT 1,
created_at TEXT NOT NULL DEFAULT (datetime('now')),
updated_at TEXT NOT NULL DEFAULT (datetime('now'))
)`,
`CREATE INDEX IF NOT EXISTS idx_metric_alert_rules_workload ON metric_alert_rules(workload_id)`,
}
for _, t := range observabilityTables {
if _, err := s.db.Exec(t); err != nil {