feat(alerts): metric-threshold alerting (backend + API)
Operators can define metric-threshold alert rules (cpu_percent, memory_percent, memory_bytes; gt/lt) per-workload or global via /api/metric-alert-rules. A periodic evaluator (internal/metricalert, 30s tick) checks the freshest container stats sample per container against enabled rules and, on breach (per-rule-per-workload cooldown), emits into the existing event_log + bus pipeline (source "metric_alert", workload_id set). Alerts therefore surface on the global events page, the per-app activity timeline, and any configured event-trigger webhook -- no new notification plumbing. Mirrors the log_scan_rules store/API/route patterns and the stats.Collector lifecycle. Rule CRUD reads are authed, mutations AdminOnly. Frontend rule-config UI is a follow-up phase. Reviewed: go APPROVE (0 CRITICAL/HIGH).
This commit is contained in:
@@ -277,6 +277,39 @@ const (
|
||||
LogScanSeverityError = "error"
|
||||
)
|
||||
|
||||
// MetricAlertRule fires an event when a container metric breaches a
|
||||
// threshold. Mirrors LogScanRule but evaluated against stats_samples
|
||||
// instead of log lines.
|
||||
type MetricAlertRule struct {
|
||||
ID int64 `json:"id"`
|
||||
WorkloadID string `json:"workload_id"` // "" = applies to all workloads
|
||||
Name string `json:"name"`
|
||||
Metric string `json:"metric"` // cpu_percent | memory_percent | memory_bytes
|
||||
Comparator string `json:"comparator"` // gt | lt
|
||||
Threshold float64 `json:"threshold"`
|
||||
Severity string `json:"severity"` // info | warn | error
|
||||
CooldownSeconds int `json:"cooldown_seconds"` // min seconds between fires per (rule,workload)
|
||||
Enabled bool `json:"enabled"`
|
||||
CreatedAt string `json:"created_at"`
|
||||
UpdatedAt string `json:"updated_at"`
|
||||
}
|
||||
|
||||
// Metric-alert metric identifiers. cpu_percent + memory_percent are
|
||||
// 0–100 ratios; memory_bytes is an absolute usage figure. Validated in
|
||||
// the store on create/update.
|
||||
const (
|
||||
MetricCPUPercent = "cpu_percent"
|
||||
MetricMemoryPercent = "memory_percent"
|
||||
MetricMemoryBytes = "memory_bytes"
|
||||
)
|
||||
|
||||
// Metric-alert comparators. gt fires when the value exceeds the
|
||||
// threshold; lt when it falls below.
|
||||
const (
|
||||
MetricComparatorGT = "gt"
|
||||
MetricComparatorLT = "lt"
|
||||
)
|
||||
|
||||
// WorkloadKind enumerates the legacy discriminator values written into
|
||||
// containers.workload_kind and workloads.kind. After the hard cutover the
|
||||
// backing project / stack / static_site tables are gone — these constants
|
||||
|
||||
Reference in New Issue
Block a user