feat(observability): event triggers + log scanner backend

Two paired backends sharing the events.Bus seam:

Event triggers (consumer-side):
- internal/store/event_triggers.go — CRUD with action_secret
  redaction on read (placeholder echo treated as "no change" on
  PATCH so secrets aren't accidentally wiped).
- internal/events/dispatcher.go — bus subscriber, AND-composed
  filters (severity CSV, source CSV, message regex with memoized
  compile cache). Structural loop-prevention: never writes to
  event_log. Sends via notifier.SendPayload.
- internal/notify: SendPayload + SendSyncForTestPayload methods,
  TierEventTrigger constant, doSendRaw shared with the legacy
  Event-shaped path.
- internal/api/event_triggers.go — admin-gated CRUD + /test
  sending the real TriggerWebhookPayload shape. SSRF guard
  rejects loopback / link-local / unspecified targets. PATCH
  uses pointer-typed DTO for partial updates.

Log scanner (producer-side):
- internal/logscanner/ — engine (per-rule cooldown +
  per-container token bucket, atomic drop counters), tail
  (multiplexed docker frame demuxer with TTY fallback + 16 MiB
  payload cap + 1 MiB reassembly cap + RFC3339Nano-validated
  timestamp strip + UTF-8-safe message truncation), manager
  (5s container polling, atomic.Pointer[Snapshot] hot-reload,
  HitEmitter writes event_log + publishes EventLog so the
  trigger dispatcher picks them up immediately).
- internal/docker/container.go — ContainerLogsOpts exposes
  stream selection for stderr-only / stdout-only rules.
- internal/store: log_scan_rules table + CRUD with
  EffectiveLogScanRules resolver (globals minus per-workload
  overrides plus workload-only additions). Transactional
  cascade-delete of overrides when a global rule is removed.
- internal/api/log_scan_rules.go — admin-gated CRUD + /test
  (sample_line → matched/captures) + /stats (drop counters +
  active tail count + last-snapshot compile errors) +
  GET /api/workloads/{id}/effective-rules.

cmd/server/main.go wires both subsystems next to the existing
RegisterPersistentLogger. Coverage spans engine cooldown / bucket
counter tests, snapshot effective-set semantics, manager compile-
error capture, dispatcher matching, store validation +
cascade-delete, API URL validator + secret redaction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-11 22:18:11 +03:00
parent 82d32181ba
commit 7a9ff7ad54
23 changed files with 3974 additions and 19 deletions
+126 -4
View File
@@ -197,6 +197,34 @@ type StageEnv struct {
UpdatedAt string `json:"updated_at"`
}
// WorkloadVolume is the plugin-shape equivalent of legacy Volume: a
// per-workload mount declaration. The Scope enum matches the existing
// VolumeScope contract so the legacy resolver can be reused once its
// project_id assumption is loosened.
type WorkloadVolume struct {
ID string `json:"id"`
WorkloadID string `json:"workload_id"`
Source string `json:"source"`
Target string `json:"target"`
Scope string `json:"scope"`
Name string `json:"name"`
CreatedAt string `json:"created_at"`
UpdatedAt string `json:"updated_at"`
}
// WorkloadEnv is the plugin-shape equivalent of StageEnv: per-workload
// environment variable overrides, optionally encrypted at rest. Read by
// the Source plugin at deploy time, merged on top of source_config.env.
type WorkloadEnv struct {
ID string `json:"id"`
WorkloadID string `json:"workload_id"`
Key string `json:"key"`
Value string `json:"value"`
Encrypted bool `json:"encrypted"`
CreatedAt string `json:"created_at"`
UpdatedAt string `json:"updated_at"`
}
// VolumeScope defines the sharing scope for a volume mount.
// Valid scopes: instance, stage, project, project_named, named, ephemeral.
type VolumeScope string
@@ -333,6 +361,82 @@ type EventLog struct {
CreatedAt string `json:"created_at"`
}
// EventTrigger is a filter+action rule evaluated against EventLog
// entries published on the bus. When all non-empty filters match, the
// trigger fires its configured action (webhook today, additional action
// types extensible via the ActionType enum).
//
// Filter fields use a comma-separated list shape for multi-value
// filters (severity, source) to keep the schema flat — empty string
// means "no filter on this dimension." FilterMessageRegex is a single
// regex evaluated against EventLog.Message.
//
// Loop-prevention: deliveries are recorded in webhook_deliveries (the
// existing audit trail). The dispatcher MUST NOT write to event_log
// or it will recurse.
type EventTrigger struct {
ID int64 `json:"id"`
Name string `json:"name"`
FilterSeverity string `json:"filter_severity"` // comma list: "warn,error"; "" = any
FilterSource string `json:"filter_source"` // comma list: "logscan,deploy"; "" = any
FilterMessageRegex string `json:"filter_message_regex"` // "" = any
ActionType string `json:"action_type"` // "webhook" today
ActionTarget string `json:"action_target"` // URL for webhook
ActionSecret string `json:"action_secret"` // optional HMAC secret for signed delivery
Enabled bool `json:"enabled"`
CreatedAt string `json:"created_at"`
UpdatedAt string `json:"updated_at"`
}
// EventTriggerActionType enumerates the supported action_type values.
// Adding a new action is additive — old triggers keep working, the
// dispatcher just learns a new branch.
const (
EventTriggerActionWebhook = "webhook"
)
// LogScanRule is one regex-based pattern the log scanner evaluates
// against container log lines. The (workload_id, overrides_id) pair
// implements the "global rule with optional per-workload override"
// pattern documented in docs/LOGSCAN_AND_TRIGGERS_TODO.md:
//
// - WorkloadID == "" && OverridesID == 0 → global rule, applies to
// every workload unless overridden.
// - WorkloadID != "" && OverridesID == 0 → workload-only addition.
// - WorkloadID != "" && OverridesID != 0 → override of the named
// global rule for one workload (Enabled=false to disable globally
// for this workload).
type LogScanRule struct {
ID int64 `json:"id"`
WorkloadID string `json:"workload_id"` // "" = global
OverridesID int64 `json:"overrides_id"` // 0 = not an override
Name string `json:"name"`
Pattern string `json:"pattern"` // regex, compiled at load
Severity string `json:"severity"` // info|warn|error
Streams string `json:"streams"` // all|stdout|stderr
CooldownSeconds int `json:"cooldown_seconds"`
Enabled bool `json:"enabled"`
CreatedAt string `json:"created_at"`
UpdatedAt string `json:"updated_at"`
}
// Log scan stream filter values. "all" reads both streams; "stdout"
// or "stderr" filter to one. Used both for store validation and at
// docker-side log read time.
const (
LogScanStreamAll = "all"
LogScanStreamStdout = "stdout"
LogScanStreamStderr = "stderr"
)
// Log scan severity values mirror the event_log enum so a matched
// rule lands as an event_log row with the rule's severity verbatim.
const (
LogScanSeverityInfo = "info"
LogScanSeverityWarn = "warn"
LogScanSeverityError = "error"
)
// WorkloadKind enumerates the kinds of things that own containers.
// Each kind has a corresponding row in projects/stacks/static_sites referenced via Workload.RefID.
type WorkloadKind string
@@ -346,12 +450,24 @@ const (
// Workload is the unifying primitive that abstracts Project, Stack, and StaticSite.
// Each row is paired with exactly one project/stack/site via (Kind, RefID).
// Notification + webhook config moves here so it lives in one place across kinds.
//
// SourceKind / SourceConfig / TriggerKind / TriggerConfig / PublicFaces /
// ParentWorkloadID populate the unified plugin model from the Workload-first
// refactor. Existing rows keep these empty until they are explicitly migrated
// or replaced — the legacy Kind/RefID columns continue to point at
// project/stack/site rows in parallel during the cutover.
type Workload struct {
ID string `json:"id"`
Kind string `json:"kind"` // project | stack | site
Kind string `json:"kind"` // project | stack | site (legacy discriminator)
RefID string `json:"ref_id"`
Name string `json:"name"`
AppID string `json:"app_id"` // nullable; "" = unassigned
AppID string `json:"app_id"` // nullable; "" = unassigned (a.k.a. GroupID after rename)
SourceKind string `json:"source_kind"` // "" until plugin-mode populated
SourceConfig string `json:"source_config"` // JSON-encoded, decoded by the matching Source
TriggerKind string `json:"trigger_kind"`
TriggerConfig string `json:"trigger_config"` // JSON-encoded, decoded by the matching Trigger
PublicFaces string `json:"public_faces"` // JSON-encoded []PublicFace
ParentWorkloadID string `json:"parent_workload_id"` // "" = root; non-empty = stage chain
NotificationURL string `json:"notification_url"`
NotificationSecret string `json:"-"` // never serialized
WebhookSecret string `json:"-"` // URL-identifier secret; never serialized
@@ -384,8 +500,14 @@ type Container struct {
ProxyRouteID string `json:"proxy_route_id"`
NpmProxyID int `json:"npm_proxy_id"`
LastSeenAt string `json:"last_seen_at"`
CreatedAt string `json:"created_at"`
UpdatedAt string `json:"updated_at"`
// ExtraJSON carries source-specific metadata that isn't promoted to a
// first-class column — currently per-face proxy route IDs for
// multi-face image deploys. Stored as a JSON object; '{}' on empty
// rows. Sources own the shape; consumers should tolerate unknown
// keys.
ExtraJSON string `json:"extra_json"`
CreatedAt string `json:"created_at"`
UpdatedAt string `json:"updated_at"`
}
// App is an optional grouping of workloads (e.g., "my-saas" = web project + worker stack + redis stack).