docs: workload refactor + observability progress

Two design + handoff docs: - docs/WORKLOAD_REFACTOR_TODO.md — status-at-a-glance table showing what's done (volume scopes, kind-aware editors, vendor webhook parsing, chain-panel CSS, Log Rules panel) and what's still pending (static source inline port + the hard legacy cutover gated on it; codemap entries; /apps page-level i18n; Priority 4 integration tests). - docs/LOGSCAN_AND_TRIGGERS_TODO.md — companion design + status doc for the two Observability features. Records the loop-prevention invariant (event_log = system observing itself, webhook_deliveries = system talking to outside) so the next contributor doesn't accidentally break it by adding a new EventLog subscriber that re-publishes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 22:18:51 +03:00
parent 4707db1c3b
commit 30133bc1eb
2 changed files with 719 additions and 0 deletions
@@ -0,0 +1,385 @@
+# Log Scanner + Event Triggers — Design Handoff
+
+Two related features. They can ship independently, but were designed together
+because they share the event_log seam.
+
+- **A. Log scanner** — tail container logs, match against rules, emit event_log
+  entries. Producer of events.
+- **B. Event triggers** — turn event_log entries into webhook / notification
+  dispatches. Consumer of events. Generalizes the existing
+  `RegisterPersistentLogger` pattern.
+
+Either half is useful alone:
+- A without B = errors get surfaced in the events UI, no external delivery.
+- B without A = manual + reconciler + deploy events can drive notifications.
+
+Recommended ship order: B first (smaller, self-contained generalization), then
+A (more moving parts, depends on container-lifecycle hooks).
+
+---
+
+## A. Log scanner — BACKEND LANDED
+
+Status:
+
+- **Schema + store CRUD** — `internal/store/log_scan_rules.go` +
+  `log_scan_rules` table added to the `observabilityTables` block.
+  Includes the `EffectiveLogScanRules(workloadID)` helper that
+  resolves global rules minus per-workload overrides plus workload-
+  only additions in one Go-side pass.
+- **Stream-selectable docker reads** — `internal/docker/container.go`
+  `ContainerLogsOpts` accepts a `ContainerLogOptions{ShowStdout,
+  ShowStderr, Follow, Tail}` so the scanner can subscribe to one
+  stream when a rule scopes itself to stdout or stderr. The legacy
+  `ContainerLogs` is preserved as a thin wrapper for back-compat.
+- **Engine** — `internal/logscanner/engine.go`: per-rule cooldown
+  (keyed on container+rule), per-container token bucket (default 10
+  events / 60s, override-able), regex match per line, hits returned
+  for the manager to persist. Pure logic, fully unit-tested.
+- **Tail goroutine** — `internal/logscanner/tail.go`: per-container
+  loop reading docker's multiplexed log frames (with TTY fallback),
+  strips the prepended RFC3339 timestamp, runs every line through the
+  engine + snapshot. Exits on container stop or context cancel.
+- **Manager** — `internal/logscanner/manager.go`: 5s polling diff
+  against `ListContainers(state=running)`, atomic.Pointer[Snapshot]
+  hot-reload, structural HitEmitter that writes event_log rows AND
+  publishes `EventLog` on the bus (so event-trigger dispatchers can
+  pick them up immediately).
+- **API** — `internal/api/log_scan_rules.go`: full CRUD,
+  `/test` endpoint accepting `{"sample_line": "..."}` and returning
+  matched/captures, plus
+  `GET /api/workloads/{id}/effective-rules` for the workload detail
+  page's future Log Rules tab. Admin-gated mutations.
+- **Wired in main.go** before the API server is constructed so the
+  reload callback is plugged via `apiServer.SetLogScanReloader`.
+- **Loop-prevention** — Same boundary as feature B: scanner publishes
+  EventLog events, dispatcher consumes them, neither writes to
+  event_log on the consume side.
+- **Tests** — `internal/logscanner/{engine,rules}_test.go` cover
+  cooldown isolation, token bucket refill, stream filtering,
+  override-replaces-global, disabled-override-suppresses-global,
+  compile-error reporting. `internal/store/log_scan_rules_test.go`
+  covers validation + cascade delete.
+
+**Frontend still pending** — `/log-scan-rules` pages, regex test box
+component, Log Rules tab on `/apps/[id]`, i18n keys. Not touched this
+turn.
+
+### Where it plugs in
+
+[internal/docker/container.go:362](../internal/docker/container.go#L362) already
+exposes `ContainerLogs(ctx, id, follow=true, tail)`. The existing SSE handler at
+[internal/api/workloads.go:43](../internal/api/workloads.go#L43)
+(`streamWorkloadContainerLogs`) is per-viewer and dies on browser disconnect —
+**do not hook the scanner there**. The scanner is a separate long-lived
+subsystem owned by the server process.
+
+Minor required change to `ContainerLogs`: expose `ShowStdout` / `ShowStderr` as
+caller-controlled. Currently hardcoded to `true`/`true`. Single existing caller
+passes "both" → no friction. Add an options struct or two booleans.
+
+### New package: `internal/logscanner/`
+
+```
+internal/logscanner/
+  manager.go    — Manager: map[containerID]*tail, lifecycle hooks
+  tail.go       — per-container goroutine; reads logs, fans to engine
+  engine.go     — rule evaluation + cooldown + rate limit
+  rules.go      — Rule struct, regex compile cache, effective-set resolver
+```
+
+**Manager lifecycle.** Subscribes to container start/stop signals. Options for
+the signal source:
+1. Add a `ContainerStarted` / `ContainerStopped` event type to the bus and
+   publish from the reconciler + deployer. Cleanest, but adds two event types.
+2. Manager polls `docker.ListContainers` every N seconds and diffs. Lazier,
+   robust to missed signals, slightly higher idle CPU. Probably fine.
+
+Pick (1) if you want zero-latency start, (2) if you want fewer moving parts.
+Defaulting to **(2) with 5s poll** — Docker container starts already take
+seconds; sub-second matching is not a requirement.
+
+**Tail goroutine.** On container start: open `ContainerLogs(follow=true,
+tail="0")` with stdout/stderr filters per rules in scope. Read line-by-line via
+`bufio.Scanner`. For each line: run through engine. On container stop or ctx
+cancel: drain and exit.
+
+**Engine.** Holds compiled regexes per rule. For each line:
+- Walk effective ruleset for this workload (see schema below).
+- For each matching rule: check cooldown (`map[ruleID]time.Time`, mutex
+  guarded). If cooled down, insert event_log row + publish + update timestamp.
+- Per-container token bucket (default: 10 events/min/container) to prevent
+  catastrophic event_log floods if a regex is too greedy.
+
+### Schema
+
+Single table, global + override pattern. No separate "overrides" table.
+
+```sql
+CREATE TABLE log_scan_rules (
+  id               INTEGER PRIMARY KEY AUTOINCREMENT,
+  workload_id      TEXT,                  -- NULL = global rule
+  overrides_id     INTEGER,               -- if set, this row overrides a global rule for one workload
+  name             TEXT NOT NULL,
+  pattern          TEXT NOT NULL,         -- regex, compiled at load
+  severity         TEXT NOT NULL,         -- info|warn|error
+  streams          TEXT NOT NULL DEFAULT 'all',  -- all|stdout|stderr
+  cooldown_seconds INTEGER NOT NULL DEFAULT 60,
+  enabled          INTEGER NOT NULL DEFAULT 1,
+  created_at       TEXT NOT NULL,
+  FOREIGN KEY (workload_id) REFERENCES workloads(id) ON DELETE CASCADE,
+  FOREIGN KEY (overrides_id) REFERENCES log_scan_rules(id) ON DELETE CASCADE
+);
+CREATE INDEX idx_log_scan_rules_workload ON log_scan_rules(workload_id);
+CREATE INDEX idx_log_scan_rules_overrides ON log_scan_rules(overrides_id);
+```
+
+**Effective ruleset for workload X:**
+1. All rows where `workload_id IS NULL AND overrides_id IS NULL` (pure globals),
+   *minus* any global that has a row with `workload_id = X AND overrides_id = global.id`.
+2. Plus all rows where `workload_id = X AND overrides_id IS NULL` (workload-only additions).
+3. Plus all override rows where `workload_id = X AND overrides_id IS NOT NULL`
+   (substitute for the global; their fields win, including `enabled=false` to
+   disable the global for this workload).
+
+A pure SQL implementation is doable with a `LEFT JOIN ... WHERE override.id IS
+NULL` for step 1 plus a `UNION ALL` for steps 2 and 3. Or compute in Go after
+two simpler queries — fine since rule counts will be small.
+
+### Output
+
+Scanner calls `store.InsertEvent` with:
+- `Source = "logscan"`
+- `Severity` from the matched rule
+- `Message` = raw matched line (truncated to ~500 chars)
+- `Metadata` JSON = `{"workload_id": ..., "container_id": ..., "rule_id": ..., "rule_name": ..., "captures": {...}}`
+
+Then `bus.Publish(EventLog, payload)`. This reuses exactly the path
+[internal/events/bus.go:158](../internal/events/bus.go#L158)
+(`RegisterPersistentLogger`) already established. SSE clients see it live, and
+the dispatcher from feature B picks it up.
+
+### Hot-reload
+
+When a rule is created/updated/deleted via the API, the manager must rebuild
+the effective ruleset for affected containers. Cheapest path: a single
+`*atomic.Pointer[ruleSnapshot]` shared across tails, replaced wholesale on any
+rule change. Each tail dereferences the snapshot per line — no locking on the
+hot path.
+
+---
+
+## B. Event triggers — BACKEND LANDED
+
+Status:
+
+- **Schema + store CRUD** — `internal/store/event_triggers.go` + table
+  creation in `internal/store/store.go` `observabilityTables`. Model:
+  `EventTrigger` in `internal/store/models.go`.
+- **Dispatcher** — `internal/events/dispatcher.go`
+  `RegisterEventTriggerDispatcher(bus, triggerSource, notifier)`.
+  Filter eval is AND-composed across severity (CSV), source (CSV), and
+  optional message regex. Compiled regexes are memoized.
+- **Webhook delivery** — extended `notify.Notifier` with
+  `SendPayload(url, secret, eventType, payload)` which reuses the
+  existing HMAC + headers infra (`X-Hub-Signature-256`, etc.). New
+  `TierEventTrigger` tier is recorded for telemetry / audit.
+- **Loop-prevention** — dispatcher does **not** call `InsertEvent`.
+  Delivery outcomes go through the notifier's existing logging only.
+- **API** — `internal/api/event_triggers.go` with admin-gated mutations:
+
+```http
+GET    /api/event-triggers
+POST   /api/event-triggers
+GET    /api/event-triggers/{id}
+PATCH  /api/event-triggers/{id}
+DELETE /api/event-triggers/{id}
+POST   /api/event-triggers/{id}/test     — synthetic event_log → notifier.SendSyncForTest
+```
+
+- **Wired in main.go** next to `RegisterPersistentLogger`.
+- **Tests** — `internal/events/dispatcher_test.go`: 10 cases covering
+  filter eval, regex caching, dispatcher fan-out, unsupported
+  action_type, trigger-source errors. CSV filter helper has dedicated
+  table-driven coverage.
+
+**Frontend still pending** — `/event-triggers` list + detail + new
+pages, the Send-test UX, i18n keys. Not touched this turn.
+
+### Where it plugs in
+
+Mirrors the `RegisterPersistentLogger` shape at
+[internal/events/bus.go:158](../internal/events/bus.go#L158):
+
+```go
+func RegisterEventTriggerDispatcher(b *Bus, triggers TriggerSource, notifier Notifier) func() {
+    sub := b.Subscribe(func(evt Event) bool { return evt.Type == EventLog })
+    go func() {
+        for evt := range sub {
+            payload, ok := evt.Payload.(EventLogPayload)
+            if !ok { continue }
+            for _, t := range triggers.Enabled() {
+                if t.matches(payload) {
+                    notifier.Send(t.ActionTarget, buildBody(t, payload))
+                }
+            }
+        }
+    }()
+    return func() { b.Unsubscribe(sub) }
+}
+```
+
+Reuses the existing notifier at
+[internal/notify/notifier.go](../internal/notify/notifier.go) — including the
+signed-delivery and `webhook_deliveries` audit trail.
+
+### Schema
+
+```sql
+CREATE TABLE event_triggers (
+  id                    INTEGER PRIMARY KEY AUTOINCREMENT,
+  name                  TEXT NOT NULL,
+  filter_severity       TEXT,            -- nullable; comma-list like 'warn,error'
+  filter_source         TEXT,            -- nullable; comma-list like 'logscan,deploy'
+  filter_message_regex  TEXT,            -- nullable; matched against message
+  action_type           TEXT NOT NULL,   -- 'webhook' | 'notification_channel'
+  action_target         TEXT NOT NULL,   -- URL or channel ID
+  enabled               INTEGER NOT NULL DEFAULT 1,
+  created_at            TEXT NOT NULL
+);
+```
+
+Filters AND together. Empty filters match all.
+
+### Loop-prevention
+
+**Critical constraint: the dispatcher must not write to event_log.** All
+delivery successes / failures land in `webhook_deliveries` (existing table) so
+the audit trail is preserved without risking trigger recursion. Keeps the
+boundary crisp:
+
+- `event_log` = system observing itself
+- `webhook_deliveries` = system talking to the outside
+
+If a user-visible "trigger fired" entry is desired in the events UI, add a
+*read-only join* from `webhook_deliveries` into the events page rather than
+writing event_log rows.
+
+---
+
+## What to defer
+
+| Item | Why | Add when |
+|---|---|---|
+| Multi-line stack trace coalescing | Real rabbit hole (which lines belong together?). | Real user pain. |
+| Capture-group templating in messages (`{{.captures.code}}`) | v1 stores captures in metadata, displays raw line. | Once real rules exist and patterns emerge. |
+| Backfilling history search | This is Loki/Grafana scope-creep. | Never (push to Loki instead if it comes up). |
+| Per-rule alert routing | v1 fans out by `(severity, source)` filter on trigger side. | When users want one rule → one channel. |
+| YAML config-as-code | Tinyforge is UI-driven everywhere else. | Probably never. |
+| Retry / backoff on trigger delivery failure | Notifier already handles delivery; whether *triggers* retry is a separate question. | If trigger reliability becomes an SLO. |
+
+---
+
+## UI footprint
+
+All boolean inputs use `ToggleSwitch` per project CLAUDE.md. All destructive
+actions use `ConfirmDialog` per memory note (no inline Yes/No strips).
+
+### New pages
+
+- **`/log-scan-rules`** — list with severity / workload filter, "+ New rule" button.
+  - Detail page: name, pattern (regex with live test box that takes a sample log line), severity, streams, cooldown, enabled toggle, scope picker (global / workload).
+- **`/event-triggers`** — list, "+ New trigger" button.
+  - Detail page: name, filters (severity multiselect, source multiselect, optional message regex), action type, action target, enabled toggle.
+
+### Augmentations
+
+- **Workload detail page** (`/apps/[id]`): new "Log Rules" tab/panel listing
+  effective rules for this workload. Each global shows an "Override for this
+  workload" button. Each override / workload-only shows edit + delete.
+- **Events page** (`/events`): entries with `source=logscan` get a small icon
+  + tooltip showing rule name. Click → jumps to rule detail.
+- **Settings sidebar**: links to `/log-scan-rules` and `/event-triggers` under
+  a new "Observability" group.
+
+### i18n keys to add
+
+Roughly 40–60 keys across `en.json` + `ru.json`. Namespace: `logscan.*` and
+`triggers.*`.
+
+---
+
+## API surface
+
+```
+GET    /api/log-scan-rules                 — list (filter: ?workload_id=, ?global=true)
+POST   /api/log-scan-rules                 — create
+GET    /api/log-scan-rules/{id}            — detail
+PATCH  /api/log-scan-rules/{id}            — update
+DELETE /api/log-scan-rules/{id}            — delete
+POST   /api/log-scan-rules/{id}/test       — body: {sample_line}; returns matched: bool, captures
+GET    /api/workloads/{id}/effective-rules — computed effective ruleset for a workload
+
+GET    /api/event-triggers                 — list
+POST   /api/event-triggers                 — create
+GET    /api/event-triggers/{id}            — detail
+PATCH  /api/event-triggers/{id}            — update
+DELETE /api/event-triggers/{id}            — delete
+POST   /api/event-triggers/{id}/test       — dispatches a synthetic event to verify the action target
+```
+
+`POST .../test` endpoints are worth shipping in v1 — they make the rule /
+trigger editing UX dramatically nicer and avoid "did I get the regex right?"
+deploy-and-pray cycles.
+
+---
+
+## File pointers (when work starts)
+
+**Backend, new:**
+- `internal/logscanner/{manager,tail,engine,rules}.go`
+- `internal/api/log_scan_rules.go`
+- `internal/api/event_triggers.go`
+- `internal/store/log_scan_rules.go`
+- `internal/store/event_triggers.go`
+- `internal/events/dispatcher.go` (or extend `bus.go` with `RegisterEventTriggerDispatcher`)
+
+**Backend, modified:**
+- [internal/docker/container.go:362](../internal/docker/container.go#L362) — expose stream selection on `ContainerLogs`
+- [internal/api/router.go](../internal/api/router.go) — register new routes
+- [cmd/server/main.go](../cmd/server/main.go) — wire `RegisterEventTriggerDispatcher` next to `RegisterPersistentLogger`, start `logscanner.Manager`
+- migrations: `internal/store/migrations/00XX_log_scan_rules.sql`, `00XX_event_triggers.sql`
+
+**Frontend, new:**
+- `web/src/routes/log-scan-rules/+page.svelte`, `[id]/+page.svelte`, `new/+page.svelte`
+- `web/src/routes/event-triggers/+page.svelte`, `[id]/+page.svelte`, `new/+page.svelte`
+- `web/src/lib/components/LogRulePanel.svelte` (workload detail tab)
+- `web/src/lib/components/RegexTestBox.svelte` (reusable)
+
+**Frontend, modified:**
+- `web/src/routes/apps/[id]/+page.svelte` — add Log Rules tab
+- `web/src/routes/events/+page.svelte` — logscan source icon + rule tooltip
+- `web/src/routes/+layout.svelte` — Observability nav group
+- `web/src/lib/i18n/{en,ru}.json` — new key namespaces
+- `web/src/lib/api.ts`, `web/src/lib/types.ts` — typed clients
+
+---
+
+## Open questions to revisit before coding
+
+1. **Container start/stop signal source** — bus events (low latency, two new
+   event types) vs polling (simpler, ~5s latency). Tentative: polling.
+2. **Trigger delivery retry** — does the dispatcher retry on webhook failure,
+   or is one shot enough since `webhook_deliveries` records failures? Tentative:
+   one shot v1; revisit if reliability complaints surface.
+3. **Where does the "logscan source icon" link go on the events page** — rule
+   detail page, or the workload's effective-rules tab? Latter is probably more
+   useful since it shows context.
+
+---
+
+## Memory pointer
+
+Add a memory after this lands describing the event_log = observe-self,
+webhook_deliveries = talk-to-outside boundary — it's the kind of invariant
+that's easy to violate accidentally when adding new event types later.
@@ -0,0 +1,334 @@
+# Workload-First Refactor — Remaining Work
+
+Handoff for resuming the refactor. The plugin architecture (Source × Trigger),
+`/api/workloads` surface, `/apps` UI, env/volume/webhook/logs/chain panels,
+multi-face proxy routes, blue-green image deploys, schema-driven wizard, and
+test coverage on triggers / image helpers / webhook parser / store upserts are
+**already landed and live**. What follows is what's still pending, in priority
+order.
+
+## Status at a glance
+
+| Item | Priority | Status |
+| ---- | -------- | ------ |
+| Static source inline port | 1 | **PENDING** — only remaining blocker for hard cutover |
+| Hard legacy cutover | 1 | **PENDING** — gated by static port (volume scopes blocker is resolved) |
+| Generalized volume scopes | 2 | DONE |
+| Kind-aware editors (compose / image / static) | 2 | DONE |
+| Vendor-specific webhook parsing | 2 | DONE |
+| Chain-panel CSS | 3 | DONE |
+| Log Rules panel on `/apps/[id]` | adjacent | DONE — uses `getEffectiveLogScanRules` + per-workload override action |
+| i18n for `/apps/*` page strings | 3 | **PARTIAL** — Log Rules panel + Observability surfaces i18n'd; `apps.*` namespace still pending |
+| Docs / codemap entries for `internal/workload/plugin/` | 3 | **PENDING** |
+| API-handler / dispatcher / compose-source / static-backend tests | 4 | **PENDING** |
+| Triggers as first-class reusable entities (post-cutover) | 5 | **PENDING** |
+
+Cross-references to the adjacent Observability work (Event Triggers + Log
+Scanner backend + drop-counter stats panel) live in
+[docs/LOGSCAN_AND_TRIGGERS_TODO.md](LOGSCAN_AND_TRIGGERS_TODO.md).
+
+## Priority 1 — Architecture unlock
+
+### Static source inline port — ~2150 LOC across 8 files
+
+The current `internal/workload/plugin/source/static/` delegates to
+`staticsite.Manager` via a phantom-row adapter
+(`cmd/server/static_backend.go`) that keeps a synthetic row in the legacy
+`static_sites` table per workload. This works but blocks the hard cutover —
+you can't drop `static_sites` until the adapter is gone.
+
+To port inline, the deploy pipeline body has to move into
+`internal/workload/plugin/source/static/`:
+
+| Source file | Lines | What to keep / port |
+| --- | --- | --- |
+| `internal/staticsite/manager.go` | 834 | Deploy / Stop / status pipeline. State should move to `containers` rows + `workload_env` instead of `static_sites`. |
+| `internal/staticsite/gitea_content.go` | 360 | Keep as helper — Gitea content download/listing. |
+| `internal/staticsite/github_provider.go` | 276 | Keep as helper. |
+| `internal/staticsite/gitlab_provider.go` | 254 | Keep as helper. |
+| `internal/staticsite/healthcheck.go` | 111 | Convert to plugin Reconcile body. |
+| `internal/staticsite/markdown.go` | 83 | Keep as helper. |
+| `internal/staticsite/provider.go` | 171 | Keep — provider abstraction. |
+| `internal/staticsite/deno/` | (sub-pkg) | Keep — Dockerfile + router.ts codegen. |
+
+Estimated as its own dedicated turn (or two). Strategy: keep the provider
+abstraction + helpers exported; rewrite only `Manager.Deploy` body into a new
+`source/static/deploy.go` that operates against `plugin.Workload` directly and
+writes container rows + workload_env rather than the `static_sites` table.
+
+### Hard legacy cutover
+
+Sole remaining blocker is the static source inline port above. The
+generalized-volume-scopes blocker is resolved (legacy `ResolvePath`
+stays in place for legacy callers and dies with the cutover). When the
+static port lands:
+
+- Delete `/api/projects`, `/api/stacks`, `/api/sites`, `/api/stages` handlers.
+- Drop tables: `projects`, `stages`, `stacks`, `stack_revisions`,
+  `stack_deploys`, `static_sites`, `static_site_secrets`, `deploys`,
+  `poll_states`.
+- Delete `internal/stack/`, `internal/staticsite/` packages.
+- Delete frontend `/projects`, `/sites`, `/stacks` routes.
+- Delete legacy `volume.ResolvePath` + `internal/api/volume_browser.go`
+  callers (the only remaining users).
+
+## Priority 2 — Behavior gaps
+
+### ~~Generalized volume scopes~~ — DONE
+
+Landed: `internal/volume.ResolveWorkloadPath` (workload-keyed; sits next to the
+legacy `ResolvePath` so legacy code paths keep working) plus the wired-through
+`computeMounts` in `internal/workload/plugin/source/image/image.go`. All
+`VolumeScope` values are now honored at deploy time:
+
+- `absolute` — host bind, validated against `settings.AllowedVolumePaths`.
+- `ephemeral` — tmpfs.
+- `instance` — per-tag dir under `<base>/<workload>-<idShort>/instance-<tag>/<source>`.
+- `stage`, `project` — both collapse to `<base>/<workload>-<idShort>/<source>`.
+- `project_named` — Docker named volume prefixed `tf-<idShort>-<name>`.
+- `named` — Docker named volume by raw name.
+
+Test coverage: `internal/volume/resolver_test.go` (table-driven, portable
+Linux/Windows). The legacy `ResolvePath` stays in place for legacy deployer +
+volume-browser callers and dies with the hard cutover.
+
+### ~~Kind-aware editors on `/apps/new` and `/apps/[id]` edit~~ — DONE
+
+All three Source plugins now have hand-rolled forms on both pages, with
+an "Advanced JSON" toggle preserved as the power-user escape hatch.
+Submit logic marshals form fields back into the same JSON shape the
+backend already expects — no API or store changes required.
+
+**Principle:** the plugin contract makes new Source / Trigger kinds cheap
+on the backend, but the UI is not cheap by default — every kind needs a
+paired hand-rolled form to be daily-driver usable. The shared JSON
+editor is the fallback for power users and brand-new plugins, not the
+end state. New Source / Trigger merge requests should treat "ship the
+kind-aware form" as part of done, not a follow-up.
+
+**Landed:**
+
+- `compose`: YAML textarea + project_name input on both `/apps/new`
+  and `/apps/[id]`.
+- `image`: form fields for image / port / healthcheck / default_tag /
+  registry_name / cpu_limit / memory_limit / max_instances on both
+  pages. Registry name is a select populated from `/api/registries`
+  (with text-input fallback when the list is empty). env + volumes
+  stay in their detail-page panels and round-trip through the form
+  via `imageFormBody` so manual edits aren't clobbered.
+- `static`: provider select (gitea / github / gitlab), base URL,
+  repo_owner / repo_name (both required), branch (default "main"),
+  folder_path, access_token (password input, for private repos),
+  mode radio (static / deno), render_markdown checkbox. The
+  storage_enabled / storage_limit_mb fields aren't surfaced as
+  form controls yet, but they round-trip through `staticFormBody`
+  so values set via the raw JSON editor survive form edits.
+
+**Still pending forms:** none — all three Source plugins now have
+hand-rolled forms on both `/apps/new` and `/apps/[id]`.
+
+The raw JSON editor stays available behind the "Advanced JSON" toggle
+(shipped with compose) so the plugin's full sample is still reachable
+for power users and for any new plugin kind without a hand-rolled form.
+
+Effort: per-kind form roughly half a turn each; can land incrementally.
+Touches `web/src/routes/apps/new/+page.svelte` and the edit block in
+`web/src/routes/apps/[id]/+page.svelte`. The Svelte side keeps
+serializing into the same `source_config` JSON shape the backend
+already expects — no API or store change required.
+
+### ~~Vendor-specific webhook parsing for `/api/webhook/workloads/{secret}`~~ — DONE
+
+Landed: `internal/webhook/vendor_parsers.go` plus rewrites in
+`internal/webhook/handler.go` `buildInboundEvent`. The dispatch order is now:
+
+1. Empty body → manual event.
+2. Vendor-specific parsers, short-circuit on a recognized `X-*-Event`
+   header — Gitea package, GitHub `package` / `registry_package`, GitHub
+   push, Gitea push, GitLab `Push Hook` / `Tag Push Hook`.
+3. Generic simple-body fallback: top-level `image` or top-level `ref` —
+   what the legacy CI integrations already send.
+
+Vendor parsers can populate fields the generic parser cannot: image
+digest, `GitEvent.Vendor`, registry host. When a vendor parser claims a
+request (header matches) it is authoritative — a malformed Gitea
+package payload surfaces as an error rather than silently falling
+through to the generic parser. Test coverage:
+`internal/webhook/vendor_parsers_test.go` covers each vendor branch +
+the routed-via-`buildInboundEvent` integration cases.
+
+Open follow-ups deferred to future turns:
+
+- GitLab Container Registry events use a custom envelope outside the
+  webhook event surface — handle if a user reports needing it.
+- Docker Hub webhook (push event) uses `{"push_data": {"tag": ...}, "repository": {...}}` — add when there's a user request.
+
+## Priority 3 — Polish
+
+### ~~Chain-panel CSS~~ — DONE
+
+Landed: rules for `.chain-row`, `.chain-card` (with hover/transform on
+anchors), `.chain-self` (brand-tinted highlight), `.chain-name`,
+`.chain-label` (70px fixed-width mono column), `.chain-children-list`
+(flex-wrap), plus a sub-600px stack to keep the panel usable on narrow
+screens. Appended at the end of the `<style>` block in
+`web/src/routes/apps/[id]/+page.svelte`.
+
+### Docs / codemap entries
+
+Nothing under `docs/CODEMAPS/` for `internal/workload/plugin/`. Should cover:
+
+- The Source × Trigger contract + registry pattern (`init()` + blank-import in
+  `cmd/server/main.go`).
+- How a new Source kind is added (write `init()` registration, blank-import,
+  add to wizard via `SchemaSample`).
+- The dispatcher seam: `deployer.DispatchPlugin` / `DispatchTeardown` /
+  `DispatchReconcile` and how the reconciler / webhook ingress / API
+  handlers all flow through it.
+
+`README.md` should mention `/apps` as the new user surface and that
+`/projects` / `/sites` / `/stacks` carry `Deprecation: true` headers.
+
+### i18n: page-level strings — PARTIAL
+
+Already i18n'd:
+
+- `nav.apps`, `nav.eventTriggers`, `nav.logScanRules` — top nav labels.
+- Log Rules panel on `/apps/[id]` reuses `logscan.panel.*` keys
+  (shipped with the Observability work).
+- All `/event-triggers/*` and `/log-scan-rules/*` page strings — keys
+  live under `triggers.*` and `logscan.*` namespaces in
+  `web/src/lib/i18n/{en,ru}.json`.
+
+Still hardcoded English:
+
+- `/apps/+page.svelte` — list page (hero, lede, stats, empty state,
+  table headers, status pills).
+- `/apps/new/+page.svelte` — wizard labels, form copy, kind-aware
+  form rows (compose / image / static all hardcoded English today).
+- `/apps/[id]/+page.svelte` — detail page sections (chain, env,
+  volumes, webhook, manual deploy, danger zone) — the Log Rules
+  panel embedded inside it is the only i18n'd section.
+
+Roughly 80–100 keys across the three `/apps/*` pages once extracted.
+Namespace: `apps.*` (with sub-namespaces `apps.list.*`, `apps.new.*`,
+`apps.detail.*`, `apps.form.*`).
+
+## Priority 4 — Tests we still don't have
+
+Solid pure-function coverage landed in the prior turn. Still missing:
+
+- **API-handler integration tests** for `/api/workloads/*` (CRUD, deploy,
+  env, volumes, webhook, chain, promote-from). Pattern: in-memory store +
+  fake deployer + fake docker / proxy / dns providers, exercise via
+  `httptest`.
+- **Deployer dispatcher**: `DispatchPlugin` / `DispatchTeardown` /
+  `DispatchReconcile` with a fake Source registered.
+- **Compose source**: `composeProjectName` sanitizer, `writeYAMLIfChanged`
+  short-circuit. (Both pure; just need fixtures.)
+- **Static source Backend adapter** in `cmd/server/static_backend.go`.
+
+## Priority 5 — Post-cutover roadmap
+
+### Triggers as first-class reusable entities
+
+Today a trigger's config lives embedded in the workload row
+(`workload.trigger_kind` plus `workload.trigger_config` JSON via the plugin
+contract). One workload owns exactly one trigger; one trigger serves exactly
+one workload. This couples two concepts that users increasingly want
+orthogonal:
+
+- One **inbound webhook** fanning out to several workloads (a single CI push
+  rebuilds dev + staging together).
+- One **registry watcher** driving multiple workloads off the same image
+  (different tag filters per binding, shared poll state).
+- One **schedule** kicking off a batch of jobs.
+- One **git push** filter shared by sibling stack services.
+
+**Direction:** promote triggers to their own table with a join.
+
+- `triggers` — `id`, `kind` (registry / git / webhook / schedule / manual /
+  log_scan), `config` JSON, `secret`, `created_at`, audit fields.
+- `workload_trigger_bindings` — `workload_id`, `trigger_id`, `binding_config`
+  JSON (per-binding overrides: tag filter, path filter, branch filter), plus
+  ordering / enabled flag.
+
+The dispatcher seam stays unchanged — `deployer.DispatchPlugin` still receives
+a `(Workload, TriggerEvent)` pair; the only change is that the event's source
+is resolved through the binding row instead of the workload row.
+
+**UX principle: first-class on the backend, inline by default in the UI.**
+The workload create/edit form still has an "Add trigger" control that creates
+a fresh trigger record in one step, so the 1:1 case (git push → this workload)
+feels unchanged from today. Reuse is **opt-in** via a "Pick existing trigger"
+picker on the same control. Triggers also get their own list/detail pages under
+`/triggers` so the fan-out cases are discoverable and centrally manageable
+(rotate secret once, audit once).
+
+**Per-kind modal applies, same rule as Source plugins** — the create/edit
+form for a trigger switches body by `kind` (git: repo / branch / path;
+registry: image / tag regex; webhook: secret + payload preview; schedule:
+cron). Backend cheap, UI requires a paired hand-rolled form per kind. Treat
+"ship the kind-aware form" as part of done for any new trigger kind.
+
+**Migration:** clean break (no migration) per the workload-first memory —
+at cutover, each workload's embedded trigger config becomes a single
+auto-created trigger record with a single binding row. No user-visible change
+on day one; reuse becomes possible thereafter.
+
+**Sequencing:** lands **after** the Priority 1 hard cutover. The embedded
+trigger config works fine for the 1:1 case that dominates today; the
+static-source inline port is the higher-value blocker. Treat this as the
+next major arc once cutover ships.
+
+**Touch points to expect:**
+
+- `internal/workload/plugin/trigger/*` — kind handlers stay; only their input
+  shape changes (read from binding + trigger row, not workload row).
+- `internal/store/` — new `triggers` + `workload_trigger_bindings` tables and
+  CRUD; remove `trigger_kind` / `trigger_config` from the workload row.
+- `internal/api/workloads.go` — adapt the workload create/edit handlers to
+  accept either "inline new trigger" or "bind existing trigger" payloads.
+- New `/api/triggers` surface + `/triggers` frontend pages.
+- `internal/webhook/handler.go` — inbound webhook now resolves to a trigger,
+  fans out to all bound workloads.
+- `internal/reconciler/reconciler.go` — registry watchers iterate triggers,
+  not workloads; each trigger may fire N bindings.
+
+## Open architectural questions
+
+### Stages chain vs explicit Stage entity
+
+`parent_workload_id` is now the canonical mechanism for stage chains
+(dev → staging → prod). Decision deferred: do we need a separate `Stage`
+entity at all, or is the chain sufficient? Currently feels like the chain
+covers the use case — `promote-from` works, the UI shows the relationship.
+Probably can leave the legacy `stages` table dropped entirely once cutover
+proceeds.
+
+### `Container.extra_json` evolution
+
+Currently only the image source uses it (per-face proxy route IDs). If
+other sources gain similar needs (compose service health metadata, static
+build SHAs), the schema there should stay versionless and additive — every
+reader must tolerate unknown keys. Document this in the source plugin
+guide alongside the codemap entry.
+
+## File pointers for the next session
+
+- Plugin contracts: `internal/workload/plugin/{plugin,source,trigger,types,registry}.go`
+- Source implementations: `internal/workload/plugin/source/{image,compose,static}/`
+- Trigger implementations: `internal/workload/plugin/trigger/{registry,git,manual}/`
+- Dispatcher: `internal/deployer/dispatch.go`
+- Webhook ingress (plugin path): `internal/webhook/handler.go` `handlePluginWorkloadWebhook`
+- Reconciler hook: `internal/reconciler/reconciler.go` `reconcilePluginWorkloads`
+- Static backend adapter (to be deleted post-port): `cmd/server/static_backend.go`
+- Frontend pages: `web/src/routes/apps/+page.svelte`, `web/src/routes/apps/new/+page.svelte`, `web/src/routes/apps/[id]/+page.svelte`
+- Tests: `internal/workload/plugin/trigger/*/!(_test).go`, `internal/workload/plugin/source/image/image_helpers_test.go`, `internal/webhook/inbound_event_test.go`, `internal/store/workload_env_test.go`
+
+## Memory pointer
+
+Memory at
+`C:/Users/Alexei/.claude/projects/c--Users-Alexei-Documents-docker-watcher/memory/`
+already covers the Workload-first decision and the no-migration constraint.
+Refresh as the cutover lands.