docs: workload refactor + observability progress
Build / build (push) Successful in 10m40s

Two design + handoff docs:

- docs/WORKLOAD_REFACTOR_TODO.md — status-at-a-glance table
  showing what's done (volume scopes, kind-aware editors,
  vendor webhook parsing, chain-panel CSS, Log Rules panel)
  and what's still pending (static source inline port + the
  hard legacy cutover gated on it; codemap entries; /apps
  page-level i18n; Priority 4 integration tests).

- docs/LOGSCAN_AND_TRIGGERS_TODO.md — companion design + status
  doc for the two Observability features. Records the
  loop-prevention invariant (event_log = system observing
  itself, webhook_deliveries = system talking to outside) so
  the next contributor doesn't accidentally break it by adding
  a new EventLog subscriber that re-publishes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-11 22:18:51 +03:00
parent 4707db1c3b
commit 30133bc1eb
2 changed files with 719 additions and 0 deletions
+385
View File
@@ -0,0 +1,385 @@
# Log Scanner + Event Triggers — Design Handoff
Two related features. They can ship independently, but were designed together
because they share the event_log seam.
- **A. Log scanner** — tail container logs, match against rules, emit event_log
entries. Producer of events.
- **B. Event triggers** — turn event_log entries into webhook / notification
dispatches. Consumer of events. Generalizes the existing
`RegisterPersistentLogger` pattern.
Either half is useful alone:
- A without B = errors get surfaced in the events UI, no external delivery.
- B without A = manual + reconciler + deploy events can drive notifications.
Recommended ship order: B first (smaller, self-contained generalization), then
A (more moving parts, depends on container-lifecycle hooks).
---
## A. Log scanner — BACKEND LANDED
Status:
- **Schema + store CRUD** — `internal/store/log_scan_rules.go` +
`log_scan_rules` table added to the `observabilityTables` block.
Includes the `EffectiveLogScanRules(workloadID)` helper that
resolves global rules minus per-workload overrides plus workload-
only additions in one Go-side pass.
- **Stream-selectable docker reads** — `internal/docker/container.go`
`ContainerLogsOpts` accepts a `ContainerLogOptions{ShowStdout,
ShowStderr, Follow, Tail}` so the scanner can subscribe to one
stream when a rule scopes itself to stdout or stderr. The legacy
`ContainerLogs` is preserved as a thin wrapper for back-compat.
- **Engine** — `internal/logscanner/engine.go`: per-rule cooldown
(keyed on container+rule), per-container token bucket (default 10
events / 60s, override-able), regex match per line, hits returned
for the manager to persist. Pure logic, fully unit-tested.
- **Tail goroutine** — `internal/logscanner/tail.go`: per-container
loop reading docker's multiplexed log frames (with TTY fallback),
strips the prepended RFC3339 timestamp, runs every line through the
engine + snapshot. Exits on container stop or context cancel.
- **Manager** — `internal/logscanner/manager.go`: 5s polling diff
against `ListContainers(state=running)`, atomic.Pointer[Snapshot]
hot-reload, structural HitEmitter that writes event_log rows AND
publishes `EventLog` on the bus (so event-trigger dispatchers can
pick them up immediately).
- **API** — `internal/api/log_scan_rules.go`: full CRUD,
`/test` endpoint accepting `{"sample_line": "..."}` and returning
matched/captures, plus
`GET /api/workloads/{id}/effective-rules` for the workload detail
page's future Log Rules tab. Admin-gated mutations.
- **Wired in main.go** before the API server is constructed so the
reload callback is plugged via `apiServer.SetLogScanReloader`.
- **Loop-prevention** — Same boundary as feature B: scanner publishes
EventLog events, dispatcher consumes them, neither writes to
event_log on the consume side.
- **Tests** — `internal/logscanner/{engine,rules}_test.go` cover
cooldown isolation, token bucket refill, stream filtering,
override-replaces-global, disabled-override-suppresses-global,
compile-error reporting. `internal/store/log_scan_rules_test.go`
covers validation + cascade delete.
**Frontend still pending**`/log-scan-rules` pages, regex test box
component, Log Rules tab on `/apps/[id]`, i18n keys. Not touched this
turn.
### Where it plugs in
[internal/docker/container.go:362](../internal/docker/container.go#L362) already
exposes `ContainerLogs(ctx, id, follow=true, tail)`. The existing SSE handler at
[internal/api/workloads.go:43](../internal/api/workloads.go#L43)
(`streamWorkloadContainerLogs`) is per-viewer and dies on browser disconnect —
**do not hook the scanner there**. The scanner is a separate long-lived
subsystem owned by the server process.
Minor required change to `ContainerLogs`: expose `ShowStdout` / `ShowStderr` as
caller-controlled. Currently hardcoded to `true`/`true`. Single existing caller
passes "both" → no friction. Add an options struct or two booleans.
### New package: `internal/logscanner/`
```
internal/logscanner/
manager.go — Manager: map[containerID]*tail, lifecycle hooks
tail.go — per-container goroutine; reads logs, fans to engine
engine.go — rule evaluation + cooldown + rate limit
rules.go — Rule struct, regex compile cache, effective-set resolver
```
**Manager lifecycle.** Subscribes to container start/stop signals. Options for
the signal source:
1. Add a `ContainerStarted` / `ContainerStopped` event type to the bus and
publish from the reconciler + deployer. Cleanest, but adds two event types.
2. Manager polls `docker.ListContainers` every N seconds and diffs. Lazier,
robust to missed signals, slightly higher idle CPU. Probably fine.
Pick (1) if you want zero-latency start, (2) if you want fewer moving parts.
Defaulting to **(2) with 5s poll** — Docker container starts already take
seconds; sub-second matching is not a requirement.
**Tail goroutine.** On container start: open `ContainerLogs(follow=true,
tail="0")` with stdout/stderr filters per rules in scope. Read line-by-line via
`bufio.Scanner`. For each line: run through engine. On container stop or ctx
cancel: drain and exit.
**Engine.** Holds compiled regexes per rule. For each line:
- Walk effective ruleset for this workload (see schema below).
- For each matching rule: check cooldown (`map[ruleID]time.Time`, mutex
guarded). If cooled down, insert event_log row + publish + update timestamp.
- Per-container token bucket (default: 10 events/min/container) to prevent
catastrophic event_log floods if a regex is too greedy.
### Schema
Single table, global + override pattern. No separate "overrides" table.
```sql
CREATE TABLE log_scan_rules (
id INTEGER PRIMARY KEY AUTOINCREMENT,
workload_id TEXT, -- NULL = global rule
overrides_id INTEGER, -- if set, this row overrides a global rule for one workload
name TEXT NOT NULL,
pattern TEXT NOT NULL, -- regex, compiled at load
severity TEXT NOT NULL, -- info|warn|error
streams TEXT NOT NULL DEFAULT 'all', -- all|stdout|stderr
cooldown_seconds INTEGER NOT NULL DEFAULT 60,
enabled INTEGER NOT NULL DEFAULT 1,
created_at TEXT NOT NULL,
FOREIGN KEY (workload_id) REFERENCES workloads(id) ON DELETE CASCADE,
FOREIGN KEY (overrides_id) REFERENCES log_scan_rules(id) ON DELETE CASCADE
);
CREATE INDEX idx_log_scan_rules_workload ON log_scan_rules(workload_id);
CREATE INDEX idx_log_scan_rules_overrides ON log_scan_rules(overrides_id);
```
**Effective ruleset for workload X:**
1. All rows where `workload_id IS NULL AND overrides_id IS NULL` (pure globals),
*minus* any global that has a row with `workload_id = X AND overrides_id = global.id`.
2. Plus all rows where `workload_id = X AND overrides_id IS NULL` (workload-only additions).
3. Plus all override rows where `workload_id = X AND overrides_id IS NOT NULL`
(substitute for the global; their fields win, including `enabled=false` to
disable the global for this workload).
A pure SQL implementation is doable with a `LEFT JOIN ... WHERE override.id IS
NULL` for step 1 plus a `UNION ALL` for steps 2 and 3. Or compute in Go after
two simpler queries — fine since rule counts will be small.
### Output
Scanner calls `store.InsertEvent` with:
- `Source = "logscan"`
- `Severity` from the matched rule
- `Message` = raw matched line (truncated to ~500 chars)
- `Metadata` JSON = `{"workload_id": ..., "container_id": ..., "rule_id": ..., "rule_name": ..., "captures": {...}}`
Then `bus.Publish(EventLog, payload)`. This reuses exactly the path
[internal/events/bus.go:158](../internal/events/bus.go#L158)
(`RegisterPersistentLogger`) already established. SSE clients see it live, and
the dispatcher from feature B picks it up.
### Hot-reload
When a rule is created/updated/deleted via the API, the manager must rebuild
the effective ruleset for affected containers. Cheapest path: a single
`*atomic.Pointer[ruleSnapshot]` shared across tails, replaced wholesale on any
rule change. Each tail dereferences the snapshot per line — no locking on the
hot path.
---
## B. Event triggers — BACKEND LANDED
Status:
- **Schema + store CRUD** — `internal/store/event_triggers.go` + table
creation in `internal/store/store.go` `observabilityTables`. Model:
`EventTrigger` in `internal/store/models.go`.
- **Dispatcher** — `internal/events/dispatcher.go`
`RegisterEventTriggerDispatcher(bus, triggerSource, notifier)`.
Filter eval is AND-composed across severity (CSV), source (CSV), and
optional message regex. Compiled regexes are memoized.
- **Webhook delivery** — extended `notify.Notifier` with
`SendPayload(url, secret, eventType, payload)` which reuses the
existing HMAC + headers infra (`X-Hub-Signature-256`, etc.). New
`TierEventTrigger` tier is recorded for telemetry / audit.
- **Loop-prevention** — dispatcher does **not** call `InsertEvent`.
Delivery outcomes go through the notifier's existing logging only.
- **API** — `internal/api/event_triggers.go` with admin-gated mutations:
```http
GET /api/event-triggers
POST /api/event-triggers
GET /api/event-triggers/{id}
PATCH /api/event-triggers/{id}
DELETE /api/event-triggers/{id}
POST /api/event-triggers/{id}/test synthetic event_log notifier.SendSyncForTest
```
- **Wired in main.go** next to `RegisterPersistentLogger`.
- **Tests** — `internal/events/dispatcher_test.go`: 10 cases covering
filter eval, regex caching, dispatcher fan-out, unsupported
action_type, trigger-source errors. CSV filter helper has dedicated
table-driven coverage.
**Frontend still pending**`/event-triggers` list + detail + new
pages, the Send-test UX, i18n keys. Not touched this turn.
### Where it plugs in
Mirrors the `RegisterPersistentLogger` shape at
[internal/events/bus.go:158](../internal/events/bus.go#L158):
```go
func RegisterEventTriggerDispatcher(b *Bus, triggers TriggerSource, notifier Notifier) func() {
sub := b.Subscribe(func(evt Event) bool { return evt.Type == EventLog })
go func() {
for evt := range sub {
payload, ok := evt.Payload.(EventLogPayload)
if !ok { continue }
for _, t := range triggers.Enabled() {
if t.matches(payload) {
notifier.Send(t.ActionTarget, buildBody(t, payload))
}
}
}
}()
return func() { b.Unsubscribe(sub) }
}
```
Reuses the existing notifier at
[internal/notify/notifier.go](../internal/notify/notifier.go) — including the
signed-delivery and `webhook_deliveries` audit trail.
### Schema
```sql
CREATE TABLE event_triggers (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
filter_severity TEXT, -- nullable; comma-list like 'warn,error'
filter_source TEXT, -- nullable; comma-list like 'logscan,deploy'
filter_message_regex TEXT, -- nullable; matched against message
action_type TEXT NOT NULL, -- 'webhook' | 'notification_channel'
action_target TEXT NOT NULL, -- URL or channel ID
enabled INTEGER NOT NULL DEFAULT 1,
created_at TEXT NOT NULL
);
```
Filters AND together. Empty filters match all.
### Loop-prevention
**Critical constraint: the dispatcher must not write to event_log.** All
delivery successes / failures land in `webhook_deliveries` (existing table) so
the audit trail is preserved without risking trigger recursion. Keeps the
boundary crisp:
- `event_log` = system observing itself
- `webhook_deliveries` = system talking to the outside
If a user-visible "trigger fired" entry is desired in the events UI, add a
*read-only join* from `webhook_deliveries` into the events page rather than
writing event_log rows.
---
## What to defer
| Item | Why | Add when |
|---|---|---|
| Multi-line stack trace coalescing | Real rabbit hole (which lines belong together?). | Real user pain. |
| Capture-group templating in messages (`{{.captures.code}}`) | v1 stores captures in metadata, displays raw line. | Once real rules exist and patterns emerge. |
| Backfilling history search | This is Loki/Grafana scope-creep. | Never (push to Loki instead if it comes up). |
| Per-rule alert routing | v1 fans out by `(severity, source)` filter on trigger side. | When users want one rule → one channel. |
| YAML config-as-code | Tinyforge is UI-driven everywhere else. | Probably never. |
| Retry / backoff on trigger delivery failure | Notifier already handles delivery; whether *triggers* retry is a separate question. | If trigger reliability becomes an SLO. |
---
## UI footprint
All boolean inputs use `ToggleSwitch` per project CLAUDE.md. All destructive
actions use `ConfirmDialog` per memory note (no inline Yes/No strips).
### New pages
- **`/log-scan-rules`** — list with severity / workload filter, "+ New rule" button.
- Detail page: name, pattern (regex with live test box that takes a sample log line), severity, streams, cooldown, enabled toggle, scope picker (global / workload).
- **`/event-triggers`** — list, "+ New trigger" button.
- Detail page: name, filters (severity multiselect, source multiselect, optional message regex), action type, action target, enabled toggle.
### Augmentations
- **Workload detail page** (`/apps/[id]`): new "Log Rules" tab/panel listing
effective rules for this workload. Each global shows an "Override for this
workload" button. Each override / workload-only shows edit + delete.
- **Events page** (`/events`): entries with `source=logscan` get a small icon
+ tooltip showing rule name. Click → jumps to rule detail.
- **Settings sidebar**: links to `/log-scan-rules` and `/event-triggers` under
a new "Observability" group.
### i18n keys to add
Roughly 4060 keys across `en.json` + `ru.json`. Namespace: `logscan.*` and
`triggers.*`.
---
## API surface
```
GET /api/log-scan-rules — list (filter: ?workload_id=, ?global=true)
POST /api/log-scan-rules — create
GET /api/log-scan-rules/{id} — detail
PATCH /api/log-scan-rules/{id} — update
DELETE /api/log-scan-rules/{id} — delete
POST /api/log-scan-rules/{id}/test — body: {sample_line}; returns matched: bool, captures
GET /api/workloads/{id}/effective-rules — computed effective ruleset for a workload
GET /api/event-triggers — list
POST /api/event-triggers — create
GET /api/event-triggers/{id} — detail
PATCH /api/event-triggers/{id} — update
DELETE /api/event-triggers/{id} — delete
POST /api/event-triggers/{id}/test — dispatches a synthetic event to verify the action target
```
`POST .../test` endpoints are worth shipping in v1 — they make the rule /
trigger editing UX dramatically nicer and avoid "did I get the regex right?"
deploy-and-pray cycles.
---
## File pointers (when work starts)
**Backend, new:**
- `internal/logscanner/{manager,tail,engine,rules}.go`
- `internal/api/log_scan_rules.go`
- `internal/api/event_triggers.go`
- `internal/store/log_scan_rules.go`
- `internal/store/event_triggers.go`
- `internal/events/dispatcher.go` (or extend `bus.go` with `RegisterEventTriggerDispatcher`)
**Backend, modified:**
- [internal/docker/container.go:362](../internal/docker/container.go#L362) — expose stream selection on `ContainerLogs`
- [internal/api/router.go](../internal/api/router.go) — register new routes
- [cmd/server/main.go](../cmd/server/main.go) — wire `RegisterEventTriggerDispatcher` next to `RegisterPersistentLogger`, start `logscanner.Manager`
- migrations: `internal/store/migrations/00XX_log_scan_rules.sql`, `00XX_event_triggers.sql`
**Frontend, new:**
- `web/src/routes/log-scan-rules/+page.svelte`, `[id]/+page.svelte`, `new/+page.svelte`
- `web/src/routes/event-triggers/+page.svelte`, `[id]/+page.svelte`, `new/+page.svelte`
- `web/src/lib/components/LogRulePanel.svelte` (workload detail tab)
- `web/src/lib/components/RegexTestBox.svelte` (reusable)
**Frontend, modified:**
- `web/src/routes/apps/[id]/+page.svelte` — add Log Rules tab
- `web/src/routes/events/+page.svelte` — logscan source icon + rule tooltip
- `web/src/routes/+layout.svelte` — Observability nav group
- `web/src/lib/i18n/{en,ru}.json` — new key namespaces
- `web/src/lib/api.ts`, `web/src/lib/types.ts` — typed clients
---
## Open questions to revisit before coding
1. **Container start/stop signal source** — bus events (low latency, two new
event types) vs polling (simpler, ~5s latency). Tentative: polling.
2. **Trigger delivery retry** — does the dispatcher retry on webhook failure,
or is one shot enough since `webhook_deliveries` records failures? Tentative:
one shot v1; revisit if reliability complaints surface.
3. **Where does the "logscan source icon" link go on the events page** — rule
detail page, or the workload's effective-rules tab? Latter is probably more
useful since it shows context.
---
## Memory pointer
Add a memory after this lands describing the event_log = observe-self,
webhook_deliveries = talk-to-outside boundary — it's the kind of invariant
that's easy to violate accidentally when adding new event types later.
+334
View File
@@ -0,0 +1,334 @@
# Workload-First Refactor — Remaining Work
Handoff for resuming the refactor. The plugin architecture (Source × Trigger),
`/api/workloads` surface, `/apps` UI, env/volume/webhook/logs/chain panels,
multi-face proxy routes, blue-green image deploys, schema-driven wizard, and
test coverage on triggers / image helpers / webhook parser / store upserts are
**already landed and live**. What follows is what's still pending, in priority
order.
## Status at a glance
| Item | Priority | Status |
| ---- | -------- | ------ |
| Static source inline port | 1 | **PENDING** — only remaining blocker for hard cutover |
| Hard legacy cutover | 1 | **PENDING** — gated by static port (volume scopes blocker is resolved) |
| Generalized volume scopes | 2 | DONE |
| Kind-aware editors (compose / image / static) | 2 | DONE |
| Vendor-specific webhook parsing | 2 | DONE |
| Chain-panel CSS | 3 | DONE |
| Log Rules panel on `/apps/[id]` | adjacent | DONE — uses `getEffectiveLogScanRules` + per-workload override action |
| i18n for `/apps/*` page strings | 3 | **PARTIAL** — Log Rules panel + Observability surfaces i18n'd; `apps.*` namespace still pending |
| Docs / codemap entries for `internal/workload/plugin/` | 3 | **PENDING** |
| API-handler / dispatcher / compose-source / static-backend tests | 4 | **PENDING** |
| Triggers as first-class reusable entities (post-cutover) | 5 | **PENDING** |
Cross-references to the adjacent Observability work (Event Triggers + Log
Scanner backend + drop-counter stats panel) live in
[docs/LOGSCAN_AND_TRIGGERS_TODO.md](LOGSCAN_AND_TRIGGERS_TODO.md).
## Priority 1 — Architecture unlock
### Static source inline port — ~2150 LOC across 8 files
The current `internal/workload/plugin/source/static/` delegates to
`staticsite.Manager` via a phantom-row adapter
(`cmd/server/static_backend.go`) that keeps a synthetic row in the legacy
`static_sites` table per workload. This works but blocks the hard cutover —
you can't drop `static_sites` until the adapter is gone.
To port inline, the deploy pipeline body has to move into
`internal/workload/plugin/source/static/`:
| Source file | Lines | What to keep / port |
| --- | --- | --- |
| `internal/staticsite/manager.go` | 834 | Deploy / Stop / status pipeline. State should move to `containers` rows + `workload_env` instead of `static_sites`. |
| `internal/staticsite/gitea_content.go` | 360 | Keep as helper — Gitea content download/listing. |
| `internal/staticsite/github_provider.go` | 276 | Keep as helper. |
| `internal/staticsite/gitlab_provider.go` | 254 | Keep as helper. |
| `internal/staticsite/healthcheck.go` | 111 | Convert to plugin Reconcile body. |
| `internal/staticsite/markdown.go` | 83 | Keep as helper. |
| `internal/staticsite/provider.go` | 171 | Keep — provider abstraction. |
| `internal/staticsite/deno/` | (sub-pkg) | Keep — Dockerfile + router.ts codegen. |
Estimated as its own dedicated turn (or two). Strategy: keep the provider
abstraction + helpers exported; rewrite only `Manager.Deploy` body into a new
`source/static/deploy.go` that operates against `plugin.Workload` directly and
writes container rows + workload_env rather than the `static_sites` table.
### Hard legacy cutover
Sole remaining blocker is the static source inline port above. The
generalized-volume-scopes blocker is resolved (legacy `ResolvePath`
stays in place for legacy callers and dies with the cutover). When the
static port lands:
- Delete `/api/projects`, `/api/stacks`, `/api/sites`, `/api/stages` handlers.
- Drop tables: `projects`, `stages`, `stacks`, `stack_revisions`,
`stack_deploys`, `static_sites`, `static_site_secrets`, `deploys`,
`poll_states`.
- Delete `internal/stack/`, `internal/staticsite/` packages.
- Delete frontend `/projects`, `/sites`, `/stacks` routes.
- Delete legacy `volume.ResolvePath` + `internal/api/volume_browser.go`
callers (the only remaining users).
## Priority 2 — Behavior gaps
### ~~Generalized volume scopes~~ — DONE
Landed: `internal/volume.ResolveWorkloadPath` (workload-keyed; sits next to the
legacy `ResolvePath` so legacy code paths keep working) plus the wired-through
`computeMounts` in `internal/workload/plugin/source/image/image.go`. All
`VolumeScope` values are now honored at deploy time:
- `absolute` — host bind, validated against `settings.AllowedVolumePaths`.
- `ephemeral` — tmpfs.
- `instance` — per-tag dir under `<base>/<workload>-<idShort>/instance-<tag>/<source>`.
- `stage`, `project` — both collapse to `<base>/<workload>-<idShort>/<source>`.
- `project_named` — Docker named volume prefixed `tf-<idShort>-<name>`.
- `named` — Docker named volume by raw name.
Test coverage: `internal/volume/resolver_test.go` (table-driven, portable
Linux/Windows). The legacy `ResolvePath` stays in place for legacy deployer +
volume-browser callers and dies with the hard cutover.
### ~~Kind-aware editors on `/apps/new` and `/apps/[id]` edit~~ — DONE
All three Source plugins now have hand-rolled forms on both pages, with
an "Advanced JSON" toggle preserved as the power-user escape hatch.
Submit logic marshals form fields back into the same JSON shape the
backend already expects — no API or store changes required.
**Principle:** the plugin contract makes new Source / Trigger kinds cheap
on the backend, but the UI is not cheap by default — every kind needs a
paired hand-rolled form to be daily-driver usable. The shared JSON
editor is the fallback for power users and brand-new plugins, not the
end state. New Source / Trigger merge requests should treat "ship the
kind-aware form" as part of done, not a follow-up.
**Landed:**
- `compose`: YAML textarea + project_name input on both `/apps/new`
and `/apps/[id]`.
- `image`: form fields for image / port / healthcheck / default_tag /
registry_name / cpu_limit / memory_limit / max_instances on both
pages. Registry name is a select populated from `/api/registries`
(with text-input fallback when the list is empty). env + volumes
stay in their detail-page panels and round-trip through the form
via `imageFormBody` so manual edits aren't clobbered.
- `static`: provider select (gitea / github / gitlab), base URL,
repo_owner / repo_name (both required), branch (default "main"),
folder_path, access_token (password input, for private repos),
mode radio (static / deno), render_markdown checkbox. The
storage_enabled / storage_limit_mb fields aren't surfaced as
form controls yet, but they round-trip through `staticFormBody`
so values set via the raw JSON editor survive form edits.
**Still pending forms:** none — all three Source plugins now have
hand-rolled forms on both `/apps/new` and `/apps/[id]`.
The raw JSON editor stays available behind the "Advanced JSON" toggle
(shipped with compose) so the plugin's full sample is still reachable
for power users and for any new plugin kind without a hand-rolled form.
Effort: per-kind form roughly half a turn each; can land incrementally.
Touches `web/src/routes/apps/new/+page.svelte` and the edit block in
`web/src/routes/apps/[id]/+page.svelte`. The Svelte side keeps
serializing into the same `source_config` JSON shape the backend
already expects — no API or store change required.
### ~~Vendor-specific webhook parsing for `/api/webhook/workloads/{secret}`~~ — DONE
Landed: `internal/webhook/vendor_parsers.go` plus rewrites in
`internal/webhook/handler.go` `buildInboundEvent`. The dispatch order is now:
1. Empty body → manual event.
2. Vendor-specific parsers, short-circuit on a recognized `X-*-Event`
header — Gitea package, GitHub `package` / `registry_package`, GitHub
push, Gitea push, GitLab `Push Hook` / `Tag Push Hook`.
3. Generic simple-body fallback: top-level `image` or top-level `ref`
what the legacy CI integrations already send.
Vendor parsers can populate fields the generic parser cannot: image
digest, `GitEvent.Vendor`, registry host. When a vendor parser claims a
request (header matches) it is authoritative — a malformed Gitea
package payload surfaces as an error rather than silently falling
through to the generic parser. Test coverage:
`internal/webhook/vendor_parsers_test.go` covers each vendor branch +
the routed-via-`buildInboundEvent` integration cases.
Open follow-ups deferred to future turns:
- GitLab Container Registry events use a custom envelope outside the
webhook event surface — handle if a user reports needing it.
- Docker Hub webhook (push event) uses `{"push_data": {"tag": ...}, "repository": {...}}` — add when there's a user request.
## Priority 3 — Polish
### ~~Chain-panel CSS~~ — DONE
Landed: rules for `.chain-row`, `.chain-card` (with hover/transform on
anchors), `.chain-self` (brand-tinted highlight), `.chain-name`,
`.chain-label` (70px fixed-width mono column), `.chain-children-list`
(flex-wrap), plus a sub-600px stack to keep the panel usable on narrow
screens. Appended at the end of the `<style>` block in
`web/src/routes/apps/[id]/+page.svelte`.
### Docs / codemap entries
Nothing under `docs/CODEMAPS/` for `internal/workload/plugin/`. Should cover:
- The Source × Trigger contract + registry pattern (`init()` + blank-import in
`cmd/server/main.go`).
- How a new Source kind is added (write `init()` registration, blank-import,
add to wizard via `SchemaSample`).
- The dispatcher seam: `deployer.DispatchPlugin` / `DispatchTeardown` /
`DispatchReconcile` and how the reconciler / webhook ingress / API
handlers all flow through it.
`README.md` should mention `/apps` as the new user surface and that
`/projects` / `/sites` / `/stacks` carry `Deprecation: true` headers.
### i18n: page-level strings — PARTIAL
Already i18n'd:
- `nav.apps`, `nav.eventTriggers`, `nav.logScanRules` — top nav labels.
- Log Rules panel on `/apps/[id]` reuses `logscan.panel.*` keys
(shipped with the Observability work).
- All `/event-triggers/*` and `/log-scan-rules/*` page strings — keys
live under `triggers.*` and `logscan.*` namespaces in
`web/src/lib/i18n/{en,ru}.json`.
Still hardcoded English:
- `/apps/+page.svelte` — list page (hero, lede, stats, empty state,
table headers, status pills).
- `/apps/new/+page.svelte` — wizard labels, form copy, kind-aware
form rows (compose / image / static all hardcoded English today).
- `/apps/[id]/+page.svelte` — detail page sections (chain, env,
volumes, webhook, manual deploy, danger zone) — the Log Rules
panel embedded inside it is the only i18n'd section.
Roughly 80100 keys across the three `/apps/*` pages once extracted.
Namespace: `apps.*` (with sub-namespaces `apps.list.*`, `apps.new.*`,
`apps.detail.*`, `apps.form.*`).
## Priority 4 — Tests we still don't have
Solid pure-function coverage landed in the prior turn. Still missing:
- **API-handler integration tests** for `/api/workloads/*` (CRUD, deploy,
env, volumes, webhook, chain, promote-from). Pattern: in-memory store +
fake deployer + fake docker / proxy / dns providers, exercise via
`httptest`.
- **Deployer dispatcher**: `DispatchPlugin` / `DispatchTeardown` /
`DispatchReconcile` with a fake Source registered.
- **Compose source**: `composeProjectName` sanitizer, `writeYAMLIfChanged`
short-circuit. (Both pure; just need fixtures.)
- **Static source Backend adapter** in `cmd/server/static_backend.go`.
## Priority 5 — Post-cutover roadmap
### Triggers as first-class reusable entities
Today a trigger's config lives embedded in the workload row
(`workload.trigger_kind` plus `workload.trigger_config` JSON via the plugin
contract). One workload owns exactly one trigger; one trigger serves exactly
one workload. This couples two concepts that users increasingly want
orthogonal:
- One **inbound webhook** fanning out to several workloads (a single CI push
rebuilds dev + staging together).
- One **registry watcher** driving multiple workloads off the same image
(different tag filters per binding, shared poll state).
- One **schedule** kicking off a batch of jobs.
- One **git push** filter shared by sibling stack services.
**Direction:** promote triggers to their own table with a join.
- `triggers``id`, `kind` (registry / git / webhook / schedule / manual /
log_scan), `config` JSON, `secret`, `created_at`, audit fields.
- `workload_trigger_bindings``workload_id`, `trigger_id`, `binding_config`
JSON (per-binding overrides: tag filter, path filter, branch filter), plus
ordering / enabled flag.
The dispatcher seam stays unchanged — `deployer.DispatchPlugin` still receives
a `(Workload, TriggerEvent)` pair; the only change is that the event's source
is resolved through the binding row instead of the workload row.
**UX principle: first-class on the backend, inline by default in the UI.**
The workload create/edit form still has an "Add trigger" control that creates
a fresh trigger record in one step, so the 1:1 case (git push → this workload)
feels unchanged from today. Reuse is **opt-in** via a "Pick existing trigger"
picker on the same control. Triggers also get their own list/detail pages under
`/triggers` so the fan-out cases are discoverable and centrally manageable
(rotate secret once, audit once).
**Per-kind modal applies, same rule as Source plugins** — the create/edit
form for a trigger switches body by `kind` (git: repo / branch / path;
registry: image / tag regex; webhook: secret + payload preview; schedule:
cron). Backend cheap, UI requires a paired hand-rolled form per kind. Treat
"ship the kind-aware form" as part of done for any new trigger kind.
**Migration:** clean break (no migration) per the workload-first memory —
at cutover, each workload's embedded trigger config becomes a single
auto-created trigger record with a single binding row. No user-visible change
on day one; reuse becomes possible thereafter.
**Sequencing:** lands **after** the Priority 1 hard cutover. The embedded
trigger config works fine for the 1:1 case that dominates today; the
static-source inline port is the higher-value blocker. Treat this as the
next major arc once cutover ships.
**Touch points to expect:**
- `internal/workload/plugin/trigger/*` — kind handlers stay; only their input
shape changes (read from binding + trigger row, not workload row).
- `internal/store/` — new `triggers` + `workload_trigger_bindings` tables and
CRUD; remove `trigger_kind` / `trigger_config` from the workload row.
- `internal/api/workloads.go` — adapt the workload create/edit handlers to
accept either "inline new trigger" or "bind existing trigger" payloads.
- New `/api/triggers` surface + `/triggers` frontend pages.
- `internal/webhook/handler.go` — inbound webhook now resolves to a trigger,
fans out to all bound workloads.
- `internal/reconciler/reconciler.go` — registry watchers iterate triggers,
not workloads; each trigger may fire N bindings.
## Open architectural questions
### Stages chain vs explicit Stage entity
`parent_workload_id` is now the canonical mechanism for stage chains
(dev → staging → prod). Decision deferred: do we need a separate `Stage`
entity at all, or is the chain sufficient? Currently feels like the chain
covers the use case — `promote-from` works, the UI shows the relationship.
Probably can leave the legacy `stages` table dropped entirely once cutover
proceeds.
### `Container.extra_json` evolution
Currently only the image source uses it (per-face proxy route IDs). If
other sources gain similar needs (compose service health metadata, static
build SHAs), the schema there should stay versionless and additive — every
reader must tolerate unknown keys. Document this in the source plugin
guide alongside the codemap entry.
## File pointers for the next session
- Plugin contracts: `internal/workload/plugin/{plugin,source,trigger,types,registry}.go`
- Source implementations: `internal/workload/plugin/source/{image,compose,static}/`
- Trigger implementations: `internal/workload/plugin/trigger/{registry,git,manual}/`
- Dispatcher: `internal/deployer/dispatch.go`
- Webhook ingress (plugin path): `internal/webhook/handler.go` `handlePluginWorkloadWebhook`
- Reconciler hook: `internal/reconciler/reconciler.go` `reconcilePluginWorkloads`
- Static backend adapter (to be deleted post-port): `cmd/server/static_backend.go`
- Frontend pages: `web/src/routes/apps/+page.svelte`, `web/src/routes/apps/new/+page.svelte`, `web/src/routes/apps/[id]/+page.svelte`
- Tests: `internal/workload/plugin/trigger/*/!(_test).go`, `internal/workload/plugin/source/image/image_helpers_test.go`, `internal/webhook/inbound_event_test.go`, `internal/store/workload_env_test.go`
## Memory pointer
Memory at
`C:/Users/Alexei/.claude/projects/c--Users-Alexei-Documents-docker-watcher/memory/`
already covers the Workload-first decision and the no-migration constraint.
Refresh as the cutover lands.