Two design + handoff docs: - docs/WORKLOAD_REFACTOR_TODO.md — status-at-a-glance table showing what's done (volume scopes, kind-aware editors, vendor webhook parsing, chain-panel CSS, Log Rules panel) and what's still pending (static source inline port + the hard legacy cutover gated on it; codemap entries; /apps page-level i18n; Priority 4 integration tests). - docs/LOGSCAN_AND_TRIGGERS_TODO.md — companion design + status doc for the two Observability features. Records the loop-prevention invariant (event_log = system observing itself, webhook_deliveries = system talking to outside) so the next contributor doesn't accidentally break it by adding a new EventLog subscriber that re-publishes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,385 @@
|
||||
# Log Scanner + Event Triggers — Design Handoff
|
||||
|
||||
Two related features. They can ship independently, but were designed together
|
||||
because they share the event_log seam.
|
||||
|
||||
- **A. Log scanner** — tail container logs, match against rules, emit event_log
|
||||
entries. Producer of events.
|
||||
- **B. Event triggers** — turn event_log entries into webhook / notification
|
||||
dispatches. Consumer of events. Generalizes the existing
|
||||
`RegisterPersistentLogger` pattern.
|
||||
|
||||
Either half is useful alone:
|
||||
- A without B = errors get surfaced in the events UI, no external delivery.
|
||||
- B without A = manual + reconciler + deploy events can drive notifications.
|
||||
|
||||
Recommended ship order: B first (smaller, self-contained generalization), then
|
||||
A (more moving parts, depends on container-lifecycle hooks).
|
||||
|
||||
---
|
||||
|
||||
## A. Log scanner — BACKEND LANDED
|
||||
|
||||
Status:
|
||||
|
||||
- **Schema + store CRUD** — `internal/store/log_scan_rules.go` +
|
||||
`log_scan_rules` table added to the `observabilityTables` block.
|
||||
Includes the `EffectiveLogScanRules(workloadID)` helper that
|
||||
resolves global rules minus per-workload overrides plus workload-
|
||||
only additions in one Go-side pass.
|
||||
- **Stream-selectable docker reads** — `internal/docker/container.go`
|
||||
`ContainerLogsOpts` accepts a `ContainerLogOptions{ShowStdout,
|
||||
ShowStderr, Follow, Tail}` so the scanner can subscribe to one
|
||||
stream when a rule scopes itself to stdout or stderr. The legacy
|
||||
`ContainerLogs` is preserved as a thin wrapper for back-compat.
|
||||
- **Engine** — `internal/logscanner/engine.go`: per-rule cooldown
|
||||
(keyed on container+rule), per-container token bucket (default 10
|
||||
events / 60s, override-able), regex match per line, hits returned
|
||||
for the manager to persist. Pure logic, fully unit-tested.
|
||||
- **Tail goroutine** — `internal/logscanner/tail.go`: per-container
|
||||
loop reading docker's multiplexed log frames (with TTY fallback),
|
||||
strips the prepended RFC3339 timestamp, runs every line through the
|
||||
engine + snapshot. Exits on container stop or context cancel.
|
||||
- **Manager** — `internal/logscanner/manager.go`: 5s polling diff
|
||||
against `ListContainers(state=running)`, atomic.Pointer[Snapshot]
|
||||
hot-reload, structural HitEmitter that writes event_log rows AND
|
||||
publishes `EventLog` on the bus (so event-trigger dispatchers can
|
||||
pick them up immediately).
|
||||
- **API** — `internal/api/log_scan_rules.go`: full CRUD,
|
||||
`/test` endpoint accepting `{"sample_line": "..."}` and returning
|
||||
matched/captures, plus
|
||||
`GET /api/workloads/{id}/effective-rules` for the workload detail
|
||||
page's future Log Rules tab. Admin-gated mutations.
|
||||
- **Wired in main.go** before the API server is constructed so the
|
||||
reload callback is plugged via `apiServer.SetLogScanReloader`.
|
||||
- **Loop-prevention** — Same boundary as feature B: scanner publishes
|
||||
EventLog events, dispatcher consumes them, neither writes to
|
||||
event_log on the consume side.
|
||||
- **Tests** — `internal/logscanner/{engine,rules}_test.go` cover
|
||||
cooldown isolation, token bucket refill, stream filtering,
|
||||
override-replaces-global, disabled-override-suppresses-global,
|
||||
compile-error reporting. `internal/store/log_scan_rules_test.go`
|
||||
covers validation + cascade delete.
|
||||
|
||||
**Frontend still pending** — `/log-scan-rules` pages, regex test box
|
||||
component, Log Rules tab on `/apps/[id]`, i18n keys. Not touched this
|
||||
turn.
|
||||
|
||||
### Where it plugs in
|
||||
|
||||
[internal/docker/container.go:362](../internal/docker/container.go#L362) already
|
||||
exposes `ContainerLogs(ctx, id, follow=true, tail)`. The existing SSE handler at
|
||||
[internal/api/workloads.go:43](../internal/api/workloads.go#L43)
|
||||
(`streamWorkloadContainerLogs`) is per-viewer and dies on browser disconnect —
|
||||
**do not hook the scanner there**. The scanner is a separate long-lived
|
||||
subsystem owned by the server process.
|
||||
|
||||
Minor required change to `ContainerLogs`: expose `ShowStdout` / `ShowStderr` as
|
||||
caller-controlled. Currently hardcoded to `true`/`true`. Single existing caller
|
||||
passes "both" → no friction. Add an options struct or two booleans.
|
||||
|
||||
### New package: `internal/logscanner/`
|
||||
|
||||
```
|
||||
internal/logscanner/
|
||||
manager.go — Manager: map[containerID]*tail, lifecycle hooks
|
||||
tail.go — per-container goroutine; reads logs, fans to engine
|
||||
engine.go — rule evaluation + cooldown + rate limit
|
||||
rules.go — Rule struct, regex compile cache, effective-set resolver
|
||||
```
|
||||
|
||||
**Manager lifecycle.** Subscribes to container start/stop signals. Options for
|
||||
the signal source:
|
||||
1. Add a `ContainerStarted` / `ContainerStopped` event type to the bus and
|
||||
publish from the reconciler + deployer. Cleanest, but adds two event types.
|
||||
2. Manager polls `docker.ListContainers` every N seconds and diffs. Lazier,
|
||||
robust to missed signals, slightly higher idle CPU. Probably fine.
|
||||
|
||||
Pick (1) if you want zero-latency start, (2) if you want fewer moving parts.
|
||||
Defaulting to **(2) with 5s poll** — Docker container starts already take
|
||||
seconds; sub-second matching is not a requirement.
|
||||
|
||||
**Tail goroutine.** On container start: open `ContainerLogs(follow=true,
|
||||
tail="0")` with stdout/stderr filters per rules in scope. Read line-by-line via
|
||||
`bufio.Scanner`. For each line: run through engine. On container stop or ctx
|
||||
cancel: drain and exit.
|
||||
|
||||
**Engine.** Holds compiled regexes per rule. For each line:
|
||||
- Walk effective ruleset for this workload (see schema below).
|
||||
- For each matching rule: check cooldown (`map[ruleID]time.Time`, mutex
|
||||
guarded). If cooled down, insert event_log row + publish + update timestamp.
|
||||
- Per-container token bucket (default: 10 events/min/container) to prevent
|
||||
catastrophic event_log floods if a regex is too greedy.
|
||||
|
||||
### Schema
|
||||
|
||||
Single table, global + override pattern. No separate "overrides" table.
|
||||
|
||||
```sql
|
||||
CREATE TABLE log_scan_rules (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
workload_id TEXT, -- NULL = global rule
|
||||
overrides_id INTEGER, -- if set, this row overrides a global rule for one workload
|
||||
name TEXT NOT NULL,
|
||||
pattern TEXT NOT NULL, -- regex, compiled at load
|
||||
severity TEXT NOT NULL, -- info|warn|error
|
||||
streams TEXT NOT NULL DEFAULT 'all', -- all|stdout|stderr
|
||||
cooldown_seconds INTEGER NOT NULL DEFAULT 60,
|
||||
enabled INTEGER NOT NULL DEFAULT 1,
|
||||
created_at TEXT NOT NULL,
|
||||
FOREIGN KEY (workload_id) REFERENCES workloads(id) ON DELETE CASCADE,
|
||||
FOREIGN KEY (overrides_id) REFERENCES log_scan_rules(id) ON DELETE CASCADE
|
||||
);
|
||||
CREATE INDEX idx_log_scan_rules_workload ON log_scan_rules(workload_id);
|
||||
CREATE INDEX idx_log_scan_rules_overrides ON log_scan_rules(overrides_id);
|
||||
```
|
||||
|
||||
**Effective ruleset for workload X:**
|
||||
1. All rows where `workload_id IS NULL AND overrides_id IS NULL` (pure globals),
|
||||
*minus* any global that has a row with `workload_id = X AND overrides_id = global.id`.
|
||||
2. Plus all rows where `workload_id = X AND overrides_id IS NULL` (workload-only additions).
|
||||
3. Plus all override rows where `workload_id = X AND overrides_id IS NOT NULL`
|
||||
(substitute for the global; their fields win, including `enabled=false` to
|
||||
disable the global for this workload).
|
||||
|
||||
A pure SQL implementation is doable with a `LEFT JOIN ... WHERE override.id IS
|
||||
NULL` for step 1 plus a `UNION ALL` for steps 2 and 3. Or compute in Go after
|
||||
two simpler queries — fine since rule counts will be small.
|
||||
|
||||
### Output
|
||||
|
||||
Scanner calls `store.InsertEvent` with:
|
||||
- `Source = "logscan"`
|
||||
- `Severity` from the matched rule
|
||||
- `Message` = raw matched line (truncated to ~500 chars)
|
||||
- `Metadata` JSON = `{"workload_id": ..., "container_id": ..., "rule_id": ..., "rule_name": ..., "captures": {...}}`
|
||||
|
||||
Then `bus.Publish(EventLog, payload)`. This reuses exactly the path
|
||||
[internal/events/bus.go:158](../internal/events/bus.go#L158)
|
||||
(`RegisterPersistentLogger`) already established. SSE clients see it live, and
|
||||
the dispatcher from feature B picks it up.
|
||||
|
||||
### Hot-reload
|
||||
|
||||
When a rule is created/updated/deleted via the API, the manager must rebuild
|
||||
the effective ruleset for affected containers. Cheapest path: a single
|
||||
`*atomic.Pointer[ruleSnapshot]` shared across tails, replaced wholesale on any
|
||||
rule change. Each tail dereferences the snapshot per line — no locking on the
|
||||
hot path.
|
||||
|
||||
---
|
||||
|
||||
## B. Event triggers — BACKEND LANDED
|
||||
|
||||
Status:
|
||||
|
||||
- **Schema + store CRUD** — `internal/store/event_triggers.go` + table
|
||||
creation in `internal/store/store.go` `observabilityTables`. Model:
|
||||
`EventTrigger` in `internal/store/models.go`.
|
||||
- **Dispatcher** — `internal/events/dispatcher.go`
|
||||
`RegisterEventTriggerDispatcher(bus, triggerSource, notifier)`.
|
||||
Filter eval is AND-composed across severity (CSV), source (CSV), and
|
||||
optional message regex. Compiled regexes are memoized.
|
||||
- **Webhook delivery** — extended `notify.Notifier` with
|
||||
`SendPayload(url, secret, eventType, payload)` which reuses the
|
||||
existing HMAC + headers infra (`X-Hub-Signature-256`, etc.). New
|
||||
`TierEventTrigger` tier is recorded for telemetry / audit.
|
||||
- **Loop-prevention** — dispatcher does **not** call `InsertEvent`.
|
||||
Delivery outcomes go through the notifier's existing logging only.
|
||||
- **API** — `internal/api/event_triggers.go` with admin-gated mutations:
|
||||
|
||||
```http
|
||||
GET /api/event-triggers
|
||||
POST /api/event-triggers
|
||||
GET /api/event-triggers/{id}
|
||||
PATCH /api/event-triggers/{id}
|
||||
DELETE /api/event-triggers/{id}
|
||||
POST /api/event-triggers/{id}/test — synthetic event_log → notifier.SendSyncForTest
|
||||
```
|
||||
|
||||
- **Wired in main.go** next to `RegisterPersistentLogger`.
|
||||
- **Tests** — `internal/events/dispatcher_test.go`: 10 cases covering
|
||||
filter eval, regex caching, dispatcher fan-out, unsupported
|
||||
action_type, trigger-source errors. CSV filter helper has dedicated
|
||||
table-driven coverage.
|
||||
|
||||
**Frontend still pending** — `/event-triggers` list + detail + new
|
||||
pages, the Send-test UX, i18n keys. Not touched this turn.
|
||||
|
||||
### Where it plugs in
|
||||
|
||||
Mirrors the `RegisterPersistentLogger` shape at
|
||||
[internal/events/bus.go:158](../internal/events/bus.go#L158):
|
||||
|
||||
```go
|
||||
func RegisterEventTriggerDispatcher(b *Bus, triggers TriggerSource, notifier Notifier) func() {
|
||||
sub := b.Subscribe(func(evt Event) bool { return evt.Type == EventLog })
|
||||
go func() {
|
||||
for evt := range sub {
|
||||
payload, ok := evt.Payload.(EventLogPayload)
|
||||
if !ok { continue }
|
||||
for _, t := range triggers.Enabled() {
|
||||
if t.matches(payload) {
|
||||
notifier.Send(t.ActionTarget, buildBody(t, payload))
|
||||
}
|
||||
}
|
||||
}
|
||||
}()
|
||||
return func() { b.Unsubscribe(sub) }
|
||||
}
|
||||
```
|
||||
|
||||
Reuses the existing notifier at
|
||||
[internal/notify/notifier.go](../internal/notify/notifier.go) — including the
|
||||
signed-delivery and `webhook_deliveries` audit trail.
|
||||
|
||||
### Schema
|
||||
|
||||
```sql
|
||||
CREATE TABLE event_triggers (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
name TEXT NOT NULL,
|
||||
filter_severity TEXT, -- nullable; comma-list like 'warn,error'
|
||||
filter_source TEXT, -- nullable; comma-list like 'logscan,deploy'
|
||||
filter_message_regex TEXT, -- nullable; matched against message
|
||||
action_type TEXT NOT NULL, -- 'webhook' | 'notification_channel'
|
||||
action_target TEXT NOT NULL, -- URL or channel ID
|
||||
enabled INTEGER NOT NULL DEFAULT 1,
|
||||
created_at TEXT NOT NULL
|
||||
);
|
||||
```
|
||||
|
||||
Filters AND together. Empty filters match all.
|
||||
|
||||
### Loop-prevention
|
||||
|
||||
**Critical constraint: the dispatcher must not write to event_log.** All
|
||||
delivery successes / failures land in `webhook_deliveries` (existing table) so
|
||||
the audit trail is preserved without risking trigger recursion. Keeps the
|
||||
boundary crisp:
|
||||
|
||||
- `event_log` = system observing itself
|
||||
- `webhook_deliveries` = system talking to the outside
|
||||
|
||||
If a user-visible "trigger fired" entry is desired in the events UI, add a
|
||||
*read-only join* from `webhook_deliveries` into the events page rather than
|
||||
writing event_log rows.
|
||||
|
||||
---
|
||||
|
||||
## What to defer
|
||||
|
||||
| Item | Why | Add when |
|
||||
|---|---|---|
|
||||
| Multi-line stack trace coalescing | Real rabbit hole (which lines belong together?). | Real user pain. |
|
||||
| Capture-group templating in messages (`{{.captures.code}}`) | v1 stores captures in metadata, displays raw line. | Once real rules exist and patterns emerge. |
|
||||
| Backfilling history search | This is Loki/Grafana scope-creep. | Never (push to Loki instead if it comes up). |
|
||||
| Per-rule alert routing | v1 fans out by `(severity, source)` filter on trigger side. | When users want one rule → one channel. |
|
||||
| YAML config-as-code | Tinyforge is UI-driven everywhere else. | Probably never. |
|
||||
| Retry / backoff on trigger delivery failure | Notifier already handles delivery; whether *triggers* retry is a separate question. | If trigger reliability becomes an SLO. |
|
||||
|
||||
---
|
||||
|
||||
## UI footprint
|
||||
|
||||
All boolean inputs use `ToggleSwitch` per project CLAUDE.md. All destructive
|
||||
actions use `ConfirmDialog` per memory note (no inline Yes/No strips).
|
||||
|
||||
### New pages
|
||||
|
||||
- **`/log-scan-rules`** — list with severity / workload filter, "+ New rule" button.
|
||||
- Detail page: name, pattern (regex with live test box that takes a sample log line), severity, streams, cooldown, enabled toggle, scope picker (global / workload).
|
||||
- **`/event-triggers`** — list, "+ New trigger" button.
|
||||
- Detail page: name, filters (severity multiselect, source multiselect, optional message regex), action type, action target, enabled toggle.
|
||||
|
||||
### Augmentations
|
||||
|
||||
- **Workload detail page** (`/apps/[id]`): new "Log Rules" tab/panel listing
|
||||
effective rules for this workload. Each global shows an "Override for this
|
||||
workload" button. Each override / workload-only shows edit + delete.
|
||||
- **Events page** (`/events`): entries with `source=logscan` get a small icon
|
||||
+ tooltip showing rule name. Click → jumps to rule detail.
|
||||
- **Settings sidebar**: links to `/log-scan-rules` and `/event-triggers` under
|
||||
a new "Observability" group.
|
||||
|
||||
### i18n keys to add
|
||||
|
||||
Roughly 40–60 keys across `en.json` + `ru.json`. Namespace: `logscan.*` and
|
||||
`triggers.*`.
|
||||
|
||||
---
|
||||
|
||||
## API surface
|
||||
|
||||
```
|
||||
GET /api/log-scan-rules — list (filter: ?workload_id=, ?global=true)
|
||||
POST /api/log-scan-rules — create
|
||||
GET /api/log-scan-rules/{id} — detail
|
||||
PATCH /api/log-scan-rules/{id} — update
|
||||
DELETE /api/log-scan-rules/{id} — delete
|
||||
POST /api/log-scan-rules/{id}/test — body: {sample_line}; returns matched: bool, captures
|
||||
GET /api/workloads/{id}/effective-rules — computed effective ruleset for a workload
|
||||
|
||||
GET /api/event-triggers — list
|
||||
POST /api/event-triggers — create
|
||||
GET /api/event-triggers/{id} — detail
|
||||
PATCH /api/event-triggers/{id} — update
|
||||
DELETE /api/event-triggers/{id} — delete
|
||||
POST /api/event-triggers/{id}/test — dispatches a synthetic event to verify the action target
|
||||
```
|
||||
|
||||
`POST .../test` endpoints are worth shipping in v1 — they make the rule /
|
||||
trigger editing UX dramatically nicer and avoid "did I get the regex right?"
|
||||
deploy-and-pray cycles.
|
||||
|
||||
---
|
||||
|
||||
## File pointers (when work starts)
|
||||
|
||||
**Backend, new:**
|
||||
- `internal/logscanner/{manager,tail,engine,rules}.go`
|
||||
- `internal/api/log_scan_rules.go`
|
||||
- `internal/api/event_triggers.go`
|
||||
- `internal/store/log_scan_rules.go`
|
||||
- `internal/store/event_triggers.go`
|
||||
- `internal/events/dispatcher.go` (or extend `bus.go` with `RegisterEventTriggerDispatcher`)
|
||||
|
||||
**Backend, modified:**
|
||||
- [internal/docker/container.go:362](../internal/docker/container.go#L362) — expose stream selection on `ContainerLogs`
|
||||
- [internal/api/router.go](../internal/api/router.go) — register new routes
|
||||
- [cmd/server/main.go](../cmd/server/main.go) — wire `RegisterEventTriggerDispatcher` next to `RegisterPersistentLogger`, start `logscanner.Manager`
|
||||
- migrations: `internal/store/migrations/00XX_log_scan_rules.sql`, `00XX_event_triggers.sql`
|
||||
|
||||
**Frontend, new:**
|
||||
- `web/src/routes/log-scan-rules/+page.svelte`, `[id]/+page.svelte`, `new/+page.svelte`
|
||||
- `web/src/routes/event-triggers/+page.svelte`, `[id]/+page.svelte`, `new/+page.svelte`
|
||||
- `web/src/lib/components/LogRulePanel.svelte` (workload detail tab)
|
||||
- `web/src/lib/components/RegexTestBox.svelte` (reusable)
|
||||
|
||||
**Frontend, modified:**
|
||||
- `web/src/routes/apps/[id]/+page.svelte` — add Log Rules tab
|
||||
- `web/src/routes/events/+page.svelte` — logscan source icon + rule tooltip
|
||||
- `web/src/routes/+layout.svelte` — Observability nav group
|
||||
- `web/src/lib/i18n/{en,ru}.json` — new key namespaces
|
||||
- `web/src/lib/api.ts`, `web/src/lib/types.ts` — typed clients
|
||||
|
||||
---
|
||||
|
||||
## Open questions to revisit before coding
|
||||
|
||||
1. **Container start/stop signal source** — bus events (low latency, two new
|
||||
event types) vs polling (simpler, ~5s latency). Tentative: polling.
|
||||
2. **Trigger delivery retry** — does the dispatcher retry on webhook failure,
|
||||
or is one shot enough since `webhook_deliveries` records failures? Tentative:
|
||||
one shot v1; revisit if reliability complaints surface.
|
||||
3. **Where does the "logscan source icon" link go on the events page** — rule
|
||||
detail page, or the workload's effective-rules tab? Latter is probably more
|
||||
useful since it shows context.
|
||||
|
||||
---
|
||||
|
||||
## Memory pointer
|
||||
|
||||
Add a memory after this lands describing the event_log = observe-self,
|
||||
webhook_deliveries = talk-to-outside boundary — it's the kind of invariant
|
||||
that's easy to violate accidentally when adding new event types later.
|
||||
@@ -0,0 +1,334 @@
|
||||
# Workload-First Refactor — Remaining Work
|
||||
|
||||
Handoff for resuming the refactor. The plugin architecture (Source × Trigger),
|
||||
`/api/workloads` surface, `/apps` UI, env/volume/webhook/logs/chain panels,
|
||||
multi-face proxy routes, blue-green image deploys, schema-driven wizard, and
|
||||
test coverage on triggers / image helpers / webhook parser / store upserts are
|
||||
**already landed and live**. What follows is what's still pending, in priority
|
||||
order.
|
||||
|
||||
## Status at a glance
|
||||
|
||||
| Item | Priority | Status |
|
||||
| ---- | -------- | ------ |
|
||||
| Static source inline port | 1 | **PENDING** — only remaining blocker for hard cutover |
|
||||
| Hard legacy cutover | 1 | **PENDING** — gated by static port (volume scopes blocker is resolved) |
|
||||
| Generalized volume scopes | 2 | DONE |
|
||||
| Kind-aware editors (compose / image / static) | 2 | DONE |
|
||||
| Vendor-specific webhook parsing | 2 | DONE |
|
||||
| Chain-panel CSS | 3 | DONE |
|
||||
| Log Rules panel on `/apps/[id]` | adjacent | DONE — uses `getEffectiveLogScanRules` + per-workload override action |
|
||||
| i18n for `/apps/*` page strings | 3 | **PARTIAL** — Log Rules panel + Observability surfaces i18n'd; `apps.*` namespace still pending |
|
||||
| Docs / codemap entries for `internal/workload/plugin/` | 3 | **PENDING** |
|
||||
| API-handler / dispatcher / compose-source / static-backend tests | 4 | **PENDING** |
|
||||
| Triggers as first-class reusable entities (post-cutover) | 5 | **PENDING** |
|
||||
|
||||
Cross-references to the adjacent Observability work (Event Triggers + Log
|
||||
Scanner backend + drop-counter stats panel) live in
|
||||
[docs/LOGSCAN_AND_TRIGGERS_TODO.md](LOGSCAN_AND_TRIGGERS_TODO.md).
|
||||
|
||||
## Priority 1 — Architecture unlock
|
||||
|
||||
### Static source inline port — ~2150 LOC across 8 files
|
||||
|
||||
The current `internal/workload/plugin/source/static/` delegates to
|
||||
`staticsite.Manager` via a phantom-row adapter
|
||||
(`cmd/server/static_backend.go`) that keeps a synthetic row in the legacy
|
||||
`static_sites` table per workload. This works but blocks the hard cutover —
|
||||
you can't drop `static_sites` until the adapter is gone.
|
||||
|
||||
To port inline, the deploy pipeline body has to move into
|
||||
`internal/workload/plugin/source/static/`:
|
||||
|
||||
| Source file | Lines | What to keep / port |
|
||||
| --- | --- | --- |
|
||||
| `internal/staticsite/manager.go` | 834 | Deploy / Stop / status pipeline. State should move to `containers` rows + `workload_env` instead of `static_sites`. |
|
||||
| `internal/staticsite/gitea_content.go` | 360 | Keep as helper — Gitea content download/listing. |
|
||||
| `internal/staticsite/github_provider.go` | 276 | Keep as helper. |
|
||||
| `internal/staticsite/gitlab_provider.go` | 254 | Keep as helper. |
|
||||
| `internal/staticsite/healthcheck.go` | 111 | Convert to plugin Reconcile body. |
|
||||
| `internal/staticsite/markdown.go` | 83 | Keep as helper. |
|
||||
| `internal/staticsite/provider.go` | 171 | Keep — provider abstraction. |
|
||||
| `internal/staticsite/deno/` | (sub-pkg) | Keep — Dockerfile + router.ts codegen. |
|
||||
|
||||
Estimated as its own dedicated turn (or two). Strategy: keep the provider
|
||||
abstraction + helpers exported; rewrite only `Manager.Deploy` body into a new
|
||||
`source/static/deploy.go` that operates against `plugin.Workload` directly and
|
||||
writes container rows + workload_env rather than the `static_sites` table.
|
||||
|
||||
### Hard legacy cutover
|
||||
|
||||
Sole remaining blocker is the static source inline port above. The
|
||||
generalized-volume-scopes blocker is resolved (legacy `ResolvePath`
|
||||
stays in place for legacy callers and dies with the cutover). When the
|
||||
static port lands:
|
||||
|
||||
- Delete `/api/projects`, `/api/stacks`, `/api/sites`, `/api/stages` handlers.
|
||||
- Drop tables: `projects`, `stages`, `stacks`, `stack_revisions`,
|
||||
`stack_deploys`, `static_sites`, `static_site_secrets`, `deploys`,
|
||||
`poll_states`.
|
||||
- Delete `internal/stack/`, `internal/staticsite/` packages.
|
||||
- Delete frontend `/projects`, `/sites`, `/stacks` routes.
|
||||
- Delete legacy `volume.ResolvePath` + `internal/api/volume_browser.go`
|
||||
callers (the only remaining users).
|
||||
|
||||
## Priority 2 — Behavior gaps
|
||||
|
||||
### ~~Generalized volume scopes~~ — DONE
|
||||
|
||||
Landed: `internal/volume.ResolveWorkloadPath` (workload-keyed; sits next to the
|
||||
legacy `ResolvePath` so legacy code paths keep working) plus the wired-through
|
||||
`computeMounts` in `internal/workload/plugin/source/image/image.go`. All
|
||||
`VolumeScope` values are now honored at deploy time:
|
||||
|
||||
- `absolute` — host bind, validated against `settings.AllowedVolumePaths`.
|
||||
- `ephemeral` — tmpfs.
|
||||
- `instance` — per-tag dir under `<base>/<workload>-<idShort>/instance-<tag>/<source>`.
|
||||
- `stage`, `project` — both collapse to `<base>/<workload>-<idShort>/<source>`.
|
||||
- `project_named` — Docker named volume prefixed `tf-<idShort>-<name>`.
|
||||
- `named` — Docker named volume by raw name.
|
||||
|
||||
Test coverage: `internal/volume/resolver_test.go` (table-driven, portable
|
||||
Linux/Windows). The legacy `ResolvePath` stays in place for legacy deployer +
|
||||
volume-browser callers and dies with the hard cutover.
|
||||
|
||||
### ~~Kind-aware editors on `/apps/new` and `/apps/[id]` edit~~ — DONE
|
||||
|
||||
All three Source plugins now have hand-rolled forms on both pages, with
|
||||
an "Advanced JSON" toggle preserved as the power-user escape hatch.
|
||||
Submit logic marshals form fields back into the same JSON shape the
|
||||
backend already expects — no API or store changes required.
|
||||
|
||||
**Principle:** the plugin contract makes new Source / Trigger kinds cheap
|
||||
on the backend, but the UI is not cheap by default — every kind needs a
|
||||
paired hand-rolled form to be daily-driver usable. The shared JSON
|
||||
editor is the fallback for power users and brand-new plugins, not the
|
||||
end state. New Source / Trigger merge requests should treat "ship the
|
||||
kind-aware form" as part of done, not a follow-up.
|
||||
|
||||
**Landed:**
|
||||
|
||||
- `compose`: YAML textarea + project_name input on both `/apps/new`
|
||||
and `/apps/[id]`.
|
||||
- `image`: form fields for image / port / healthcheck / default_tag /
|
||||
registry_name / cpu_limit / memory_limit / max_instances on both
|
||||
pages. Registry name is a select populated from `/api/registries`
|
||||
(with text-input fallback when the list is empty). env + volumes
|
||||
stay in their detail-page panels and round-trip through the form
|
||||
via `imageFormBody` so manual edits aren't clobbered.
|
||||
- `static`: provider select (gitea / github / gitlab), base URL,
|
||||
repo_owner / repo_name (both required), branch (default "main"),
|
||||
folder_path, access_token (password input, for private repos),
|
||||
mode radio (static / deno), render_markdown checkbox. The
|
||||
storage_enabled / storage_limit_mb fields aren't surfaced as
|
||||
form controls yet, but they round-trip through `staticFormBody`
|
||||
so values set via the raw JSON editor survive form edits.
|
||||
|
||||
**Still pending forms:** none — all three Source plugins now have
|
||||
hand-rolled forms on both `/apps/new` and `/apps/[id]`.
|
||||
|
||||
The raw JSON editor stays available behind the "Advanced JSON" toggle
|
||||
(shipped with compose) so the plugin's full sample is still reachable
|
||||
for power users and for any new plugin kind without a hand-rolled form.
|
||||
|
||||
Effort: per-kind form roughly half a turn each; can land incrementally.
|
||||
Touches `web/src/routes/apps/new/+page.svelte` and the edit block in
|
||||
`web/src/routes/apps/[id]/+page.svelte`. The Svelte side keeps
|
||||
serializing into the same `source_config` JSON shape the backend
|
||||
already expects — no API or store change required.
|
||||
|
||||
### ~~Vendor-specific webhook parsing for `/api/webhook/workloads/{secret}`~~ — DONE
|
||||
|
||||
Landed: `internal/webhook/vendor_parsers.go` plus rewrites in
|
||||
`internal/webhook/handler.go` `buildInboundEvent`. The dispatch order is now:
|
||||
|
||||
1. Empty body → manual event.
|
||||
2. Vendor-specific parsers, short-circuit on a recognized `X-*-Event`
|
||||
header — Gitea package, GitHub `package` / `registry_package`, GitHub
|
||||
push, Gitea push, GitLab `Push Hook` / `Tag Push Hook`.
|
||||
3. Generic simple-body fallback: top-level `image` or top-level `ref` —
|
||||
what the legacy CI integrations already send.
|
||||
|
||||
Vendor parsers can populate fields the generic parser cannot: image
|
||||
digest, `GitEvent.Vendor`, registry host. When a vendor parser claims a
|
||||
request (header matches) it is authoritative — a malformed Gitea
|
||||
package payload surfaces as an error rather than silently falling
|
||||
through to the generic parser. Test coverage:
|
||||
`internal/webhook/vendor_parsers_test.go` covers each vendor branch +
|
||||
the routed-via-`buildInboundEvent` integration cases.
|
||||
|
||||
Open follow-ups deferred to future turns:
|
||||
|
||||
- GitLab Container Registry events use a custom envelope outside the
|
||||
webhook event surface — handle if a user reports needing it.
|
||||
- Docker Hub webhook (push event) uses `{"push_data": {"tag": ...}, "repository": {...}}` — add when there's a user request.
|
||||
|
||||
## Priority 3 — Polish
|
||||
|
||||
### ~~Chain-panel CSS~~ — DONE
|
||||
|
||||
Landed: rules for `.chain-row`, `.chain-card` (with hover/transform on
|
||||
anchors), `.chain-self` (brand-tinted highlight), `.chain-name`,
|
||||
`.chain-label` (70px fixed-width mono column), `.chain-children-list`
|
||||
(flex-wrap), plus a sub-600px stack to keep the panel usable on narrow
|
||||
screens. Appended at the end of the `<style>` block in
|
||||
`web/src/routes/apps/[id]/+page.svelte`.
|
||||
|
||||
### Docs / codemap entries
|
||||
|
||||
Nothing under `docs/CODEMAPS/` for `internal/workload/plugin/`. Should cover:
|
||||
|
||||
- The Source × Trigger contract + registry pattern (`init()` + blank-import in
|
||||
`cmd/server/main.go`).
|
||||
- How a new Source kind is added (write `init()` registration, blank-import,
|
||||
add to wizard via `SchemaSample`).
|
||||
- The dispatcher seam: `deployer.DispatchPlugin` / `DispatchTeardown` /
|
||||
`DispatchReconcile` and how the reconciler / webhook ingress / API
|
||||
handlers all flow through it.
|
||||
|
||||
`README.md` should mention `/apps` as the new user surface and that
|
||||
`/projects` / `/sites` / `/stacks` carry `Deprecation: true` headers.
|
||||
|
||||
### i18n: page-level strings — PARTIAL
|
||||
|
||||
Already i18n'd:
|
||||
|
||||
- `nav.apps`, `nav.eventTriggers`, `nav.logScanRules` — top nav labels.
|
||||
- Log Rules panel on `/apps/[id]` reuses `logscan.panel.*` keys
|
||||
(shipped with the Observability work).
|
||||
- All `/event-triggers/*` and `/log-scan-rules/*` page strings — keys
|
||||
live under `triggers.*` and `logscan.*` namespaces in
|
||||
`web/src/lib/i18n/{en,ru}.json`.
|
||||
|
||||
Still hardcoded English:
|
||||
|
||||
- `/apps/+page.svelte` — list page (hero, lede, stats, empty state,
|
||||
table headers, status pills).
|
||||
- `/apps/new/+page.svelte` — wizard labels, form copy, kind-aware
|
||||
form rows (compose / image / static all hardcoded English today).
|
||||
- `/apps/[id]/+page.svelte` — detail page sections (chain, env,
|
||||
volumes, webhook, manual deploy, danger zone) — the Log Rules
|
||||
panel embedded inside it is the only i18n'd section.
|
||||
|
||||
Roughly 80–100 keys across the three `/apps/*` pages once extracted.
|
||||
Namespace: `apps.*` (with sub-namespaces `apps.list.*`, `apps.new.*`,
|
||||
`apps.detail.*`, `apps.form.*`).
|
||||
|
||||
## Priority 4 — Tests we still don't have
|
||||
|
||||
Solid pure-function coverage landed in the prior turn. Still missing:
|
||||
|
||||
- **API-handler integration tests** for `/api/workloads/*` (CRUD, deploy,
|
||||
env, volumes, webhook, chain, promote-from). Pattern: in-memory store +
|
||||
fake deployer + fake docker / proxy / dns providers, exercise via
|
||||
`httptest`.
|
||||
- **Deployer dispatcher**: `DispatchPlugin` / `DispatchTeardown` /
|
||||
`DispatchReconcile` with a fake Source registered.
|
||||
- **Compose source**: `composeProjectName` sanitizer, `writeYAMLIfChanged`
|
||||
short-circuit. (Both pure; just need fixtures.)
|
||||
- **Static source Backend adapter** in `cmd/server/static_backend.go`.
|
||||
|
||||
## Priority 5 — Post-cutover roadmap
|
||||
|
||||
### Triggers as first-class reusable entities
|
||||
|
||||
Today a trigger's config lives embedded in the workload row
|
||||
(`workload.trigger_kind` plus `workload.trigger_config` JSON via the plugin
|
||||
contract). One workload owns exactly one trigger; one trigger serves exactly
|
||||
one workload. This couples two concepts that users increasingly want
|
||||
orthogonal:
|
||||
|
||||
- One **inbound webhook** fanning out to several workloads (a single CI push
|
||||
rebuilds dev + staging together).
|
||||
- One **registry watcher** driving multiple workloads off the same image
|
||||
(different tag filters per binding, shared poll state).
|
||||
- One **schedule** kicking off a batch of jobs.
|
||||
- One **git push** filter shared by sibling stack services.
|
||||
|
||||
**Direction:** promote triggers to their own table with a join.
|
||||
|
||||
- `triggers` — `id`, `kind` (registry / git / webhook / schedule / manual /
|
||||
log_scan), `config` JSON, `secret`, `created_at`, audit fields.
|
||||
- `workload_trigger_bindings` — `workload_id`, `trigger_id`, `binding_config`
|
||||
JSON (per-binding overrides: tag filter, path filter, branch filter), plus
|
||||
ordering / enabled flag.
|
||||
|
||||
The dispatcher seam stays unchanged — `deployer.DispatchPlugin` still receives
|
||||
a `(Workload, TriggerEvent)` pair; the only change is that the event's source
|
||||
is resolved through the binding row instead of the workload row.
|
||||
|
||||
**UX principle: first-class on the backend, inline by default in the UI.**
|
||||
The workload create/edit form still has an "Add trigger" control that creates
|
||||
a fresh trigger record in one step, so the 1:1 case (git push → this workload)
|
||||
feels unchanged from today. Reuse is **opt-in** via a "Pick existing trigger"
|
||||
picker on the same control. Triggers also get their own list/detail pages under
|
||||
`/triggers` so the fan-out cases are discoverable and centrally manageable
|
||||
(rotate secret once, audit once).
|
||||
|
||||
**Per-kind modal applies, same rule as Source plugins** — the create/edit
|
||||
form for a trigger switches body by `kind` (git: repo / branch / path;
|
||||
registry: image / tag regex; webhook: secret + payload preview; schedule:
|
||||
cron). Backend cheap, UI requires a paired hand-rolled form per kind. Treat
|
||||
"ship the kind-aware form" as part of done for any new trigger kind.
|
||||
|
||||
**Migration:** clean break (no migration) per the workload-first memory —
|
||||
at cutover, each workload's embedded trigger config becomes a single
|
||||
auto-created trigger record with a single binding row. No user-visible change
|
||||
on day one; reuse becomes possible thereafter.
|
||||
|
||||
**Sequencing:** lands **after** the Priority 1 hard cutover. The embedded
|
||||
trigger config works fine for the 1:1 case that dominates today; the
|
||||
static-source inline port is the higher-value blocker. Treat this as the
|
||||
next major arc once cutover ships.
|
||||
|
||||
**Touch points to expect:**
|
||||
|
||||
- `internal/workload/plugin/trigger/*` — kind handlers stay; only their input
|
||||
shape changes (read from binding + trigger row, not workload row).
|
||||
- `internal/store/` — new `triggers` + `workload_trigger_bindings` tables and
|
||||
CRUD; remove `trigger_kind` / `trigger_config` from the workload row.
|
||||
- `internal/api/workloads.go` — adapt the workload create/edit handlers to
|
||||
accept either "inline new trigger" or "bind existing trigger" payloads.
|
||||
- New `/api/triggers` surface + `/triggers` frontend pages.
|
||||
- `internal/webhook/handler.go` — inbound webhook now resolves to a trigger,
|
||||
fans out to all bound workloads.
|
||||
- `internal/reconciler/reconciler.go` — registry watchers iterate triggers,
|
||||
not workloads; each trigger may fire N bindings.
|
||||
|
||||
## Open architectural questions
|
||||
|
||||
### Stages chain vs explicit Stage entity
|
||||
|
||||
`parent_workload_id` is now the canonical mechanism for stage chains
|
||||
(dev → staging → prod). Decision deferred: do we need a separate `Stage`
|
||||
entity at all, or is the chain sufficient? Currently feels like the chain
|
||||
covers the use case — `promote-from` works, the UI shows the relationship.
|
||||
Probably can leave the legacy `stages` table dropped entirely once cutover
|
||||
proceeds.
|
||||
|
||||
### `Container.extra_json` evolution
|
||||
|
||||
Currently only the image source uses it (per-face proxy route IDs). If
|
||||
other sources gain similar needs (compose service health metadata, static
|
||||
build SHAs), the schema there should stay versionless and additive — every
|
||||
reader must tolerate unknown keys. Document this in the source plugin
|
||||
guide alongside the codemap entry.
|
||||
|
||||
## File pointers for the next session
|
||||
|
||||
- Plugin contracts: `internal/workload/plugin/{plugin,source,trigger,types,registry}.go`
|
||||
- Source implementations: `internal/workload/plugin/source/{image,compose,static}/`
|
||||
- Trigger implementations: `internal/workload/plugin/trigger/{registry,git,manual}/`
|
||||
- Dispatcher: `internal/deployer/dispatch.go`
|
||||
- Webhook ingress (plugin path): `internal/webhook/handler.go` `handlePluginWorkloadWebhook`
|
||||
- Reconciler hook: `internal/reconciler/reconciler.go` `reconcilePluginWorkloads`
|
||||
- Static backend adapter (to be deleted post-port): `cmd/server/static_backend.go`
|
||||
- Frontend pages: `web/src/routes/apps/+page.svelte`, `web/src/routes/apps/new/+page.svelte`, `web/src/routes/apps/[id]/+page.svelte`
|
||||
- Tests: `internal/workload/plugin/trigger/*/!(_test).go`, `internal/workload/plugin/source/image/image_helpers_test.go`, `internal/webhook/inbound_event_test.go`, `internal/store/workload_env_test.go`
|
||||
|
||||
## Memory pointer
|
||||
|
||||
Memory at
|
||||
`C:/Users/Alexei/.claude/projects/c--Users-Alexei-Documents-docker-watcher/memory/`
|
||||
already covers the Workload-first decision and the no-migration constraint.
|
||||
Refresh as the cutover lands.
|
||||
Reference in New Issue
Block a user