Files
tiny-forge/docs/WORKLOAD_REFACTOR_TODO.md
T
alexei.dolgolyov 234c3c711e
Build / build (push) Successful in 10m43s
feat(static): inline static-source plugin; drop phantom-row adapter
Lift the static-site deploy pipeline from internal/staticsite/manager.go
into internal/workload/plugin/source/static/ so plugin-native static
workloads operate directly on plugin.Workload + the containers table +
workload_env. The cmd/server/static_backend.go phantom-row adapter is
gone; the legacy static_sites table is no longer touched on plugin
deploys.

Backend
- new state.go: runtimeState (last_commit_sha, last_sync_at,
  last_error, status) persisted in containers.extra_json under the
  deterministic row id <workloadID>:site
- per-workload sync.Mutex serializes saveState read-modify-write so
  parallel deploys for the same workload can't race container_id /
  proxy_route_id writes
- extra_json round-trips through map[string]json.RawMessage so
  unknown keys survive — typed runtimeStateKeys are stripped before
  merge so clearing a typed field actually drops the key
- new env.go reads workload_env (replaces static_site_secrets for
  plugin-native sites); decrypt-failure logs and skips one entry
  rather than failing the whole deploy
- new build.go ports prepareDenoBuild + prepareStaticBuild + copyDir;
  copyDir uses filepath.WalkDir + Lstat to refuse symlinks and
  non-regular files
- new deploy.go is the ~300-line core; intent.Reason gates force vs
  skip-if-no-changes; success-path saveState failure rolls back
  container + proxy route and writes "failed" state (no orphans)
- new teardown.go combines Remove + Stop; idempotent on
  never-deployed workloads
- new reconcile.go refreshes container state from Docker; flips
  runtimeState.Status to failed when the container is missing/crashed

Hardening (from go-reviewer + security-reviewer subagent passes;
1 CRITICAL + 5 HIGH + 3 MEDIUM addressed before merge)
- path-traversal defense in all 3 providers (gitea_content,
  github_provider, gitlab_provider): reject tree entries whose
  resolved local path escapes destDir
- verifyDownloadInsideRoot walks the build dir post-download as a
  second line of defense
- sanitizeError redacts the access token, collapses to one line, and
  clamps to 240 bytes before persisting to extra_json or fanning out
  to the notification webhook
- container/image/volume names suffixed with workload-id short prefix
  (workload name is not UNIQUE in schema)
- primaryDomain reads settings.Domain to complete a bare subdomain
  face into a full FQDN (matches legacy Manager behavior)
- ctx-aware health-check sleep
- json.Marshal for event metadata (was fmt.Sprintf JSON template)
- strings.HasPrefix for failed-status detection (was brittle slice
  expression)

Wire-up
- cmd/server/main.go: removed wireStaticBackend(...) call; existing
  blank import on _ ".../source/static" drives init() registration
- cmd/server/static_backend.go deleted

Doc
- WORKLOAD_REFACTOR_TODO: static port marked DONE; next focus is
  the hard legacy cutover (drop /api/projects, /api/stacks,
  /api/sites, /api/stages + their tables, internal/stack +
  internal/staticsite packages, frontend /projects /stacks /sites)

Behavior notes for operators
- plugin-native static workloads no longer write to static_sites;
  legacy /api/sites/* still serves original rows unchanged
- legacy tinyforge.static-site / .static-site-name container labels
  dropped on plugin deploys; canonical tinyforge.workload.id / .kind
  cover ownership
- container/image/volume names gained an 8-char ID suffix
  (e.g. dw-site-mysite-a1b2c3d4); legacy-deployed sites keep the
  old shape until redeployed through the plugin path
2026-05-16 02:56:23 +03:00

22 KiB
Raw Blame History

Workload-First Refactor — Remaining Work

Handoff for resuming the refactor. The plugin architecture (Source × Trigger), /api/workloads surface, /apps UI, env/volume/webhook/logs/chain panels, multi-face proxy routes, blue-green image deploys, schema-driven wizard, and test coverage on triggers / image helpers / webhook parser / store upserts are already landed and live. What follows is what's still pending, in priority order.

Current focus (read this first)

Triggers as first-class reusable entities — DONE (2026-05-16) and Static source inline port — DONE (2026-05-16). The phantom-row adapter (cmd/server/static_backend.go) is gone; the static plugin now operates directly on plugin.Workload + containers + workload_env, with runtime state (last_commit_sha, last_sync_at, last_error, status) carried in containers.extra_json. Provider downloads enforce path-traversal rejection, error strings are sanitized before persistence, and Docker resource names are suffixed with the workload ID short prefix to dodge name collisions.

Next on Priority 1 is the hard legacy cutover — drop /api/projects, /api/stacks, /api/sites, /api/stages handlers, drop their tables, delete internal/stack/ + internal/staticsite/ packages, delete frontend /projects / /stacks / /sites routes. The internal/staticsite package stays alive only for the legacy /api/sites/* HTTP routes — once those drop, it dies with them.

Status at a glance

Item Priority Status
Triggers as first-class reusable entities 1 DONE (2026-05-16)
Static source inline port 1 DONE (2026-05-16)
Hard legacy cutover 1 PENDING — current focus
Generalized volume scopes 2 DONE
Kind-aware editors (compose / image / static) 2 DONE
Vendor-specific webhook parsing 2 DONE
Chain-panel CSS 3 DONE
Log Rules panel on /apps/[id] adjacent DONE — uses getEffectiveLogScanRules + per-workload override action
i18n for /apps/* page strings 3 PARTIAL — Log Rules panel + Observability surfaces i18n'd; apps.* namespace still pending
Docs / codemap entries for internal/workload/plugin/ 3 PENDING
API-handler / dispatcher / compose-source / static-backend tests 4 PENDING

Cross-references to the adjacent Observability work (Event Triggers + Log Scanner backend + drop-counter stats panel) live in docs/LOGSCAN_AND_TRIGGERS_TODO.md.

Priority 1 — Architecture unlock

Triggers as first-class reusable entities — DONE (2026-05-16)

Trigger config used to live embedded in the workload row (workload.trigger_kind + workload.trigger_config). One workload owned exactly one trigger; one trigger served exactly one workload. The split makes a Trigger its own record so one inbound webhook / registry watcher / schedule / git-push filter fans out to many workloads.

Schema + storetriggers + workload_trigger_bindings tables with ON DELETE CASCADE. binding_config JSON merges on top of trigger.config (top-level merge, binding wins). Boot-time backfill lifts every existing embedded trigger into a standalone trigger row + binding inside a per-workload transaction so a partial failure rolls back cleanly. Trigger names are id-suffixed unconditionally to dodge the (name, kind) collision race. store.ErrUnique sentinel translates SQLite UNIQUE violations at the store boundary; API handlers use errors.Is instead of substring match. MergeJSONConfig always returns a freshly allocated slice (no aliasing under fan-out).

Webhook fan-out — new POST /api/webhook/triggers/{secret} resolves to one Trigger and fans out to every enabled binding via a bounded worker pool (maxTriggerFanOutConcurrency = 4). Per-binding errors are isolated (one broken workload doesn't block siblings). Outcome accounting splits deployed / skipped / no-match / errored cleanly. Legacy POST /api/webhook/workloads/{secret} route dropped (clean break per the workload-first memory; the boot backfill kept secrets resolvable at the new path).

API/api/triggers CRUD, /api/triggers/{id}/webhook, /api/triggers/{id}/bindings (list + bind), /api/bindings/{id} for update and delete, and /api/workloads/{id}/triggers (list + bind, accepts either trigger_id or inline {kind, name, config, ...}). Inline-create path runs trigger insert + binding insert inside one transaction (CreateTriggerWithBindingTx) so a binding failure can't leak an orphan trigger. validateBindingConfig enforces 8 KiB cap and runs the trigger plugin's Validate() against the merged shape on every bind/update. List endpoints use LEFT JOIN ... GROUP BY (ListTriggersWithBindingCount, ListBindingsForTriggerWithNames, ListBindingsForWorkloadWithNames) — no per-row N+1.

Plugin contract unchangedTrigger.Match still takes (Workload, InboundEvent). The fan-out path uses plugin.WithEffectiveTrigger to stuff the merged config into a copied workload before the call, so the existing registry, git, manual plugins work unchanged.

Reconciler — gate dropped from (SourceKind != "" && TriggerKind != "") to SourceKind != "". A workload with a Source but no triggers still gets Source.Reconcile called every tick (manual-only deploys are common during early setup).

Frontend — new pages under web/src/routes/triggers/:

  • +page.svelte — list with kind chips, binding count, webhook status, empty state.
  • new/+page.svelte — wizard with kind picker (cards), name, kind-aware config form (registry / git / manual + JSON fallback), webhook toggles.
  • [id]/+page.svelte — editable per-kind form, webhook URL panel (origin-prefixed, copy + ConfirmDialog-gated rotate), bindings list with per-row enabled <ToggleSwitch> + ConfirmDialog-gated unbind, danger-zone delete.

Workload UI — embedded trigger fields removed.

  • apps/new/+page.svelte — wizard now has Trigger step with NEW / PICK / SKIP modes; bind happens after createPluginWorkload succeeds.
  • apps/[id]/+page.svelte — Bindings panel above Containers, "Add trigger" modal with Inline / Pick-existing tabs, per-binding override editor (inline disclosure with read-only base config, editable JSON override, merged preview, 8 KiB byte cap, save / reset-to-inherit). Per-row "OVERRIDES n FIELDS" badge surfaces deviation from the trigger.

Shared componentweb/src/lib/components/TriggerKindForm.svelte hosts the kind picker + name + per-kind config + JSON fallback + webhook toggles. Reused on both /triggers/new and the workload Add-trigger modal.

i18n — full EN + RU coverage under redeployTriggers.* (standalone pages), apps.detail.bindings.* (workload bindings panel including override.*), apps.new.triggers.* (wizard mode picker), nav.triggers. The existing /event-triggers nav label was disambiguated to "Event Triggers" to coexist with the new /triggers entry.

Compliance — three pre-existing raw <input type="checkbox"> instances in apps/new + apps/[id] (render-markdown, env-encrypted) replaced with <ToggleSwitch> to honor the project rule.

Touch points (final):

  • internal/store/triggers.go, workload_trigger_bindings.go, models.go, store.go (schema + backfill + translateSQLError).
  • internal/workload/plugin/binding.go (MergeJSONConfig, WithEffectiveTrigger).
  • internal/webhook/trigger_handler.go + handler.go (route mount, legacy route removed).
  • internal/reconciler/reconciler.go (trigger gate dropped).
  • internal/api/triggers.go + router.go (REST surface).
  • web/src/routes/triggers/, web/src/routes/apps/{new,[id]}, web/src/lib/components/TriggerKindForm.svelte, web/src/lib/api.ts, web/src/lib/i18n/{en,ru}.json, web/src/routes/+layout.svelte.

Reviews shipped through go-reviewer + security-reviewer + typescript-reviewer subagents — 0 CRITICAL; 5 HIGH and 4 MEDIUM findings addressed inline before merge.

Static source inline port — DONE (2026-05-16)

The phantom-row adapter (cmd/server/static_backend.go) is deleted; the static plugin now operates directly on plugin.Workload, the containers table, and workload_env. The deploy pipeline body lives inline in internal/workload/plugin/source/static/{deploy,teardown,reconcile, state,env,build,naming,static}.go.

State migration: the legacy static_sites columns (last_commit_sha, last_sync_at, last_error, status, container_id, proxy_route_id) are now persisted on the container row keyed <workloadID>:site — deterministic ID, single row per workload. First-class fields (container_id, proxy_route_id, subdomain, state, port, image_ref) move into their dedicated columns on the containers table; the rest live in containers.extra_json via a typed runtimeState struct that preserves unknown keys on round-trip (so future writers can extend extra_json without forcing this struct to grow). workload_env replaces static_site_secrets for plugin-native workloads.

Reused helpers: internal/staticsite/{provider,gitea_content, github_provider,gitlab_provider,markdown,deno} stay alive (and exported) as helpers — providers are still imported via staticsite.NewGitProvider. The staticsite.Manager itself stays alive only to service the legacy /api/sites/* HTTP routes; once those drop in the cutover the package can be deleted entirely.

Hardening landed alongside the port (from go-reviewer + security-reviewer subagent passes — 1 CRITICAL, 5 HIGH, 3 MEDIUM addressed before merge):

  • Path-traversal defense: providers (gitea_content.go, github_provider.go, gitlab_provider.go) reject any tree entry whose resolved local path escapes destDir; the static plugin's verifyDownloadInsideRoot walks the build dir post-download as a second line of defense; copyDir uses filepath.WalkDir + Lstat to refuse symlinks and non-regular files.
  • Error sanitization: a sanitizeError helper redacts the decrypted access token, collapses to one line, and clamps to 240 bytes before any error string lands in runtimeState.LastError (persisted in extra_json) or fans out to the notification webhook.
  • Resource naming with workload-ID short suffix: container, image, and storage volume names all carry idShort(w) so two workloads sharing a name can't clobber each other's resources (workload name is not UNIQUE in the schema).
  • Per-workload mutex on saveState: serializes the read-modify- write of containers.extra_json so two parallel deploys for the same workload can't race to clobber each other's container_id / proxy_route_id.
  • saveState failure on the success path is fatal: rolls back the just-created container + proxy route and writes a "failed" state, so we don't leak a running container with no row pointing at it.
  • primaryDomain reads settings.Domain to complete a bare subdomain face into a full FQDN (matches legacy Manager behavior).
  • time.Sleep honors ctx.Done() during the post-start health window.
  • json.Marshal for event metadata + strings.HasPrefix for failed-status detection — replaces the prior fmt.Sprintf JSON template + brittle slice expression.

Touch points (final):

  • internal/workload/plugin/source/static/{static,deploy,teardown, reconcile,state,env,build,naming}.go — the inline plugin.
  • internal/staticsite/{gitea_content,github_provider, gitlab_provider}.go — added the path-traversal guards.
  • cmd/server/main.gowireStaticBackend(...) call removed; the existing blank import on _ "internal/workload/plugin/source/ static" now drives init() registration.
  • cmd/server/static_backend.go — deleted.

Behavioral notes for operators:

  • Plugin-native static workloads no longer write to the static_sites table at all — anything querying that table for plugin-native workloads (operator dashboards, ad-hoc SQL) sees stale or absent values. The legacy /api/sites/* routes still serve original rows unchanged.
  • Container labels tinyforge.static-site / tinyforge.static-site-name are no longer set on plugin-native deploys; the canonical tinyforge.workload.id / .kind labels (added by docker.ContainerConfig) cover ownership.
  • Container, image, and volume names all gained an 8-char ID suffix (e.g. dw-site-mysite-a1b2c3d4). Existing legacy-deployed sites keep their old dw-site-mysite shape until they're redeployed through the plugin path.

Hard legacy cutover

The static-source inline port (above) is now complete; the cutover is unblocked. Proceeding with the cutover means:

  • Delete /api/projects, /api/stacks, /api/sites, /api/stages handlers.
  • Drop tables: projects, stages, stacks, stack_revisions, stack_deploys, static_sites, static_site_secrets, deploys, poll_states.
  • Delete internal/stack/, internal/staticsite/ packages.
  • Delete frontend /projects, /sites, /stacks routes.
  • Delete legacy volume.ResolvePath + internal/api/volume_browser.go callers (the only remaining users).

Priority 2 — Behavior gaps

Generalized volume scopes — DONE

Landed: internal/volume.ResolveWorkloadPath (workload-keyed; sits next to the legacy ResolvePath so legacy code paths keep working) plus the wired-through computeMounts in internal/workload/plugin/source/image/image.go. All VolumeScope values are now honored at deploy time:

  • absolute — host bind, validated against settings.AllowedVolumePaths.
  • ephemeral — tmpfs.
  • instance — per-tag dir under <base>/<workload>-<idShort>/instance-<tag>/<source>.
  • stage, project — both collapse to <base>/<workload>-<idShort>/<source>.
  • project_named — Docker named volume prefixed tf-<idShort>-<name>.
  • named — Docker named volume by raw name.

Test coverage: internal/volume/resolver_test.go (table-driven, portable Linux/Windows). The legacy ResolvePath stays in place for legacy deployer + volume-browser callers and dies with the hard cutover.

Kind-aware editors on /apps/new and /apps/[id] edit — DONE

All three Source plugins now have hand-rolled forms on both pages, with an "Advanced JSON" toggle preserved as the power-user escape hatch. Submit logic marshals form fields back into the same JSON shape the backend already expects — no API or store changes required.

Principle: the plugin contract makes new Source / Trigger kinds cheap on the backend, but the UI is not cheap by default — every kind needs a paired hand-rolled form to be daily-driver usable. The shared JSON editor is the fallback for power users and brand-new plugins, not the end state. New Source / Trigger merge requests should treat "ship the kind-aware form" as part of done, not a follow-up.

Landed:

  • compose: YAML textarea + project_name input on both /apps/new and /apps/[id].
  • image: form fields for image / port / healthcheck / default_tag / registry_name / cpu_limit / memory_limit / max_instances on both pages. Registry name is a select populated from /api/registries (with text-input fallback when the list is empty). env + volumes stay in their detail-page panels and round-trip through the form via imageFormBody so manual edits aren't clobbered.
  • static: provider select (gitea / github / gitlab), base URL, repo_owner / repo_name (both required), branch (default "main"), folder_path, access_token (password input, for private repos), mode radio (static / deno), render_markdown checkbox. The storage_enabled / storage_limit_mb fields aren't surfaced as form controls yet, but they round-trip through staticFormBody so values set via the raw JSON editor survive form edits.

Still pending forms: none — all three Source plugins now have hand-rolled forms on both /apps/new and /apps/[id].

The raw JSON editor stays available behind the "Advanced JSON" toggle (shipped with compose) so the plugin's full sample is still reachable for power users and for any new plugin kind without a hand-rolled form.

Effort: per-kind form roughly half a turn each; can land incrementally. Touches web/src/routes/apps/new/+page.svelte and the edit block in web/src/routes/apps/[id]/+page.svelte. The Svelte side keeps serializing into the same source_config JSON shape the backend already expects — no API or store change required.

Vendor-specific webhook parsing for /api/webhook/workloads/{secret} — DONE

Landed: internal/webhook/vendor_parsers.go plus rewrites in internal/webhook/handler.go buildInboundEvent. The dispatch order is now:

  1. Empty body → manual event.
  2. Vendor-specific parsers, short-circuit on a recognized X-*-Event header — Gitea package, GitHub package / registry_package, GitHub push, Gitea push, GitLab Push Hook / Tag Push Hook.
  3. Generic simple-body fallback: top-level image or top-level ref — what the legacy CI integrations already send.

Vendor parsers can populate fields the generic parser cannot: image digest, GitEvent.Vendor, registry host. When a vendor parser claims a request (header matches) it is authoritative — a malformed Gitea package payload surfaces as an error rather than silently falling through to the generic parser. Test coverage: internal/webhook/vendor_parsers_test.go covers each vendor branch + the routed-via-buildInboundEvent integration cases.

Open follow-ups deferred to future turns:

  • GitLab Container Registry events use a custom envelope outside the webhook event surface — handle if a user reports needing it.
  • Docker Hub webhook (push event) uses {"push_data": {"tag": ...}, "repository": {...}} — add when there's a user request.

Priority 3 — Polish

Chain-panel CSS — DONE

Landed: rules for .chain-row, .chain-card (with hover/transform on anchors), .chain-self (brand-tinted highlight), .chain-name, .chain-label (70px fixed-width mono column), .chain-children-list (flex-wrap), plus a sub-600px stack to keep the panel usable on narrow screens. Appended at the end of the <style> block in web/src/routes/apps/[id]/+page.svelte.

Docs / codemap entries

Nothing under docs/CODEMAPS/ for internal/workload/plugin/. Should cover:

  • The Source × Trigger contract + registry pattern (init() + blank-import in cmd/server/main.go).
  • How a new Source kind is added (write init() registration, blank-import, add to wizard via SchemaSample).
  • The dispatcher seam: deployer.DispatchPlugin / DispatchTeardown / DispatchReconcile and how the reconciler / webhook ingress / API handlers all flow through it.

README.md should mention /apps as the new user surface and that /projects / /sites / /stacks carry Deprecation: true headers.

i18n: page-level strings — PARTIAL

Already i18n'd:

  • nav.apps, nav.eventTriggers, nav.logScanRules — top nav labels.
  • Log Rules panel on /apps/[id] reuses logscan.panel.* keys (shipped with the Observability work).
  • All /event-triggers/* and /log-scan-rules/* page strings — keys live under triggers.* and logscan.* namespaces in web/src/lib/i18n/{en,ru}.json.

Still hardcoded English:

  • /apps/+page.svelte — list page (hero, lede, stats, empty state, table headers, status pills).
  • /apps/new/+page.svelte — wizard labels, form copy, kind-aware form rows (compose / image / static all hardcoded English today).
  • /apps/[id]/+page.svelte — detail page sections (chain, env, volumes, webhook, manual deploy, danger zone) — the Log Rules panel embedded inside it is the only i18n'd section.

Roughly 80100 keys across the three /apps/* pages once extracted. Namespace: apps.* (with sub-namespaces apps.list.*, apps.new.*, apps.detail.*, apps.form.*).

Priority 4 — Tests we still don't have

Solid pure-function coverage landed in the prior turn. Still missing:

  • API-handler integration tests for /api/workloads/* (CRUD, deploy, env, volumes, webhook, chain, promote-from). Pattern: in-memory store + fake deployer + fake docker / proxy / dns providers, exercise via httptest.
  • Deployer dispatcher: DispatchPlugin / DispatchTeardown / DispatchReconcile with a fake Source registered.
  • Compose source: composeProjectName sanitizer, writeYAMLIfChanged short-circuit. (Both pure; just need fixtures.)
  • Static source Backend adapter in cmd/server/static_backend.go.

Open architectural questions

Stages chain vs explicit Stage entity

parent_workload_id is now the canonical mechanism for stage chains (dev → staging → prod). Decision deferred: do we need a separate Stage entity at all, or is the chain sufficient? Currently feels like the chain covers the use case — promote-from works, the UI shows the relationship. Probably can leave the legacy stages table dropped entirely once cutover proceeds.

Container.extra_json evolution

Currently only the image source uses it (per-face proxy route IDs). If other sources gain similar needs (compose service health metadata, static build SHAs), the schema there should stay versionless and additive — every reader must tolerate unknown keys. Document this in the source plugin guide alongside the codemap entry.

File pointers for the next session

  • Plugin contracts: internal/workload/plugin/{plugin,source,trigger,types,registry}.go
  • Source implementations: internal/workload/plugin/source/{image,compose,static}/
  • Trigger implementations: internal/workload/plugin/trigger/{registry,git,manual}/
  • Dispatcher: internal/deployer/dispatch.go
  • Webhook ingress (plugin path): internal/webhook/handler.go handlePluginWorkloadWebhook
  • Reconciler hook: internal/reconciler/reconciler.go reconcilePluginWorkloads
  • Static backend adapter (to be deleted post-port): cmd/server/static_backend.go
  • Frontend pages: web/src/routes/apps/+page.svelte, web/src/routes/apps/new/+page.svelte, web/src/routes/apps/[id]/+page.svelte
  • Tests: internal/workload/plugin/trigger/*/!(_test).go, internal/workload/plugin/source/image/image_helpers_test.go, internal/webhook/inbound_event_test.go, internal/store/workload_env_test.go

Memory pointer

Memory at C:/Users/Alexei/.claude/projects/c--Users-Alexei-Documents-docker-watcher/memory/ already covers the Workload-first decision and the no-migration constraint. Refresh as the cutover lands.