Wraps up the workload refactor with the fixes that came out of the multi-agent code review (see docs/plans/workload-refactor.md "What actually shipped"). Backend: - store.ReconcileContainer: separate write path so the 30s reconciler tick no longer overwrites deployer-owned fields (subdomain, proxy_route_id, npm_proxy_id, image_tag). - Container.stage_id column + index; ListProxyRoutes / ListContainersByStageID join via stage_id (survives stage rename), with legacy fallback to (project_id, role=stage_name). - Reconciler: workload-existence check (rejects forged tinyforge.workload.id labels), skips inventing project-kind rows, child-context cancel before wg.Wait() on shutdown. - Transactional CRUD across projects / stacks / static_sites: parent UPDATE and workload sync land in one transaction so secret rotations are durable. - Webhook routing reads exclusively through workloads.webhook_secret; legacy GetProjectByWebhookSecret / GetStaticSiteByWebhookSecret fallback removed. - store.GetStackByComposeProjectName + indexed lookup (no more full-table stack scan per compose container per tick). - store.ListMissingSweepRows: filtered query for the missing-sweep. - /api/instances/* handlers verify (workload_id, role) match URL (project_id, stage_name) before mutating — closes the cross-project hijack the security review flagged. - extra_json no longer referenced from Go (column kept on disk for now). Frontend: - WorkloadContainers.svelte: generic detail-page panel reusable by stack and site detail pages. - Containers page polish: client-side kind/state filters over an unfiltered fetch, URL-synced filters, race-safe loads via sequence number, EN+RU i18n, sidebar counter via navCounts.containers. Misc: - scripts/dev-server.sh: tolerate empty netstat grep result. - .gitignore: ignore docker-watcher binaries, .claude/worktrees/, .facts-sync.json.
19 KiB
Workload Refactor — Compressed Plan
Status: Shipped (with explicit deferrals — see "What actually shipped" at the bottom) Owner: alexei.dolgolyov Date: 2026-05-07 Last updated: 2026-05-09 (post multi-agent review fixes)
Goal
Unify Project, Stack, and StaticSite under a single Workload primitive, and introduce a normalized containers index so every Tinyforge-managed container has one canonical row. This unblocks a global Containers view today and lets future workload kinds (cron jobs, one-shot tasks, databases-as-resource, functions) plug in without another tab/store/deployer branch.
Why this is the compressed plan
The original 8-PR plan was designed for a live system with dual-writes and soak periods. Tinyforge has no production users yet, so all defenses against live runtime state collapse: no external label consumers, no third-party CI hitting webhook URLs, no orphaned containers to recover. Everything ships in 3 PRs against a clean slate. Solo-dev reversibility is preserved by branching, not by dual-write gymnastics.
Target architecture
Workloadis the unifying primitive withkind ∈ {project, stack, site, …}. Each existing Project/Stack/StaticSite becomes a Workload row.containersis a normalized index: every Tinyforge-managed container has one row withworkload_id,workload_kind,role, Docker container ID, host, state, last_seen.- Optional
appstable (thin nullableapp_idon Workload) added empty; UI gated behind a feature flag, defer indefinitely until pull. - Stable Docker labels:
tinyforge.workload.id,tinyforge.workload.kind,tinyforge.role,tinyforge.managed. Legacytinyforge.project/tinyforge.stage/tinyforge.instance-idare removed in the same wave. - Global
/containersUI route; per-workload container panel becomes a shared<WorkloadContainers>component reused by project, stack, and site detail pages.
Schema
Appended to internal/store/store.go::runMigrations() as additive CREATE TABLE statements (idempotent via CREATE TABLE IF NOT EXISTS).
CREATE TABLE IF NOT EXISTS workloads (
id TEXT PRIMARY KEY,
kind TEXT NOT NULL, -- 'project' | 'stack' | 'site'
ref_id TEXT NOT NULL, -- FK into projects/stacks/static_sites by kind
name TEXT NOT NULL,
app_id TEXT, -- nullable FK into apps.id
notification_url TEXT NOT NULL DEFAULT '',
notification_secret TEXT NOT NULL DEFAULT '',
webhook_secret TEXT NOT NULL DEFAULT '',
webhook_signing_secret TEXT NOT NULL DEFAULT '',
webhook_require_signature INTEGER NOT NULL DEFAULT 0,
created_at TEXT NOT NULL,
updated_at TEXT NOT NULL,
UNIQUE(kind, ref_id)
);
CREATE INDEX IF NOT EXISTS idx_workloads_app_id ON workloads(app_id);
CREATE INDEX IF NOT EXISTS idx_workloads_kind ON workloads(kind);
CREATE TABLE IF NOT EXISTS containers (
id TEXT PRIMARY KEY,
workload_id TEXT NOT NULL,
workload_kind TEXT NOT NULL, -- denormalized for filtered queries
role TEXT NOT NULL, -- stage name (project), service name (stack), '' (site)
container_id TEXT NOT NULL DEFAULT '', -- Docker ID, '' between create+start
image_ref TEXT NOT NULL DEFAULT '',
host TEXT NOT NULL DEFAULT 'local',
state TEXT NOT NULL DEFAULT '', -- running | stopped | failed | removing | missing
port INTEGER NOT NULL DEFAULT 0,
last_seen_at TEXT NOT NULL DEFAULT '',
extra_json TEXT NOT NULL DEFAULT '{}', -- {subdomain, npm_proxy_id, proxy_route_id, ...}
created_at TEXT NOT NULL,
updated_at TEXT NOT NULL
);
CREATE INDEX IF NOT EXISTS idx_containers_workload ON containers(workload_id);
CREATE INDEX IF NOT EXISTS idx_containers_state ON containers(state);
CREATE INDEX IF NOT EXISTS idx_containers_container_id ON containers(container_id);
CREATE TABLE IF NOT EXISTS apps (
id TEXT PRIMARY KEY,
name TEXT NOT NULL,
description TEXT NOT NULL DEFAULT '',
created_at TEXT NOT NULL,
updated_at TEXT NOT NULL
);
extra_json carries kind-specific fields (subdomain, npm_proxy_id, proxy_route_id) so the spine stays narrow. SQLite JSON1 is required for queries against extra_json; verify the driver in go.mod supports it before committing — fall back to dedicated columns if not.
PR 1 — Spine: schema, Workload package, reconciler
Single PR, lands the data layer end-to-end. No dual-writes; project/stack/site CRUD writes directly to workloads.
New files
internal/store/workloads.go—CreateWorkload,GetWorkloadByID,GetWorkloadByRef(kind, refID),ListWorkloads,UpdateWorkload,DeleteWorkload.internal/store/containers.go—UpsertContainer,GetContainerByDockerID,ListContainersByWorkload,ListContainers(filter),MarkContainerMissing, newListProxyRoutes(mirrors the join shape frominternal/store/instances.go::ListProxyRoutes, readingextra_jsonviajson_extract).internal/store/apps.go— minimal CRUD; not wired anywhere yet.internal/workload/workload.go—Workloadinterface (ID,Kind,Name,Deploy,Stop,Start,Delete,Containers).internal/workload/adapters/project_adapter.go— wrapsinternal/deployer.internal/workload/adapters/stack_adapter.go— wrapsinternal/stack/manager.go.internal/workload/adapters/site_adapter.go— wrapsinternal/staticsite/manager.go.internal/reconciler/reconciler.go— single writer tocontainers. Readsdocker ps --filter label=tinyforge.managed, groups by(workload.id, role), upserts rows, marks absent rowsstate='missing'. Boot-time one-shot run + 30s tick.internal/reconciler/reconciler_test.go— table-driven tests with a fake Docker client.
Modified files
internal/store/store.go::runMigrations— append the threeCREATE TABLEstatements (after line ~165 where the existing migrations end).internal/store/models.go— addWorkload,Container,Appstructs.internal/store/projects.go—CreateProject,UpdateProject,DeleteProjectwrap the write ins.db.Begin()and also write the matchingworkloadsrow. Webhook/notification secret setters updateworkloads.webhook_secret/webhook_signing_secret/notification_secretdirectly.internal/store/stacks.go— same Workload write onCreateStack/UpdateStack/DeleteStack.internal/store/static_sites.go— same.internal/docker/client.go— add label constantsLabelWorkloadID,LabelWorkloadKind,LabelRole,LabelManaged. Remove the oldLabelProject,LabelStage,LabelInstanceIDwrites from the deployer.internal/deployer/deployer.go(label injection ~line 388) — emit only the new labels.internal/deployer/bluegreen.go(~line 97) — same.internal/stack/manager.go— afterdocker compose up, stamp new labels on each compose-managed container viadocker container update --label-add. Compose's owncom.docker.compose.servicebecomesrole.internal/staticsite/manager.go— stamp new labels at container start.internal/store/instances.go— delete this file. The deployer no longer creates instance rows; reconciler owns container state.internal/api/instances.go— delete or alias to/api/containersfiltered by workload. Solo dev → delete is cleaner.internal/api/proxies.go— switch theListProxyRoutesimport tocontainers.ListProxyRoutes.internal/api/docker.go::buildActiveImagesSet(~line 251) — replace theListAllInstanceswalk with a singlecontainers.image_refquery.internal/api/stale.go,internal/stale/scanner.go— read fromcontainersinstead ofinstances.internal/webhook/matcher.go— queryworkloads.webhook_secretdirectly.cmd/server/main.go— start the reconciler goroutine afterstore.New. Drop any startup code that touchedinstances.
Tests
- Extend
internal/store/store_test.gowithTestCreateProjectAlsoCreatesWorkload,TestDeleteProjectCascadesWorkload,TestUpsertContainerIdempotent,TestListProxyRoutesShape. - New
internal/reconciler/reconciler_test.gowith adockerClientinterface and a fake — assert that a slice oftypes.Containerproduces the expectedcontainersupserts. - Run the existing test suite under
-race.
Deliverable
System builds, deploys a project end-to-end, deploys a stack end-to-end, deploys a static site end-to-end. containers table reflects reality after each deploy and after a 30s reconciler tick. The legacy instances table is gone.
PR 2 — API + frontend
New files
internal/api/workloads.go—GET /api/workloads,GET /api/workloads/{id},GET /api/workloads/{id}/containers,PATCH /api/workloads/{id}(setsapp_idand notification/webhook config).internal/api/containers.go—GET /api/containers?workload_id=&kind=&state=&app_id=,GET /api/containers/{id}.internal/api/apps.go—GET /api/apps,POST /api/apps,PATCH /api/apps/{id},DELETE /api/apps/{id}(gated by settings flagfeatures.apps_grouping=true).web/src/routes/containers/+page.svelte— global filterable table. Reuses table patterns fromweb/src/routes/proxies/+page.svelteandweb/src/routes/containers/stale/+page.svelte(the existingstale/route stays untouched).web/src/lib/components/WorkloadContainers.svelte— shared container panel. TakesworkloadIdprop, hits/api/workloads/{id}/containers. Handles 1..N container rows.
Modified files
internal/api/router.go— register the new endpoints. Remove/api/instancesregistration.web/src/routes/projects/[id]/+page.svelte— replace the inline instance list with<WorkloadContainers workloadId={...}/>.web/src/routes/stacks/[id]/+page.svelte— same.web/src/routes/sites/[id]/+page.svelte— same.- Top nav component (find under
web/src/lib/components/) — insert a "Containers" tab between "Projects" and "Stacks". Existing tabs stay. web/src/lib/api.ts(or wherever API client functions live) — addlistWorkloads,getWorkload,listContainers,getContainer,listApps. Remove instance-shaped helpers.web/src/lib/types.ts— addWorkload,Container,Apptypes. RemoveInstanceonce unreferenced.
Deliverable
User-visible: a Containers tab in the top nav showing every running container with kind/state/workload filters, links into the owning project/stack/site detail page, and a per-workload container panel that looks identical on all three detail pages.
PR 3 — Polish + optional Apps UI
Defer indefinitely if no pull. Lands as a single PR when wanted.
Scope
- Apps UI:
web/src/routes/apps/+page.svelte,[id]/+page.svelte. Workload detail pages get an "App" dropdown to assignapp_id. Gated byfeatures.apps_grouping=truein settings. - Drop any leftover dead code referencing
Instancetypes. - Documentation: update
CLAUDE.mdandREADME.mdto describe the Workload model. - Optional: consolidate
internal/deployerandinternal/stack/managerinto a single orchestrator. Out of scope for this refactor — adapters wrap the existing kind-specific code and that's fine. Revisit only if the duplication starts hurting.
What's explicitly deferred
- Deployer + stack-manager consolidation.
- Apps UI (schema added in PR 1, UI in PR 3 behind flag).
- Multi-host containers (
containers.hostexists but is always'local'). - Workload-kind plugin model — the adapter registry has three hardcoded entries.
- Webhook secret handling for old per-project URLs that may already be in CI configs (no users yet → don't care).
Risks (compressed)
- SQLite JSON1 availability. Verify the driver in
go.modsupportsjson_extractbefore committing toextra_json. If not, hoistsubdomain,npm_proxy_id,proxy_route_idto dedicated columns oncontainers. ListProxyRoutesshape regression. The new query reads fromcontainers+workloadsinstead ofinstances+projects+stages. Worth a golden-output test before flippinginternal/api/proxies.goover.- Stack containers and label stamping.
docker container update --label-addis required to label compose-managed containers post-up. If the local Docker engine version doesn't support it, fall back to relying oncom.docker.compose.project+com.docker.compose.servicefor reconciler joins. - Boot-time backfill from
docker ps. First run needs to populatecontainersfrom currently-running containers using the legacytinyforge.instance-idandcom.docker.compose.projectlabels (since pre-refactor containers don't have the new labels). Solo-dev workaround:docker compose downtest workloads, run the new binary against an empty Docker host, redeploy.
Concrete file paths
Modified:
internal/store/store.go(migrations at line ~75–165)internal/store/projects.go,stacks.go,static_sites.go,models.go,store_test.gointernal/docker/client.gointernal/deployer/deployer.go(~line 388),internal/deployer/bluegreen.go(~line 97)internal/stack/manager.go,internal/staticsite/manager.gointernal/api/router.go,proxies.go,docker.go(buildActiveImagesSetat line 251),stale.gointernal/stale/scanner.go,internal/webhook/matcher.gocmd/server/main.goweb/src/routes/projects/[id]/+page.svelte,stacks/[id]/+page.svelte,sites/[id]/+page.svelteweb/src/lib/api.ts,web/src/lib/types.ts- Top nav component in
web/src/lib/components/
Created:
internal/store/workloads.go,containers.go,apps.gointernal/workload/workload.go,adapters/project_adapter.go,adapters/stack_adapter.go,adapters/site_adapter.gointernal/reconciler/reconciler.go,reconciler_test.gointernal/api/workloads.go,containers.go,apps.goweb/src/routes/containers/+page.svelteweb/src/lib/components/WorkloadContainers.svelte
Deleted:
internal/store/instances.gointernal/api/instances.go
What actually shipped (2026-05-09)
After a multi-agent code review caught several issues, the refactor landed with the following deviations from the original plan. They are documented here so a future reader doesn't have to reconstruct them from git log.
Deferred / dropped
internal/workload/package + adapters. The plan called for aWorkloadinterface (Deploy,Stop,Start,Delete,Containers) withproject_adapter.go,stack_adapter.go,site_adapter.go. Not built. The adapters would have been thin pass-throughs to the existing kind-specific code; the duplication is real but small and the per-kind paths still type-check cleanly. The data-layer "Workload" (DB row) is the only Workload primitive today. Revisit if the per-kind branching becomes painful.internal/api/instances.goURL space. Plan said "delete or alias to /api/containers." Kept alive but every handler that mutates a container now callsresolveAndAuthorizeInstanceto verify the row's(workload_id, role)match the URL's(project_id, stage_name)— closes the cross-project hijack the security review flagged. URL renaming deferred until the frontendInstanceCardis renamed too (next refactor wave).InstanceCard.svelterename. The component is now generic enough to beContainerCard, but the rename would touch 3+ call sites and i18n keys. Deferred.extra_jsonSQL column. Schema still has the column (NOT NULL DEFAULT '{}'); Go code no longer references it (struct field, scan, INSERT, UPDATE all dropped). When/if a kind-specific need surfaces, hoist a dedicated column rather than re-introducing JSON1.
Built but not in the original plan
Container.stage_idcolumn + index + ListProxyRoutes / ListContainersByStageID join. Survives stage renames; the original plan joined onstages.name = containers.rolewhich would orphan rows on rename. The deployer populatesstage_idfor project containers; stack/site rows leave it empty.store.ReconcileContainer— separate write path for the reconciler. The originalUpsertContainerON CONFLICT clause overwrotesubdomain,proxy_route_id,npm_proxy_id,image_tagfrom the reconciler's empty values on every 30s tick, silently wiping deployer state.ReconcileContaineronly updates Docker-derived fields on conflict (container_id,image_ref,state,port,last_seen_at,updated_at).- Workload-existence check in the reconciler — a
tinyforge.workload.idlabel that doesn't resolve to a known workload is now rejected. Anyone with Docker socket access could otherwise spawn a container with a forged label and steal the canonical row for an existing workload. - Project-kind row invention skipped. When the reconciler sees a container with
tinyforge.workload.kind=projectand no existing row matches the docker container ID, it skips the upsert (deployer is the authoritative writer for project rows). Inventing a deterministic-key row would race withMaxInstances > 1deploys. - Reconciler shutdown ordering —
Stop()cancels its child context beforewg.Wait()so a hungdocker psdoesn't block process shutdown. - Transactional CRUD + workload sync. Every
Create*,Update*,Delete*, andSet*Secretpath onprojects/stacks/static_sitesnow wraps the parent UPDATE and the workload row sync in a single transaction. Closes the rotation-durability gap the security review flagged. - Workload-only webhook lookup. The legacy fallback (
GetProjectByWebhookSecret,GetStaticSiteByWebhookSecret) is gone — webhook routing reads exclusively throughworkloads.webhook_secret, so a rotation that didn't commit doesn't get silently accepted. store.GetStackByComposeProjectName+ indexed lookup. Reconciler used to do a full-table stack scan per compose container per tick.store.ListMissingSweepRows— filtered query (container_id != '' AND state != 'missing') so the missing-sweep reads only candidate rows instead of the whole index.web/src/lib/components/WorkloadContainers.svelte— generic detail-page panel reusable by stack and site detail pages. Project detail keeps its stage-groupedInstanceCardlayout (containers there are sharded per-stage, not flat).- Containers page polish — kind/state filters now apply client-side over an unfiltered fetch (so tab counters reflect the whole population), URL-synced filters (
?kind=stack&state=running) for shareable links, race-safe loads via a sequence number, full i18n with EN+RU strings, and a counter badge in the sidebar vianavCounts.containers. stage_idmigration. New rows getstage_idfrom the deployer; legacy rows fall back to the (project_id, role=stage_name) join insideListContainersByStageID.