# Workload Refactor — Compressed Plan Status: Shipped (with explicit deferrals — see "What actually shipped" at the bottom) Owner: alexei.dolgolyov Date: 2026-05-07 Last updated: 2026-05-09 (post multi-agent review fixes) ## Goal Unify `Project`, `Stack`, and `StaticSite` under a single `Workload` primitive, and introduce a normalized `containers` index so every Tinyforge-managed container has one canonical row. This unblocks a global Containers view today and lets future workload kinds (cron jobs, one-shot tasks, databases-as-resource, functions) plug in without another tab/store/deployer branch. ## Why this is the compressed plan The original 8-PR plan was designed for a live system with dual-writes and soak periods. Tinyforge has no production users yet, so all defenses against live runtime state collapse: no external label consumers, no third-party CI hitting webhook URLs, no orphaned containers to recover. Everything ships in 3 PRs against a clean slate. Solo-dev reversibility is preserved by branching, not by dual-write gymnastics. ## Target architecture - `Workload` is the unifying primitive with `kind ∈ {project, stack, site, …}`. Each existing Project/Stack/StaticSite becomes a Workload row. - `containers` is a normalized index: every Tinyforge-managed container has one row with `workload_id`, `workload_kind`, `role`, Docker container ID, host, state, last_seen. - Optional `apps` table (thin nullable `app_id` on Workload) added empty; UI gated behind a feature flag, defer indefinitely until pull. - Stable Docker labels: `tinyforge.workload.id`, `tinyforge.workload.kind`, `tinyforge.role`, `tinyforge.managed`. Legacy `tinyforge.project` / `tinyforge.stage` / `tinyforge.instance-id` are removed in the same wave. - Global `/containers` UI route; per-workload container panel becomes a shared `` component reused by project, stack, and site detail pages. ## Schema Appended to `internal/store/store.go::runMigrations()` as additive `CREATE TABLE` statements (idempotent via `CREATE TABLE IF NOT EXISTS`). ```sql CREATE TABLE IF NOT EXISTS workloads ( id TEXT PRIMARY KEY, kind TEXT NOT NULL, -- 'project' | 'stack' | 'site' ref_id TEXT NOT NULL, -- FK into projects/stacks/static_sites by kind name TEXT NOT NULL, app_id TEXT, -- nullable FK into apps.id notification_url TEXT NOT NULL DEFAULT '', notification_secret TEXT NOT NULL DEFAULT '', webhook_secret TEXT NOT NULL DEFAULT '', webhook_signing_secret TEXT NOT NULL DEFAULT '', webhook_require_signature INTEGER NOT NULL DEFAULT 0, created_at TEXT NOT NULL, updated_at TEXT NOT NULL, UNIQUE(kind, ref_id) ); CREATE INDEX IF NOT EXISTS idx_workloads_app_id ON workloads(app_id); CREATE INDEX IF NOT EXISTS idx_workloads_kind ON workloads(kind); CREATE TABLE IF NOT EXISTS containers ( id TEXT PRIMARY KEY, workload_id TEXT NOT NULL, workload_kind TEXT NOT NULL, -- denormalized for filtered queries role TEXT NOT NULL, -- stage name (project), service name (stack), '' (site) container_id TEXT NOT NULL DEFAULT '', -- Docker ID, '' between create+start image_ref TEXT NOT NULL DEFAULT '', host TEXT NOT NULL DEFAULT 'local', state TEXT NOT NULL DEFAULT '', -- running | stopped | failed | removing | missing port INTEGER NOT NULL DEFAULT 0, last_seen_at TEXT NOT NULL DEFAULT '', extra_json TEXT NOT NULL DEFAULT '{}', -- {subdomain, npm_proxy_id, proxy_route_id, ...} created_at TEXT NOT NULL, updated_at TEXT NOT NULL ); CREATE INDEX IF NOT EXISTS idx_containers_workload ON containers(workload_id); CREATE INDEX IF NOT EXISTS idx_containers_state ON containers(state); CREATE INDEX IF NOT EXISTS idx_containers_container_id ON containers(container_id); CREATE TABLE IF NOT EXISTS apps ( id TEXT PRIMARY KEY, name TEXT NOT NULL, description TEXT NOT NULL DEFAULT '', created_at TEXT NOT NULL, updated_at TEXT NOT NULL ); ``` `extra_json` carries kind-specific fields (`subdomain`, `npm_proxy_id`, `proxy_route_id`) so the spine stays narrow. SQLite JSON1 is required for queries against `extra_json`; verify the driver in `go.mod` supports it before committing — fall back to dedicated columns if not. ## PR 1 — Spine: schema, Workload package, reconciler Single PR, lands the data layer end-to-end. No dual-writes; project/stack/site CRUD writes directly to `workloads`. ### New files - `internal/store/workloads.go` — `CreateWorkload`, `GetWorkloadByID`, `GetWorkloadByRef(kind, refID)`, `ListWorkloads`, `UpdateWorkload`, `DeleteWorkload`. - `internal/store/containers.go` — `UpsertContainer`, `GetContainerByDockerID`, `ListContainersByWorkload`, `ListContainers(filter)`, `MarkContainerMissing`, new `ListProxyRoutes` (mirrors the join shape from `internal/store/instances.go::ListProxyRoutes`, reading `extra_json` via `json_extract`). - `internal/store/apps.go` — minimal CRUD; not wired anywhere yet. - `internal/workload/workload.go` — `Workload` interface (`ID`, `Kind`, `Name`, `Deploy`, `Stop`, `Start`, `Delete`, `Containers`). - `internal/workload/adapters/project_adapter.go` — wraps `internal/deployer`. - `internal/workload/adapters/stack_adapter.go` — wraps `internal/stack/manager.go`. - `internal/workload/adapters/site_adapter.go` — wraps `internal/staticsite/manager.go`. - `internal/reconciler/reconciler.go` — single writer to `containers`. Reads `docker ps --filter label=tinyforge.managed`, groups by `(workload.id, role)`, upserts rows, marks absent rows `state='missing'`. Boot-time one-shot run + 30s tick. - `internal/reconciler/reconciler_test.go` — table-driven tests with a fake Docker client. ### Modified files - `internal/store/store.go::runMigrations` — append the three `CREATE TABLE` statements (after line ~165 where the existing migrations end). - `internal/store/models.go` — add `Workload`, `Container`, `App` structs. - `internal/store/projects.go` — `CreateProject`, `UpdateProject`, `DeleteProject` wrap the write in `s.db.Begin()` and also write the matching `workloads` row. Webhook/notification secret setters update `workloads.webhook_secret` / `webhook_signing_secret` / `notification_secret` directly. - `internal/store/stacks.go` — same Workload write on `CreateStack` / `UpdateStack` / `DeleteStack`. - `internal/store/static_sites.go` — same. - `internal/docker/client.go` — add label constants `LabelWorkloadID`, `LabelWorkloadKind`, `LabelRole`, `LabelManaged`. **Remove** the old `LabelProject`, `LabelStage`, `LabelInstanceID` writes from the deployer. - `internal/deployer/deployer.go` (label injection ~line 388) — emit only the new labels. - `internal/deployer/bluegreen.go` (~line 97) — same. - `internal/stack/manager.go` — after `docker compose up`, stamp new labels on each compose-managed container via `docker container update --label-add`. Compose's own `com.docker.compose.service` becomes `role`. - `internal/staticsite/manager.go` — stamp new labels at container start. - `internal/store/instances.go` — **delete this file**. The deployer no longer creates instance rows; reconciler owns container state. - `internal/api/instances.go` — **delete or alias** to `/api/containers` filtered by workload. Solo dev → delete is cleaner. - `internal/api/proxies.go` — switch the `ListProxyRoutes` import to `containers.ListProxyRoutes`. - `internal/api/docker.go::buildActiveImagesSet` (~line 251) — replace the `ListAllInstances` walk with a single `containers.image_ref` query. - `internal/api/stale.go`, `internal/stale/scanner.go` — read from `containers` instead of `instances`. - `internal/webhook/matcher.go` — query `workloads.webhook_secret` directly. - `cmd/server/main.go` — start the reconciler goroutine after `store.New`. Drop any startup code that touched `instances`. ### Tests - Extend `internal/store/store_test.go` with `TestCreateProjectAlsoCreatesWorkload`, `TestDeleteProjectCascadesWorkload`, `TestUpsertContainerIdempotent`, `TestListProxyRoutesShape`. - New `internal/reconciler/reconciler_test.go` with a `dockerClient` interface and a fake — assert that a slice of `types.Container` produces the expected `containers` upserts. - Run the existing test suite under `-race`. ### Deliverable System builds, deploys a project end-to-end, deploys a stack end-to-end, deploys a static site end-to-end. `containers` table reflects reality after each deploy and after a 30s reconciler tick. The legacy `instances` table is gone. ## PR 2 — API + frontend ### New files - `internal/api/workloads.go` — `GET /api/workloads`, `GET /api/workloads/{id}`, `GET /api/workloads/{id}/containers`, `PATCH /api/workloads/{id}` (sets `app_id` and notification/webhook config). - `internal/api/containers.go` — `GET /api/containers?workload_id=&kind=&state=&app_id=`, `GET /api/containers/{id}`. - `internal/api/apps.go` — `GET /api/apps`, `POST /api/apps`, `PATCH /api/apps/{id}`, `DELETE /api/apps/{id}` (gated by settings flag `features.apps_grouping=true`). - `web/src/routes/containers/+page.svelte` — global filterable table. Reuses table patterns from `web/src/routes/proxies/+page.svelte` and `web/src/routes/containers/stale/+page.svelte` (the existing `stale/` route stays untouched). - `web/src/lib/components/WorkloadContainers.svelte` — shared container panel. Takes `workloadId` prop, hits `/api/workloads/{id}/containers`. Handles 1..N container rows. ### Modified files - `internal/api/router.go` — register the new endpoints. Remove `/api/instances` registration. - `web/src/routes/projects/[id]/+page.svelte` — replace the inline instance list with ``. - `web/src/routes/stacks/[id]/+page.svelte` — same. - `web/src/routes/sites/[id]/+page.svelte` — same. - Top nav component (find under `web/src/lib/components/`) — insert a "Containers" tab between "Projects" and "Stacks". Existing tabs stay. - `web/src/lib/api.ts` (or wherever API client functions live) — add `listWorkloads`, `getWorkload`, `listContainers`, `getContainer`, `listApps`. Remove instance-shaped helpers. - `web/src/lib/types.ts` — add `Workload`, `Container`, `App` types. Remove `Instance` once unreferenced. ### Deliverable User-visible: a `Containers` tab in the top nav showing every running container with kind/state/workload filters, links into the owning project/stack/site detail page, and a per-workload container panel that looks identical on all three detail pages. ## PR 3 — Polish + optional Apps UI Defer indefinitely if no pull. Lands as a single PR when wanted. ### Scope - Apps UI: `web/src/routes/apps/+page.svelte`, `[id]/+page.svelte`. Workload detail pages get an "App" dropdown to assign `app_id`. Gated by `features.apps_grouping=true` in settings. - Drop any leftover dead code referencing `Instance` types. - Documentation: update `CLAUDE.md` and `README.md` to describe the Workload model. - Optional: consolidate `internal/deployer` and `internal/stack/manager` into a single orchestrator. **Out of scope for this refactor** — adapters wrap the existing kind-specific code and that's fine. Revisit only if the duplication starts hurting. ## What's explicitly deferred - Deployer + stack-manager consolidation. - Apps UI (schema added in PR 1, UI in PR 3 behind flag). - Multi-host containers (`containers.host` exists but is always `'local'`). - Workload-kind plugin model — the adapter registry has three hardcoded entries. - Webhook secret handling for old per-project URLs that may already be in CI configs (no users yet → don't care). ## Risks (compressed) - **SQLite JSON1 availability.** Verify the driver in `go.mod` supports `json_extract` before committing to `extra_json`. If not, hoist `subdomain`, `npm_proxy_id`, `proxy_route_id` to dedicated columns on `containers`. - **`ListProxyRoutes` shape regression.** The new query reads from `containers` + `workloads` instead of `instances` + `projects` + `stages`. Worth a golden-output test before flipping `internal/api/proxies.go` over. - **Stack containers and label stamping.** `docker container update --label-add` is required to label compose-managed containers post-up. If the local Docker engine version doesn't support it, fall back to relying on `com.docker.compose.project` + `com.docker.compose.service` for reconciler joins. - **Boot-time backfill from `docker ps`.** First run needs to populate `containers` from currently-running containers using the legacy `tinyforge.instance-id` and `com.docker.compose.project` labels (since pre-refactor containers don't have the new labels). Solo-dev workaround: `docker compose down` test workloads, run the new binary against an empty Docker host, redeploy. ## Concrete file paths Modified: - `internal/store/store.go` (migrations at line ~75–165) - `internal/store/projects.go`, `stacks.go`, `static_sites.go`, `models.go`, `store_test.go` - `internal/docker/client.go` - `internal/deployer/deployer.go` (~line 388), `internal/deployer/bluegreen.go` (~line 97) - `internal/stack/manager.go`, `internal/staticsite/manager.go` - `internal/api/router.go`, `proxies.go`, `docker.go` (`buildActiveImagesSet` at line 251), `stale.go` - `internal/stale/scanner.go`, `internal/webhook/matcher.go` - `cmd/server/main.go` - `web/src/routes/projects/[id]/+page.svelte`, `stacks/[id]/+page.svelte`, `sites/[id]/+page.svelte` - `web/src/lib/api.ts`, `web/src/lib/types.ts` - Top nav component in `web/src/lib/components/` Created: - `internal/store/workloads.go`, `containers.go`, `apps.go` - `internal/workload/workload.go`, `adapters/project_adapter.go`, `adapters/stack_adapter.go`, `adapters/site_adapter.go` - `internal/reconciler/reconciler.go`, `reconciler_test.go` - `internal/api/workloads.go`, `containers.go`, `apps.go` - `web/src/routes/containers/+page.svelte` - `web/src/lib/components/WorkloadContainers.svelte` Deleted: - `internal/store/instances.go` - `internal/api/instances.go` ## What actually shipped (2026-05-09) After a multi-agent code review caught several issues, the refactor landed with the following deviations from the original plan. They are documented here so a future reader doesn't have to reconstruct them from git log. ### Deferred / dropped - **`internal/workload/` package + adapters.** The plan called for a `Workload` interface (`Deploy`, `Stop`, `Start`, `Delete`, `Containers`) with `project_adapter.go`, `stack_adapter.go`, `site_adapter.go`. **Not built.** The adapters would have been thin pass-throughs to the existing kind-specific code; the duplication is real but small and the per-kind paths still type-check cleanly. The data-layer "Workload" (DB row) is the only Workload primitive today. Revisit if the per-kind branching becomes painful. - **`internal/api/instances.go` URL space.** Plan said "delete or alias to /api/containers." **Kept alive** but every handler that mutates a container now calls `resolveAndAuthorizeInstance` to verify the row's `(workload_id, role)` match the URL's `(project_id, stage_name)` — closes the cross-project hijack the security review flagged. URL renaming deferred until the frontend `InstanceCard` is renamed too (next refactor wave). - **`InstanceCard.svelte` rename.** The component is now generic enough to be `ContainerCard`, but the rename would touch 3+ call sites and i18n keys. Deferred. - **`extra_json` SQL column.** Schema still has the column (NOT NULL DEFAULT '{}'); Go code no longer references it (struct field, scan, INSERT, UPDATE all dropped). When/if a kind-specific need surfaces, hoist a dedicated column rather than re-introducing JSON1. ### Built but not in the original plan - **`Container.stage_id` column** + index + ListProxyRoutes / ListContainersByStageID join. Survives stage renames; the original plan joined on `stages.name = containers.role` which would orphan rows on rename. The deployer populates `stage_id` for project containers; stack/site rows leave it empty. - **`store.ReconcileContainer`** — separate write path for the reconciler. The original `UpsertContainer` ON CONFLICT clause overwrote `subdomain`, `proxy_route_id`, `npm_proxy_id`, `image_tag` from the reconciler's empty values on every 30s tick, silently wiping deployer state. `ReconcileContainer` only updates Docker-derived fields on conflict (`container_id`, `image_ref`, `state`, `port`, `last_seen_at`, `updated_at`). - **Workload-existence check in the reconciler** — a `tinyforge.workload.id` label that doesn't resolve to a known workload is now rejected. Anyone with Docker socket access could otherwise spawn a container with a forged label and steal the canonical row for an existing workload. - **Project-kind row invention skipped.** When the reconciler sees a container with `tinyforge.workload.kind=project` and no existing row matches the docker container ID, it skips the upsert (deployer is the authoritative writer for project rows). Inventing a deterministic-key row would race with `MaxInstances > 1` deploys. - **Reconciler shutdown ordering** — `Stop()` cancels its child context before `wg.Wait()` so a hung `docker ps` doesn't block process shutdown. - **Transactional CRUD + workload sync.** Every `Create*`, `Update*`, `Delete*`, and `Set*Secret` path on `projects` / `stacks` / `static_sites` now wraps the parent UPDATE and the workload row sync in a single transaction. Closes the rotation-durability gap the security review flagged. - **Workload-only webhook lookup.** The legacy fallback (`GetProjectByWebhookSecret`, `GetStaticSiteByWebhookSecret`) is gone — webhook routing reads exclusively through `workloads.webhook_secret`, so a rotation that didn't commit doesn't get silently accepted. - **`store.GetStackByComposeProjectName`** + indexed lookup. Reconciler used to do a full-table stack scan per compose container per tick. - **`store.ListMissingSweepRows`** — filtered query (`container_id != '' AND state != 'missing'`) so the missing-sweep reads only candidate rows instead of the whole index. - **`web/src/lib/components/WorkloadContainers.svelte`** — generic detail-page panel reusable by stack and site detail pages. Project detail keeps its stage-grouped `InstanceCard` layout (containers there are sharded per-stage, not flat). - **Containers page polish** — kind/state filters now apply client-side over an unfiltered fetch (so tab counters reflect the whole population), URL-synced filters (`?kind=stack&state=running`) for shareable links, race-safe loads via a sequence number, full i18n with EN+RU strings, and a counter badge in the sidebar via `navCounts.containers`. - **`stage_id` migration.** New rows get `stage_id` from the deployer; legacy rows fall back to the (project_id, role=stage_name) join inside `ListContainersByStageID`.