refactor(workload): finalize containers index + post-review hardening
Wraps up the workload refactor with the fixes that came out of the multi-agent code review (see docs/plans/workload-refactor.md "What actually shipped"). Backend: - store.ReconcileContainer: separate write path so the 30s reconciler tick no longer overwrites deployer-owned fields (subdomain, proxy_route_id, npm_proxy_id, image_tag). - Container.stage_id column + index; ListProxyRoutes / ListContainersByStageID join via stage_id (survives stage rename), with legacy fallback to (project_id, role=stage_name). - Reconciler: workload-existence check (rejects forged tinyforge.workload.id labels), skips inventing project-kind rows, child-context cancel before wg.Wait() on shutdown. - Transactional CRUD across projects / stacks / static_sites: parent UPDATE and workload sync land in one transaction so secret rotations are durable. - Webhook routing reads exclusively through workloads.webhook_secret; legacy GetProjectByWebhookSecret / GetStaticSiteByWebhookSecret fallback removed. - store.GetStackByComposeProjectName + indexed lookup (no more full-table stack scan per compose container per tick). - store.ListMissingSweepRows: filtered query for the missing-sweep. - /api/instances/* handlers verify (workload_id, role) match URL (project_id, stage_name) before mutating — closes the cross-project hijack the security review flagged. - extra_json no longer referenced from Go (column kept on disk for now). Frontend: - WorkloadContainers.svelte: generic detail-page panel reusable by stack and site detail pages. - Containers page polish: client-side kind/state filters over an unfiltered fetch, URL-synced filters, race-safe loads via sequence number, EN+RU i18n, sidebar counter via navCounts.containers. Misc: - scripts/dev-server.sh: tolerate empty netstat grep result. - .gitignore: ignore docker-watcher binaries, .claude/worktrees/, .facts-sync.json.
This commit is contained in:
@@ -1,8 +1,9 @@
|
||||
# Workload Refactor — Compressed Plan
|
||||
|
||||
Status: Draft, pre-implementation
|
||||
Status: Shipped (with explicit deferrals — see "What actually shipped" at the bottom)
|
||||
Owner: alexei.dolgolyov
|
||||
Date: 2026-05-07
|
||||
Last updated: 2026-05-09 (post multi-agent review fixes)
|
||||
|
||||
## Goal
|
||||
|
||||
@@ -195,3 +196,29 @@ Created:
|
||||
Deleted:
|
||||
- `internal/store/instances.go`
|
||||
- `internal/api/instances.go`
|
||||
|
||||
## What actually shipped (2026-05-09)
|
||||
|
||||
After a multi-agent code review caught several issues, the refactor landed with the following deviations from the original plan. They are documented here so a future reader doesn't have to reconstruct them from git log.
|
||||
|
||||
### Deferred / dropped
|
||||
|
||||
- **`internal/workload/` package + adapters.** The plan called for a `Workload` interface (`Deploy`, `Stop`, `Start`, `Delete`, `Containers`) with `project_adapter.go`, `stack_adapter.go`, `site_adapter.go`. **Not built.** The adapters would have been thin pass-throughs to the existing kind-specific code; the duplication is real but small and the per-kind paths still type-check cleanly. The data-layer "Workload" (DB row) is the only Workload primitive today. Revisit if the per-kind branching becomes painful.
|
||||
- **`internal/api/instances.go` URL space.** Plan said "delete or alias to /api/containers." **Kept alive** but every handler that mutates a container now calls `resolveAndAuthorizeInstance` to verify the row's `(workload_id, role)` match the URL's `(project_id, stage_name)` — closes the cross-project hijack the security review flagged. URL renaming deferred until the frontend `InstanceCard` is renamed too (next refactor wave).
|
||||
- **`InstanceCard.svelte` rename.** The component is now generic enough to be `ContainerCard`, but the rename would touch 3+ call sites and i18n keys. Deferred.
|
||||
- **`extra_json` SQL column.** Schema still has the column (NOT NULL DEFAULT '{}'); Go code no longer references it (struct field, scan, INSERT, UPDATE all dropped). When/if a kind-specific need surfaces, hoist a dedicated column rather than re-introducing JSON1.
|
||||
|
||||
### Built but not in the original plan
|
||||
|
||||
- **`Container.stage_id` column** + index + ListProxyRoutes / ListContainersByStageID join. Survives stage renames; the original plan joined on `stages.name = containers.role` which would orphan rows on rename. The deployer populates `stage_id` for project containers; stack/site rows leave it empty.
|
||||
- **`store.ReconcileContainer`** — separate write path for the reconciler. The original `UpsertContainer` ON CONFLICT clause overwrote `subdomain`, `proxy_route_id`, `npm_proxy_id`, `image_tag` from the reconciler's empty values on every 30s tick, silently wiping deployer state. `ReconcileContainer` only updates Docker-derived fields on conflict (`container_id`, `image_ref`, `state`, `port`, `last_seen_at`, `updated_at`).
|
||||
- **Workload-existence check in the reconciler** — a `tinyforge.workload.id` label that doesn't resolve to a known workload is now rejected. Anyone with Docker socket access could otherwise spawn a container with a forged label and steal the canonical row for an existing workload.
|
||||
- **Project-kind row invention skipped.** When the reconciler sees a container with `tinyforge.workload.kind=project` and no existing row matches the docker container ID, it skips the upsert (deployer is the authoritative writer for project rows). Inventing a deterministic-key row would race with `MaxInstances > 1` deploys.
|
||||
- **Reconciler shutdown ordering** — `Stop()` cancels its child context before `wg.Wait()` so a hung `docker ps` doesn't block process shutdown.
|
||||
- **Transactional CRUD + workload sync.** Every `Create*`, `Update*`, `Delete*`, and `Set*Secret` path on `projects` / `stacks` / `static_sites` now wraps the parent UPDATE and the workload row sync in a single transaction. Closes the rotation-durability gap the security review flagged.
|
||||
- **Workload-only webhook lookup.** The legacy fallback (`GetProjectByWebhookSecret`, `GetStaticSiteByWebhookSecret`) is gone — webhook routing reads exclusively through `workloads.webhook_secret`, so a rotation that didn't commit doesn't get silently accepted.
|
||||
- **`store.GetStackByComposeProjectName`** + indexed lookup. Reconciler used to do a full-table stack scan per compose container per tick.
|
||||
- **`store.ListMissingSweepRows`** — filtered query (`container_id != '' AND state != 'missing'`) so the missing-sweep reads only candidate rows instead of the whole index.
|
||||
- **`web/src/lib/components/WorkloadContainers.svelte`** — generic detail-page panel reusable by stack and site detail pages. Project detail keeps its stage-grouped `InstanceCard` layout (containers there are sharded per-stage, not flat).
|
||||
- **Containers page polish** — kind/state filters now apply client-side over an unfiltered fetch (so tab counters reflect the whole population), URL-synced filters (`?kind=stack&state=running`) for shareable links, race-safe loads via a sequence number, full i18n with EN+RU strings, and a counter badge in the sidebar via `navCounts.containers`.
|
||||
- **`stage_id` migration.** New rows get `stage_id` from the deployer; legacy rows fall back to the (project_id, role=stage_name) join inside `ListContainersByStageID`.
|
||||
|
||||
Reference in New Issue
Block a user