739b67856a
Build / build (push) Successful in 10m39s
The clean-break delete that closes the workload-first refactor arc.
Net diff: ~30 backend files deleted, ~20 modified, ~12k LOC removed
on the Go side; entire /projects /stacks /sites /deploy frontend
trees gone; ~6.7k LOC removed on the Svelte/TypeScript side.
Backend
- API handlers gone: internal/api/{projects,stages,stage_env,stacks,
static_sites,deploys,instances,volume_browser}.go
- Store CRUD + tests gone: internal/store/{projects,stages,stage_env,
stacks,static_sites,static_site_secrets,deploys,poll_state,volumes,
workload_sync}.go (+ _test.go siblings)
- Legacy deployer pipeline gone: internal/deployer/{bluegreen,promote,
rollback,subdomain,resolver_test}.go; deployer.go trimmed to just the
dispatch surface used by the plugin pipeline
- internal/staticsite/{manager,healthcheck}.go and
internal/stack/manager.go gone (the rest of those packages stay as
helpers imported by the static + compose plugins)
- internal/registry/poller.go gone (legacy registry poller)
- internal/volume.ResolvePath gone; ResolveWorkloadPath stays
- internal/webhook: handleWebhook (project) + handleSiteWebhook (site)
gone; only POST /api/webhook/triggers/{secret} remains
- workload-side webhook URL handlers (getWorkloadWebhook +
regenerateWorkloadWebhook + EnsureWorkloadWebhookSecret +
SetWorkloadWebhookSecret + GetWorkloadByWebhookSecret) gone — they
minted URLs that would 404 against the new trigger-only ingress
- cmd/server/main.go: dropped staticsite.Manager, stack.Manager,
staticsite.HealthChecker, registry poller, SetSiteSyncTriggerer,
SetStaticSiteManager, SetStackManager, wireStaticBackend
- store/store.go: idempotent DROP TABLE IF EXISTS for every legacy
table (projects, stages, stage_env, volumes, deploys, deploy_logs,
poll_states, stacks, stack_revisions, stack_deploys, static_sites,
static_site_secrets); FK order children-then-parents
- store/models.go: dropped Project, Stage, Deploy, DeployLog, StageEnv,
Volume, StaticSite, StaticSiteSecret, Stack, StackRevision,
StackDeploy types; kept WorkloadKind constants as documented strings
- internal/store/helpers.go (new): BoolToInt, rowScanner,
GenerateWebhookSecret extracted from deleted CRUD files
- internal/api/secrets.go (new): forwards to store.GenerateWebhookSecret
so api + store paths share one secret-generation impl (no
panic-vs-UUID-fallback divergence)
- internal/reconciler/reconciler.go: dropped legacy stack-by-compose
+ static-site label paths; only canonical tinyforge.workload.id
dispatch remains
- providers (gitea_content/github_provider/gitlab_provider) gained
path-traversal rejection on every tree entry
- internal/webhook ParsedImage / ParseImageRef demoted to package-
private (no external callers)
Frontend
- /projects /stacks /sites /deploy routes deleted (entire trees)
- ProjectCard / InstanceCard / StaleContainerCard components deleted
- api.ts: dropped every project/stage/stack/site/deploy/instance
helper + types (Project, Stage, Stack, StaticSite, Deploy,
Instance, Volume, etc.); kept Workload, Container, App, Settings,
Registry, EventTrigger, LogScanRule, webhook envelopes
- WorkloadWebhook type + getWorkloadWebhook/regenerateWorkloadWebhook
api functions gone (mirror of the backend deletion above)
- web/src/routes/+layout.svelte: dropped /projects /sites /stacks
/deploy nav entries, trimmed quick-nav keymap
- web/src/routes/+page.svelte: dashboard rewrite — reads
listWorkloads + listContainers only; 4-card stat grid
(workloads/running/failed/stale) + recent workloads strip
- navCounts.ts, SystemHealthCard.svelte, ContainerLogs.svelte,
ContainerStats.svelte, StatusBadge.svelte, TagCombobox.svelte,
proxies/+page.svelte, containers/+page.svelte all rewired to the
workload-first surface
- AbortController plumbing on dashboard, nav-counts, stale page,
SystemHealthCard so navigation doesn't leave dangling fetches
- i18n: dropped projects.*, projectDetail.*, envEditor.*,
volumeEditor.*, volumeBrowser.*, quickDeploy.*, sites.*, stacks.*,
instance.*, confirm.* namespaces; en/ru parity preserved (1042
keys each)
Hardening from go-reviewer + security-reviewer + typescript-reviewer
subagent passes (0 CRITICAL across all three; 1 HIGH + ~12 MEDIUM
addressed inline before commit):
- Sec H1: dead-end workload webhook URL handlers (would mint URLs
that 404 the new trigger-only ingress) deleted across backend +
frontend
- Go M1: IsTerminalDeployStatus dropped (no production callers)
- Go M2: ParsedImage/ParseImageRef lowercased (in-package only)
- Go M6: generateWebhookSecret unified — api shim forwards to
store.GenerateWebhookSecret
- Doc/comment freshness: stage_id (no longer FK), ProxyRoute legacy
field names, workloadIDRow rationale, webhook_deliveries.target_type
enum, WebhookDeliveryLog component header
Doc
- WORKLOAD_REFACTOR_TODO: cutover marked DONE; all three Priority 1
items are now shipped. Next focus is Priority 3 polish (apps.* i18n
+ codemap entries) and Priority 4 tests.
Behavioral notes for operators upgrading from a pre-cutover build
- Existing rows in the dropped tables disappear on first boot.
- Legacy webhook URLs at /api/webhook/{secret} and
/api/webhook/sites/{secret} return 404; CI configs must repoint to
/api/webhook/triggers/{secret} (the trigger-split boot backfill
lifted any embedded workload secret onto a Trigger row, so the
secret value itself carries over).
- Frontend routes /projects /stacks /sites /deploy are gone; nav
links replaced with /apps and /triggers.
369 lines
12 KiB
Go
369 lines
12 KiB
Go
// Package reconciler keeps the normalized containers index in sync with the
|
|
// Docker daemon. It runs on a tick (and one-shot at boot) — for every
|
|
// Tinyforge-managed container in `docker ps`, it resolves a workload by the
|
|
// canonical workload-id label and writes a Container row through
|
|
// ReconcileContainer (which only touches Docker-derived fields on conflict,
|
|
// never deployer-owned columns like subdomain / proxy_route_id /
|
|
// npm_proxy_id / image_tag / stage_id). Rows whose Docker container ID is no
|
|
// longer present are flipped to state='missing'.
|
|
//
|
|
// Only the tinyforge.workload.id label is honored after the hard cutover —
|
|
// every Source plugin labels its containers with the workload identity at
|
|
// create time. The legacy tinyforge.static-site / compose-project paths
|
|
// were dropped along with the static_sites / stacks tables.
|
|
package reconciler
|
|
|
|
import (
|
|
"context"
|
|
"encoding/json"
|
|
"errors"
|
|
"log/slog"
|
|
"sync"
|
|
"time"
|
|
|
|
"github.com/alexei/tinyforge/internal/docker"
|
|
"github.com/alexei/tinyforge/internal/store"
|
|
"github.com/alexei/tinyforge/internal/workload/plugin"
|
|
)
|
|
|
|
// DockerLister is the subset of docker.Client the reconciler depends on.
|
|
// Defined here (where it's used) so tests can substitute a fake without
|
|
// pulling in the full docker package.
|
|
type DockerLister interface {
|
|
ListAllForReconciler(ctx context.Context) ([]docker.ReconcileItem, error)
|
|
}
|
|
|
|
// PluginReconciler is the optional dispatch surface for per-workload
|
|
// Source.Reconcile calls. Nil-safe — when unset, the reconciler skips
|
|
// the plugin pass and only refreshes the containers index from Docker.
|
|
type PluginReconciler interface {
|
|
DispatchReconcile(ctx context.Context, w plugin.Workload) error
|
|
}
|
|
|
|
// Reconciler is the background worker that syncs the containers index.
|
|
type Reconciler struct {
|
|
store *store.Store
|
|
docker DockerLister
|
|
interval time.Duration
|
|
plugins PluginReconciler // optional; nil disables the per-workload Source.Reconcile pass.
|
|
|
|
stop chan struct{}
|
|
cancel context.CancelFunc // populated in Start; invoked by Stop so an in-flight tick is unblocked.
|
|
wg sync.WaitGroup
|
|
}
|
|
|
|
// New constructs a Reconciler. interval is the tick period; values <=0 fall
|
|
// back to 30s. interval > 5m is clamped to 5m so a manual misconfiguration
|
|
// can't silently disable timely state updates.
|
|
func New(st *store.Store, dockerClient DockerLister, interval time.Duration) *Reconciler {
|
|
if interval <= 0 {
|
|
interval = 30 * time.Second
|
|
}
|
|
if interval > 5*time.Minute {
|
|
interval = 5 * time.Minute
|
|
}
|
|
return &Reconciler{
|
|
store: st,
|
|
docker: dockerClient,
|
|
interval: interval,
|
|
stop: make(chan struct{}),
|
|
}
|
|
}
|
|
|
|
// SetPluginReconciler injects the per-workload Source.Reconcile dispatch.
|
|
// Safe to call before or after Start; tick uses whatever's set at the
|
|
// time.
|
|
func (r *Reconciler) SetPluginReconciler(p PluginReconciler) { r.plugins = p }
|
|
|
|
// Start kicks off the background reconciliation loop. Runs one tick
|
|
// immediately so startup populates the index without waiting for the first
|
|
// timer fire. The provided context is wrapped with a child cancel func so
|
|
// Stop() can unblock an in-flight Docker call.
|
|
func (r *Reconciler) Start(ctx context.Context) {
|
|
ctx, cancel := context.WithCancel(ctx)
|
|
r.cancel = cancel
|
|
r.wg.Add(1)
|
|
go r.loop(ctx)
|
|
}
|
|
|
|
// Stop signals the loop to exit. Cancels the child context FIRST so any
|
|
// in-flight `docker ps` (which can hang on a stuck daemon) returns promptly,
|
|
// then waits for the goroutine to finish. Idempotent.
|
|
func (r *Reconciler) Stop() {
|
|
if r.cancel != nil {
|
|
r.cancel()
|
|
}
|
|
select {
|
|
case <-r.stop:
|
|
// already closed
|
|
default:
|
|
close(r.stop)
|
|
}
|
|
r.wg.Wait()
|
|
}
|
|
|
|
// ReconcileOnce runs a single reconciliation pass. Exposed for tests and for
|
|
// callers that want to force a sync after a known mutation (e.g., right after
|
|
// a deploy succeeds, before the next tick).
|
|
func (r *Reconciler) ReconcileOnce(ctx context.Context) error {
|
|
items, err := r.docker.ListAllForReconciler(ctx)
|
|
if err != nil {
|
|
return err
|
|
}
|
|
seen := make(map[string]struct{}, len(items)) // container row IDs we touched
|
|
|
|
for _, item := range items {
|
|
rowID := r.upsertFromItem(item)
|
|
if rowID != "" {
|
|
seen[rowID] = struct{}{}
|
|
}
|
|
}
|
|
|
|
r.markMissingRows(seen)
|
|
r.reconcilePluginWorkloads(ctx)
|
|
return nil
|
|
}
|
|
|
|
// reconcilePluginWorkloads iterates every workload row that has a
|
|
// Source plugin and asks the dispatcher to invoke Source.Reconcile.
|
|
// Failures are logged per-workload — one workload's broken state must
|
|
// not stop sweeping the rest.
|
|
//
|
|
// Trigger configuration is no longer required to reconcile: a workload
|
|
// with a Source but no trigger bindings is still a deployed thing whose
|
|
// container state must stay in sync (manual-only deploys are common
|
|
// during early setup). After the trigger-split refactor triggers live
|
|
// in their own table, so the only gate here is SourceKind.
|
|
//
|
|
// No-op when the plugin dispatcher hasn't been wired (boot-time race,
|
|
// disabled deployments, tests).
|
|
func (r *Reconciler) reconcilePluginWorkloads(ctx context.Context) {
|
|
if r.plugins == nil {
|
|
return
|
|
}
|
|
rows, err := r.store.ListWorkloads("")
|
|
if err != nil {
|
|
slog.Warn("reconciler: list workloads for plugin pass", "error", err)
|
|
return
|
|
}
|
|
for _, w := range rows {
|
|
if w.SourceKind == "" {
|
|
continue
|
|
}
|
|
pw := toPluginWorkload(w)
|
|
if err := r.plugins.DispatchReconcile(ctx, pw); err != nil {
|
|
slog.Warn("reconciler: plugin reconcile failed",
|
|
"workload", w.ID, "kind", w.SourceKind, "error", err)
|
|
}
|
|
}
|
|
}
|
|
|
|
// toPluginWorkload mirrors the api / webhook converters; kept local to
|
|
// avoid an import dependency between those packages.
|
|
func toPluginWorkload(w store.Workload) plugin.Workload {
|
|
var faces []plugin.PublicFace
|
|
if w.PublicFaces != "" {
|
|
_ = json.Unmarshal([]byte(w.PublicFaces), &faces)
|
|
}
|
|
return plugin.Workload{
|
|
ID: w.ID,
|
|
Name: w.Name,
|
|
GroupID: w.AppID,
|
|
ParentWorkloadID: w.ParentWorkloadID,
|
|
SourceKind: w.SourceKind,
|
|
SourceConfig: json.RawMessage(w.SourceConfig),
|
|
TriggerKind: w.TriggerKind,
|
|
TriggerConfig: json.RawMessage(w.TriggerConfig),
|
|
PublicFaces: faces,
|
|
NotificationURL: w.NotificationURL,
|
|
NotificationSecret: w.NotificationSecret,
|
|
WebhookSecret: w.WebhookSecret,
|
|
WebhookSigningSecret: w.WebhookSigningSecret,
|
|
WebhookRequireSignature: w.WebhookRequireSignature,
|
|
CreatedAt: w.CreatedAt,
|
|
UpdatedAt: w.UpdatedAt,
|
|
}
|
|
}
|
|
|
|
func (r *Reconciler) loop(ctx context.Context) {
|
|
defer r.wg.Done()
|
|
|
|
// Boot tick.
|
|
if err := r.ReconcileOnce(ctx); err != nil {
|
|
slog.Warn("reconciler: initial pass", "error", err)
|
|
}
|
|
|
|
ticker := time.NewTicker(r.interval)
|
|
defer ticker.Stop()
|
|
for {
|
|
select {
|
|
case <-ctx.Done():
|
|
return
|
|
case <-r.stop:
|
|
return
|
|
case <-ticker.C:
|
|
if err := r.ReconcileOnce(ctx); err != nil {
|
|
slog.Warn("reconciler: tick", "error", err)
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
// upsertFromItem dispatches one container to its workload and writes the
|
|
// Container row. Returns the row ID on success or "" if no dispatch matched.
|
|
// After the hard cutover only the canonical tinyforge.workload.id label
|
|
// path is honored — every Source plugin labels its containers with the
|
|
// workload identity at create time.
|
|
func (r *Reconciler) upsertFromItem(item docker.ReconcileItem) string {
|
|
if id := item.Labels[docker.LabelWorkloadID]; id != "" {
|
|
return r.upsertByWorkloadLabel(item, id)
|
|
}
|
|
return ""
|
|
}
|
|
|
|
// upsertByWorkloadLabel — canonical path. Project containers are owned by the
|
|
// deployer: the deployer pre-creates the row with a per-instance UUID and
|
|
// proxy/subdomain metadata. The reconciler resolves the existing row by
|
|
// docker container ID and only touches Docker-derived fields. If no existing
|
|
// row matches and the kind is project, we skip the upsert — inventing a
|
|
// deterministic-ID row would race with the deployer's UUID rows for stages
|
|
// with MaxInstances > 1, leaving ghost rows behind.
|
|
//
|
|
// Untrusted-label defense: a workload_id label that doesn't resolve to a
|
|
// known workload row is silently ignored. Anyone with Docker socket access
|
|
// could otherwise spawn a container with a forged label and steal the
|
|
// canonical slot for an existing workload.
|
|
func (r *Reconciler) upsertByWorkloadLabel(item docker.ReconcileItem, workloadID string) string {
|
|
w, err := r.store.GetWorkloadByID(workloadID)
|
|
if err != nil {
|
|
// Forged or stale label — log once at debug; tick rate keeps logs quiet.
|
|
slog.Debug("reconciler: unknown workload_id label", "workload_id", workloadID, "container_id", item.ID)
|
|
return ""
|
|
}
|
|
role := item.Labels[docker.LabelRole]
|
|
kind := item.Labels[docker.LabelWorkloadKind]
|
|
if kind != "" && kind != w.Kind {
|
|
slog.Warn("reconciler: workload kind mismatch", "label_kind", kind, "stored_kind", w.Kind, "workload_id", workloadID)
|
|
return ""
|
|
}
|
|
if kind == "" {
|
|
kind = w.Kind
|
|
}
|
|
|
|
// Resolve to existing row by Docker container ID.
|
|
existing, lookupErr := r.store.GetContainerByDockerID(item.ID)
|
|
if lookupErr == nil {
|
|
port := 0
|
|
if len(item.Ports) > 0 {
|
|
port = int(item.Ports[0])
|
|
}
|
|
if err := r.store.ReconcileContainer(store.Container{
|
|
ID: existing.ID,
|
|
WorkloadID: workloadID,
|
|
WorkloadKind: kind,
|
|
Role: role,
|
|
ContainerID: item.ID,
|
|
ImageRef: item.Image,
|
|
Host: "local",
|
|
State: normalizeState(item.State),
|
|
Port: port,
|
|
LastSeenAt: store.Now(),
|
|
}); err != nil {
|
|
slog.Warn("reconciler: reconcile by workload label", "container_id", item.ID, "error", err)
|
|
return ""
|
|
}
|
|
return existing.ID
|
|
}
|
|
if !errors.Is(lookupErr, store.ErrNotFound) {
|
|
slog.Warn("reconciler: lookup container by docker id", "container_id", item.ID, "error", lookupErr)
|
|
return ""
|
|
}
|
|
|
|
// No row yet. For project workloads, the deployer is the authoritative
|
|
// writer — wait for the deployer to create the row rather than
|
|
// inventing one with a deterministic key (which would collide with
|
|
// MaxInstances > 1 deploys).
|
|
if kind == string(store.WorkloadKindProject) {
|
|
return ""
|
|
}
|
|
|
|
// Site/stack reach this branch only when their plugin hasn't yet
|
|
// upserted the row (e.g. a boot tick that races the first deploy).
|
|
// The deterministic ID computed here matches what the static and
|
|
// compose plugins write in their state-save paths, so a subsequent
|
|
// plugin write upserts in place rather than creating a sibling row.
|
|
rowID := workloadIDRow(workloadID, kind, role, item.ID)
|
|
port := 0
|
|
if len(item.Ports) > 0 {
|
|
port = int(item.Ports[0])
|
|
}
|
|
if err := r.store.ReconcileContainer(store.Container{
|
|
ID: rowID,
|
|
WorkloadID: workloadID,
|
|
WorkloadKind: kind,
|
|
Role: role,
|
|
ContainerID: item.ID,
|
|
ImageRef: item.Image,
|
|
Host: "local",
|
|
State: normalizeState(item.State),
|
|
Port: port,
|
|
LastSeenAt: store.Now(),
|
|
}); err != nil {
|
|
slog.Warn("reconciler: reconcile by workload label (insert)", "container_id", item.ID, "error", err)
|
|
return ""
|
|
}
|
|
return rowID
|
|
}
|
|
|
|
// markMissingRows flips state to 'missing' for any container row whose Docker
|
|
// container ID was not seen in this pass. Uses ListMissingSweepRows to scan
|
|
// only rows that are bound to a real container and not already missing.
|
|
func (r *Reconciler) markMissingRows(seen map[string]struct{}) {
|
|
rows, err := r.store.ListMissingSweepRows()
|
|
if err != nil {
|
|
slog.Warn("reconciler: list rows for missing-sweep", "error", err)
|
|
return
|
|
}
|
|
for _, row := range rows {
|
|
if _, ok := seen[row.ID]; ok {
|
|
continue
|
|
}
|
|
if err := r.store.MarkContainerMissing(row.ID); err != nil {
|
|
slog.Warn("reconciler: mark missing", "row_id", row.ID, "error", err)
|
|
}
|
|
}
|
|
}
|
|
|
|
// workloadIDRow picks the row ID for a non-project workload-labelled
|
|
// container that has no existing row. Sites use `<workloadID>:site`
|
|
// (matches the static plugin's `containerRowID` helper). Stack
|
|
// services use `<workloadID>:<service-role>` (matches the compose
|
|
// plugin). Project rows are never invented here — the deployer
|
|
// pre-creates per-instance UUID rows so the reconciler must wait.
|
|
func workloadIDRow(workloadID, kind, role, containerID string) string {
|
|
if kind == string(store.WorkloadKindSite) {
|
|
return workloadID + ":site"
|
|
}
|
|
if role != "" {
|
|
return workloadID + ":" + role
|
|
}
|
|
return workloadID + ":" + containerID
|
|
}
|
|
|
|
// normalizeState maps Docker container states to our condensed set:
|
|
// running | stopped | failed | removing | missing.
|
|
func normalizeState(dockerState string) string {
|
|
switch dockerState {
|
|
case "running":
|
|
return "running"
|
|
case "exited", "dead", "stopped":
|
|
return "stopped"
|
|
case "created", "restarting", "paused":
|
|
return dockerState
|
|
case "removing":
|
|
return "removing"
|
|
default:
|
|
return dockerState
|
|
}
|
|
}
|