Files
tiny-forge/internal/reconciler/reconciler.go
T
alexei.dolgolyov 739b67856a
Build / build (push) Successful in 10m39s
feat(cutover): hard legacy cutover — drop projects/stacks/sites/deploys
The clean-break delete that closes the workload-first refactor arc.
Net diff: ~30 backend files deleted, ~20 modified, ~12k LOC removed
on the Go side; entire /projects /stacks /sites /deploy frontend
trees gone; ~6.7k LOC removed on the Svelte/TypeScript side.

Backend
- API handlers gone: internal/api/{projects,stages,stage_env,stacks,
  static_sites,deploys,instances,volume_browser}.go
- Store CRUD + tests gone: internal/store/{projects,stages,stage_env,
  stacks,static_sites,static_site_secrets,deploys,poll_state,volumes,
  workload_sync}.go (+ _test.go siblings)
- Legacy deployer pipeline gone: internal/deployer/{bluegreen,promote,
  rollback,subdomain,resolver_test}.go; deployer.go trimmed to just the
  dispatch surface used by the plugin pipeline
- internal/staticsite/{manager,healthcheck}.go and
  internal/stack/manager.go gone (the rest of those packages stay as
  helpers imported by the static + compose plugins)
- internal/registry/poller.go gone (legacy registry poller)
- internal/volume.ResolvePath gone; ResolveWorkloadPath stays
- internal/webhook: handleWebhook (project) + handleSiteWebhook (site)
  gone; only POST /api/webhook/triggers/{secret} remains
- workload-side webhook URL handlers (getWorkloadWebhook +
  regenerateWorkloadWebhook + EnsureWorkloadWebhookSecret +
  SetWorkloadWebhookSecret + GetWorkloadByWebhookSecret) gone — they
  minted URLs that would 404 against the new trigger-only ingress
- cmd/server/main.go: dropped staticsite.Manager, stack.Manager,
  staticsite.HealthChecker, registry poller, SetSiteSyncTriggerer,
  SetStaticSiteManager, SetStackManager, wireStaticBackend
- store/store.go: idempotent DROP TABLE IF EXISTS for every legacy
  table (projects, stages, stage_env, volumes, deploys, deploy_logs,
  poll_states, stacks, stack_revisions, stack_deploys, static_sites,
  static_site_secrets); FK order children-then-parents
- store/models.go: dropped Project, Stage, Deploy, DeployLog, StageEnv,
  Volume, StaticSite, StaticSiteSecret, Stack, StackRevision,
  StackDeploy types; kept WorkloadKind constants as documented strings
- internal/store/helpers.go (new): BoolToInt, rowScanner,
  GenerateWebhookSecret extracted from deleted CRUD files
- internal/api/secrets.go (new): forwards to store.GenerateWebhookSecret
  so api + store paths share one secret-generation impl (no
  panic-vs-UUID-fallback divergence)
- internal/reconciler/reconciler.go: dropped legacy stack-by-compose
  + static-site label paths; only canonical tinyforge.workload.id
  dispatch remains
- providers (gitea_content/github_provider/gitlab_provider) gained
  path-traversal rejection on every tree entry
- internal/webhook ParsedImage / ParseImageRef demoted to package-
  private (no external callers)

Frontend
- /projects /stacks /sites /deploy routes deleted (entire trees)
- ProjectCard / InstanceCard / StaleContainerCard components deleted
- api.ts: dropped every project/stage/stack/site/deploy/instance
  helper + types (Project, Stage, Stack, StaticSite, Deploy,
  Instance, Volume, etc.); kept Workload, Container, App, Settings,
  Registry, EventTrigger, LogScanRule, webhook envelopes
- WorkloadWebhook type + getWorkloadWebhook/regenerateWorkloadWebhook
  api functions gone (mirror of the backend deletion above)
- web/src/routes/+layout.svelte: dropped /projects /sites /stacks
  /deploy nav entries, trimmed quick-nav keymap
- web/src/routes/+page.svelte: dashboard rewrite — reads
  listWorkloads + listContainers only; 4-card stat grid
  (workloads/running/failed/stale) + recent workloads strip
- navCounts.ts, SystemHealthCard.svelte, ContainerLogs.svelte,
  ContainerStats.svelte, StatusBadge.svelte, TagCombobox.svelte,
  proxies/+page.svelte, containers/+page.svelte all rewired to the
  workload-first surface
- AbortController plumbing on dashboard, nav-counts, stale page,
  SystemHealthCard so navigation doesn't leave dangling fetches
- i18n: dropped projects.*, projectDetail.*, envEditor.*,
  volumeEditor.*, volumeBrowser.*, quickDeploy.*, sites.*, stacks.*,
  instance.*, confirm.* namespaces; en/ru parity preserved (1042
  keys each)

Hardening from go-reviewer + security-reviewer + typescript-reviewer
subagent passes (0 CRITICAL across all three; 1 HIGH + ~12 MEDIUM
addressed inline before commit):

- Sec H1: dead-end workload webhook URL handlers (would mint URLs
  that 404 the new trigger-only ingress) deleted across backend +
  frontend
- Go M1: IsTerminalDeployStatus dropped (no production callers)
- Go M2: ParsedImage/ParseImageRef lowercased (in-package only)
- Go M6: generateWebhookSecret unified — api shim forwards to
  store.GenerateWebhookSecret
- Doc/comment freshness: stage_id (no longer FK), ProxyRoute legacy
  field names, workloadIDRow rationale, webhook_deliveries.target_type
  enum, WebhookDeliveryLog component header

Doc
- WORKLOAD_REFACTOR_TODO: cutover marked DONE; all three Priority 1
  items are now shipped. Next focus is Priority 3 polish (apps.* i18n
  + codemap entries) and Priority 4 tests.

Behavioral notes for operators upgrading from a pre-cutover build
- Existing rows in the dropped tables disappear on first boot.
- Legacy webhook URLs at /api/webhook/{secret} and
  /api/webhook/sites/{secret} return 404; CI configs must repoint to
  /api/webhook/triggers/{secret} (the trigger-split boot backfill
  lifted any embedded workload secret onto a Trigger row, so the
  secret value itself carries over).
- Frontend routes /projects /stacks /sites /deploy are gone; nav
  links replaced with /apps and /triggers.
2026-05-16 06:00:21 +03:00

369 lines
12 KiB
Go

// Package reconciler keeps the normalized containers index in sync with the
// Docker daemon. It runs on a tick (and one-shot at boot) — for every
// Tinyforge-managed container in `docker ps`, it resolves a workload by the
// canonical workload-id label and writes a Container row through
// ReconcileContainer (which only touches Docker-derived fields on conflict,
// never deployer-owned columns like subdomain / proxy_route_id /
// npm_proxy_id / image_tag / stage_id). Rows whose Docker container ID is no
// longer present are flipped to state='missing'.
//
// Only the tinyforge.workload.id label is honored after the hard cutover —
// every Source plugin labels its containers with the workload identity at
// create time. The legacy tinyforge.static-site / compose-project paths
// were dropped along with the static_sites / stacks tables.
package reconciler
import (
"context"
"encoding/json"
"errors"
"log/slog"
"sync"
"time"
"github.com/alexei/tinyforge/internal/docker"
"github.com/alexei/tinyforge/internal/store"
"github.com/alexei/tinyforge/internal/workload/plugin"
)
// DockerLister is the subset of docker.Client the reconciler depends on.
// Defined here (where it's used) so tests can substitute a fake without
// pulling in the full docker package.
type DockerLister interface {
ListAllForReconciler(ctx context.Context) ([]docker.ReconcileItem, error)
}
// PluginReconciler is the optional dispatch surface for per-workload
// Source.Reconcile calls. Nil-safe — when unset, the reconciler skips
// the plugin pass and only refreshes the containers index from Docker.
type PluginReconciler interface {
DispatchReconcile(ctx context.Context, w plugin.Workload) error
}
// Reconciler is the background worker that syncs the containers index.
type Reconciler struct {
store *store.Store
docker DockerLister
interval time.Duration
plugins PluginReconciler // optional; nil disables the per-workload Source.Reconcile pass.
stop chan struct{}
cancel context.CancelFunc // populated in Start; invoked by Stop so an in-flight tick is unblocked.
wg sync.WaitGroup
}
// New constructs a Reconciler. interval is the tick period; values <=0 fall
// back to 30s. interval > 5m is clamped to 5m so a manual misconfiguration
// can't silently disable timely state updates.
func New(st *store.Store, dockerClient DockerLister, interval time.Duration) *Reconciler {
if interval <= 0 {
interval = 30 * time.Second
}
if interval > 5*time.Minute {
interval = 5 * time.Minute
}
return &Reconciler{
store: st,
docker: dockerClient,
interval: interval,
stop: make(chan struct{}),
}
}
// SetPluginReconciler injects the per-workload Source.Reconcile dispatch.
// Safe to call before or after Start; tick uses whatever's set at the
// time.
func (r *Reconciler) SetPluginReconciler(p PluginReconciler) { r.plugins = p }
// Start kicks off the background reconciliation loop. Runs one tick
// immediately so startup populates the index without waiting for the first
// timer fire. The provided context is wrapped with a child cancel func so
// Stop() can unblock an in-flight Docker call.
func (r *Reconciler) Start(ctx context.Context) {
ctx, cancel := context.WithCancel(ctx)
r.cancel = cancel
r.wg.Add(1)
go r.loop(ctx)
}
// Stop signals the loop to exit. Cancels the child context FIRST so any
// in-flight `docker ps` (which can hang on a stuck daemon) returns promptly,
// then waits for the goroutine to finish. Idempotent.
func (r *Reconciler) Stop() {
if r.cancel != nil {
r.cancel()
}
select {
case <-r.stop:
// already closed
default:
close(r.stop)
}
r.wg.Wait()
}
// ReconcileOnce runs a single reconciliation pass. Exposed for tests and for
// callers that want to force a sync after a known mutation (e.g., right after
// a deploy succeeds, before the next tick).
func (r *Reconciler) ReconcileOnce(ctx context.Context) error {
items, err := r.docker.ListAllForReconciler(ctx)
if err != nil {
return err
}
seen := make(map[string]struct{}, len(items)) // container row IDs we touched
for _, item := range items {
rowID := r.upsertFromItem(item)
if rowID != "" {
seen[rowID] = struct{}{}
}
}
r.markMissingRows(seen)
r.reconcilePluginWorkloads(ctx)
return nil
}
// reconcilePluginWorkloads iterates every workload row that has a
// Source plugin and asks the dispatcher to invoke Source.Reconcile.
// Failures are logged per-workload — one workload's broken state must
// not stop sweeping the rest.
//
// Trigger configuration is no longer required to reconcile: a workload
// with a Source but no trigger bindings is still a deployed thing whose
// container state must stay in sync (manual-only deploys are common
// during early setup). After the trigger-split refactor triggers live
// in their own table, so the only gate here is SourceKind.
//
// No-op when the plugin dispatcher hasn't been wired (boot-time race,
// disabled deployments, tests).
func (r *Reconciler) reconcilePluginWorkloads(ctx context.Context) {
if r.plugins == nil {
return
}
rows, err := r.store.ListWorkloads("")
if err != nil {
slog.Warn("reconciler: list workloads for plugin pass", "error", err)
return
}
for _, w := range rows {
if w.SourceKind == "" {
continue
}
pw := toPluginWorkload(w)
if err := r.plugins.DispatchReconcile(ctx, pw); err != nil {
slog.Warn("reconciler: plugin reconcile failed",
"workload", w.ID, "kind", w.SourceKind, "error", err)
}
}
}
// toPluginWorkload mirrors the api / webhook converters; kept local to
// avoid an import dependency between those packages.
func toPluginWorkload(w store.Workload) plugin.Workload {
var faces []plugin.PublicFace
if w.PublicFaces != "" {
_ = json.Unmarshal([]byte(w.PublicFaces), &faces)
}
return plugin.Workload{
ID: w.ID,
Name: w.Name,
GroupID: w.AppID,
ParentWorkloadID: w.ParentWorkloadID,
SourceKind: w.SourceKind,
SourceConfig: json.RawMessage(w.SourceConfig),
TriggerKind: w.TriggerKind,
TriggerConfig: json.RawMessage(w.TriggerConfig),
PublicFaces: faces,
NotificationURL: w.NotificationURL,
NotificationSecret: w.NotificationSecret,
WebhookSecret: w.WebhookSecret,
WebhookSigningSecret: w.WebhookSigningSecret,
WebhookRequireSignature: w.WebhookRequireSignature,
CreatedAt: w.CreatedAt,
UpdatedAt: w.UpdatedAt,
}
}
func (r *Reconciler) loop(ctx context.Context) {
defer r.wg.Done()
// Boot tick.
if err := r.ReconcileOnce(ctx); err != nil {
slog.Warn("reconciler: initial pass", "error", err)
}
ticker := time.NewTicker(r.interval)
defer ticker.Stop()
for {
select {
case <-ctx.Done():
return
case <-r.stop:
return
case <-ticker.C:
if err := r.ReconcileOnce(ctx); err != nil {
slog.Warn("reconciler: tick", "error", err)
}
}
}
}
// upsertFromItem dispatches one container to its workload and writes the
// Container row. Returns the row ID on success or "" if no dispatch matched.
// After the hard cutover only the canonical tinyforge.workload.id label
// path is honored — every Source plugin labels its containers with the
// workload identity at create time.
func (r *Reconciler) upsertFromItem(item docker.ReconcileItem) string {
if id := item.Labels[docker.LabelWorkloadID]; id != "" {
return r.upsertByWorkloadLabel(item, id)
}
return ""
}
// upsertByWorkloadLabel — canonical path. Project containers are owned by the
// deployer: the deployer pre-creates the row with a per-instance UUID and
// proxy/subdomain metadata. The reconciler resolves the existing row by
// docker container ID and only touches Docker-derived fields. If no existing
// row matches and the kind is project, we skip the upsert — inventing a
// deterministic-ID row would race with the deployer's UUID rows for stages
// with MaxInstances > 1, leaving ghost rows behind.
//
// Untrusted-label defense: a workload_id label that doesn't resolve to a
// known workload row is silently ignored. Anyone with Docker socket access
// could otherwise spawn a container with a forged label and steal the
// canonical slot for an existing workload.
func (r *Reconciler) upsertByWorkloadLabel(item docker.ReconcileItem, workloadID string) string {
w, err := r.store.GetWorkloadByID(workloadID)
if err != nil {
// Forged or stale label — log once at debug; tick rate keeps logs quiet.
slog.Debug("reconciler: unknown workload_id label", "workload_id", workloadID, "container_id", item.ID)
return ""
}
role := item.Labels[docker.LabelRole]
kind := item.Labels[docker.LabelWorkloadKind]
if kind != "" && kind != w.Kind {
slog.Warn("reconciler: workload kind mismatch", "label_kind", kind, "stored_kind", w.Kind, "workload_id", workloadID)
return ""
}
if kind == "" {
kind = w.Kind
}
// Resolve to existing row by Docker container ID.
existing, lookupErr := r.store.GetContainerByDockerID(item.ID)
if lookupErr == nil {
port := 0
if len(item.Ports) > 0 {
port = int(item.Ports[0])
}
if err := r.store.ReconcileContainer(store.Container{
ID: existing.ID,
WorkloadID: workloadID,
WorkloadKind: kind,
Role: role,
ContainerID: item.ID,
ImageRef: item.Image,
Host: "local",
State: normalizeState(item.State),
Port: port,
LastSeenAt: store.Now(),
}); err != nil {
slog.Warn("reconciler: reconcile by workload label", "container_id", item.ID, "error", err)
return ""
}
return existing.ID
}
if !errors.Is(lookupErr, store.ErrNotFound) {
slog.Warn("reconciler: lookup container by docker id", "container_id", item.ID, "error", lookupErr)
return ""
}
// No row yet. For project workloads, the deployer is the authoritative
// writer — wait for the deployer to create the row rather than
// inventing one with a deterministic key (which would collide with
// MaxInstances > 1 deploys).
if kind == string(store.WorkloadKindProject) {
return ""
}
// Site/stack reach this branch only when their plugin hasn't yet
// upserted the row (e.g. a boot tick that races the first deploy).
// The deterministic ID computed here matches what the static and
// compose plugins write in their state-save paths, so a subsequent
// plugin write upserts in place rather than creating a sibling row.
rowID := workloadIDRow(workloadID, kind, role, item.ID)
port := 0
if len(item.Ports) > 0 {
port = int(item.Ports[0])
}
if err := r.store.ReconcileContainer(store.Container{
ID: rowID,
WorkloadID: workloadID,
WorkloadKind: kind,
Role: role,
ContainerID: item.ID,
ImageRef: item.Image,
Host: "local",
State: normalizeState(item.State),
Port: port,
LastSeenAt: store.Now(),
}); err != nil {
slog.Warn("reconciler: reconcile by workload label (insert)", "container_id", item.ID, "error", err)
return ""
}
return rowID
}
// markMissingRows flips state to 'missing' for any container row whose Docker
// container ID was not seen in this pass. Uses ListMissingSweepRows to scan
// only rows that are bound to a real container and not already missing.
func (r *Reconciler) markMissingRows(seen map[string]struct{}) {
rows, err := r.store.ListMissingSweepRows()
if err != nil {
slog.Warn("reconciler: list rows for missing-sweep", "error", err)
return
}
for _, row := range rows {
if _, ok := seen[row.ID]; ok {
continue
}
if err := r.store.MarkContainerMissing(row.ID); err != nil {
slog.Warn("reconciler: mark missing", "row_id", row.ID, "error", err)
}
}
}
// workloadIDRow picks the row ID for a non-project workload-labelled
// container that has no existing row. Sites use `<workloadID>:site`
// (matches the static plugin's `containerRowID` helper). Stack
// services use `<workloadID>:<service-role>` (matches the compose
// plugin). Project rows are never invented here — the deployer
// pre-creates per-instance UUID rows so the reconciler must wait.
func workloadIDRow(workloadID, kind, role, containerID string) string {
if kind == string(store.WorkloadKindSite) {
return workloadID + ":site"
}
if role != "" {
return workloadID + ":" + role
}
return workloadID + ":" + containerID
}
// normalizeState maps Docker container states to our condensed set:
// running | stopped | failed | removing | missing.
func normalizeState(dockerState string) string {
switch dockerState {
case "running":
return "running"
case "exited", "dead", "stopped":
return "stopped"
case "created", "restarting", "paused":
return dockerState
case "removing":
return "removing"
default:
return dockerState
}
}