Add a deploy_strategy field to each source's config blob — "" (default), "recreate", or "blue-green" — validated in each source's Validate and read on the deploy path. No new DB column, no migration: the field rides inside the existing SourceConfig JSON and every existing workload decodes "" to its historical behavior (image -> blue-green, others -> recreate). The real gap this closes: dockerfile and static stopped the old container before creating the new one on every redeploy — a downtime window image never had. Their blue-green branch now: - names the new "green" container with a unique suffix so it coexists with the still-serving blue (plumbed into both the container name AND the proxy forwardHost); - skips the collision teardown that destroyed blue early; - gates green — an HTTP readiness probe (deps.Health.Check) when a healthcheck is configured, else the existing liveness window; - swaps the route via a pure upsert (no pre-DeleteRoute) so NPM repoints in place with no gap; - persists green into the single runtime-state row BEFORE reaping blue, so a crash mid-swap can never orphan green or leave the row pointing at a removed container (state.go/teardown.go/reconcile.go stay untouched). image honors explicit "recreate" (reap existing containers after pull, before cutover); its default blue-green path is unchanged. compose stays stack-managed and rejects "blue-green" at Validate so the contract is honest. static forces recreate for storage-backed deno sites — blue-green would mount the same RW volume into both containers at once. Shared helper internal/workload/plugin/strategy.go (ValidateStrategy + BuildGreenName). Backend-only (phase 1); the field is usable today via the app's advanced-JSON editor — a friendly toggle + i18n follow in phase 2. Tests: ValidateStrategy matrix, per-source Validate (incl. the empty-key backward-compat lock), and effectiveStrategy defaults + the deno gate. Design + adversarial review: docs/plans/DEPLOY_STRATEGY_PLAN.md.
6.0 KiB
Configurable Deploy Strategy — Implementation Plan
Status: planned (workflow-designed + adversarially reviewed) · Feature rank: #3 · Date: 2026-06-19
Problem
image does zero-downtime blue-green; dockerfile and static stop+remove the old
container before creating the new one on every redeploy (a real downtime window).
compose is stack-managed. Give operators a per-workload deploy strategy and bring
blue-green to the built-from-source sources.
Design (chosen via a 3-proposal judge panel; "minimal" won, 9/10)
Per-source deploy_strategy field inside each source's SourceConfig JSON blob —
no new DB column, no migration, no dispatch.go change. Values: "" (back-compat
default), "recreate", "blue-green". Round-trips opaquely through
plugin.WorkloadFromStore / SourceConfigOf[Config]; validated in each source's existing
Validate(json.RawMessage) (runs on create and update at workloads_plugin.go:291).
Per-source default (load-bearing): a single shared default would silently flip
image's native blue-green to recreate, so each source has a tiny effectiveStrategy:
image:""→ blue-greendockerfile/static/compose:""→ recreate
The blue-green branch for dockerfile/static uses a transient two-container / single-row
swap so state.go, teardown.go, and reconcile.go (which read one deterministic row)
stay untouched — the lowest-risk way to ship gap-free cutover.
Review fixes folded in (adversarial pass)
- BLOCKER — ordering / crash-safety. Blue-green order MUST be: create+start green →
readiness-gate green →
ConfigureRoute(green)(upsert) →saveState(green)into the single row FIRST → only THEN stop+remove blue (captured before saveState). The single row must always point at a running container; reaping blue before persisting green orphans green and makes the reconciler flip a healthy workload tofailed. - Unique green name is load-bearing. dockerfile/static names are deterministic
(
tf-build-<name>-<id>/dw-site-<name>-<id>) and double as the proxyforwardHost. The green container needs a genuinely unique name (…-<ms-hex>, lifted fromimage.buildContainerName) set in bothcc.Nameand theConfigureRouteforwardHost. - Readiness, not liveness. Before cutover, use
deps.Health.Check(ctx, http://<green>: <port><healthcheck>)when a healthcheck path is configured (dockerfile hasHealthcheck); fall back to the existing 3s liveness gate otherwise. Don't advertise "zero-downtime" on the liveness-only path. - Pure upsert. Drop the pre-
DeleteRoute; call onlyConfigureRoute(upsert-by-FQDN for NPM repoints in place; Traefik is label-driven). Traefik caveat: blue+green briefly carry the same host-rule labels → momentary dual-serve; documented as a Traefik-only phase-1 limitation (NPM, the common case, is gap-free). - deno + storage → force recreate. When
statichasStorageEnabled && mode==deno,effectiveStrategyforcesrecreate— blue-green would mount the same RW named volume into both containers (a concurrent-writer window recreate never had). - image
recreategets its own shape. Don't reuserollbackNew(assumes blue survives). imagerecreate= reap existing running containers after a successful pull, then create green; on green failure the downtime is the accepted recreate contract (logged distinctly, not as a non-disruptive rollback). - Image tag
:latestshared by blue/green is safe — containers pin image-by-id at create (no fix needed).
Files (phase 1, backend-only)
- NEW
internal/workload/plugin/strategy.go—StrategyRecreate/StrategyBlueGreenconsts,ValidateStrategy(value string, allowBlueGreen bool) error,BuildGreenName(name, id string, ts time.Time) string(lifted unique-suffix scheme).+ strategy_test.go. image/image.go—DeployStrategyon Config;effectiveStrategy(""→blue-green); Validate; honorrecreate(reap-after-pull + dedicated log).dockerfile/dockerfile.go(Config + Validate) +dockerfile/deploy.go(blue-green branch, fixes 1–4) +dockerfile/deploy_test.go.static/static.go(Config + Validate) +static/deploy.go(blue-green branch + deno gate, fixes 1–5) +static/deploy_test.go.compose/compose.go— Config field + Validate rejectsblue-green(allowBlueGreen=false)- test.
Phase 1 backward-compat lock (mandatory, unit-tested)
ValidateStrategy("", …) returns nil; every effectiveStrategy("") returns the source's
historical default. Existing rows (no deploy_strategy key) decode "" → today's exact
behavior, byte-for-byte.
Later phases (deferred)
- P2 (UI):
sourceForms.tsseed/serialize +/apps/new&/apps/[id]select + en/ru i18n (hide blue-green for compose). - P3 (harden): mandatory HTTP readiness probe for static; connection draining before blue removal; Traefik label suppression at cutover.
- P4 (architecture): extract image's proven sequence into a shared
plugin.DeploySingleContainer; migrate dockerfile/static to the multi-row model (crash-safe mid-swap; unlocksMaxInstances>1). - P5: true
rolling(needs a backend-pool primitive onproxy.Provider) + compose green-project blue-green.
Test plan
Table-driven, TDD: ValidateStrategy accept/reject matrix (incl. allowBlueGreen=false,
reserved rolling rejected, "" accepted); per-source effectiveStrategy defaults +
deno-storage→recreate; dockerfile/static blue-green deploy tests asserting (a) green named
≠ deterministic name, (b) collision teardown NOT run, (c) ConfigureRoute called with
forwardHost==green and NO preceding DeleteRoute, (d) saveState(green) before
RemoveContainer(blue), (e) single row ends at green; failure path: green fails gate →
green removed, blue + route untouched; compose rejects blue-green. Gates: go build,
go vet, go test ./internal/..., npm run check/test, ./scripts/dev-server.sh.