Files
tiny-forge/docs/plans/DEPLOY_STRATEGY_PLAN.md
T
alexei.dolgolyov e3d140c57a feat(deployer): configurable per-workload deploy strategy (blue-green for built sources)
Add a deploy_strategy field to each source's config blob — "" (default),
"recreate", or "blue-green" — validated in each source's Validate and read on
the deploy path. No new DB column, no migration: the field rides inside the
existing SourceConfig JSON and every existing workload decodes "" to its
historical behavior (image -> blue-green, others -> recreate).

The real gap this closes: dockerfile and static stopped the old container
before creating the new one on every redeploy — a downtime window image never
had. Their blue-green branch now:
- names the new "green" container with a unique suffix so it coexists with the
  still-serving blue (plumbed into both the container name AND the proxy
  forwardHost);
- skips the collision teardown that destroyed blue early;
- gates green — an HTTP readiness probe (deps.Health.Check) when a healthcheck
  is configured, else the existing liveness window;
- swaps the route via a pure upsert (no pre-DeleteRoute) so NPM repoints in
  place with no gap;
- persists green into the single runtime-state row BEFORE reaping blue, so a
  crash mid-swap can never orphan green or leave the row pointing at a removed
  container (state.go/teardown.go/reconcile.go stay untouched).

image honors explicit "recreate" (reap existing containers after pull, before
cutover); its default blue-green path is unchanged. compose stays
stack-managed and rejects "blue-green" at Validate so the contract is honest.
static forces recreate for storage-backed deno sites — blue-green would mount
the same RW volume into both containers at once.

Shared helper internal/workload/plugin/strategy.go (ValidateStrategy +
BuildGreenName). Backend-only (phase 1); the field is usable today via the
app's advanced-JSON editor — a friendly toggle + i18n follow in phase 2.
Tests: ValidateStrategy matrix, per-source Validate (incl. the empty-key
backward-compat lock), and effectiveStrategy defaults + the deno gate. Design
+ adversarial review: docs/plans/DEPLOY_STRATEGY_PLAN.md.
2026-06-19 16:51:20 +03:00

99 lines
6.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Configurable Deploy Strategy — Implementation Plan
**Status:** planned (workflow-designed + adversarially reviewed) · **Feature rank:** #3 · **Date:** 2026-06-19
## Problem
`image` does zero-downtime blue-green; `dockerfile` and `static` **stop+remove the old
container before creating the new one** on every redeploy (a real downtime window).
`compose` is stack-managed. Give operators a per-workload **deploy strategy** and bring
blue-green to the built-from-source sources.
## Design (chosen via a 3-proposal judge panel; "minimal" won, 9/10)
Per-source `deploy_strategy` field **inside each source's `SourceConfig` JSON blob**
**no new DB column, no migration, no `dispatch.go` change**. Values: `""` (back-compat
default), `"recreate"`, `"blue-green"`. Round-trips opaquely through
`plugin.WorkloadFromStore` / `SourceConfigOf[Config]`; validated in each source's existing
`Validate(json.RawMessage)` (runs on create **and** update at `workloads_plugin.go:291`).
**Per-source default (load-bearing):** a single shared default would silently flip
image's native blue-green to recreate, so each source has a tiny `effectiveStrategy`:
- `image`: `""`**blue-green**
- `dockerfile` / `static` / `compose`: `""`**recreate**
The blue-green branch for dockerfile/static uses a **transient two-container / single-row
swap** so `state.go`, `teardown.go`, and `reconcile.go` (which read one deterministic row)
stay **untouched** — the lowest-risk way to ship gap-free cutover.
## Review fixes folded in (adversarial pass)
1. **BLOCKER — ordering / crash-safety.** Blue-green order MUST be: create+start green →
readiness-gate green → `ConfigureRoute(green)` (upsert) → **`saveState(green)` into the
single row FIRST** → only THEN stop+remove blue (captured before saveState). The single
row must always point at a running container; reaping blue before persisting green
orphans green and makes the reconciler flip a healthy workload to `failed`.
2. **Unique green name is load-bearing.** dockerfile/static names are deterministic
(`tf-build-<name>-<id>` / `dw-site-<name>-<id>`) and double as the proxy `forwardHost`.
The green container needs a genuinely unique name (`…-<ms-hex>`, lifted from
`image.buildContainerName`) set in **both** `cc.Name` **and** the `ConfigureRoute`
`forwardHost`.
3. **Readiness, not liveness.** Before cutover, use `deps.Health.Check(ctx, http://<green>:
<port><healthcheck>)` when a healthcheck path is configured (dockerfile has `Healthcheck`);
fall back to the existing 3s liveness gate otherwise. Don't advertise "zero-downtime" on
the liveness-only path.
4. **Pure upsert.** Drop the pre-`DeleteRoute`; call only `ConfigureRoute` (upsert-by-FQDN
for NPM repoints in place; Traefik is label-driven). **Traefik caveat:** blue+green
briefly carry the same host-rule labels → momentary dual-serve; documented as a
Traefik-only phase-1 limitation (NPM, the common case, is gap-free).
5. **deno + storage → force recreate.** When `static` has `StorageEnabled && mode==deno`,
`effectiveStrategy` forces `recreate` — blue-green would mount the same RW named volume
into both containers (a concurrent-writer window recreate never had).
6. **image `recreate` gets its own shape.** Don't reuse `rollbackNew` (assumes blue
survives). image `recreate` = reap existing running containers **after** a successful
pull, then create green; on green failure the downtime is the accepted recreate
contract (logged distinctly, not as a non-disruptive rollback).
7. Image tag `:latest` shared by blue/green is **safe** — containers pin image-by-id at
create (no fix needed).
## Files (phase 1, backend-only)
- **NEW** `internal/workload/plugin/strategy.go` — `StrategyRecreate`/`StrategyBlueGreen`
consts, `ValidateStrategy(value string, allowBlueGreen bool) error`,
`BuildGreenName(name, id string, ts time.Time) string` (lifted unique-suffix scheme).
`+ strategy_test.go`.
- `image/image.go` — `DeployStrategy` on Config; `effectiveStrategy` (""→blue-green);
Validate; honor `recreate` (reap-after-pull + dedicated log).
- `dockerfile/dockerfile.go` (Config + Validate) + `dockerfile/deploy.go` (blue-green
branch, fixes 14) + `dockerfile/deploy_test.go`.
- `static/static.go` (Config + Validate) + `static/deploy.go` (blue-green branch + deno
gate, fixes 15) + `static/deploy_test.go`.
- `compose/compose.go` — Config field + Validate rejects `blue-green` (allowBlueGreen=false)
+ test.
## Phase 1 backward-compat lock (mandatory, unit-tested)
`ValidateStrategy("", …)` returns nil; every `effectiveStrategy("")` returns the source's
historical default. Existing rows (no `deploy_strategy` key) decode `""` → today's exact
behavior, byte-for-byte.
## Later phases (deferred)
- **P2 (UI):** `sourceForms.ts` seed/serialize + `/apps/new` & `/apps/[id]` select +
en/ru i18n (hide blue-green for compose).
- **P3 (harden):** mandatory HTTP readiness probe for static; connection draining before
blue removal; Traefik label suppression at cutover.
- **P4 (architecture):** extract image's proven sequence into a shared
`plugin.DeploySingleContainer`; migrate dockerfile/static to the multi-row model
(crash-safe mid-swap; unlocks `MaxInstances>1`).
- **P5:** true `rolling` (needs a backend-pool primitive on `proxy.Provider`) + compose
green-project blue-green.
## Test plan
Table-driven, TDD: `ValidateStrategy` accept/reject matrix (incl. `allowBlueGreen=false`,
reserved `rolling` rejected, `""` accepted); per-source `effectiveStrategy` defaults +
deno-storage→recreate; dockerfile/static blue-green deploy tests asserting (a) green named
≠ deterministic name, (b) collision teardown NOT run, (c) `ConfigureRoute` called with
`forwardHost==green` and NO preceding `DeleteRoute`, (d) `saveState(green)` **before**
`RemoveContainer(blue)`, (e) single row ends at green; failure path: green fails gate →
green removed, blue + route untouched; compose rejects blue-green. Gates: `go build`,
`go vet`, `go test ./internal/...`, `npm run check/test`, `./scripts/dev-server.sh`.