feat(deployer): configurable per-workload deploy strategy (blue-green for built sources)

Add a deploy_strategy field to each source's config blob — "" (default),
"recreate", or "blue-green" — validated in each source's Validate and read on
the deploy path. No new DB column, no migration: the field rides inside the
existing SourceConfig JSON and every existing workload decodes "" to its
historical behavior (image -> blue-green, others -> recreate).

The real gap this closes: dockerfile and static stopped the old container
before creating the new one on every redeploy — a downtime window image never
had. Their blue-green branch now:
- names the new "green" container with a unique suffix so it coexists with the
  still-serving blue (plumbed into both the container name AND the proxy
  forwardHost);
- skips the collision teardown that destroyed blue early;
- gates green — an HTTP readiness probe (deps.Health.Check) when a healthcheck
  is configured, else the existing liveness window;
- swaps the route via a pure upsert (no pre-DeleteRoute) so NPM repoints in
  place with no gap;
- persists green into the single runtime-state row BEFORE reaping blue, so a
  crash mid-swap can never orphan green or leave the row pointing at a removed
  container (state.go/teardown.go/reconcile.go stay untouched).

image honors explicit "recreate" (reap existing containers after pull, before
cutover); its default blue-green path is unchanged. compose stays
stack-managed and rejects "blue-green" at Validate so the contract is honest.
static forces recreate for storage-backed deno sites — blue-green would mount
the same RW volume into both containers at once.

Shared helper internal/workload/plugin/strategy.go (ValidateStrategy +
BuildGreenName). Backend-only (phase 1); the field is usable today via the
app's advanced-JSON editor — a friendly toggle + i18n follow in phase 2.
Tests: ValidateStrategy matrix, per-source Validate (incl. the empty-key
backward-compat lock), and effectiveStrategy defaults + the deno gate. Design
+ adversarial review: docs/plans/DEPLOY_STRATEGY_PLAN.md.
This commit is contained in:
2026-06-19 16:51:20 +03:00
parent 0c4c338bfe
commit e3d140c57a
13 changed files with 592 additions and 12 deletions
+98
View File
@@ -0,0 +1,98 @@
# Configurable Deploy Strategy — Implementation Plan
**Status:** planned (workflow-designed + adversarially reviewed) · **Feature rank:** #3 · **Date:** 2026-06-19
## Problem
`image` does zero-downtime blue-green; `dockerfile` and `static` **stop+remove the old
container before creating the new one** on every redeploy (a real downtime window).
`compose` is stack-managed. Give operators a per-workload **deploy strategy** and bring
blue-green to the built-from-source sources.
## Design (chosen via a 3-proposal judge panel; "minimal" won, 9/10)
Per-source `deploy_strategy` field **inside each source's `SourceConfig` JSON blob**
**no new DB column, no migration, no `dispatch.go` change**. Values: `""` (back-compat
default), `"recreate"`, `"blue-green"`. Round-trips opaquely through
`plugin.WorkloadFromStore` / `SourceConfigOf[Config]`; validated in each source's existing
`Validate(json.RawMessage)` (runs on create **and** update at `workloads_plugin.go:291`).
**Per-source default (load-bearing):** a single shared default would silently flip
image's native blue-green to recreate, so each source has a tiny `effectiveStrategy`:
- `image`: `""`**blue-green**
- `dockerfile` / `static` / `compose`: `""`**recreate**
The blue-green branch for dockerfile/static uses a **transient two-container / single-row
swap** so `state.go`, `teardown.go`, and `reconcile.go` (which read one deterministic row)
stay **untouched** — the lowest-risk way to ship gap-free cutover.
## Review fixes folded in (adversarial pass)
1. **BLOCKER — ordering / crash-safety.** Blue-green order MUST be: create+start green →
readiness-gate green → `ConfigureRoute(green)` (upsert) → **`saveState(green)` into the
single row FIRST** → only THEN stop+remove blue (captured before saveState). The single
row must always point at a running container; reaping blue before persisting green
orphans green and makes the reconciler flip a healthy workload to `failed`.
2. **Unique green name is load-bearing.** dockerfile/static names are deterministic
(`tf-build-<name>-<id>` / `dw-site-<name>-<id>`) and double as the proxy `forwardHost`.
The green container needs a genuinely unique name (`…-<ms-hex>`, lifted from
`image.buildContainerName`) set in **both** `cc.Name` **and** the `ConfigureRoute`
`forwardHost`.
3. **Readiness, not liveness.** Before cutover, use `deps.Health.Check(ctx, http://<green>:
<port><healthcheck>)` when a healthcheck path is configured (dockerfile has `Healthcheck`);
fall back to the existing 3s liveness gate otherwise. Don't advertise "zero-downtime" on
the liveness-only path.
4. **Pure upsert.** Drop the pre-`DeleteRoute`; call only `ConfigureRoute` (upsert-by-FQDN
for NPM repoints in place; Traefik is label-driven). **Traefik caveat:** blue+green
briefly carry the same host-rule labels → momentary dual-serve; documented as a
Traefik-only phase-1 limitation (NPM, the common case, is gap-free).
5. **deno + storage → force recreate.** When `static` has `StorageEnabled && mode==deno`,
`effectiveStrategy` forces `recreate` — blue-green would mount the same RW named volume
into both containers (a concurrent-writer window recreate never had).
6. **image `recreate` gets its own shape.** Don't reuse `rollbackNew` (assumes blue
survives). image `recreate` = reap existing running containers **after** a successful
pull, then create green; on green failure the downtime is the accepted recreate
contract (logged distinctly, not as a non-disruptive rollback).
7. Image tag `:latest` shared by blue/green is **safe** — containers pin image-by-id at
create (no fix needed).
## Files (phase 1, backend-only)
- **NEW** `internal/workload/plugin/strategy.go` — `StrategyRecreate`/`StrategyBlueGreen`
consts, `ValidateStrategy(value string, allowBlueGreen bool) error`,
`BuildGreenName(name, id string, ts time.Time) string` (lifted unique-suffix scheme).
`+ strategy_test.go`.
- `image/image.go` — `DeployStrategy` on Config; `effectiveStrategy` (""→blue-green);
Validate; honor `recreate` (reap-after-pull + dedicated log).
- `dockerfile/dockerfile.go` (Config + Validate) + `dockerfile/deploy.go` (blue-green
branch, fixes 14) + `dockerfile/deploy_test.go`.
- `static/static.go` (Config + Validate) + `static/deploy.go` (blue-green branch + deno
gate, fixes 15) + `static/deploy_test.go`.
- `compose/compose.go` — Config field + Validate rejects `blue-green` (allowBlueGreen=false)
+ test.
## Phase 1 backward-compat lock (mandatory, unit-tested)
`ValidateStrategy("", …)` returns nil; every `effectiveStrategy("")` returns the source's
historical default. Existing rows (no `deploy_strategy` key) decode `""` → today's exact
behavior, byte-for-byte.
## Later phases (deferred)
- **P2 (UI):** `sourceForms.ts` seed/serialize + `/apps/new` & `/apps/[id]` select +
en/ru i18n (hide blue-green for compose).
- **P3 (harden):** mandatory HTTP readiness probe for static; connection draining before
blue removal; Traefik label suppression at cutover.
- **P4 (architecture):** extract image's proven sequence into a shared
`plugin.DeploySingleContainer`; migrate dockerfile/static to the multi-row model
(crash-safe mid-swap; unlocks `MaxInstances>1`).
- **P5:** true `rolling` (needs a backend-pool primitive on `proxy.Provider`) + compose
green-project blue-green.
## Test plan
Table-driven, TDD: `ValidateStrategy` accept/reject matrix (incl. `allowBlueGreen=false`,
reserved `rolling` rejected, `""` accepted); per-source `effectiveStrategy` defaults +
deno-storage→recreate; dockerfile/static blue-green deploy tests asserting (a) green named
≠ deterministic name, (b) collision teardown NOT run, (c) `ConfigureRoute` called with
`forwardHost==green` and NO preceding `DeleteRoute`, (d) `saveState(green)` **before**
`RemoveContainer(blue)`, (e) single row ends at green; failure path: green fails gate →
green removed, blue + route untouched; compose rejects blue-green. Gates: `go build`,
`go vet`, `go test ./internal/...`, `npm run check/test`, `./scripts/dev-server.sh`.
@@ -28,6 +28,12 @@ import (
type Config struct {
ComposeYAML string `json:"compose_yaml"`
ComposeProjectName string `json:"compose_project_name"`
// DeployStrategy is accepted for parity with the other sources but a
// compose stack only supports recreate (docker compose up -d
// --remove-orphans). "" and "recreate" are honored; "blue-green" is
// rejected at Validate so the contract is honest in the UI rather than
// silently accepting a value compose can't deliver.
DeployStrategy string `json:"deploy_strategy,omitempty"`
}
type source struct{}
@@ -70,6 +76,11 @@ func (*source) Validate(cfg json.RawMessage) error {
if strings.TrimSpace(c.ComposeYAML) == "" {
return fmt.Errorf("compose source: compose_yaml is required")
}
// allowBlueGreen=false: a whole-stack blue-green is not implemented, so
// reject it here rather than silently running recreate.
if err := plugin.ValidateStrategy(c.DeployStrategy, false); err != nil {
return fmt.Errorf("compose source: %w", err)
}
spec, err := stack.Parse(c.ComposeYAML)
if err != nil {
return fmt.Errorf("compose source: parse yaml: %w", err)
@@ -0,0 +1,37 @@
package compose
import (
"encoding/json"
"testing"
)
const validComposeYAML = "services:\n web:\n image: nginx:alpine\n ports:\n - \"80\"\n"
func composeCfg(strategy string) json.RawMessage {
m := map[string]any{"compose_yaml": validComposeYAML}
if strategy != "" {
m["deploy_strategy"] = strategy
}
b, _ := json.Marshal(m)
return b
}
func TestValidate_Strategy_RejectsBlueGreen(t *testing.T) {
cases := []struct {
strategy string
wantErr bool
}{
{"", false}, // backward-compat
{"recreate", false}, // the only thing compose can do
{"blue-green", true}, // not supported for a whole stack
{"rolling", true},
}
for _, c := range cases {
t.Run("strategy="+c.strategy, func(t *testing.T) {
err := (&source{}).Validate(composeCfg(c.strategy))
if (err != nil) != c.wantErr {
t.Fatalf("Validate(strategy=%q) err=%v, wantErr=%v", c.strategy, err, c.wantErr)
}
})
}
}
@@ -48,6 +48,11 @@ func deploy(ctx context.Context, deps plugin.Deps, w plugin.Workload, intent plu
return fmt.Errorf("dockerfile source: decode config: %w", err)
}
// bg selects the zero-downtime path: a unique green name so the new
// container coexists with the still-serving blue, an in-place route
// upsert, and blue reaped only AFTER green is persisted + routed.
bg := effectiveStrategy(cfg) == plugin.StrategyBlueGreen
prev, prevContainer, err := loadState(deps, w)
if err != nil {
return err
@@ -224,6 +229,13 @@ func deploy(ctx context.Context, deps plugin.Deps, w plugin.Workload, intent plu
}
containerName := containerNameFor(w)
if bg {
// Unique green name so the new container coexists with the still-
// serving blue one — the deterministic name would collide on
// Docker's per-daemon unique-name constraint. This name is also the
// proxy forwardHost below, so green receives traffic after cutover.
containerName = plugin.BuildGreenName(containerName, time.Now())
}
// Per-face proxy labels (Traefik consumes these; NPM ignores them).
labels := map[string]string{}
@@ -254,8 +266,16 @@ func deploy(ctx context.Context, deps plugin.Deps, w plugin.Workload, intent plu
containerID, err := deps.Docker.CreateContainer(ctx, cc)
if err != nil {
// Name conflict — best-effort cleanup of any prior container
// (by ID first; by name as a fallback) and one retry.
if bg {
// Green has a unique name, so this is a genuine create failure, not
// a name conflict — must NOT remove the still-serving blue.
updateStatus(deps, w, "failed", latestSHA,
sanitizeError(fmt.Sprintf("create container: %v", err), token))
return fmt.Errorf("create container: %w", err)
}
// recreate: the deterministic name may still be held by the prior
// container — best-effort cleanup (by ID first; by name fallback) and
// one retry. This is the recreate downtime window.
if prevContainerID != "" {
deps.Docker.StopContainer(ctx, prevContainerID, 10)
deps.Docker.RemoveContainer(ctx, prevContainerID, true)
@@ -308,6 +328,22 @@ func deploy(ctx context.Context, deps plugin.Deps, w plugin.Workload, intent plu
return fmt.Errorf("container not running: %s", logMsg)
}
// Blue-green readiness gate: the 3s window above only proves green did not
// crash, not that it is SERVING. Before swapping the route, probe green's
// healthcheck over the network (when configured) so traffic never flips to
// a not-yet-listening container. On failure, remove green and leave blue +
// its route untouched — a non-disruptive rollback. recreate skips this (it
// already removed blue, so there is no live fallback to protect).
if bg && cfg.Healthcheck != "" && deps.Health != nil {
healthURL := fmt.Sprintf("http://%s:%d%s", containerName, cfg.Port, cfg.Healthcheck)
if herr := deps.Health.Check(ctx, healthURL); herr != nil {
deps.Docker.RemoveContainer(ctx, containerID, true)
updateStatus(deps, w, "failed", latestSHA,
sanitizeError(fmt.Sprintf("readiness check %s: %v", cfg.Healthcheck, herr), token))
return fmt.Errorf("readiness check failed: %w", herr)
}
}
// Resolve proxy target: in-network DNS by default, NPM-remote
// override uses (settings.ServerIP, hostPort).
forwardHost := containerName
@@ -329,7 +365,12 @@ func deploy(ctx context.Context, deps plugin.Deps, w plugin.Workload, intent plu
// in-place so traffic shifts atomically over to the new container.
proxyRouteID := prevProxyRouteID
if domain != "" {
if prevProxyRouteID != "" {
// Blue-green relies on ConfigureRoute being an upsert-by-FQDN (NPM
// finds the host by domain and repoints it in place, gap-free), so we
// must NOT delete blue's route first — that would open a window.
// recreate already removed blue, so the pre-delete is harmless there
// but kept to preserve its exact prior behavior.
if !bg && prevProxyRouteID != "" {
deps.Proxy.DeleteRoute(ctx, prevProxyRouteID)
}
routeID, rerr := deps.Proxy.ConfigureRoute(ctx, domain, forwardHost, forwardPort, proxy.RouteOptions{
@@ -347,10 +388,12 @@ func deploy(ctx context.Context, deps plugin.Deps, w plugin.Workload, intent plu
}
}
// Drop the previous container only after the new one is healthy
// + routed. Different-ID-than-previous tells us we created a
// fresh one (vs returning the same ID via UpsertContainer reuse).
if prevContainerID != "" && prevContainerID != containerID {
// recreate: drop the previous container now that the new one is healthy +
// routed. Blue-green DEFERS this until AFTER saveState (below) so the
// persisted single row always points at a running container — a crash
// between cutover and saveState must not orphan green or leave the row
// pointing at a reaped blue (which the reconciler would then flag failed).
if !bg && prevContainerID != "" && prevContainerID != containerID {
deps.Docker.StopContainer(ctx, prevContainerID, 10)
deps.Docker.RemoveContainer(ctx, prevContainerID, true)
}
@@ -384,6 +427,14 @@ func deploy(ctx context.Context, deps plugin.Deps, w plugin.Workload, intent plu
return fmt.Errorf("persist deploy state: %w", err)
}
// Blue-green: green is now persisted in the single row AND serving behind
// the swapped route — only now is it safe to reap blue. (recreate already
// removed blue before saveState.)
if bg && prevContainerID != "" && prevContainerID != containerID {
deps.Docker.StopContainer(ctx, prevContainerID, 10)
deps.Docker.RemoveContainer(ctx, prevContainerID, true)
}
publishEvent(deps, w, "deployed")
dispatchBuildNotification(deps, w, domain, "deployed", "")
@@ -64,6 +64,23 @@ type Config struct {
// git provider as a commit status (pending/success/failure) on the
// built SHA. Best-effort — a reporting failure never fails a deploy.
ReportCommitStatus bool `json:"report_commit_status"`
// DeployStrategy selects how a redeploy cuts over. "" (default) and
// "recreate" stop the old container before starting the new one (a brief
// downtime window). "blue-green" starts the new build alongside the old,
// gates it, swaps the proxy route in place, then reaps the old —
// zero-downtime under NPM. Validated via plugin.ValidateStrategy.
DeployStrategy string `json:"deploy_strategy,omitempty"`
}
// effectiveStrategy resolves the configured strategy for the dockerfile
// source. Empty maps to recreate — the source's historical behavior — so
// existing workloads are unchanged.
func effectiveStrategy(cfg Config) string {
if cfg.DeployStrategy == "" {
return plugin.StrategyRecreate
}
return cfg.DeployStrategy
}
type source struct{}
@@ -120,6 +137,9 @@ func (*source) Validate(cfg json.RawMessage) error {
return fmt.Errorf("dockerfile source: %q must not contain '..'", p)
}
}
if err := plugin.ValidateStrategy(c.DeployStrategy, true); err != nil {
return fmt.Errorf("dockerfile source: %w", err)
}
return nil
}
@@ -0,0 +1,49 @@
package dockerfile
import (
"encoding/json"
"testing"
"github.com/alexei/tinyforge/internal/workload/plugin"
)
// validCfg is the smallest config that passes the non-strategy checks, so a
// test isolates the deploy_strategy behavior.
func validCfg(strategy string) json.RawMessage {
m := map[string]any{"repo_owner": "o", "repo_name": "r", "port": 8080}
if strategy != "" {
m["deploy_strategy"] = strategy
}
b, _ := json.Marshal(m)
return b
}
func TestValidate_Strategy(t *testing.T) {
cases := []struct {
strategy string
wantErr bool
}{
{"", false}, // backward-compat: no key -> valid
{"recreate", false},
{"blue-green", false}, // dockerfile supports blue-green
{"rolling", true}, // reserved, not yet implemented
{"junk", true},
}
for _, c := range cases {
t.Run("strategy="+c.strategy, func(t *testing.T) {
err := (&source{}).Validate(validCfg(c.strategy))
if (err != nil) != c.wantErr {
t.Fatalf("Validate(strategy=%q) err=%v, wantErr=%v", c.strategy, err, c.wantErr)
}
})
}
}
func TestEffectiveStrategy_Default(t *testing.T) {
if got := effectiveStrategy(Config{}); got != plugin.StrategyRecreate {
t.Fatalf("empty strategy = %q, want recreate (historical default)", got)
}
if got := effectiveStrategy(Config{DeployStrategy: plugin.StrategyBlueGreen}); got != plugin.StrategyBlueGreen {
t.Fatalf("explicit blue-green = %q, want blue-green", got)
}
}
@@ -40,6 +40,21 @@ type Config struct {
MemoryLimit int `json:"memory_limit"` // megabytes; 0 = unlimited
DefaultTag string `json:"default_tag"` // tag used when intent.Reference is empty
MaxInstances int `json:"max_instances"` // simultaneous containers to keep; 0/1 = strict blue-green
// DeployStrategy selects how a redeploy cuts over. "" defaults to the
// image source's native zero-downtime blue-green; "recreate" reaps the
// old container before the new one comes up (opt-in downtime). Validated
// via plugin.ValidateStrategy. Orthogonal to MaxInstances.
DeployStrategy string `json:"deploy_strategy,omitempty"`
}
// effectiveStrategy resolves the configured strategy for the image source.
// Empty maps to blue-green so every existing image workload keeps its
// current zero-downtime behavior byte-for-byte.
func effectiveStrategy(cfg Config) string {
if cfg.DeployStrategy == "" {
return plugin.StrategyBlueGreen
}
return cfg.DeployStrategy
}
// VolumeMount mirrors the existing store.Volume scope shape but as a flat
@@ -88,6 +103,9 @@ func (*source) Validate(cfg json.RawMessage) error {
if c.Port < 0 || c.Port > 65535 {
return fmt.Errorf("image source: port must be 0-65535")
}
if err := plugin.ValidateStrategy(c.DeployStrategy, true); err != nil {
return fmt.Errorf("image source: %w", err)
}
for i, v := range c.Volumes {
if strings.TrimSpace(v.Target) == "" {
return fmt.Errorf("image source: volumes[%d].target is required", i)
@@ -189,6 +207,33 @@ func (*source) Deploy(ctx context.Context, deps plugin.Deps, w plugin.Workload,
return fmt.Errorf("image source: ensure network: %w", err)
}
// recreate strategy (opt-in): tear down the existing containers BEFORE
// the new one comes up — the operator chose a downtime window. The
// default blue-green path skips this; its new container coexists with
// the old and the proxy route swaps atomically (enforceMaxInstances
// reaps the old AFTER cutover). Reaped here (after a successful pull, so
// a pull failure doesn't take the workload down for nothing). On a
// later create/health/route failure the recreate path has no blue to
// fall back to — inherent to recreate, distinct from blue-green's
// non-disruptive rollbackNew.
if effectiveStrategy(cfg) == plugin.StrategyRecreate {
for _, c := range existing {
if c.ContainerID != "" {
_ = deps.Docker.RemoveContainer(ctx, c.ContainerID, true)
}
if c.ProxyRouteID != "" {
_ = deps.Proxy.DeleteRoute(ctx, c.ProxyRouteID)
}
if delErr := deps.Store.DeleteContainer(c.ID); delErr != nil && !errors.Is(delErr, store.ErrNotFound) {
slog.Warn("image source: recreate reap old row", "workload", w.ID, "row", c.ID, "error", delErr)
}
}
if len(existing) > 0 {
slog.Info("image source: recreate strategy reaped old containers before cutover",
"workload", w.ID, "count", len(existing))
}
}
// Unique-per-deploy name so the new container can run alongside the
// old one. The suffix is monotonic ms; collisions are not a real
// concern for human-driven or webhook-driven deploys.
@@ -0,0 +1,49 @@
package image
import (
"encoding/json"
"testing"
"github.com/alexei/tinyforge/internal/workload/plugin"
)
func imageCfg(strategy string) json.RawMessage {
m := map[string]any{"image": "registry.example.com/o/app", "port": 8080}
if strategy != "" {
m["deploy_strategy"] = strategy
}
b, _ := json.Marshal(m)
return b
}
func TestValidate_Strategy(t *testing.T) {
cases := []struct {
strategy string
wantErr bool
}{
{"", false},
{"recreate", false},
{"blue-green", false},
{"rolling", true},
{"junk", true},
}
for _, c := range cases {
t.Run("strategy="+c.strategy, func(t *testing.T) {
err := (&source{}).Validate(imageCfg(c.strategy))
if (err != nil) != c.wantErr {
t.Fatalf("Validate(strategy=%q) err=%v, wantErr=%v", c.strategy, err, c.wantErr)
}
})
}
}
func TestEffectiveStrategy_DefaultsToBlueGreen(t *testing.T) {
// image's historical default is blue-green — empty must NOT flip it to
// recreate (the load-bearing per-source default).
if got := effectiveStrategy(Config{}); got != plugin.StrategyBlueGreen {
t.Fatalf("empty strategy = %q, want blue-green (image default)", got)
}
if got := effectiveStrategy(Config{DeployStrategy: plugin.StrategyRecreate}); got != plugin.StrategyRecreate {
t.Fatalf("explicit recreate = %q, want recreate", got)
}
}
@@ -44,6 +44,12 @@ func deploy(ctx context.Context, deps plugin.Deps, w plugin.Workload, intent plu
return fmt.Errorf("static source: decode config: %w", err)
}
// bg selects the zero-downtime path: a unique green name so the new
// container coexists with the still-serving blue, an in-place route
// upsert, and blue reaped only AFTER green is persisted + routed.
// effectiveStrategy forces recreate for storage-backed deno sites.
bg := effectiveStrategy(cfg) == plugin.StrategyBlueGreen
prev, prevContainer, err := loadState(deps, w)
if err != nil {
return err
@@ -238,6 +244,13 @@ func deploy(ctx context.Context, deps plugin.Deps, w plugin.Workload, intent plu
}
containerName := containerNameFor(w)
if bg {
// Unique green name so the new container coexists with the still-
// serving blue one — the deterministic name would collide on
// Docker's per-daemon unique-name constraint. This name is also the
// proxy forwardHost below, so green receives traffic after cutover.
containerName = plugin.BuildGreenName(containerName, time.Now())
}
var mounts []mount.Mount
if cfg.StorageEnabled && mode == "deno" {
@@ -283,8 +296,16 @@ func deploy(ctx context.Context, deps plugin.Deps, w plugin.Workload, intent plu
containerID, err := deps.Docker.CreateContainer(ctx, cc)
if err != nil {
// Container with this name might already exist — best-effort
// cleanup of any prior container by ID and by name, then retry.
if bg {
// Green has a unique name, so this is a genuine create failure, not
// a name conflict — must NOT remove the still-serving blue.
updateStatus(deps, w, "failed", latestSHA,
sanitizeError(fmt.Sprintf("create container: %v", err), token))
return fmt.Errorf("create container: %w", err)
}
// recreate: the deterministic name might still be held by the prior
// container — best-effort cleanup (by ID, then by name) and one retry.
// This is the recreate downtime window.
if prevContainerID != "" {
deps.Docker.StopContainer(ctx, prevContainerID, 10)
deps.Docker.RemoveContainer(ctx, prevContainerID, true)
@@ -353,7 +374,11 @@ func deploy(ctx context.Context, deps plugin.Deps, w plugin.Workload, intent plu
// place so traffic shifts atomically.
proxyRouteID := prevProxyRouteID
if domain != "" {
if prevProxyRouteID != "" {
// Blue-green relies on ConfigureRoute being an upsert-by-FQDN (NPM
// repoints the host in place, gap-free), so we must NOT delete blue's
// route first. recreate already removed blue, so the pre-delete is
// harmless there but kept to preserve its exact prior behavior.
if !bg && prevProxyRouteID != "" {
deps.Proxy.DeleteRoute(ctx, prevProxyRouteID)
}
routeID, rerr := deps.Proxy.ConfigureRoute(ctx, domain, forwardHost, forwardPort, proxy.RouteOptions{
@@ -371,8 +396,12 @@ func deploy(ctx context.Context, deps plugin.Deps, w plugin.Workload, intent plu
}
}
// Drop the old container if a fresh one was created (different ID).
if prevContainerID != "" && prevContainerID != containerID {
// recreate: drop the old container now that the new one is healthy +
// routed. Blue-green DEFERS this until AFTER saveState (below) so the
// persisted single row always points at a running container — a crash
// between cutover and saveState must not orphan green or leave the row
// pointing at a reaped blue (which the reconciler would then flag failed).
if !bg && prevContainerID != "" && prevContainerID != containerID {
deps.Docker.StopContainer(ctx, prevContainerID, 10)
deps.Docker.RemoveContainer(ctx, prevContainerID, true)
}
@@ -409,6 +438,14 @@ func deploy(ctx context.Context, deps plugin.Deps, w plugin.Workload, intent plu
return fmt.Errorf("persist deploy state: %w", err)
}
// Blue-green: green is now persisted in the single row AND serving behind
// the swapped route — only now is it safe to reap blue. (recreate already
// removed blue before saveState.)
if bg && prevContainerID != "" && prevContainerID != containerID {
deps.Docker.StopContainer(ctx, prevContainerID, 10)
deps.Docker.RemoveContainer(ctx, prevContainerID, true)
}
publishEvent(deps, w, "deployed")
// updateStatus normally fires the terminal-state notification; the
@@ -41,6 +41,30 @@ type Config struct {
// git provider as a commit status (pending/success/failure) on the
// deployed SHA. Best-effort — a reporting failure never fails a deploy.
ReportCommitStatus bool `json:"report_commit_status"`
// DeployStrategy selects how a redeploy cuts over. "" (default) and
// "recreate" stop the old container before the new one comes up (a brief
// downtime window). "blue-green" starts the new container alongside the
// old, gates it, swaps the proxy route in place, then reaps the old —
// zero-downtime under NPM. Validated via plugin.ValidateStrategy.
DeployStrategy string `json:"deploy_strategy,omitempty"`
}
// effectiveStrategy resolves the configured strategy for the static source.
// Empty maps to recreate — the source's historical behavior. Storage-backed
// deno sites are forced to recreate even when blue-green is requested: a
// blue-green overlap would mount the same RW named volume into BOTH
// containers at once (a concurrent-writer window recreate never has, since
// recreate stops blue before green starts).
func effectiveStrategy(cfg Config) string {
s := cfg.DeployStrategy
if s == "" {
s = plugin.StrategyRecreate
}
if s == plugin.StrategyBlueGreen && cfg.StorageEnabled && cfg.Mode == "deno" {
return plugin.StrategyRecreate
}
return s
}
type source struct{}
@@ -78,6 +102,9 @@ func (*source) Validate(cfg json.RawMessage) error {
if c.Mode != "" && c.Mode != "static" && c.Mode != "deno" {
return fmt.Errorf("static source: mode must be \"static\" or \"deno\"")
}
if err := plugin.ValidateStrategy(c.DeployStrategy, true); err != nil {
return fmt.Errorf("static source: %w", err)
}
return nil
}
@@ -0,0 +1,58 @@
package static
import (
"encoding/json"
"testing"
"github.com/alexei/tinyforge/internal/workload/plugin"
)
func validCfg(extra map[string]any) json.RawMessage {
m := map[string]any{"repo_owner": "o", "repo_name": "r"}
for k, v := range extra {
m[k] = v
}
b, _ := json.Marshal(m)
return b
}
func TestValidate_Strategy(t *testing.T) {
cases := []struct {
strategy string
wantErr bool
}{
{"", false},
{"recreate", false},
{"blue-green", false},
{"rolling", true},
{"junk", true},
}
for _, c := range cases {
t.Run("strategy="+c.strategy, func(t *testing.T) {
err := (&source{}).Validate(validCfg(map[string]any{"deploy_strategy": c.strategy}))
if (err != nil) != c.wantErr {
t.Fatalf("Validate(strategy=%q) err=%v, wantErr=%v", c.strategy, err, c.wantErr)
}
})
}
}
func TestEffectiveStrategy_DefaultAndDenoGate(t *testing.T) {
if got := effectiveStrategy(Config{}); got != plugin.StrategyRecreate {
t.Fatalf("empty strategy = %q, want recreate", got)
}
if got := effectiveStrategy(Config{DeployStrategy: plugin.StrategyBlueGreen}); got != plugin.StrategyBlueGreen {
t.Fatalf("plain blue-green = %q, want blue-green", got)
}
// Storage-backed deno site requesting blue-green is forced to recreate to
// avoid a concurrent-writer overlap on the shared /app/data volume.
denoStorage := Config{DeployStrategy: plugin.StrategyBlueGreen, StorageEnabled: true, Mode: "deno"}
if got := effectiveStrategy(denoStorage); got != plugin.StrategyRecreate {
t.Fatalf("deno+storage blue-green = %q, want recreate (forced)", got)
}
// A deno site WITHOUT storage may use blue-green.
denoNoStorage := Config{DeployStrategy: plugin.StrategyBlueGreen, Mode: "deno"}
if got := effectiveStrategy(denoNoStorage); got != plugin.StrategyBlueGreen {
t.Fatalf("deno (no storage) blue-green = %q, want blue-green", got)
}
}
+50
View File
@@ -0,0 +1,50 @@
package plugin
import (
"fmt"
"time"
)
// Deploy strategy values for a source's DeployStrategy config field.
//
// - "" (empty) — back-compat default; each source resolves it to its
// historical behavior (image -> blue-green, others -> recreate). Every
// pre-existing workload row decodes to this.
// - StrategyRecreate — stop the old container, then start the new one
// (a brief downtime window; what dockerfile/static/compose do today).
// - StrategyBlueGreen — start the new container alongside the old, gate it,
// swap the proxy route, then reap the old (zero-downtime under NPM).
const (
StrategyRecreate = "recreate"
StrategyBlueGreen = "blue-green"
)
// ValidateStrategy checks a deploy_strategy config value. "" is always valid
// (the back-compat default). StrategyRecreate is always valid. StrategyBlueGreen
// is valid only when the source supports it (allowBlueGreen) — compose passes
// false because a whole-stack blue-green is not implemented. Reserved values
// such as "rolling" are rejected until implemented so a config can't silently
// claim a behavior the deployer won't honor.
func ValidateStrategy(value string, allowBlueGreen bool) error {
switch value {
case "", StrategyRecreate:
return nil
case StrategyBlueGreen:
if allowBlueGreen {
return nil
}
return fmt.Errorf("deploy_strategy %q is not supported for this source kind; use \"recreate\"", value)
default:
return fmt.Errorf("invalid deploy_strategy %q (valid: \"\", %q, %q)", value, StrategyRecreate, StrategyBlueGreen)
}
}
// BuildGreenName appends a unique millisecond-hex suffix to a source's
// otherwise-deterministic container name so a new "green" container can run
// alongside the old "blue" during a blue-green cutover. Sources whose names
// are deterministic (dockerfile, static) collide on Docker's per-daemon
// unique-name constraint without this; the suffix lets both coexist until the
// route swaps and blue is reaped. Mirrors the image source's ms-hex scheme.
func BuildGreenName(base string, ts time.Time) string {
return fmt.Sprintf("%s-%x", base, ts.UnixMilli())
}
+48
View File
@@ -0,0 +1,48 @@
package plugin
import (
"testing"
"time"
)
func TestValidateStrategy(t *testing.T) {
cases := []struct {
name string
value string
allowBlueGreen bool
wantErr bool
}{
{"empty always ok (backward compat)", "", true, false},
{"empty ok when blue-green disallowed", "", false, false},
{"recreate ok", StrategyRecreate, true, false},
{"recreate ok when blue-green disallowed", StrategyRecreate, false, false},
{"blue-green ok when allowed", StrategyBlueGreen, true, false},
{"blue-green rejected when disallowed (compose)", StrategyBlueGreen, false, true},
{"reserved rolling rejected (allowed)", "rolling", true, true},
{"reserved rolling rejected (disallowed)", "rolling", false, true},
{"junk rejected", "banana", true, true},
}
for _, c := range cases {
t.Run(c.name, func(t *testing.T) {
err := ValidateStrategy(c.value, c.allowBlueGreen)
if (err != nil) != c.wantErr {
t.Fatalf("ValidateStrategy(%q, %v) err=%v, wantErr=%v", c.value, c.allowBlueGreen, err, c.wantErr)
}
})
}
}
func TestBuildGreenName_UniqueSuffixAndDistinct(t *testing.T) {
base := "tf-build-app-1234abcd"
a := BuildGreenName(base, time.Unix(1000, 0))
b := BuildGreenName(base, time.Unix(2000, 0))
if a == base || b == base {
t.Fatal("green name must differ from the deterministic base")
}
if a == b {
t.Fatal("different timestamps must yield different green names")
}
if len(a) <= len(base) {
t.Fatalf("green name %q should extend the base %q", a, base)
}
}