Files
tiny-forge/plans/volume-snapshot-restore/phase-2-lifecycle-locking.md
T
alexei.dolgolyov 1c47030854 feat(volsnap): volume snapshot restore (backlog #6)
Restore a captured volume snapshot onto an image workload's live host-bind
data volumes, then redeploy — the most destructive workload action, built to
the adversarially-reviewed design (C1–C6) with all data-loss guards.

- Engine.Restore (engine-owned): all-or-nothing pre-flight re-resolution from
  the workload's CURRENT config (never the tamperable manifest), per-filesystem
  disk pre-check, per-workload lock, container quiesce, extract-to-tmp, durable
  pre-restore snapshot, write-ahead journal, atomic rename swap, redeploy, and
  crash-recovery sweep (RecoverInterruptedRestores) wired before serving.
- internal/keyedmutex: shared per-key lock; deployer now serializes every
  deploy entrypoint per workload via DispatchPlugin (+ LockWorkload/RedeployLocked
  for the restore re-dispatch, no deadlock).
- Untrusted-archive extractor: zip-slip containment, type allow-list (reg/dir
  only), decompression-bomb cap, manifest-index bounds.
- POST /api/workloads/{id}/snapshots/{sid}/restore: admin, X-Confirm-Restore
  header (CSRF), per-workload single-flight (409).
- WebUI: Restore button + danger ConfirmDialog + busy state + i18n (en/ru).

Scope: image-source only; scopes absolute/stage/project (driven off the same
supportedScopes constant capture uses).

Plan-reviewed before coding; per-phase go/security/ts reviews; final review
READY TO MERGE. Security review caught + fixed a CRITICAL manifest-Source path
traversal (re-derive target from current config + base containment).

Plan: plans/volume-snapshot-restore/
2026-06-22 17:23:52 +03:00

7.9 KiB
Raw Blame History

Phase 2: Engine.Restore orchestration + lifecycle/locking + rollback

Status: Complete Parent plan: PLAN.md Domain: backend

Objective

Wire the Phase 1 primitives into the full stop → swap → redeploy sequence under a per-workload lock, with crash-safe rollback (journal + recovery sweep) and a durable pre-restore auto-capture. Define the Lifecycle seam; modify the Deployer for per-workload locking + an unlocked redeploy.

Tasks

  • internal/keyedmutex/keyedmutex.go — extract the gitops.go pattern into a shared package: type Mutex with Lock(key string) func() and TryLock(key string) (func(), bool) (the Try variant serves the Phase 3 API single-flight → 409). Unit test both.
  • Deployer locking (C1) in internal/deployer/:
    • add workloadLocks keyedmutex.Mutex field.
    • refactor DispatchPluginunlock := d.workloadLocks.Lock(w.ID); defer unlock(); return d.dispatchLocked(ctx, w, intent); move the current body into unexported dispatchLocked.
    • wrap DispatchTeardown in the same per-workload lock.
    • do NOT lock DispatchReconcile (periodic; image Reconcile is a no-op; reconciler markMissingRows only flips labels = benign; locking it would stall the reconcile loop behind long deploys).
    • expose func (d *Deployer) LockWorkload(id string) func() and func (d *Deployer) RedeployLocked(ctx, w, intent) error (= dispatchLocked, doc: "caller already holds the workload lock; calling DispatchPlugin would deadlock").
  • volsnap.Lifecycle interface (in volsnap):
    • Lock(workloadID string) func()
    • StopContainers(ctx, workloadID string) (runningTag string, err error) — stop every running container for the workload; return the newest-running container's ImageTag (so redeploy pins the same version; empty ⇒ source default). Mark stopped rows State="stopped".
    • Redeploy(ctx, w store.Workload, reference string) error — unlocked re-dispatch, Reason "restore", Reference=tag.
  • Engine.Restore(ctx, snapshotID, workloadID string) error in internal/volsnap/restore.go (engine owns it). Sequence — does NOT hold e.mu (R1):
    1. load snap; verify snap.WorkloadID == workloadID; load workload + settings; require source_kind=="image".
    2. parseManifest; preflightResolve (C3 — abort if any fails); archiveUncompressedSize + per-filesystem freeDiskBytes pre-check (C5/R4 — abort).
    3. unlock := lc.Lock(workloadID); defer unlock() (C1).
    4. re-validate the workload still exists (R4 — teardown may have won the lock); abort if gone.
    5. tag, _ := lc.StopContainers(ctx, workloadID) (C4 stop).
    6. durably capture pre-restore snapshot: e.Create(w, settings, "pre-restore") (folded; AFTER stop = quiesced; BEFORE any rename = R3). Create takes its own e.mu — Restore must hold none.
    7. write restore journal <snapDir>/restore-<workloadID>.json (snapshotID, per-volume {live, old, tmp, swapped:false}).
    8. extract ALL volumes to their tmp staging dirs (safeExtractIndex) — R3 (shrinks the destructive window to pure renames).
    9. swap each volume (swapVolumeDir), updating the journal swapped=true per volume.
    10. on ANY error in 89 → rollbackSwaps + lc.Redeploy(ctx, w, tag) + delete journal + return wrapped error.
    11. success → lc.Redeploy(ctx, w, tag) (C4 redeploy); remove .old staging dirs (reclaim disk); delete journal; best-effort audit event (store.InsertEvent source "volsnap").
    • Engine.SetLifecycle(lc Lifecycle) setter; Restore errors clearly if lifecycle is nil.
  • Engine.RecoverInterruptedRestores() (int, error) (R3) — startup sweep, mirrors CleanOrphans: for each restore-*.json journal, per volume: if swapped → remove old+tmp; else if live missing && old exists → rename old→live (revert mid-rename crash), remove tmp; else (live present, not swapped) → remove tmp. Delete journal. Log loudly. (Wiring at startup happens in Phase 3's main.go change, beside CleanOrphans.)

Files to Modify/Create

  • internal/keyedmutex/keyedmutex.go (+ _test.go) — shared lock (new)
  • internal/deployer/deployer.go, internal/deployer/dispatch.go — workloadLocks, dispatchLocked, LockWorkload, RedeployLocked, locked Teardown
  • internal/volsnap/restore.go — Lifecycle interface, Engine.Restore, RecoverInterruptedRestores, SetLifecycle, journal type
  • internal/volsnap/restore_test.go — fake-Lifecycle orchestration tests (extends Phase 1 file)
  • internal/api/gitops.go — (optional, low-risk) migrate keyedMutexkeyedmutex.Mutex for DRY

Acceptance Criteria

  • Lock re-entrancy: Engine.RestoreRedeployLocked does NOT re-acquire the workload lock (no deadlock). All existing deployer tests still pass (lock is externally transparent).
  • Happy-path orchestration test uses the REAL Engine.Create (real store + t.TempDir()) for the pre-restore capture so the e.mu deadlock (R1) would fail go test, not prod. Asserts call order: preflight → lock → stop → create → extract-all → swap-all → redeploy → cleanup.
  • Rollback test: a swap fails midway → originals restored, redeploy called, journal deleted, error returned.
  • Preflight-fail test: lock/stop NEVER called (abort before lock).
  • Disk-pre-check-fail test: abort before lock.
  • RecoverInterruptedRestores test: simulate journals in each crash state → correct revert/keep/cleanup.
  • go build ./..., go vet ./internal/..., go test ./internal/... green.

Notes

  • ⚠️ The Deployer lock change touches the hot deploy path — verify no existing path re-enters DispatchPlugin under a held lock (webhook preview = sequential teardown-then-deploy on the child, not nested — confirmed safe).
  • The API single-flight (Phase 3) is a fast 409 reject; the deployer lock is the real mutex — they compose (document).

Review Checklist

  • All tasks completed
  • Code follows project conventions
  • No unintended side effects (existing deploy/teardown behavior unchanged externally)
  • Build passes
  • Tests pass (new + existing)

Handoff to Next Phase

Implemented: internal/keyedmutex (Lock+TryLock, tested); deployer workloadLocks + dispatchLocked + LockWorkload + RedeployLocked, DispatchPlugin/DispatchTeardown now per-workload-locked (reconciler intentionally NOT). volsnap.Lifecycle interface, Engine.Restore, restoreJournal (atomic write — W1), RecoverInterruptedRestores, recoverVolume, checkDiskSpace, SetLifecycle. Tests: restore_engine_test.go (happy/real-Create, redeploy-fail, preflight-abort, extract-fail-after-lock, nil-lifecycle, wrong-workload, recovery×3 states), keyedmutex_test.go. Full go test ./internal/... green.

Review (go-reviewer, APPROVE WITH NOTES): no functional blockers in this diff. Verified: no lock re-entrancy/e.mu self-deadlock, no prune-race (extract-all precedes e.Create), recovery state machine doesn't revert good data. Addressed in-phase: W1 (atomic journal), W3 (extract-failure orchestration test). Residual W3 (mid-swap fault injection) accepted.

🔴 HARD PREREQUISITES for Phase 3 (B1 + N1 from review):

  1. Wire snapshotEngine.RecoverInterruptedRestores() at startup in cmd/server/main.go, BEFORE the API server serves — beside the existing CleanOrphans() call (~main.go:333). Without it the journal/WAL protects nothing — a crash mid-restore is unrecovered.
  2. Wire snapshotEngine.SetLifecycle(adapter) strictly BEFORE serving (same place as SetSnapshotEngine) so the e.lifecycle field is safely published (no race).
  3. The restore endpoint MUST NOT be reachable until both are wired.

Lifecycle adapter (Phase 3, main.go) maps: Lockdeployer.LockWorkload; StopContainersstore.ListContainersByWorkload + docker.StopContainer each running + UpdateContainerState(...,"stopped") + return newest-running ImageTag; Redeploydeployer.RedeployLocked with a restore-reason intent (Reference=tag).