1c47030854
Restore a captured volume snapshot onto an image workload's live host-bind
data volumes, then redeploy — the most destructive workload action, built to
the adversarially-reviewed design (C1–C6) with all data-loss guards.
- Engine.Restore (engine-owned): all-or-nothing pre-flight re-resolution from
the workload's CURRENT config (never the tamperable manifest), per-filesystem
disk pre-check, per-workload lock, container quiesce, extract-to-tmp, durable
pre-restore snapshot, write-ahead journal, atomic rename swap, redeploy, and
crash-recovery sweep (RecoverInterruptedRestores) wired before serving.
- internal/keyedmutex: shared per-key lock; deployer now serializes every
deploy entrypoint per workload via DispatchPlugin (+ LockWorkload/RedeployLocked
for the restore re-dispatch, no deadlock).
- Untrusted-archive extractor: zip-slip containment, type allow-list (reg/dir
only), decompression-bomb cap, manifest-index bounds.
- POST /api/workloads/{id}/snapshots/{sid}/restore: admin, X-Confirm-Restore
header (CSRF), per-workload single-flight (409).
- WebUI: Restore button + danger ConfirmDialog + busy state + i18n (en/ru).
Scope: image-source only; scopes absolute/stage/project (driven off the same
supportedScopes constant capture uses).
Plan-reviewed before coding; per-phase go/security/ts reviews; final review
READY TO MERGE. Security review caught + fixed a CRITICAL manifest-Source path
traversal (re-derive target from current config + base containment).
Plan: plans/volume-snapshot-restore/
107 lines
7.9 KiB
Markdown
107 lines
7.9 KiB
Markdown
# Phase 2: Engine.Restore orchestration + lifecycle/locking + rollback
|
||
|
||
**Status:** ✅ Complete
|
||
**Parent plan:** [PLAN.md](./PLAN.md)
|
||
**Domain:** backend
|
||
|
||
## Objective
|
||
|
||
Wire the Phase 1 primitives into the full **stop → swap → redeploy** sequence under a
|
||
per-workload lock, with crash-safe rollback (journal + recovery sweep) and a durable
|
||
pre-restore auto-capture. Define the `Lifecycle` seam; modify the Deployer for per-workload
|
||
locking + an unlocked redeploy.
|
||
|
||
## Tasks
|
||
|
||
- [ ] **`internal/keyedmutex/keyedmutex.go`** — extract the `gitops.go` pattern into a shared
|
||
package: `type Mutex` with `Lock(key string) func()` and `TryLock(key string) (func(), bool)`
|
||
(the Try variant serves the Phase 3 API single-flight → 409). Unit test both.
|
||
- [ ] **Deployer locking (C1)** in `internal/deployer/`:
|
||
- add `workloadLocks keyedmutex.Mutex` field.
|
||
- refactor `DispatchPlugin` → `unlock := d.workloadLocks.Lock(w.ID); defer unlock(); return d.dispatchLocked(ctx, w, intent)`; move the current body into unexported `dispatchLocked`.
|
||
- wrap `DispatchTeardown` in the same per-workload lock.
|
||
- do NOT lock `DispatchReconcile` (periodic; image Reconcile is a no-op; reconciler `markMissingRows` only flips labels = benign; locking it would stall the reconcile loop behind long deploys).
|
||
- expose `func (d *Deployer) LockWorkload(id string) func()` and `func (d *Deployer) RedeployLocked(ctx, w, intent) error` (= `dispatchLocked`, doc: "caller already holds the workload lock; calling DispatchPlugin would deadlock").
|
||
- [ ] **`volsnap.Lifecycle` interface** (in volsnap):
|
||
- `Lock(workloadID string) func()`
|
||
- `StopContainers(ctx, workloadID string) (runningTag string, err error)` — stop every running container for the workload; return the **newest-running** container's `ImageTag` (so redeploy pins the same version; empty ⇒ source default). Mark stopped rows `State="stopped"`.
|
||
- `Redeploy(ctx, w store.Workload, reference string) error` — unlocked re-dispatch, Reason `"restore"`, Reference=tag.
|
||
- [ ] **`Engine.Restore(ctx, snapshotID, workloadID string) error`** in `internal/volsnap/restore.go`
|
||
(engine owns it). Sequence — **does NOT hold `e.mu`** (R1):
|
||
1. load snap; verify `snap.WorkloadID == workloadID`; load workload + settings; require `source_kind=="image"`.
|
||
2. `parseManifest`; `preflightResolve` (C3 — abort if any fails); `archiveUncompressedSize` + per-filesystem `freeDiskBytes` pre-check (C5/R4 — abort).
|
||
3. `unlock := lc.Lock(workloadID); defer unlock()` (C1).
|
||
4. **re-validate** the workload still exists (R4 — teardown may have won the lock); abort if gone.
|
||
5. `tag, _ := lc.StopContainers(ctx, workloadID)` (C4 stop).
|
||
6. **durably** capture pre-restore snapshot: `e.Create(w, settings, "pre-restore")` (folded; AFTER stop = quiesced; BEFORE any rename = R3). `Create` takes its own `e.mu` — Restore must hold none.
|
||
7. write **restore journal** `<snapDir>/restore-<workloadID>.json` (snapshotID, per-volume {live, old, tmp, swapped:false}).
|
||
8. **extract ALL** volumes to their `tmp` staging dirs (`safeExtractIndex`) — R3 (shrinks the destructive window to pure renames).
|
||
9. **swap** each volume (`swapVolumeDir`), updating the journal `swapped=true` per volume.
|
||
10. on ANY error in 8–9 → `rollbackSwaps` + `lc.Redeploy(ctx, w, tag)` + delete journal + return wrapped error.
|
||
11. success → `lc.Redeploy(ctx, w, tag)` (C4 redeploy); remove `.old` staging dirs (reclaim disk); delete journal; best-effort audit event (`store.InsertEvent` source `"volsnap"`).
|
||
- `Engine.SetLifecycle(lc Lifecycle)` setter; `Restore` errors clearly if lifecycle is nil.
|
||
- [ ] **`Engine.RecoverInterruptedRestores() (int, error)`** (R3) — startup sweep, mirrors
|
||
`CleanOrphans`: for each `restore-*.json` journal, per volume: if `swapped` → remove `old`+`tmp`;
|
||
else if live missing && old exists → rename old→live (revert mid-rename crash), remove tmp;
|
||
else (live present, not swapped) → remove tmp. Delete journal. Log loudly. (Wiring at startup
|
||
happens in Phase 3's main.go change, beside `CleanOrphans`.)
|
||
|
||
## Files to Modify/Create
|
||
|
||
- `internal/keyedmutex/keyedmutex.go` (+ `_test.go`) — shared lock (new)
|
||
- `internal/deployer/deployer.go`, `internal/deployer/dispatch.go` — workloadLocks, dispatchLocked, LockWorkload, RedeployLocked, locked Teardown
|
||
- `internal/volsnap/restore.go` — Lifecycle interface, Engine.Restore, RecoverInterruptedRestores, SetLifecycle, journal type
|
||
- `internal/volsnap/restore_test.go` — fake-Lifecycle orchestration tests (extends Phase 1 file)
|
||
- `internal/api/gitops.go` — (optional, low-risk) migrate `keyedMutex`→`keyedmutex.Mutex` for DRY
|
||
|
||
## Acceptance Criteria
|
||
|
||
- Lock re-entrancy: `Engine.Restore` → `RedeployLocked` does NOT re-acquire the workload lock (no deadlock). All existing deployer tests still pass (lock is externally transparent).
|
||
- **Happy-path orchestration test uses the REAL `Engine.Create` (real store + `t.TempDir()`)** for the pre-restore capture so the `e.mu` deadlock (R1) would fail `go test`, not prod. Asserts call order: preflight → lock → stop → create → extract-all → swap-all → redeploy → cleanup.
|
||
- Rollback test: a swap fails midway → originals restored, redeploy called, journal deleted, error returned.
|
||
- Preflight-fail test: lock/stop NEVER called (abort before lock).
|
||
- Disk-pre-check-fail test: abort before lock.
|
||
- `RecoverInterruptedRestores` test: simulate journals in each crash state → correct revert/keep/cleanup.
|
||
- `go build ./...`, `go vet ./internal/...`, `go test ./internal/...` green.
|
||
|
||
## Notes
|
||
|
||
- ⚠️ The Deployer lock change touches the hot deploy path — verify no existing path re-enters `DispatchPlugin` under a held lock (webhook preview = sequential teardown-then-deploy on the child, not nested — confirmed safe).
|
||
- The API single-flight (Phase 3) is a fast 409 reject; the deployer lock is the real mutex — they compose (document).
|
||
|
||
## Review Checklist
|
||
|
||
- [ ] All tasks completed
|
||
- [ ] Code follows project conventions
|
||
- [ ] No unintended side effects (existing deploy/teardown behavior unchanged externally)
|
||
- [ ] Build passes
|
||
- [ ] Tests pass (new + existing)
|
||
|
||
## Handoff to Next Phase
|
||
|
||
Implemented: `internal/keyedmutex` (Lock+TryLock, tested); deployer `workloadLocks` +
|
||
`dispatchLocked` + `LockWorkload` + `RedeployLocked`, `DispatchPlugin`/`DispatchTeardown`
|
||
now per-workload-locked (reconciler intentionally NOT). `volsnap.Lifecycle` interface,
|
||
`Engine.Restore`, `restoreJournal` (atomic write — W1), `RecoverInterruptedRestores`,
|
||
`recoverVolume`, `checkDiskSpace`, `SetLifecycle`. Tests: `restore_engine_test.go`
|
||
(happy/real-Create, redeploy-fail, preflight-abort, extract-fail-after-lock, nil-lifecycle,
|
||
wrong-workload, recovery×3 states), `keyedmutex_test.go`. Full `go test ./internal/...` green.
|
||
|
||
**Review (go-reviewer, APPROVE WITH NOTES):** no functional blockers in this diff. Verified:
|
||
no lock re-entrancy/`e.mu` self-deadlock, no prune-race (extract-all precedes `e.Create`),
|
||
recovery state machine doesn't revert good data. Addressed in-phase: W1 (atomic journal),
|
||
W3 (extract-failure orchestration test). Residual W3 (mid-swap fault injection) accepted.
|
||
|
||
**🔴 HARD PREREQUISITES for Phase 3 (B1 + N1 from review):**
|
||
1. Wire `snapshotEngine.RecoverInterruptedRestores()` at startup in `cmd/server/main.go`,
|
||
BEFORE the API server serves — beside the existing `CleanOrphans()` call (~main.go:333).
|
||
Without it the journal/WAL protects nothing — a crash mid-restore is unrecovered.
|
||
2. Wire `snapshotEngine.SetLifecycle(adapter)` strictly BEFORE serving (same place as
|
||
`SetSnapshotEngine`) so the `e.lifecycle` field is safely published (no race).
|
||
3. The restore endpoint MUST NOT be reachable until both are wired.
|
||
|
||
**Lifecycle adapter (Phase 3, main.go) maps:** `Lock`→`deployer.LockWorkload`;
|
||
`StopContainers`→`store.ListContainersByWorkload` + `docker.StopContainer` each running +
|
||
`UpdateContainerState(...,"stopped")` + return newest-running `ImageTag`;
|
||
`Redeploy`→`deployer.RedeployLocked` with a `restore`-reason intent (Reference=tag).
|