Restore a captured volume snapshot onto an image workload's live host-bind
data volumes, then redeploy — the most destructive workload action, built to
the adversarially-reviewed design (C1–C6) with all data-loss guards.
- Engine.Restore (engine-owned): all-or-nothing pre-flight re-resolution from
the workload's CURRENT config (never the tamperable manifest), per-filesystem
disk pre-check, per-workload lock, container quiesce, extract-to-tmp, durable
pre-restore snapshot, write-ahead journal, atomic rename swap, redeploy, and
crash-recovery sweep (RecoverInterruptedRestores) wired before serving.
- internal/keyedmutex: shared per-key lock; deployer now serializes every
deploy entrypoint per workload via DispatchPlugin (+ LockWorkload/RedeployLocked
for the restore re-dispatch, no deadlock).
- Untrusted-archive extractor: zip-slip containment, type allow-list (reg/dir
only), decompression-bomb cap, manifest-index bounds.
- POST /api/workloads/{id}/snapshots/{sid}/restore: admin, X-Confirm-Restore
header (CSRF), per-workload single-flight (409).
- WebUI: Restore button + danger ConfirmDialog + busy state + i18n (en/ru).
Scope: image-source only; scopes absolute/stage/project (driven off the same
supportedScopes constant capture uses).
Plan-reviewed before coding; per-phase go/security/ts reviews; final review
READY TO MERGE. Security review caught + fixed a CRITICAL manifest-Source path
traversal (re-derive target from current config + base containment).
Plan: plans/volume-snapshot-restore/
4.9 KiB
Phase 3: API endpoint + CSRF header + single-flight + wiring + tests
Status: ✅ Complete Parent plan: PLAN.md Domain: backend
Objective
Expose restore over HTTP behind the destructive-action guards, and wire the real
Lifecycle adapter + startup recovery sweep at the composition root.
Tasks
restoreWorkloadSnapshothandler ininternal/api/volume_snapshots.go:POST /api/workloads/{id}/snapshots/{sid}/restore.- require
X-Confirm-Restore: <sid>header == pathsid(C6, mirrorbackups.gorestoreBackup); mismatch → 400. - per-workload single-flight via
keyedmutex.TryLock(id)(or async.MapLoadOrStore) → 409 if a restore for this workload is already running; release on completion. (Different apps restore concurrently.) - load snapshot via
snapshotEngine.Get(sid); verifysnap.WorkloadID == id; load workload; requiresource_kind=="image"(else 400). - call
s.snapshotEngine.Restore(r.Context(), id, sid). Synchronous (mirrorsdeployPluginWorkloadwhich blocks on dispatch).ErrNoSnapshotData-class / client-actionable → 400; success → 200{"status":"restored",...}; other → 500 with a generic message (no internal detail leak — mirror existing handlers; raw error toslog).
- require
- Route registration in
internal/api/router.gounder the admin group on/workloads/{id}, beside the existingPOST /snapshots. Admin-gated. - Lifecycle adapter + recovery wiring in
cmd/server/main.go(composition root — already imports deployer + docker + store + volsnap):- a small struct implementing
volsnap.Lifecycle:Lock→deployer.LockWorkload;StopContainers→store.ListContainersByWorkload+docker.StopContainereach running +store.UpdateContainerState(...,"stopped")+ return newest-runningImageTag;Redeploy→deployer.RedeployLockedwithtoPluginWorkload/WorkloadFromStore+ arestore-reason intent. snapshotEngine.SetLifecycle(adapter).- call
snapshotEngine.RecoverInterruptedRestores()at startup beside the existingCleanOrphans()call; log result. - keeps the
apipackage decoupled fromdeployer(router.go deliberately avoids importing deployer).
- a small struct implementing
Files to Modify/Create
internal/api/volume_snapshots.go—restoreWorkloadSnapshothandler + single-flight field onServerinternal/api/router.go— route registrationcmd/server/main.go— Lifecycle adapter,SetLifecycle,RecoverInterruptedRestoresstartup wiringinternal/api/volume_snapshots_test.go— handler tests (extends existing file)
Acceptance Criteria
- Missing/mismatched
X-Confirm-Restore→ 400. - Concurrent second restore for the same workload → 409.
- Non-image workload → 400; snapshot belonging to another workload → 400/404.
- Happy path → 200 (with a fake engine/lifecycle or a seeded real engine).
go build ./...,go vet ./internal/...,go test ./internal/...green; existing snapshot tests pass.
Notes
- The handler is thin: validation + single-flight + delegate to
Engine.Restore. All the dangerous logic stays in the engine (folded requirement). - Confirm the admin group +
X-Confirm-Restoretogether match the DB-restore threat model (custom header defeats CSRF form/img posts; admin JWT + AdminOnly gates authz).
Review Checklist
- All tasks completed
- Code follows project conventions
- No unintended side effects
- Build passes
- Tests pass (new + existing)
Handoff to Next Phase
Implemented: restoreWorkloadSnapshot handler (internal/api/volume_snapshots.go) — admin,
X-Confirm-Restore: <sid> header, per-workload single-flight (volRestoreInFlight.TryLock
→ 409); route POST /api/workloads/{id}/snapshots/{sid}/restore; restoreLifecycle adapter
(cmd/server/restore_lifecycle.go); main.go wires SetLifecycle + RecoverInterruptedRestores
BEFORE serving (B1/N1 resolved). Tests: header miss/mismatch→400, wrong-workload→400,
non-image→400, not-found→404, happy-path→200, single-flight→409.
SECURITY FIX (review BLOCK → resolved): the security review caught a CRITICAL — the
manifest's persisted Source/Scope (attacker-influenceable) was being trusted to compute
the destructive swap target, so Source:"../../etc" could clobber /etc. Fixed by
re-deriving the swap target from the workload's CURRENT config keyed by container Target path
(volumesByTarget shared helper) + a pathWithinBase containment assertion for base-relative
scopes. Regression guards: TestPreflightResolve_IgnoresManifestSource,
TestPreflightResolve_AllOrNothing. Both reviews now clear.
API for Phase 4 (frontend): POST /api/workloads/{id}/snapshots/{sid}/restore with header
X-Confirm-Restore: <sid> (REQUIRED — mirror how the DB restore sends it). 200 {status:"restored"},
400 (header/ownership/non-image), 404 (not found), 409 (already in progress), 500 (engine).
Restore is synchronous (blocks until done, like deploy).