# Phase 3: API endpoint + CSRF header + single-flight + wiring + tests **Status:** ✅ Complete **Parent plan:** [PLAN.md](./PLAN.md) **Domain:** backend ## Objective Expose restore over HTTP behind the destructive-action guards, and wire the real `Lifecycle` adapter + startup recovery sweep at the composition root. ## Tasks - [ ] **`restoreWorkloadSnapshot` handler** in `internal/api/volume_snapshots.go`: `POST /api/workloads/{id}/snapshots/{sid}/restore`. - require `X-Confirm-Restore: ` header == path `sid` (C6, mirror `backups.go` `restoreBackup`); mismatch → 400. - per-workload single-flight via `keyedmutex.TryLock(id)` (or a `sync.Map` LoadOrStore) → 409 if a restore for this workload is already running; release on completion. (Different apps restore concurrently.) - load snapshot via `snapshotEngine.Get(sid)`; verify `snap.WorkloadID == id`; load workload; require `source_kind=="image"` (else 400). - call `s.snapshotEngine.Restore(r.Context(), id, sid)`. Synchronous (mirrors `deployPluginWorkload` which blocks on dispatch). `ErrNoSnapshotData`-class / client-actionable → 400; success → 200 `{"status":"restored",...}`; other → 500 with a generic message (no internal detail leak — mirror existing handlers; raw error to `slog`). - [ ] **Route registration** in `internal/api/router.go` under the admin group on `/workloads/{id}`, beside the existing `POST /snapshots`. Admin-gated. - [ ] **Lifecycle adapter + recovery wiring** in `cmd/server/main.go` (composition root — already imports deployer + docker + store + volsnap): - a small struct implementing `volsnap.Lifecycle`: `Lock`→`deployer.LockWorkload`; `StopContainers`→`store.ListContainersByWorkload` + `docker.StopContainer` each running + `store.UpdateContainerState(...,"stopped")` + return newest-running `ImageTag`; `Redeploy`→`deployer.RedeployLocked` with `toPluginWorkload`/`WorkloadFromStore` + a `restore`-reason intent. - `snapshotEngine.SetLifecycle(adapter)`. - call `snapshotEngine.RecoverInterruptedRestores()` at startup beside the existing `CleanOrphans()` call; log result. - keeps the `api` package decoupled from `deployer` (router.go deliberately avoids importing deployer). ## Files to Modify/Create - `internal/api/volume_snapshots.go` — `restoreWorkloadSnapshot` handler + single-flight field on `Server` - `internal/api/router.go` — route registration - `cmd/server/main.go` — Lifecycle adapter, `SetLifecycle`, `RecoverInterruptedRestores` startup wiring - `internal/api/volume_snapshots_test.go` — handler tests (extends existing file) ## Acceptance Criteria - Missing/mismatched `X-Confirm-Restore` → 400. - Concurrent second restore for the same workload → 409. - Non-image workload → 400; snapshot belonging to another workload → 400/404. - Happy path → 200 (with a fake engine/lifecycle or a seeded real engine). - `go build ./...`, `go vet ./internal/...`, `go test ./internal/...` green; existing snapshot tests pass. ## Notes - The handler is thin: validation + single-flight + delegate to `Engine.Restore`. All the dangerous logic stays in the engine (folded requirement). - Confirm the admin group + `X-Confirm-Restore` together match the DB-restore threat model (custom header defeats CSRF form/img posts; admin JWT + AdminOnly gates authz). ## Review Checklist - [ ] All tasks completed - [ ] Code follows project conventions - [ ] No unintended side effects - [ ] Build passes - [ ] Tests pass (new + existing) ## Handoff to Next Phase Implemented: `restoreWorkloadSnapshot` handler (`internal/api/volume_snapshots.go`) — admin, `X-Confirm-Restore: ` header, per-workload single-flight (`volRestoreInFlight.TryLock` → 409); route `POST /api/workloads/{id}/snapshots/{sid}/restore`; `restoreLifecycle` adapter (`cmd/server/restore_lifecycle.go`); main.go wires `SetLifecycle` + `RecoverInterruptedRestores` BEFORE serving (B1/N1 resolved). Tests: header miss/mismatch→400, wrong-workload→400, non-image→400, not-found→404, happy-path→200, single-flight→409. **SECURITY FIX (review BLOCK → resolved):** the security review caught a CRITICAL — the manifest's persisted `Source`/`Scope` (attacker-influenceable) was being trusted to compute the destructive swap target, so `Source:"../../etc"` could clobber `/etc`. Fixed by re-deriving the swap target from the workload's CURRENT config keyed by container Target path (`volumesByTarget` shared helper) + a `pathWithinBase` containment assertion for base-relative scopes. Regression guards: `TestPreflightResolve_IgnoresManifestSource`, `TestPreflightResolve_AllOrNothing`. Both reviews now clear. **API for Phase 4 (frontend):** `POST /api/workloads/{id}/snapshots/{sid}/restore` with header `X-Confirm-Restore: ` (REQUIRED — mirror how the DB restore sends it). 200 `{status:"restored"}`, 400 (header/ownership/non-image), 404 (not found), 409 (already in progress), 500 (engine). Restore is synchronous (blocks until done, like deploy).