tiny-forge/plans/volume-snapshot-restore/PLAN.md at 1c47030854c5a42e86bd79bcdddb263089d67c65

alexei.dolgolyov 1c47030854 feat(volsnap): volume snapshot restore (backlog #6 )

Restore a captured volume snapshot onto an image workload's live host-bind
data volumes, then redeploy — the most destructive workload action, built to
the adversarially-reviewed design (C1–C6) with all data-loss guards.

- Engine.Restore (engine-owned): all-or-nothing pre-flight re-resolution from
  the workload's CURRENT config (never the tamperable manifest), per-filesystem
  disk pre-check, per-workload lock, container quiesce, extract-to-tmp, durable
  pre-restore snapshot, write-ahead journal, atomic rename swap, redeploy, and
  crash-recovery sweep (RecoverInterruptedRestores) wired before serving.
- internal/keyedmutex: shared per-key lock; deployer now serializes every
  deploy entrypoint per workload via DispatchPlugin (+ LockWorkload/RedeployLocked
  for the restore re-dispatch, no deadlock).
- Untrusted-archive extractor: zip-slip containment, type allow-list (reg/dir
  only), decompression-bomb cap, manifest-index bounds.
- POST /api/workloads/{id}/snapshots/{sid}/restore: admin, X-Confirm-Restore
  header (CSRF), per-workload single-flight (409).
- WebUI: Restore button + danger ConfirmDialog + busy state + i18n (en/ru).

Scope: image-source only; scopes absolute/stage/project (driven off the same
supportedScopes constant capture uses).

Plan-reviewed before coding; per-phase go/security/ts reviews; final review
READY TO MERGE. Security review caught + fixed a CRITICAL manifest-Source path
traversal (re-derive target from current config + base containment).

Plan: plans/volume-snapshot-restore/

Phase	Domain	Status	Review	Build	Committed
Phase 1: engine primitives	backend	✅ Done	✅ Passed (APPROVE w/ notes)	✅ Passed	⬜
Phase 2: lifecycle/locking	backend	✅ Done	✅ Passed (APPROVE w/ notes)	✅ Passed	⬜
Phase 3: API endpoint	backend	✅ Done	✅ Passed (go: APPROVE w/ notes; security: fixed CRITICAL)	✅ Passed	⬜
Phase 4: frontend	frontend	✅ Done	✅ Passed (ts: APPROVE)	✅ Passed (check 0 err, build, 26 tests)	⬜

Phase	Warning	Severity	Status (open / resolved / accepted)
(design)	Mid-restore crash can leave a per-volume MIXED state (some restored, some original); each volume is individually intact and the pre-restore snapshot is the full escape hatch.	🟡	accepted (documented v1 limit)
2→3	B1 (was Blocker): `RecoverInterruptedRestores()` + `SetLifecycle()` MUST be wired at startup BEFORE the API server serves — restore endpoint must not be reachable without them.	🔴→tracked	open — HARD Phase 3 prerequisite
2	W3 residual: the swap-failure-after-partial-swap ORCHESTRATION branch (rollbackSwaps glue) is covered by primitive unit tests + recovery test + extract-failure orchestration test, but not a full mid-swap fault-injection (needs an fs-fault seam not worth the production complexity).	🟡	accepted

7.1 KiB

Raw Blame History

Feature: Volume Snapshot Restore (backlog #6)

Summary

Mandatory design fixes (non-negotiable — a wrong design = permanent data loss)

Folded-in (also mandatory)

Resolutions from the phase-breakdown plan review (2026-06-22)

Build & Test Commands

Phases

Parallelizable Phase Groups (Orchestrator mode only)

Phase Progress Log

Outstanding Warnings

Final Review

Amendment Log

7.1 KiB Raw Blame History Unescape Escape

Feature: Volume Snapshot Restore (backlog #6)

Summary

Mandatory design fixes (non-negotiable — a wrong design = permanent data loss)

Folded-in (also mandatory)

Resolutions from the phase-breakdown plan review (2026-06-22)

Build & Test Commands

Phases

Parallelizable Phase Groups (Orchestrator mode only)

Phase Progress Log

Outstanding Warnings

Final Review

Amendment Log

7.1 KiB

Raw Blame History