Files
tiny-forge/plans/volume-snapshot-restore/phase-1-engine-primitives.md
T
alexei.dolgolyov 1c47030854 feat(volsnap): volume snapshot restore (backlog #6)
Restore a captured volume snapshot onto an image workload's live host-bind
data volumes, then redeploy — the most destructive workload action, built to
the adversarially-reviewed design (C1–C6) with all data-loss guards.

- Engine.Restore (engine-owned): all-or-nothing pre-flight re-resolution from
  the workload's CURRENT config (never the tamperable manifest), per-filesystem
  disk pre-check, per-workload lock, container quiesce, extract-to-tmp, durable
  pre-restore snapshot, write-ahead journal, atomic rename swap, redeploy, and
  crash-recovery sweep (RecoverInterruptedRestores) wired before serving.
- internal/keyedmutex: shared per-key lock; deployer now serializes every
  deploy entrypoint per workload via DispatchPlugin (+ LockWorkload/RedeployLocked
  for the restore re-dispatch, no deadlock).
- Untrusted-archive extractor: zip-slip containment, type allow-list (reg/dir
  only), decompression-bomb cap, manifest-index bounds.
- POST /api/workloads/{id}/snapshots/{sid}/restore: admin, X-Confirm-Restore
  header (CSRF), per-workload single-flight (409).
- WebUI: Restore button + danger ConfirmDialog + busy state + i18n (en/ru).

Scope: image-source only; scopes absolute/stage/project (driven off the same
supportedScopes constant capture uses).

Plan-reviewed before coding; per-phase go/security/ts reviews; final review
READY TO MERGE. Security review caught + fixed a CRITICAL manifest-Source path
traversal (re-derive target from current config + base containment).

Plan: plans/volume-snapshot-restore/
2026-06-22 17:23:52 +03:00

7.0 KiB

Phase 1: Restore engine primitives + path-safe extractor + unit tests

Status: Complete Parent plan: PLAN.md Domain: backend

Objective

Build the dangerous filesystem primitives in isolation, fully unit-tested, with NO docker/lifecycle wiring. Each is a pure function over directories + the store + a parsed manifest. No caller yet (exercised by tests so not "unused"). Zero behavior change to existing capture.

Tasks

  • internal/volsnap/extract.gosafeExtractIndex(archivePath string, index int, dest string, bombCap int64) (int64, error): open the gzip tar, extract only entries under the "<index>/" prefix into dest, return bytes written. UNTRUSTED-input guards (C6):
    • zip-slip: target := filepath.Join(dest, rel); require strings.HasPrefix(filepath.Clean(target)+sep, cleanDest+sep) (or target == cleanDest); reject otherwise.
    • allow ONLY tar.TypeReg + tar.TypeDir; reject symlink/hardlink/char/block/fifo/socket with an error (never follow).
    • decompression-bomb cap: running byte counter; abort when it would exceed bombCap.
    • create parent dirs as needed; files 0o600, dirs 0o700 (data dirs; ownership is the container's concern).
    • skip manifest.json and any entry whose leading path segment ≠ index.
  • internal/volsnap/restore.go (primitives only — NO orchestration):
    • archiveUncompressedSize(archivePath string, bombCap int64) (int64, error) — header-only sizing pass summing hdr.Size, enforcing bombCap (feeds C5). Per-index sizes too (map[int]int64) so C5 can check per filesystem.
    • parseManifest(snap store.VolumeSnapshot) ([]SnapshotVolume, error).
    • preflightResolve(w store.Workload, settings store.Settings, manifest []SnapshotVolume) ([]resolvedVol, error) — ALL-OR-NOTHING (C3): for every manifest volume require supportedScopes[scope] AND volume.ResolveWorkloadPath succeeds; on first failure return an error naming target+scope+reason (abort signal). resolvedVol{Index, Target, Scope, LivePath}. Reuses the SAME supportedScopes map.
    • swap helpers (C2 + R2 + R3): staging is sibling to the live dir's parent so renames are same-filesystem. stagingRoot(live string) string = filepath.Join(filepath.Dir(live), ".tf-restore-"+token). swapVolumeDir(live, tmp, old string) error = rename(live→old) then rename(tmp→live); detect EXDEV/cross-device and return a clear error WITHOUT having moved anything irreversibly (check device equivalence up-front or treat the rename error as fatal/rollback). rollbackSwaps(done []swap) error = for each completed swap in reverse, rename(live→discard), rename(old→live).
    • freeDiskBytes(path string) (uint64, error) — platform helper. Build-tag split mirroring the repo's lockfile_windows.go/lockfile_unix.go precedent: disk_unix.go (//go:build !windows, syscall.Statfs) + disk_windows.go (golang.org/x/sys/windows.GetDiskFreeSpaceEx). Production target is Linux.
  • Constants: maxRestoreUncompressedBytes (decompression-bomb cap) + diskFreeSafetyMargin named consts with rationale comments.

Files to Modify/Create

  • internal/volsnap/extract.go — untrusted extractor (new)
  • internal/volsnap/restore.go — primitives: sizing, manifest parse, preflight, swap/rollback, free-disk (new)
  • internal/volsnap/disk_unix.go, internal/volsnap/disk_windows.go — free-disk platform split (new)
  • internal/volsnap/extract_test.go, internal/volsnap/restore_test.go — unit tests (new)
  • go.modgolang.org/x/sys promoted indirect→direct (already present v0.33.0)

Acceptance Criteria

  • Zip-slip (../, absolute, ..\\ on win), symlink, hardlink, device, fifo entries all rejected by safeExtractIndex.
  • Decompression-bomb cap aborts extraction + sizing past the cap.
  • Happy-path extract round-trip restores file tree + contents byte-for-byte under dest.
  • swapVolumeDir + rollbackSwaps: full and PARTIAL-swap rollback leave the original live dirs byte-identical.
  • preflightResolve is all-or-nothing: one unresolvable/unsupported-scope volume → error, and the caller renames nothing.
  • archiveUncompressedSize matches the real extracted total.
  • go test ./internal/volsnap/..., go build ./..., go vet ./internal/... all green.

Notes

  • Open the archive once per pass; on Unix an open fd survives a concurrent Delete unlink (defence against a racing snapshot delete); Windows refuses delete of an open file. Acceptable.
  • safeExtractIndex writes into a caller-provided dest (the staging tmp), never directly onto the live path — the swap is a separate step (C2).

Review Checklist

  • All tasks completed
  • Code follows project conventions (gofmt, wrapped errors, small funcs)
  • No unintended side effects (no change to Create/List/Delete)
  • Build passes
  • Tests pass (new + existing)

Handoff to Next Phase

Implemented files: extract.go (safeExtractIndex, stripIndexPrefix, leadingIndex, withinDir), restore.go (parseManifest, preflightResolve, archiveUncompressedSize, swap/swapVolumeDir/rollbackSwaps/stagingDirs, consts maxRestoreUncompressedBytes = 50 GiB, diskFreeHeadroomBytes = 256 MiB), disk_unix.go/disk_windows.go (freeDiskBytes). Tests in extract_test.go + restore_test.go. go.mod: x/sys → direct.

API contract for Phase 2 (Engine.Restore):

  • safeExtractIndex(archivePath, index, dest, bombCap) — extracts ONE volume's subtree into a FRESH dest (uses O_EXCL); returns bytes written. Call once per resolved volume into its tmp staging dir.
  • preflightResolve(w, settings, manifest)[]resolvedVol{Index,Target,Scope,LivePath}, ALL-OR-NOTHING; already rejects unsupported scopes AND negative indices. Run BEFORE Lock/StopContainers.
  • stagingDirs(live, token, index)(tmp, old) siblings of filepath.Dir(live) (same-fs ⇒ atomic rename). Use a per-restore token.
  • swapVolumeDir(live, tmp, old)(hadOld, err); self-reverts the first rename on failure (live never left missing). Collect each completed swap into []swap{live,old,tmp,hadOld} and call rollbackSwaps(done) on any later failure.
  • archiveUncompressedSize(archivePath, bombCap)(perIndex map[int]int64, total, err) for the C5 per-filesystem free-disk check. NOTE: it's a LOWER-BOUND (ignores dir/inode overhead) — treat as advisory; the staged-extract+swap is the real net.
  • freeDiskBytes(path) — pass the live dir's PARENT (where tmp/old land).

Phase 2 must: extract ALL tmp dirs first, THEN swap all (shrinks the destructive window); validate each manifest index maps to an existing archive subtree (W2 — only the negative check is done so far); the disk pre-check should sum per-target-filesystem.

Review (go-reviewer, APPROVE WITH NOTES): no blockers. Addressed in-phase: W2 (negative index reject), W3 (explicit second-rename self-revert test), W4 (stagingDirs test), N1/N2/N4 (comments + sparse-type rejection test). W1 (disk estimate is lower-bound) folded into Phase 2 guidance above.