Files
tiny-forge/plans/volume-snapshot-restore/phase-1-engine-primitives.md
T
alexei.dolgolyov 1c47030854 feat(volsnap): volume snapshot restore (backlog #6)
Restore a captured volume snapshot onto an image workload's live host-bind
data volumes, then redeploy — the most destructive workload action, built to
the adversarially-reviewed design (C1–C6) with all data-loss guards.

- Engine.Restore (engine-owned): all-or-nothing pre-flight re-resolution from
  the workload's CURRENT config (never the tamperable manifest), per-filesystem
  disk pre-check, per-workload lock, container quiesce, extract-to-tmp, durable
  pre-restore snapshot, write-ahead journal, atomic rename swap, redeploy, and
  crash-recovery sweep (RecoverInterruptedRestores) wired before serving.
- internal/keyedmutex: shared per-key lock; deployer now serializes every
  deploy entrypoint per workload via DispatchPlugin (+ LockWorkload/RedeployLocked
  for the restore re-dispatch, no deadlock).
- Untrusted-archive extractor: zip-slip containment, type allow-list (reg/dir
  only), decompression-bomb cap, manifest-index bounds.
- POST /api/workloads/{id}/snapshots/{sid}/restore: admin, X-Confirm-Restore
  header (CSRF), per-workload single-flight (409).
- WebUI: Restore button + danger ConfirmDialog + busy state + i18n (en/ru).

Scope: image-source only; scopes absolute/stage/project (driven off the same
supportedScopes constant capture uses).

Plan-reviewed before coding; per-phase go/security/ts reviews; final review
READY TO MERGE. Security review caught + fixed a CRITICAL manifest-Source path
traversal (re-derive target from current config + base containment).

Plan: plans/volume-snapshot-restore/
2026-06-22 17:23:52 +03:00

97 lines
7.0 KiB
Markdown

# Phase 1: Restore engine primitives + path-safe extractor + unit tests
**Status:** ✅ Complete
**Parent plan:** [PLAN.md](./PLAN.md)
**Domain:** backend
## Objective
Build the dangerous filesystem primitives in isolation, fully unit-tested, with NO
docker/lifecycle wiring. Each is a pure function over directories + the store + a parsed
manifest. No caller yet (exercised by tests so not "unused"). Zero behavior change to
existing capture.
## Tasks
- [ ] **`internal/volsnap/extract.go`** — `safeExtractIndex(archivePath string, index int, dest string, bombCap int64) (int64, error)`:
open the gzip tar, extract only entries under the `"<index>/"` prefix into `dest`, return
bytes written. UNTRUSTED-input guards (C6):
- zip-slip: `target := filepath.Join(dest, rel)`; require `strings.HasPrefix(filepath.Clean(target)+sep, cleanDest+sep)` (or `target == cleanDest`); reject otherwise.
- allow ONLY `tar.TypeReg` + `tar.TypeDir`; reject symlink/hardlink/char/block/fifo/socket with an error (never follow).
- decompression-bomb cap: running byte counter; abort when it would exceed `bombCap`.
- create parent dirs as needed; files `0o600`, dirs `0o700` (data dirs; ownership is the container's concern).
- skip `manifest.json` and any entry whose leading path segment ≠ `index`.
- [ ] **`internal/volsnap/restore.go`** (primitives only — NO orchestration):
- `archiveUncompressedSize(archivePath string, bombCap int64) (int64, error)` — header-only sizing pass summing `hdr.Size`, enforcing `bombCap` (feeds C5). Per-index sizes too (`map[int]int64`) so C5 can check per filesystem.
- `parseManifest(snap store.VolumeSnapshot) ([]SnapshotVolume, error)`.
- `preflightResolve(w store.Workload, settings store.Settings, manifest []SnapshotVolume) ([]resolvedVol, error)` — ALL-OR-NOTHING (C3): for every manifest volume require `supportedScopes[scope]` AND `volume.ResolveWorkloadPath` succeeds; on first failure return an error naming target+scope+reason (abort signal). `resolvedVol{Index, Target, Scope, LivePath}`. Reuses the SAME `supportedScopes` map.
- swap helpers (C2 + R2 + R3): staging is **sibling to the live dir's parent** so renames are same-filesystem. `stagingRoot(live string) string` = `filepath.Join(filepath.Dir(live), ".tf-restore-"+token)`. `swapVolumeDir(live, tmp, old string) error` = rename(live→old) then rename(tmp→live); detect `EXDEV`/cross-device and return a clear error WITHOUT having moved anything irreversibly (check device equivalence up-front or treat the rename error as fatal/rollback). `rollbackSwaps(done []swap) error` = for each completed swap in reverse, rename(live→discard), rename(old→live).
- `freeDiskBytes(path string) (uint64, error)` — platform helper. Build-tag split mirroring the repo's `lockfile_windows.go`/`lockfile_unix.go` precedent: `disk_unix.go` (`//go:build !windows`, `syscall.Statfs`) + `disk_windows.go` (`golang.org/x/sys/windows.GetDiskFreeSpaceEx`). Production target is Linux.
- [ ] **Constants:** `maxRestoreUncompressedBytes` (decompression-bomb cap) + `diskFreeSafetyMargin` named consts with rationale comments.
## Files to Modify/Create
- `internal/volsnap/extract.go` — untrusted extractor (new)
- `internal/volsnap/restore.go` — primitives: sizing, manifest parse, preflight, swap/rollback, free-disk (new)
- `internal/volsnap/disk_unix.go`, `internal/volsnap/disk_windows.go` — free-disk platform split (new)
- `internal/volsnap/extract_test.go`, `internal/volsnap/restore_test.go` — unit tests (new)
- `go.mod``golang.org/x/sys` promoted indirect→direct (already present v0.33.0)
## Acceptance Criteria
- Zip-slip (`../`, absolute, `..\\` on win), symlink, hardlink, device, fifo entries all rejected by `safeExtractIndex`.
- Decompression-bomb cap aborts extraction + sizing past the cap.
- Happy-path extract round-trip restores file tree + contents byte-for-byte under `dest`.
- `swapVolumeDir` + `rollbackSwaps`: full and PARTIAL-swap rollback leave the original live dirs byte-identical.
- `preflightResolve` is all-or-nothing: one unresolvable/unsupported-scope volume → error, and the caller renames nothing.
- `archiveUncompressedSize` matches the real extracted total.
- `go test ./internal/volsnap/...`, `go build ./...`, `go vet ./internal/...` all green.
## Notes
- Open the archive once per pass; on Unix an open fd survives a concurrent `Delete` unlink (defence against a racing snapshot delete); Windows refuses delete of an open file. Acceptable.
- `safeExtractIndex` writes into a caller-provided `dest` (the staging `tmp`), never directly onto the live path — the swap is a separate step (C2).
## Review Checklist
- [ ] All tasks completed
- [ ] Code follows project conventions (gofmt, wrapped errors, small funcs)
- [ ] No unintended side effects (no change to Create/List/Delete)
- [ ] Build passes
- [ ] Tests pass (new + existing)
## Handoff to Next Phase
Implemented files: `extract.go` (`safeExtractIndex`, `stripIndexPrefix`, `leadingIndex`,
`withinDir`), `restore.go` (`parseManifest`, `preflightResolve`, `archiveUncompressedSize`,
`swap`/`swapVolumeDir`/`rollbackSwaps`/`stagingDirs`, consts `maxRestoreUncompressedBytes`
= 50 GiB, `diskFreeHeadroomBytes` = 256 MiB), `disk_unix.go`/`disk_windows.go`
(`freeDiskBytes`). Tests in `extract_test.go` + `restore_test.go`. `go.mod`: `x/sys`
direct.
**API contract for Phase 2 (Engine.Restore):**
- `safeExtractIndex(archivePath, index, dest, bombCap)` — extracts ONE volume's subtree into
a FRESH `dest` (uses `O_EXCL`); returns bytes written. Call once per resolved volume into
its `tmp` staging dir.
- `preflightResolve(w, settings, manifest)``[]resolvedVol{Index,Target,Scope,LivePath}`,
ALL-OR-NOTHING; already rejects unsupported scopes AND negative indices. Run BEFORE
Lock/StopContainers.
- `stagingDirs(live, token, index)``(tmp, old)` siblings of `filepath.Dir(live)` (same-fs
⇒ atomic rename). Use a per-restore `token`.
- `swapVolumeDir(live, tmp, old)``(hadOld, err)`; self-reverts the first rename on failure
(live never left missing). Collect each completed swap into `[]swap{live,old,tmp,hadOld}`
and call `rollbackSwaps(done)` on any later failure.
- `archiveUncompressedSize(archivePath, bombCap)``(perIndex map[int]int64, total, err)`
for the C5 per-filesystem free-disk check. NOTE: it's a LOWER-BOUND (ignores dir/inode
overhead) — treat as advisory; the staged-extract+swap is the real net.
- `freeDiskBytes(path)` — pass the live dir's PARENT (where tmp/old land).
**Phase 2 must:** extract ALL tmp dirs first, THEN swap all (shrinks the destructive
window); validate each manifest index maps to an existing archive subtree (W2 — only the
negative check is done so far); the disk pre-check should sum per-target-filesystem.
**Review (go-reviewer, APPROVE WITH NOTES):** no blockers. Addressed in-phase: W2 (negative
index reject), W3 (explicit second-rename self-revert test), W4 (stagingDirs test), N1/N2/N4
(comments + sparse-type rejection test). W1 (disk estimate is lower-bound) folded into Phase
2 guidance above.