1c47030854
Restore a captured volume snapshot onto an image workload's live host-bind
data volumes, then redeploy — the most destructive workload action, built to
the adversarially-reviewed design (C1–C6) with all data-loss guards.
- Engine.Restore (engine-owned): all-or-nothing pre-flight re-resolution from
the workload's CURRENT config (never the tamperable manifest), per-filesystem
disk pre-check, per-workload lock, container quiesce, extract-to-tmp, durable
pre-restore snapshot, write-ahead journal, atomic rename swap, redeploy, and
crash-recovery sweep (RecoverInterruptedRestores) wired before serving.
- internal/keyedmutex: shared per-key lock; deployer now serializes every
deploy entrypoint per workload via DispatchPlugin (+ LockWorkload/RedeployLocked
for the restore re-dispatch, no deadlock).
- Untrusted-archive extractor: zip-slip containment, type allow-list (reg/dir
only), decompression-bomb cap, manifest-index bounds.
- POST /api/workloads/{id}/snapshots/{sid}/restore: admin, X-Confirm-Restore
header (CSRF), per-workload single-flight (409).
- WebUI: Restore button + danger ConfirmDialog + busy state + i18n (en/ru).
Scope: image-source only; scopes absolute/stage/project (driven off the same
supportedScopes constant capture uses).
Plan-reviewed before coding; per-phase go/security/ts reviews; final review
READY TO MERGE. Security review caught + fixed a CRITICAL manifest-Source path
traversal (re-derive target from current config + base containment).
Plan: plans/volume-snapshot-restore/
97 lines
7.0 KiB
Markdown
97 lines
7.0 KiB
Markdown
# Phase 1: Restore engine primitives + path-safe extractor + unit tests
|
|
|
|
**Status:** ✅ Complete
|
|
**Parent plan:** [PLAN.md](./PLAN.md)
|
|
**Domain:** backend
|
|
|
|
## Objective
|
|
|
|
Build the dangerous filesystem primitives in isolation, fully unit-tested, with NO
|
|
docker/lifecycle wiring. Each is a pure function over directories + the store + a parsed
|
|
manifest. No caller yet (exercised by tests so not "unused"). Zero behavior change to
|
|
existing capture.
|
|
|
|
## Tasks
|
|
|
|
- [ ] **`internal/volsnap/extract.go`** — `safeExtractIndex(archivePath string, index int, dest string, bombCap int64) (int64, error)`:
|
|
open the gzip tar, extract only entries under the `"<index>/"` prefix into `dest`, return
|
|
bytes written. UNTRUSTED-input guards (C6):
|
|
- zip-slip: `target := filepath.Join(dest, rel)`; require `strings.HasPrefix(filepath.Clean(target)+sep, cleanDest+sep)` (or `target == cleanDest`); reject otherwise.
|
|
- allow ONLY `tar.TypeReg` + `tar.TypeDir`; reject symlink/hardlink/char/block/fifo/socket with an error (never follow).
|
|
- decompression-bomb cap: running byte counter; abort when it would exceed `bombCap`.
|
|
- create parent dirs as needed; files `0o600`, dirs `0o700` (data dirs; ownership is the container's concern).
|
|
- skip `manifest.json` and any entry whose leading path segment ≠ `index`.
|
|
- [ ] **`internal/volsnap/restore.go`** (primitives only — NO orchestration):
|
|
- `archiveUncompressedSize(archivePath string, bombCap int64) (int64, error)` — header-only sizing pass summing `hdr.Size`, enforcing `bombCap` (feeds C5). Per-index sizes too (`map[int]int64`) so C5 can check per filesystem.
|
|
- `parseManifest(snap store.VolumeSnapshot) ([]SnapshotVolume, error)`.
|
|
- `preflightResolve(w store.Workload, settings store.Settings, manifest []SnapshotVolume) ([]resolvedVol, error)` — ALL-OR-NOTHING (C3): for every manifest volume require `supportedScopes[scope]` AND `volume.ResolveWorkloadPath` succeeds; on first failure return an error naming target+scope+reason (abort signal). `resolvedVol{Index, Target, Scope, LivePath}`. Reuses the SAME `supportedScopes` map.
|
|
- swap helpers (C2 + R2 + R3): staging is **sibling to the live dir's parent** so renames are same-filesystem. `stagingRoot(live string) string` = `filepath.Join(filepath.Dir(live), ".tf-restore-"+token)`. `swapVolumeDir(live, tmp, old string) error` = rename(live→old) then rename(tmp→live); detect `EXDEV`/cross-device and return a clear error WITHOUT having moved anything irreversibly (check device equivalence up-front or treat the rename error as fatal/rollback). `rollbackSwaps(done []swap) error` = for each completed swap in reverse, rename(live→discard), rename(old→live).
|
|
- `freeDiskBytes(path string) (uint64, error)` — platform helper. Build-tag split mirroring the repo's `lockfile_windows.go`/`lockfile_unix.go` precedent: `disk_unix.go` (`//go:build !windows`, `syscall.Statfs`) + `disk_windows.go` (`golang.org/x/sys/windows.GetDiskFreeSpaceEx`). Production target is Linux.
|
|
- [ ] **Constants:** `maxRestoreUncompressedBytes` (decompression-bomb cap) + `diskFreeSafetyMargin` named consts with rationale comments.
|
|
|
|
## Files to Modify/Create
|
|
|
|
- `internal/volsnap/extract.go` — untrusted extractor (new)
|
|
- `internal/volsnap/restore.go` — primitives: sizing, manifest parse, preflight, swap/rollback, free-disk (new)
|
|
- `internal/volsnap/disk_unix.go`, `internal/volsnap/disk_windows.go` — free-disk platform split (new)
|
|
- `internal/volsnap/extract_test.go`, `internal/volsnap/restore_test.go` — unit tests (new)
|
|
- `go.mod` — `golang.org/x/sys` promoted indirect→direct (already present v0.33.0)
|
|
|
|
## Acceptance Criteria
|
|
|
|
- Zip-slip (`../`, absolute, `..\\` on win), symlink, hardlink, device, fifo entries all rejected by `safeExtractIndex`.
|
|
- Decompression-bomb cap aborts extraction + sizing past the cap.
|
|
- Happy-path extract round-trip restores file tree + contents byte-for-byte under `dest`.
|
|
- `swapVolumeDir` + `rollbackSwaps`: full and PARTIAL-swap rollback leave the original live dirs byte-identical.
|
|
- `preflightResolve` is all-or-nothing: one unresolvable/unsupported-scope volume → error, and the caller renames nothing.
|
|
- `archiveUncompressedSize` matches the real extracted total.
|
|
- `go test ./internal/volsnap/...`, `go build ./...`, `go vet ./internal/...` all green.
|
|
|
|
## Notes
|
|
|
|
- Open the archive once per pass; on Unix an open fd survives a concurrent `Delete` unlink (defence against a racing snapshot delete); Windows refuses delete of an open file. Acceptable.
|
|
- `safeExtractIndex` writes into a caller-provided `dest` (the staging `tmp`), never directly onto the live path — the swap is a separate step (C2).
|
|
|
|
## Review Checklist
|
|
|
|
- [ ] All tasks completed
|
|
- [ ] Code follows project conventions (gofmt, wrapped errors, small funcs)
|
|
- [ ] No unintended side effects (no change to Create/List/Delete)
|
|
- [ ] Build passes
|
|
- [ ] Tests pass (new + existing)
|
|
|
|
## Handoff to Next Phase
|
|
|
|
Implemented files: `extract.go` (`safeExtractIndex`, `stripIndexPrefix`, `leadingIndex`,
|
|
`withinDir`), `restore.go` (`parseManifest`, `preflightResolve`, `archiveUncompressedSize`,
|
|
`swap`/`swapVolumeDir`/`rollbackSwaps`/`stagingDirs`, consts `maxRestoreUncompressedBytes`
|
|
= 50 GiB, `diskFreeHeadroomBytes` = 256 MiB), `disk_unix.go`/`disk_windows.go`
|
|
(`freeDiskBytes`). Tests in `extract_test.go` + `restore_test.go`. `go.mod`: `x/sys` →
|
|
direct.
|
|
|
|
**API contract for Phase 2 (Engine.Restore):**
|
|
- `safeExtractIndex(archivePath, index, dest, bombCap)` — extracts ONE volume's subtree into
|
|
a FRESH `dest` (uses `O_EXCL`); returns bytes written. Call once per resolved volume into
|
|
its `tmp` staging dir.
|
|
- `preflightResolve(w, settings, manifest)` → `[]resolvedVol{Index,Target,Scope,LivePath}`,
|
|
ALL-OR-NOTHING; already rejects unsupported scopes AND negative indices. Run BEFORE
|
|
Lock/StopContainers.
|
|
- `stagingDirs(live, token, index)` → `(tmp, old)` siblings of `filepath.Dir(live)` (same-fs
|
|
⇒ atomic rename). Use a per-restore `token`.
|
|
- `swapVolumeDir(live, tmp, old)` → `(hadOld, err)`; self-reverts the first rename on failure
|
|
(live never left missing). Collect each completed swap into `[]swap{live,old,tmp,hadOld}`
|
|
and call `rollbackSwaps(done)` on any later failure.
|
|
- `archiveUncompressedSize(archivePath, bombCap)` → `(perIndex map[int]int64, total, err)`
|
|
for the C5 per-filesystem free-disk check. NOTE: it's a LOWER-BOUND (ignores dir/inode
|
|
overhead) — treat as advisory; the staged-extract+swap is the real net.
|
|
- `freeDiskBytes(path)` — pass the live dir's PARENT (where tmp/old land).
|
|
|
|
**Phase 2 must:** extract ALL tmp dirs first, THEN swap all (shrinks the destructive
|
|
window); validate each manifest index maps to an existing archive subtree (W2 — only the
|
|
negative check is done so far); the disk pre-check should sum per-target-filesystem.
|
|
|
|
**Review (go-reviewer, APPROVE WITH NOTES):** no blockers. Addressed in-phase: W2 (negative
|
|
index reject), W3 (explicit second-rename self-revert test), W4 (stagingDirs test), N1/N2/N4
|
|
(comments + sparse-type rejection test). W1 (disk estimate is lower-bound) folded into Phase
|
|
2 guidance above.
|