1c47030854
Restore a captured volume snapshot onto an image workload's live host-bind
data volumes, then redeploy — the most destructive workload action, built to
the adversarially-reviewed design (C1–C6) with all data-loss guards.
- Engine.Restore (engine-owned): all-or-nothing pre-flight re-resolution from
the workload's CURRENT config (never the tamperable manifest), per-filesystem
disk pre-check, per-workload lock, container quiesce, extract-to-tmp, durable
pre-restore snapshot, write-ahead journal, atomic rename swap, redeploy, and
crash-recovery sweep (RecoverInterruptedRestores) wired before serving.
- internal/keyedmutex: shared per-key lock; deployer now serializes every
deploy entrypoint per workload via DispatchPlugin (+ LockWorkload/RedeployLocked
for the restore re-dispatch, no deadlock).
- Untrusted-archive extractor: zip-slip containment, type allow-list (reg/dir
only), decompression-bomb cap, manifest-index bounds.
- POST /api/workloads/{id}/snapshots/{sid}/restore: admin, X-Confirm-Restore
header (CSRF), per-workload single-flight (409).
- WebUI: Restore button + danger ConfirmDialog + busy state + i18n (en/ru).
Scope: image-source only; scopes absolute/stage/project (driven off the same
supportedScopes constant capture uses).
Plan-reviewed before coding; per-phase go/security/ts reviews; final review
READY TO MERGE. Security review caught + fixed a CRITICAL manifest-Source path
traversal (re-derive target from current config + base containment).
Plan: plans/volume-snapshot-restore/
61 lines
3.2 KiB
Markdown
61 lines
3.2 KiB
Markdown
# CONTEXT — Volume Snapshot Restore
|
|
|
|
Working memory across phases. The orchestrator owns this file.
|
|
|
|
## Settings (from PLAN.md header)
|
|
|
|
- Mode: **Automated** · Execution: **Hybrid** (backend Direct, Phase 4 frontend implementer) · Strategy: **Incremental**
|
|
- Base: `main` · Branch: `feature/volume-snapshot-restore` · Remote: origin (Gitea)
|
|
- Build: `go build ./...` · Test: `go test ./internal/...` + `npm run test` · Lint: `go vet ./internal/...` + `npm run check`
|
|
|
|
## Key codebase facts (verified during planning)
|
|
|
|
- **Deploy choke point:** every deploy entrypoint calls `deployer.DispatchPlugin` →
|
|
put the per-workload lock there (C1). Entrypoints: `deployPluginWorkload`,
|
|
`rollbackWorkload`, `promoteFromWorkload`, `dispatchGeneric`, webhook
|
|
`fireBinding`/`handlePreviewIntent`.
|
|
- **`activeWg`/`drainMu`** in `deployer.go` = global drain barrier, NOT a per-workload lock.
|
|
- **Image idempotency short-circuit** (`image.go` Deploy ~L170-181) only fires for a
|
|
*verified-running* container → after stop, redeploy makes a fresh container; blue-green
|
|
`enforceMaxInstances` reaps the old stopped one. ⇒ stop→swap→redeploy (C4) is correct.
|
|
- **Scope resolution** (`internal/volume/resolver.go`): stage/project → `<base>/<workload>/<source>`
|
|
(shared per-workload dir); absolute → operator's allowed path. Stage tmp/old siblings under
|
|
the live dir's PARENT so renames are same-fs (R2).
|
|
- **`volsnap.Engine`** has `e.mu` taken by Create/Delete/pruneWorkload/CleanOrphans.
|
|
`Restore` must NOT hold `e.mu` (R1).
|
|
- **Archive layout:** gzip tar, each volume under integer subdir `0/`,`1/`…, `manifest.json`
|
|
at root = `[]SnapshotVolume{Index,Target,Scope,Source}`. `supportedScopes` =
|
|
absolute/stage/project (volumes.go).
|
|
- **Precedent:** `internal/api/backups.go` `restoreBackup` — X-Confirm-Restore==id,
|
|
`restoreInFlight` CAS→409, pre-restore safety backup, atomic rename swap.
|
|
- **Composition root:** `cmd/server/main.go` constructs `deployer.New` + `volsnap.New` +
|
|
`docker` + `store`; calls `CleanOrphans` at startup (wire `RecoverInterruptedRestores` there).
|
|
- **Frontend:** `WorkloadSnapshotsPanel.svelte`; api fns `web/src/lib/api.ts` ~L581;
|
|
i18n `apps.detail.snapshots.*` in en.json + ru.json.
|
|
- `golang.org/x/sys v0.33.0` already in go.mod (indirect); build-tag precedent exists
|
|
(`lockfile_windows.go`/`lockfile_unix.go`).
|
|
|
|
## Decisions / invariants
|
|
|
|
- `Engine.Restore` holds NO `e.mu`; per-workload `Lifecycle.Lock` is the serialization.
|
|
- Extract ALL tmp dirs BEFORE any rename; swap is pure renames; journal tracks per-volume `swapped`.
|
|
- Pre-restore snapshot captured AFTER stop, BEFORE first rename (durable escape hatch).
|
|
- Redeploy pins the newest-running container's tag (same version back up).
|
|
- Mixed per-volume state after a mid-restore crash is an accepted v1 limit (each volume intact; pre-restore snapshot = full revert).
|
|
|
|
## Deferred / out of scope
|
|
|
|
- Named/project_named/instance/ephemeral scopes (consistent with capture).
|
|
- Non-image sources.
|
|
- Fully-atomic all-volumes-or-nothing restore (v1 is per-volume atomic + journal recovery).
|
|
|
|
## Failed approaches / gotchas
|
|
|
|
- (none yet)
|
|
|
|
## Phase handoffs
|
|
|
|
- Phase 1 → 2: _(filled after Phase 1)_
|
|
- Phase 2 → 3: _(filled after Phase 2)_
|
|
- Phase 3 → 4: _(filled after Phase 3)_
|