1c47030854
Restore a captured volume snapshot onto an image workload's live host-bind
data volumes, then redeploy — the most destructive workload action, built to
the adversarially-reviewed design (C1–C6) with all data-loss guards.
- Engine.Restore (engine-owned): all-or-nothing pre-flight re-resolution from
the workload's CURRENT config (never the tamperable manifest), per-filesystem
disk pre-check, per-workload lock, container quiesce, extract-to-tmp, durable
pre-restore snapshot, write-ahead journal, atomic rename swap, redeploy, and
crash-recovery sweep (RecoverInterruptedRestores) wired before serving.
- internal/keyedmutex: shared per-key lock; deployer now serializes every
deploy entrypoint per workload via DispatchPlugin (+ LockWorkload/RedeployLocked
for the restore re-dispatch, no deadlock).
- Untrusted-archive extractor: zip-slip containment, type allow-list (reg/dir
only), decompression-bomb cap, manifest-index bounds.
- POST /api/workloads/{id}/snapshots/{sid}/restore: admin, X-Confirm-Restore
header (CSRF), per-workload single-flight (409).
- WebUI: Restore button + danger ConfirmDialog + busy state + i18n (en/ru).
Scope: image-source only; scopes absolute/stage/project (driven off the same
supportedScopes constant capture uses).
Plan-reviewed before coding; per-phase go/security/ts reviews; final review
READY TO MERGE. Security review caught + fixed a CRITICAL manifest-Source path
traversal (re-derive target from current config + base containment).
Plan: plans/volume-snapshot-restore/
3.2 KiB
3.2 KiB
CONTEXT — Volume Snapshot Restore
Working memory across phases. The orchestrator owns this file.
Settings (from PLAN.md header)
- Mode: Automated · Execution: Hybrid (backend Direct, Phase 4 frontend implementer) · Strategy: Incremental
- Base:
main· Branch:feature/volume-snapshot-restore· Remote: origin (Gitea) - Build:
go build ./...· Test:go test ./internal/...+npm run test· Lint:go vet ./internal/...+npm run check
Key codebase facts (verified during planning)
- Deploy choke point: every deploy entrypoint calls
deployer.DispatchPlugin→ put the per-workload lock there (C1). Entrypoints:deployPluginWorkload,rollbackWorkload,promoteFromWorkload,dispatchGeneric, webhookfireBinding/handlePreviewIntent. activeWg/drainMuindeployer.go= global drain barrier, NOT a per-workload lock.- Image idempotency short-circuit (
image.goDeploy ~L170-181) only fires for a verified-running container → after stop, redeploy makes a fresh container; blue-greenenforceMaxInstancesreaps the old stopped one. ⇒ stop→swap→redeploy (C4) is correct. - Scope resolution (
internal/volume/resolver.go): stage/project →<base>/<workload>/<source>(shared per-workload dir); absolute → operator's allowed path. Stage tmp/old siblings under the live dir's PARENT so renames are same-fs (R2). volsnap.Enginehase.mutaken by Create/Delete/pruneWorkload/CleanOrphans.Restoremust NOT holde.mu(R1).- Archive layout: gzip tar, each volume under integer subdir
0/,1/…,manifest.jsonat root =[]SnapshotVolume{Index,Target,Scope,Source}.supportedScopes= absolute/stage/project (volumes.go). - Precedent:
internal/api/backups.gorestoreBackup— X-Confirm-Restore==id,restoreInFlightCAS→409, pre-restore safety backup, atomic rename swap. - Composition root:
cmd/server/main.goconstructsdeployer.New+volsnap.New+docker+store; callsCleanOrphansat startup (wireRecoverInterruptedRestoresthere). - Frontend:
WorkloadSnapshotsPanel.svelte; api fnsweb/src/lib/api.ts~L581; i18napps.detail.snapshots.*in en.json + ru.json. golang.org/x/sys v0.33.0already in go.mod (indirect); build-tag precedent exists (lockfile_windows.go/lockfile_unix.go).
Decisions / invariants
Engine.Restoreholds NOe.mu; per-workloadLifecycle.Lockis the serialization.- Extract ALL tmp dirs BEFORE any rename; swap is pure renames; journal tracks per-volume
swapped. - Pre-restore snapshot captured AFTER stop, BEFORE first rename (durable escape hatch).
- Redeploy pins the newest-running container's tag (same version back up).
- Mixed per-volume state after a mid-restore crash is an accepted v1 limit (each volume intact; pre-restore snapshot = full revert).
Deferred / out of scope
- Named/project_named/instance/ephemeral scopes (consistent with capture).
- Non-image sources.
- Fully-atomic all-volumes-or-nothing restore (v1 is per-volume atomic + journal recovery).
Failed approaches / gotchas
- (none yet)
Phase handoffs
- Phase 1 → 2: (filled after Phase 1)
- Phase 2 → 3: (filled after Phase 2)
- Phase 3 → 4: (filled after Phase 3)