feat(volsnap): volume snapshot restore (backlog #6)

Restore a captured volume snapshot onto an image workload's live host-bind
data volumes, then redeploy — the most destructive workload action, built to
the adversarially-reviewed design (C1–C6) with all data-loss guards.

- Engine.Restore (engine-owned): all-or-nothing pre-flight re-resolution from
  the workload's CURRENT config (never the tamperable manifest), per-filesystem
  disk pre-check, per-workload lock, container quiesce, extract-to-tmp, durable
  pre-restore snapshot, write-ahead journal, atomic rename swap, redeploy, and
  crash-recovery sweep (RecoverInterruptedRestores) wired before serving.
- internal/keyedmutex: shared per-key lock; deployer now serializes every
  deploy entrypoint per workload via DispatchPlugin (+ LockWorkload/RedeployLocked
  for the restore re-dispatch, no deadlock).
- Untrusted-archive extractor: zip-slip containment, type allow-list (reg/dir
  only), decompression-bomb cap, manifest-index bounds.
- POST /api/workloads/{id}/snapshots/{sid}/restore: admin, X-Confirm-Restore
  header (CSRF), per-workload single-flight (409).
- WebUI: Restore button + danger ConfirmDialog + busy state + i18n (en/ru).

Scope: image-source only; scopes absolute/stage/project (driven off the same
supportedScopes constant capture uses).

Plan-reviewed before coding; per-phase go/security/ts reviews; final review
READY TO MERGE. Security review caught + fixed a CRITICAL manifest-Source path
traversal (re-derive target from current config + base containment).

Plan: plans/volume-snapshot-restore/
This commit is contained in:
2026-06-22 17:23:52 +03:00
parent 8a5f69af87
commit 1c47030854
33 changed files with 2825 additions and 34 deletions
+10
View File
@@ -419,6 +419,16 @@ func main() {
apiServer.SetLogScanReloader(logScanMgr)
apiServer.SetBackupEngine(backupEngine)
apiServer.SetSnapshotEngine(snapshotEngine)
// Wire the restore lifecycle seam and reconcile any restore interrupted by a
// crash, BEFORE the HTTP server starts serving — so a half-applied restore is
// completed/reverted first and the restore endpoint is never reachable
// without its safety net.
snapshotEngine.SetLifecycle(&restoreLifecycle{dep: dep, docker: dockerClient, store: db})
if n, err := snapshotEngine.RecoverInterruptedRestores(); err != nil {
slog.Warn("snapshots: recover interrupted restores on startup", "error", err)
} else if n > 0 {
slog.Info("snapshots: recovered interrupted restores on startup", "count", n)
}
apiServer.SetDBPath(dbPath)
apiServer.SetBackupSettingsChangedCallback(scheduleAutobackup)
apiServer.SetDNSProvider(dnsProvider)