Files
alexei.dolgolyov 1c47030854 feat(volsnap): volume snapshot restore (backlog #6)
Restore a captured volume snapshot onto an image workload's live host-bind
data volumes, then redeploy — the most destructive workload action, built to
the adversarially-reviewed design (C1–C6) with all data-loss guards.

- Engine.Restore (engine-owned): all-or-nothing pre-flight re-resolution from
  the workload's CURRENT config (never the tamperable manifest), per-filesystem
  disk pre-check, per-workload lock, container quiesce, extract-to-tmp, durable
  pre-restore snapshot, write-ahead journal, atomic rename swap, redeploy, and
  crash-recovery sweep (RecoverInterruptedRestores) wired before serving.
- internal/keyedmutex: shared per-key lock; deployer now serializes every
  deploy entrypoint per workload via DispatchPlugin (+ LockWorkload/RedeployLocked
  for the restore re-dispatch, no deadlock).
- Untrusted-archive extractor: zip-slip containment, type allow-list (reg/dir
  only), decompression-bomb cap, manifest-index bounds.
- POST /api/workloads/{id}/snapshots/{sid}/restore: admin, X-Confirm-Restore
  header (CSRF), per-workload single-flight (409).
- WebUI: Restore button + danger ConfirmDialog + busy state + i18n (en/ru).

Scope: image-source only; scopes absolute/stage/project (driven off the same
supportedScopes constant capture uses).

Plan-reviewed before coding; per-phase go/security/ts reviews; final review
READY TO MERGE. Security review caught + fixed a CRITICAL manifest-Source path
traversal (re-derive target from current config + base containment).

Plan: plans/volume-snapshot-restore/
2026-06-22 17:23:52 +03:00

3.6 KiB

Tinyforge

Dev Server

Start/restart with: ./scripts/dev-server.sh

  • Runs on port 8090 (avoids 8080 conflict with other local services)
  • Auto-generates ENCRYPTION_KEY if not set
  • Default login: admin / admin123
  • Override port: LISTEN_ADDR=:9000 ./scripts/dev-server.sh

Frontend

  • Boolean inputs use ToggleSwitch ($lib/components/ToggleSwitch.svelte) — the slide-style switch is the unified control across the WebUI. Do not introduce raw <input type="checkbox"> elements; place a <ToggleSwitch> next to a label/help block instead.
  • Confirmations & destructive actions use ConfirmDialog ($lib/components/ConfirmDialog.svelte) — never native window.confirm / alert. For navigation guards (e.g. the unsaved-changes prompt on /apps/new), cancel() the navigation in beforeNavigate, open ConfirmDialog, and re-issue the navigation with a bypass flag on confirm. Native beforeunload is acceptable only for hard tab-close/reload, where the browser forbids custom UI.
  • Source-config shape: $lib/workload/sourceForms.ts is the single source of truth (seed/serialize/validity for image/compose/static/dockerfile), consumed by both /apps/new and /apps/[id]. Don't re-inline seed/serialize logic.
  • "App" = workload with source_kind !== ''. Triggers are first-class bindings (workload_trigger_bindings), NOT on the workload row — never gate app lists/counts on trigger_kind (it's empty for plugin workloads). Legacy pre-cutover kind:project/stack/site rows have an empty source_kind and must be excluded everywhere.
  • i18n parity is mandatory — every key in BOTH web/src/lib/i18n/{en,ru}.json. A missing key is NOT a build error ($t returns the key string), so verify parity manually.

Backend

  • Per-workload deploy lock. Every deploy entrypoint (API deploy, rollback, promote, generic-hooks, webhook trigger dispatch) funnels through deployer.DispatchPlugin, which holds a per-workload keyedmutex lock (internal/keyedmutex) for the whole dispatch; DispatchTeardown takes it too. This serializes all container/volume mutation per workload. Do NOT add a deploy/teardown path that bypasses DispatchPlugin. Operations that must run a deploy while already holding the lock (volume-snapshot restore) use Deployer.LockWorkload + RedeployLocked (the unlocked dispatch) — calling DispatchPlugin under the held lock would deadlock (Go mutexes are not reentrant). activeWg is a global drain barrier for shutdown, NOT a per-workload lock.
  • Volume snapshot restore lives in volsnap.Engine.Restore (engine-owned, not the API handler): preflight re-resolves volumes from the workload's CURRENT config (never the snapshot manifest — that's tamper-influenceable) → lock → stop → extract-to-tmp → pre-restore snapshot → journal → atomic rename swap → redeploy. A startup RecoverInterruptedRestores sweep replays the journal after a crash; it MUST be wired (with SetLifecycle) before the API serves. The archive extractor treats the tar as untrusted (zip-slip/type-allowlist/bomb-cap); the endpoint requires an X-Confirm-Restore: <sid> header (CSRF), like the DB restore.

Build & Test

  • Frontend (from web/): npm run check (svelte-check — expect 0 errors), npm run build, npm run test (vitest; pure-logic units like sourceForms.test.ts).
  • Backend (repo root): go build ./..., go vet ./internal/..., go test ./internal/....
  • ./scripts/dev-server.sh rebuilds the SPA + restarts the Go server on :8090; it kills the prior process, so a previous background dev-server task reporting exit 1 is expected, not a failure.