tiny-forge/plans/volume-snapshot-restore/CONTEXT.md

# CONTEXT — Volume Snapshot Restore

Working memory across phases. The orchestrator owns this file.

## Settings (from PLAN.md header)

- Mode: **Automated** · Execution: **Hybrid** (backend Direct, Phase 4 frontend implementer) · Strategy: **Incremental**
- Base: `main` · Branch: `feature/volume-snapshot-restore` · Remote: origin (Gitea)
- Build: `go build ./...` · Test: `go test ./internal/...` + `npm run test` · Lint: `go vet ./internal/...` + `npm run check`

## Key codebase facts (verified during planning)

- **Deploy choke point:** every deploy entrypoint calls `deployer.DispatchPlugin` →
  put the per-workload lock there (C1). Entrypoints: `deployPluginWorkload`,
  `rollbackWorkload`, `promoteFromWorkload`, `dispatchGeneric`, webhook
  `fireBinding`/`handlePreviewIntent`.
- **`activeWg`/`drainMu`** in `deployer.go` = global drain barrier, NOT a per-workload lock.
- **Image idempotency short-circuit** (`image.go` Deploy ~L170-181) only fires for a
  *verified-running* container → after stop, redeploy makes a fresh container; blue-green
  `enforceMaxInstances` reaps the old stopped one. ⇒ stop→swap→redeploy (C4) is correct.
- **Scope resolution** (`internal/volume/resolver.go`): stage/project → `<base>/<workload>/<source>`
  (shared per-workload dir); absolute → operator's allowed path. Stage tmp/old siblings under
  the live dir's PARENT so renames are same-fs (R2).
- **`volsnap.Engine`** has `e.mu` taken by Create/Delete/pruneWorkload/CleanOrphans.
  `Restore` must NOT hold `e.mu` (R1).
- **Archive layout:** gzip tar, each volume under integer subdir `0/`,`1/`…, `manifest.json`
  at root = `[]SnapshotVolume{Index,Target,Scope,Source}`. `supportedScopes` =
  absolute/stage/project (volumes.go).
- **Precedent:** `internal/api/backups.go` `restoreBackup` — X-Confirm-Restore==id,
  `restoreInFlight` CAS→409, pre-restore safety backup, atomic rename swap.
- **Composition root:** `cmd/server/main.go` constructs `deployer.New` + `volsnap.New` +
  `docker` + `store`; calls `CleanOrphans` at startup (wire `RecoverInterruptedRestores` there).
- **Frontend:** `WorkloadSnapshotsPanel.svelte`; api fns `web/src/lib/api.ts` ~L581;
  i18n `apps.detail.snapshots.*` in en.json + ru.json.
- `golang.org/x/sys v0.33.0` already in go.mod (indirect); build-tag precedent exists
  (`lockfile_windows.go`/`lockfile_unix.go`).

## Decisions / invariants

- `Engine.Restore` holds NO `e.mu`; per-workload `Lifecycle.Lock` is the serialization.
- Extract ALL tmp dirs BEFORE any rename; swap is pure renames; journal tracks per-volume `swapped`.
- Pre-restore snapshot captured AFTER stop, BEFORE first rename (durable escape hatch).
- Redeploy pins the newest-running container's tag (same version back up).
- Mixed per-volume state after a mid-restore crash is an accepted v1 limit (each volume intact; pre-restore snapshot = full revert).

## Deferred / out of scope

- Named/project_named/instance/ephemeral scopes (consistent with capture).
- Non-image sources.
- Fully-atomic all-volumes-or-nothing restore (v1 is per-volume atomic + journal recovery).

## Failed approaches / gotchas

- (none yet)

## Phase handoffs

- Phase 1 → 2: _(filled after Phase 1)_
- Phase 2 → 3: _(filled after Phase 2)_
- Phase 3 → 4: _(filled after Phase 3)_