diff --git a/docs/plans/DEPLOY_HISTORY_ROLLBACK_PLAN.md b/docs/plans/DEPLOY_HISTORY_ROLLBACK_PLAN.md
new file mode 100644
index 0000000..1041586
--- /dev/null
+++ b/docs/plans/DEPLOY_HISTORY_ROLLBACK_PLAN.md
@@ -0,0 +1,223 @@
+# Deploy History + One-Click Rollback — Implementation Plan
+
+**Status:** planned (review incorporated) · **Feature rank:** #1 · **Date:** 2026-06-19
+
+## Review findings incorporated (adversarial pass)
+
+- **BLOCKER — never persist the raw deploy error** (it can carry registry-auth bytes /
+ compose stdout — see `compose.go` SECURITY comment + `workloads_plugin.go:198`).
+ `deploy_history.error` only ever gets a **fixed generic marker**
+ (`"deploy failed (see server logs)"`) on failure; the raw error goes to `slog` only.
+ `capDeployStatus(err.Error())` is rejected.
+- **BLOCKER — don't double-count metrics.** `DispatchPlugin` already calls
+ `metrics.DeploysTotal.Inc(...)`; recording slots into the **existing** outcome block,
+ not a re-added metrics line.
+- **FIX — no runtime-state store getter exists.** static/dockerfile `LastCommitSHA`
+ lives in `containers.extra_json` on a deterministic-ID row
+ (`GetContainerByID(w.ID+":site")` / `+":dockerfile"`, decode `ExtraJSON`). Moot for
+ Phase-1 rollback (image-only) but the resolver must use this, not a fictional getter.
+- **FIX — cascade is distrusted here.** `DeleteWorkload` explicitly deletes containers
+ rather than relying on the FK. Match that: add `DELETE FROM deploy_history WHERE
+ workload_id = ?` inside the `DeleteWorkload` transaction, and make the cascade test a
+ hard gate.
+- **FIX — keep recording off the hot path's tail.** `DispatchPlugin` runs synchronously
+ on the request goroutine; the INSERT is cheap but `PruneDeployHistory` runs in a
+ goroutine. Draining-rejected attempts (beginDispatch fail) record nothing — correct,
+ a never-run deploy must not appear as a rollback target.
+- **FIX — pagination:** use `parseLimit(raw, 50, 200)` (not the unclamped
+ `listWorkloadEvents` style); parse `offset` separately, clamp negatives to 0.
+
+
+## Problem
+
+Tinyforge has *failure* rollback (a failed deploy unwinds its own new container —
+[image.go:258](../../internal/workload/plugin/source/image/image.go)), but **no way to
+revert a *successful* deploy to a prior version.** Blue-green's `enforceMaxInstances`
+deletes the old container rows after cutover, so once `v3` replaces `v2` there is no
+record of `v2` and nothing to roll back to. The only "history" is free-text
+`event_log` rows (`"deployed"`) — not structured, not version-pinned, not replayable.
+
+This is the single most-requested capability for any deploy tool, and the plumbing is
+90% there: every deploy flows through one choke point, and the manual-deploy endpoint
+already accepts a `reference` override.
+
+## Key architectural facts (verified against current code)
+
+- **Single dispatch choke point:** `Deployer.DispatchPlugin(ctx, w, intent)` in
+ [internal/deployer/dispatch.go](../../internal/deployer/dispatch.go) routes *every*
+ source kind and already computes a success/failure `outcome`. This is where history
+ is recorded.
+- **`intent.Reference` is the version handle:** image source resolves
+ `tag := intent.Reference` (falling back to `DefaultTag`/`latest`). The manual deploy
+ endpoint ([workloads_plugin.go](../../internal/api/workloads_plugin.go)) already accepts
+ `{reference, note}` and builds a `manual` intent. **Rollback = deploy with a pinned
+ reference + a distinct reason.**
+- **Effective vs requested reference:** for a *manual* image deploy `intent.Reference`
+ is often `""` (means `DefaultTag`). The *effective* deployed tag is written onto the
+ freshest container row (`store.Container.ImageTag`). For static/dockerfile the
+ effective version is `runtime_state.LastCommitSHA`, resolved inside the source.
+- **Built-from-source sources don't honor a SHA reference on Deploy** — static and
+ dockerfile clone `cfg.Branch` HEAD and capture `latestSHA`; they cannot yet check out
+ an arbitrary commit. So **SHA-pinned rollback for them needs a source change (later
+ phase).** Image-tag rollback works today.
+- **Migration pattern:** additive statements in `runMigrations()` /
+ `workloadTables` in [store.go](../../internal/store/store.go); workload-scoped tables
+ use `REFERENCES workloads(id) ON DELETE CASCADE`. Per-table CRUD lives in its own
+ `internal/store/
.go`, model in `models.go`.
+- **Idempotency note:** the image source's same-tag short-circuit returns *before* it
+ arms its `EmitDeployEvent` defer, so a no-op deploy emits no timeline event. History
+ recorded at `DispatchPlugin` will still log it as a `success` attempt — acceptable
+ (history = ledger of attempts), but called out so the divergence is intentional.
+
+## Scope
+
+### Phase 1 (this plan)
+1. Persistent, structured **deploy-history ledger** for **all** source kinds (success
+ *and* failure) — powers an audit timeline and the rollback action.
+2. **One-click rollback** for the **image** source (redeploy a pinned tag).
+3. Read-only history panel on `/apps/[id]`; rollback button shown only for entries that
+ are `success` + have a non-empty reference + a rollback-capable source kind.
+
+### Explicitly out of scope (future phases, table already supports them)
+- SHA-pinned rebuild rollback for static/dockerfile (needs source checkout-by-commit).
+- Config-snapshot rollback for compose (no artifact reference).
+- Promotion (dev→staging→prod) — separate feature, will reuse this ledger.
+
+## Data model
+
+New table `deploy_history` (added to `workloadTables` in `runMigrations`):
+
+```sql
+CREATE TABLE IF NOT EXISTS deploy_history (
+ id INTEGER PRIMARY KEY AUTOINCREMENT,
+ workload_id TEXT NOT NULL REFERENCES workloads(id) ON DELETE CASCADE,
+ source_kind TEXT NOT NULL DEFAULT '',
+ reference TEXT NOT NULL DEFAULT '', -- effective artifact: image tag | commit sha | ''
+ reason TEXT NOT NULL DEFAULT '', -- manual|registry-push|git-push|cron|rollback|promote
+ triggered_by TEXT NOT NULL DEFAULT '',
+ note TEXT NOT NULL DEFAULT '',
+ outcome TEXT NOT NULL DEFAULT '', -- success | failure
+ error TEXT NOT NULL DEFAULT '', -- truncated, secret-free
+ started_at TEXT NOT NULL DEFAULT '',
+ finished_at TEXT NOT NULL DEFAULT ''
+);
+CREATE INDEX IF NOT EXISTS idx_deploy_history_workload
+ ON deploy_history(workload_id, id DESC);
+```
+
+**Why a dedicated table (not `event_log`):** structured + queryable, version-pinned,
+carries the replayable `reference`, and its retention is independent of the human event
+feed. `event_log` stays the free-text timeline; `deploy_history` is the version ledger.
+
+Go model in `models.go` (`DeployHistoryEntry`, mirrors `MetricAlertRule` style).
+
+## Backend changes
+
+### 1. Store — `internal/store/deploy_history.go` (new) + `models.go` + `store.go`
+- `DeployHistoryEntry` struct.
+- `InsertDeployHistory(e DeployHistoryEntry) (DeployHistoryEntry, error)`.
+- `ListDeployHistory(workloadID string, limit, offset int) ([]DeployHistoryEntry, error)`
+ — ordered `id DESC`; default/clamped limit (e.g. 50, max 200) via existing `parseLimit`
+ conventions at the API layer.
+- `GetDeployHistory(id int64) (DeployHistoryEntry, error)` — for rollback lookup;
+ `ErrNotFound` on miss.
+- `PruneDeployHistory(workloadID string, keep int) error` — keep newest `keep` per
+ workload (mirror the stats-prune pattern). Called best-effort after insert.
+- Migration: append `CREATE TABLE` + index to `workloadTables`.
+- Table test `deploy_history_test.go` (insert/list/get/prune, cascade-on-workload-delete).
+
+### 2. Deployer — record at the choke point (`internal/deployer/dispatch.go`)
+Wrap the existing `src.Deploy(...)` call:
+```go
+started := store.Now()
+err = src.Deploy(ctx, d.PluginDeps(), w, intent)
+outcome := "success"; if err != nil { outcome = "failure" }
+metrics.DeploysTotal.Inc(w.SourceKind, outcome)
+d.recordDeployHistory(w, intent, outcome, err, started) // best-effort, never blocks
+return err
+```
+- `recordDeployHistory` resolves the **effective reference** and inserts a row.
+ Best-effort: a store failure is logged, never propagated (same contract as
+ `maybeBackupBeforeDeploy` and `EmitDeployEvent`).
+- **Effective-reference resolver** (`internal/deployer/deploy_ref.go`, unit-tested):
+ 1. start from `intent.Reference`;
+ 2. `image`: read newest `ListContainersByWorkload(w.ID)` row (by `CreatedAt`), prefer
+ its `ImageTag` when non-empty — captures the `DefaultTag`/`latest` resolution;
+ 3. `static`/`dockerfile`: when still empty, read persisted runtime state
+ `LastCommitSHA` (verify exact store getter during impl);
+ 4. `compose`/unknown: leave as-is (may be `""`).
+- **Error sanitization:** reuse the `capDeployStatus` cap (256 runes) idea — store a
+ short, secret-free `error`. The raw error keeps going to `slog` only. (The deploy
+ error already carries a generic client message; the wrapped detail must not be
+ persisted verbatim because it can echo registry-auth / compose-stdout bytes — same
+ caller contract documented on `EmitDeployEvent`.)
+- Recording does **not** run for `DispatchReconcile` (periodic, not a deploy) or
+ `DispatchTeardown`.
+
+### 3. API — `internal/api/deploy_history.go` (new) + `router.go`
+- `GET /api/workloads/{id}/deploys?limit=&offset=` → `listWorkloadDeploys` (read; any
+ authenticated user — mirrors `listWorkloadEvents`). Uses `parseLimit`.
+- `POST /api/workloads/{id}/rollback` → `rollbackWorkload` (`auth.AdminOnly`), body
+ `{deploy_id}`:
+ 1. load workload (404 if missing; 400 if `source_kind == ""`);
+ 2. `GetDeployHistory(deploy_id)`; 404 if missing, 400 if its `workload_id` ≠ path id
+ (no cross-workload replay);
+ 3. guard: `outcome == "success"`, `reference != ""`, and `source_kind` is
+ rollback-capable (`image` in Phase 1) → else 400 with a clear message;
+ 4. build `manual`-shaped intent `{Reason: "rollback", Reference: row.reference,
+ Metadata: {"note": "rollback to " + row.reference, "rollback_of": },
+ TriggeredBy: actor}`;
+ 5. `deployer.DispatchPlugin(...)`; 202 on accept (same shape as deploy).
+- Register both routes inside the existing `r.Route("/workloads/{id}", …)` block in
+ [router.go](../../internal/api/router.go), next to `/deploy` and `/events`.
+- A `RollbackCapable(sourceKind) bool` helper (single source of truth, shared with the
+ list response so the frontend can render the button state without hardcoding kinds).
+- The list response includes a per-entry `rollbackable bool` computed server-side.
+
+## Frontend changes (`web/`)
+
+- **`DeployHistoryPanel.svelte`** (new, in `lib/components/`): table of entries —
+ short reference, reason badge, `outcome` `StatusBadge` (ok/bad), `triggered_by`,
+ relative time. For `rollbackable` rows a **Roll back** button → `ConfirmDialog`
+ ("Roll back to ?") → `POST …/rollback {deploy_id}` → `Toast` +
+ refresh history and container state. Loading via `Skeleton`; `EmptyState` when no
+ rows. Reuses existing components only.
+- Mount the panel on **`/apps/[id]`** alongside the activity timeline (it is the
+ *structured, actionable* sibling of the free-text timeline).
+- **i18n:** add keys under a `deployHistory.*` namespace to **both**
+ `web/src/lib/i18n/en.json` and `ru.json` (parity is mandatory and not a build error —
+ verify manually per CLAUDE.md).
+- API client: add `listDeploys(id, params)` and `rollback(id, deployId)` to the existing
+ workload API module.
+
+## Testing
+
+- **Store:** `deploy_history_test.go` — insert/list ordering, get, prune-keeps-newest,
+ cascade delete with workload.
+- **Deployer:** extend `deployer` tests — `DispatchPlugin` writes one `success` row and
+ one `failure` row (with sanitized error); reconcile/teardown write none. Resolver unit
+ test (`deploy_ref_test.go`) for the image read-back + empty fallbacks.
+- **API:** rollback guards — cross-workload id → 400; non-success/empty-ref/
+ non-image → 400; happy path → 202 and a `rollback`-reason history row appears.
+- **Web:** keep it light (the panel is mostly presentational); a `sourceForms`-style
+ pure-logic unit only if a non-trivial helper emerges.
+- Gates: `go build ./...`, `go vet ./internal/...`, `go test ./internal/...`,
+ `cd web && npm run check && npm run test`, then `./scripts/dev-server.sh`.
+
+## Risks / mitigations
+
+- **Recording must never break a deploy** → best-effort insert, errors only logged
+ (matches existing `EmitDeployEvent` / pre-deploy-backup contracts).
+- **Secret leakage via `error`** → store only a capped, generic reason; raw error to
+ `slog` only.
+- **Unbounded growth** → `PruneDeployHistory` keeps newest N per workload.
+- **Rollback to a vanished image tag** → the image source's `PullImage` fails and its
+ own failure-rollback leaves the live container untouched; the rollback attempt is
+ recorded as `failure`. No special handling needed.
+- **No-op rollback (target already running, `MaxInstances>1`)** → image short-circuit
+ returns `nil`; recorded as `success`. Acceptable.
+
+## Rollout
+
+Single PR. Additive migration (no destructive DDL). No settings changes. Backward
+compatible: existing workloads simply start accumulating history on their next deploy.
diff --git a/docs/plans/WORKLOAD_METRICS_GRAPH_PLAN.md b/docs/plans/WORKLOAD_METRICS_GRAPH_PLAN.md
new file mode 100644
index 0000000..747db25
--- /dev/null
+++ b/docs/plans/WORKLOAD_METRICS_GRAPH_PLAN.md
@@ -0,0 +1,84 @@
+# Per-Workload Metrics Graph — Implementation Plan
+
+**Status:** planned · **Feature rank:** #2 · **Date:** 2026-06-19
+
+## Problem
+
+Stats are collected per container (`container_stats_samples`, CPU/mem/net/disk) and
+charted **globally** on the dashboard (`SystemResourcesCard` + `ResourceChart`), but
+`/apps/[id]` shows only live snapshots — there's no per-workload "is my app leaking
+memory / pegging CPU over the last few hours" view. This is a daily question and the
+data already exists; we just need a per-workload query + a panel that reuses the chart.
+
+## Verified facts
+
+- `ContainerStatsSample.OwnerID` == the **container row id** (`containers.id`), confirmed
+ by `lookupInstanceName` → `GetContainerByID(sm.OwnerID)` in
+ [stats_history.go](../../internal/api/stats_history.go). `OwnerType` ∈ {instance, site}.
+- Each sample's `ts` is that container's own Docker-stats `Timestamp.Unix()`
+ ([collector.go](../../internal/stats/collector.go)) — NOT one shared tick stamp. In a
+ multi-container tick the per-second truncation usually collapses them to the same
+ integer `ts`, so per-`ts` aggregation works; a ±1s split at a second boundary is
+ cosmetic for a trend line. (Reviewer-corrected.) The handler 404s on an unknown
+ workload id but returns `[]` for a known workload with no samples yet.
+- `ResourceChart.svelte` takes a fully-built `EChartsOption` from the parent; the parent
+ owns series/axes (see `SystemResourcesCard`). Reads stay available when Docker is down
+ (samples come from SQLite, not the daemon).
+- Per-workload reads (`/events`, `/runtime-state`) are open to any authenticated user;
+ this endpoint follows suit (no `AdminOnly`).
+
+## Backend
+
+1. **Store** — `ListContainerStatsSamplesByWorkload(workloadID string, sinceTS int64)`:
+ ```sql
+ SELECT cs.container_id, cs.owner_type, cs.owner_id, cs.ts,
+ cs.cpu_percent, cs.memory_usage, cs.memory_limit,
+ cs.network_rx, cs.network_tx, cs.block_read, cs.block_write
+ FROM container_stats_samples cs
+ JOIN containers c ON c.id = cs.owner_id
+ WHERE c.workload_id = ? AND cs.ts >= ?
+ ORDER BY cs.ts ASC
+ ```
+ Returns `[]ContainerStatsSample`.
+
+2. **API** — `getWorkloadStatsHistory` (GET `/api/workloads/{id}/stats/history?window=`):
+ reuse `parseWindow`/`sinceTimestamp`; aggregate samples **per ts** into a compact
+ series so multi-container workloads (compose) sum correctly:
+ ```go
+ type workloadStatsPoint struct {
+ TS int64 `json:"ts"`
+ CPUPercent float64 `json:"cpu_percent"` // sum across the workload's containers
+ MemoryUsage int64 `json:"memory_usage"` // sum bytes
+ MemoryLimit int64 `json:"memory_limit"` // max (effective ceiling)
+ }
+ ```
+ Always returns `[]` (never 503) — empty when stats are disabled / Docker was down /
+ the workload is new. Register in the `/workloads/{id}` route block.
+
+3. **Tests** — store: join scopes to the right workload (A's samples ≠ B's); API:
+ per-ts aggregation sums two containers at the same tick.
+
+## Frontend
+
+4. **api.ts** — `WorkloadStatsPoint` type + `fetchWorkloadStatsHistory(id, window, signal)`.
+5. **`WorkloadMetricsPanel.svelte`** — window selector (30m / 2h / 6h), fetch + 15s poll
+ (mirror `SystemResourcesCard`), build an `EChartsOption` with **two series**: CPU %
+ on the left axis, Memory (MiB) on the right axis (absolute bytes, because
+ `memory_limit` is often 0/unlimited so a % would divide by zero). `EmptyState`/ hint
+ when there are no samples. Render via `ResourceChart`. Mount on `/apps/[id]` near the
+ deploy-history panel.
+6. **i18n** — `apps.detail.metrics.*` in both en.json and ru.json (parity mandatory).
+
+## Risks / mitigations
+
+- **Docker down / stats disabled** → empty series, friendly hint (no error). SQLite read
+ path is independent of the daemon.
+- **memory_limit = 0 (unlimited)** → plot absolute MiB, not %, to avoid div-by-zero.
+- **Sparse sampling** → chart shows whatever ticks exist; window selector lets the user
+ widen. No interpolation.
+- **Auth** → read-only, any authenticated user (consistent with other per-workload reads).
+
+## Rollout
+
+Single change set, additive, no migration. Reuses the existing `echarts` dependency and
+`ResourceChart` component.
diff --git a/internal/api/deploy_history.go b/internal/api/deploy_history.go
new file mode 100644
index 0000000..d5f164c
--- /dev/null
+++ b/internal/api/deploy_history.go
@@ -0,0 +1,151 @@
+package api
+
+import (
+ "errors"
+ "log/slog"
+ "net/http"
+ "strconv"
+ "time"
+
+ "github.com/go-chi/chi/v5"
+
+ "github.com/alexei/tinyforge/internal/auth"
+ "github.com/alexei/tinyforge/internal/store"
+ "github.com/alexei/tinyforge/internal/workload/plugin"
+)
+
+// parseOffset parses a pagination offset, clamping anything invalid or
+// negative to 0. parseLimit (secrets.go) handles the limit half.
+func parseOffset(raw string) int {
+ n, err := strconv.Atoi(raw)
+ if err != nil || n < 0 {
+ return 0
+ }
+ return n
+}
+
+// rollbackCapableKinds is the single source of truth for which source kinds
+// support reference-pinned redeploy. The image source resolves
+// intent.Reference as the tag, so replaying a prior tag is a real rollback.
+// static/dockerfile clone branch HEAD and cannot yet check out an arbitrary
+// commit (a later phase); compose has no single artifact handle.
+var rollbackCapableKinds = map[string]bool{"image": true}
+
+// RollbackCapable reports whether a source kind supports one-click rollback.
+// Used by both the list response (per-row `rollbackable` flag) and the
+// rollback guard so the UI and the server never disagree.
+func RollbackCapable(sourceKind string) bool { return rollbackCapableKinds[sourceKind] }
+
+// listWorkloadDeploys handles GET /api/workloads/{id}/deploys. Read-only,
+// open to any authenticated user (mirrors the per-workload events feed).
+// Returns the structured deploy ledger newest-first with a server-computed
+// `rollbackable` flag per row.
+func (s *Server) listWorkloadDeploys(w http.ResponseWriter, r *http.Request) {
+ id := chi.URLParam(r, "id")
+ if id == "" {
+ respondError(w, http.StatusBadRequest, "workload id is required")
+ return
+ }
+
+ q := r.URL.Query()
+ limit := parseLimit(q.Get("limit"), 50, 200)
+ offset := parseOffset(q.Get("offset"))
+
+ rows, err := s.store.ListDeployHistory(id, limit, offset)
+ if err != nil {
+ slog.Error("failed to list deploy history", "workload", id, "error", err)
+ respondError(w, http.StatusInternalServerError, "failed to list deploy history")
+ return
+ }
+ for i := range rows {
+ rows[i].Rollbackable = rows[i].Outcome == "success" &&
+ rows[i].Reference != "" &&
+ RollbackCapable(rows[i].SourceKind)
+ }
+ respondJSON(w, http.StatusOK, rows)
+}
+
+// rollbackWorkload handles POST /api/workloads/{id}/rollback. Admin-only
+// (same gate as /deploy). Body: {"deploy_id": }. It resolves the pinned
+// reference from a prior successful, rollback-capable ledger row belonging
+// to this workload and replays it as a `rollback`-reason deploy.
+func (s *Server) rollbackWorkload(w http.ResponseWriter, r *http.Request) {
+ id := chi.URLParam(r, "id")
+
+ row, err := s.store.GetWorkloadByID(id)
+ if err != nil {
+ if errors.Is(err, store.ErrNotFound) {
+ respondNotFound(w, "workload")
+ return
+ }
+ respondError(w, http.StatusInternalServerError, "get workload")
+ return
+ }
+ if row.SourceKind == "" {
+ respondError(w, http.StatusBadRequest, "workload has no source_kind; cannot roll back")
+ return
+ }
+
+ var body struct {
+ DeployID int64 `json:"deploy_id"`
+ }
+ if !decodeJSONStrict(w, r, &body) {
+ return
+ }
+ if body.DeployID <= 0 {
+ respondError(w, http.StatusBadRequest, "deploy_id is required")
+ return
+ }
+
+ entry, err := s.store.GetDeployHistory(body.DeployID)
+ if err != nil {
+ if errors.Is(err, store.ErrNotFound) {
+ respondNotFound(w, "deploy history entry")
+ return
+ }
+ respondError(w, http.StatusInternalServerError, "get deploy history")
+ return
+ }
+ // No cross-workload replay: the entry must belong to the path workload.
+ if entry.WorkloadID != id {
+ respondError(w, http.StatusBadRequest, "deploy entry does not belong to this workload")
+ return
+ }
+ if entry.Outcome != "success" {
+ respondError(w, http.StatusBadRequest, "cannot roll back to a failed deploy")
+ return
+ }
+ if entry.Reference == "" || !RollbackCapable(row.SourceKind) {
+ respondError(w, http.StatusBadRequest, "this deploy is not rollback-capable")
+ return
+ }
+
+ actor := "manual"
+ if claims, ok := auth.ClaimsFromContext(r.Context()); ok && claims.Username != "" {
+ actor = claims.Username
+ }
+ intent := plugin.DeploymentIntent{
+ Reason: "rollback",
+ Reference: entry.Reference,
+ Metadata: map[string]string{
+ "note": "rollback to " + entry.Reference,
+ "rollback_of": strconv.FormatInt(entry.ID, 10),
+ },
+ TriggeredAt: time.Now().UTC(),
+ TriggeredBy: actor,
+ }
+ if err := s.deployer.DispatchPlugin(r.Context(), toPluginWorkload(row), intent); err != nil {
+ // Raw error stays in the server log; client gets a generic message
+ // (the wrapped error can carry registry-auth bytes).
+ slog.Warn("rollback dispatch failed", "workload", id, "actor", actor,
+ "reference", entry.Reference, "error", err)
+ respondError(w, http.StatusInternalServerError, "rollback failed; see server logs")
+ return
+ }
+ respondJSON(w, http.StatusAccepted, map[string]any{
+ "workload_id": id,
+ "reference": entry.Reference,
+ "rollback_of": entry.ID,
+ "triggered_by": actor,
+ })
+}
diff --git a/internal/api/deploy_history_test.go b/internal/api/deploy_history_test.go
new file mode 100644
index 0000000..14ebb94
--- /dev/null
+++ b/internal/api/deploy_history_test.go
@@ -0,0 +1,126 @@
+package api
+
+import (
+ "net/http"
+ "testing"
+
+ "github.com/alexei/tinyforge/internal/store"
+ "github.com/alexei/tinyforge/internal/workload/plugin"
+)
+
+// createImageWorkload creates an image-source workload through the API so
+// source_kind is persisted exactly as production does, returning its id.
+func createImageWorkload(t *testing.T, e *apiTestEnv, name string) string {
+ t.Helper()
+ resp := e.do(t, http.MethodPost, "/api/workloads", pluginWorkloadRequest{
+ Name: name, SourceKind: "image", SourceConfig: validImageSourceConfig(),
+ })
+ if resp.StatusCode != http.StatusCreated {
+ _ = decodeEnvelope(t, resp, nil)
+ t.Fatalf("create workload: status %d", resp.StatusCode)
+ }
+ var got plugin.Workload
+ if errMsg := decodeEnvelope(t, resp, &got); errMsg != "" {
+ t.Fatalf("create workload envelope error: %q", errMsg)
+ }
+ return got.ID
+}
+
+func TestListWorkloadDeploys_ComputesRollbackable(t *testing.T) {
+ e := newAPITestEnv(t)
+ id := createImageWorkload(t, e, "app")
+
+ // success + reference + image => rollbackable
+ e.store.InsertDeployHistory(store.DeployHistoryEntry{
+ WorkloadID: id, SourceKind: "image", Reference: "v1", Outcome: "success",
+ })
+ // failure => not rollbackable
+ e.store.InsertDeployHistory(store.DeployHistoryEntry{
+ WorkloadID: id, SourceKind: "image", Reference: "v2", Outcome: "failure",
+ })
+ // success but empty reference => not rollbackable
+ e.store.InsertDeployHistory(store.DeployHistoryEntry{
+ WorkloadID: id, SourceKind: "image", Reference: "", Outcome: "success",
+ })
+
+ resp := e.do(t, http.MethodGet, "/api/workloads/"+id+"/deploys", nil)
+ if resp.StatusCode != http.StatusOK {
+ t.Fatalf("status = %d, want 200", resp.StatusCode)
+ }
+ var rows []store.DeployHistoryEntry
+ if errMsg := decodeEnvelope(t, resp, &rows); errMsg != "" {
+ t.Fatalf("envelope error: %q", errMsg)
+ }
+ if len(rows) != 3 {
+ t.Fatalf("expected 3 rows, got %d", len(rows))
+ }
+ // Newest-first: empty-ref success, failure, then v1 success.
+ if !rows[2].Rollbackable {
+ t.Fatalf("v1 success row should be rollbackable: %+v", rows[2])
+ }
+ if rows[1].Rollbackable || rows[0].Rollbackable {
+ t.Fatalf("failure / empty-ref rows must not be rollbackable")
+ }
+}
+
+func TestRollback_HappyPath_DispatchesRollbackIntent(t *testing.T) {
+ e := newAPITestEnv(t)
+ id := createImageWorkload(t, e, "app")
+ entry, _ := e.store.InsertDeployHistory(store.DeployHistoryEntry{
+ WorkloadID: id, SourceKind: "image", Reference: "v1", Outcome: "success",
+ })
+
+ before := e.dispatcher.deployCount.Load()
+ resp := e.do(t, http.MethodPost, "/api/workloads/"+id+"/rollback",
+ map[string]any{"deploy_id": entry.ID})
+ if resp.StatusCode != http.StatusAccepted {
+ errMsg := decodeEnvelope(t, resp, nil)
+ t.Fatalf("status = %d, want 202 (err=%q)", resp.StatusCode, errMsg)
+ }
+ if got := e.dispatcher.deployCount.Load(); got != before+1 {
+ t.Fatalf("expected one dispatch, got delta %d", got-before)
+ }
+ intent := e.dispatcher.lastIntent.Load()
+ if intent == nil || intent.Reason != "rollback" || intent.Reference != "v1" {
+ t.Fatalf("expected rollback intent for v1, got %+v", intent)
+ }
+}
+
+func TestRollback_Guards(t *testing.T) {
+ e := newAPITestEnv(t)
+ imageID := createImageWorkload(t, e, "img")
+ otherID := createImageWorkload(t, e, "other")
+
+ success, _ := e.store.InsertDeployHistory(store.DeployHistoryEntry{
+ WorkloadID: imageID, SourceKind: "image", Reference: "v1", Outcome: "success",
+ })
+ failed, _ := e.store.InsertDeployHistory(store.DeployHistoryEntry{
+ WorkloadID: imageID, SourceKind: "image", Reference: "v2", Outcome: "failure",
+ })
+ otherWL, _ := e.store.InsertDeployHistory(store.DeployHistoryEntry{
+ WorkloadID: otherID, SourceKind: "image", Reference: "v1", Outcome: "success",
+ })
+
+ cases := []struct {
+ name string
+ workload string
+ body any
+ wantCode int
+ }{
+ {"missing deploy_id", imageID, map[string]any{}, http.StatusBadRequest},
+ {"zero deploy_id", imageID, map[string]any{"deploy_id": 0}, http.StatusBadRequest},
+ {"unknown deploy_id", imageID, map[string]any{"deploy_id": 999999}, http.StatusNotFound},
+ {"unknown workload", "nope", map[string]any{"deploy_id": success.ID}, http.StatusNotFound},
+ {"failed deploy", imageID, map[string]any{"deploy_id": failed.ID}, http.StatusBadRequest},
+ {"cross-workload entry", imageID, map[string]any{"deploy_id": otherWL.ID}, http.StatusBadRequest},
+ }
+ for _, c := range cases {
+ t.Run(c.name, func(t *testing.T) {
+ resp := e.do(t, http.MethodPost, "/api/workloads/"+c.workload+"/rollback", c.body)
+ if resp.StatusCode != c.wantCode {
+ errMsg := decodeEnvelope(t, resp, nil)
+ t.Fatalf("status = %d, want %d (err=%q)", resp.StatusCode, c.wantCode, errMsg)
+ }
+ })
+ }
+}
diff --git a/internal/api/router.go b/internal/api/router.go
index d169dc8..ff6b686 100644
--- a/internal/api/router.go
+++ b/internal/api/router.go
@@ -336,6 +336,12 @@ func (s *Server) Router() chi.Router {
r.With(auth.AdminOnly).Post("/start", s.startPluginWorkload)
r.With(auth.AdminOnly).Delete("/", s.deletePluginWorkload)
+ // Deploy ledger + rollback. The history feed is read-only
+ // (any authenticated user); rollback is a redeploy, so it is
+ // admin-gated like /deploy.
+ r.Get("/deploys", s.listWorkloadDeploys)
+ r.With(auth.AdminOnly).Post("/rollback", s.rollbackWorkload)
+
// Volume snapshots (admin-only). Capture/list a workload's
// host-bind data volumes; {sid}-scoped download/delete live
// in the global admin group alongside backups.
@@ -348,6 +354,10 @@ func (s *Server) Router() chi.Router {
r.Get("/runtime-state", s.getWorkloadRuntimeState)
r.Get("/storage", s.getWorkloadStorage)
+ // Per-workload metrics history (CPU/memory time-series),
+ // aggregated across the workload's containers. Read-only.
+ r.Get("/stats/history", s.getWorkloadStatsHistory)
+
// Per-workload activity / deploy timeline (read-only). Scoped
// to this workload's event-log rows; the global feed lives at
// /events/log.
diff --git a/internal/api/stats_history.go b/internal/api/stats_history.go
index 14f68ee..5c931d5 100644
--- a/internal/api/stats_history.go
+++ b/internal/api/stats_history.go
@@ -1,12 +1,15 @@
package api
import (
+ "errors"
"log/slog"
"net/http"
"sort"
"strconv"
"time"
+ "github.com/go-chi/chi/v5"
+
"github.com/alexei/tinyforge/internal/auth"
"github.com/alexei/tinyforge/internal/store"
)
@@ -85,6 +88,76 @@ func (s *Server) getSystemStatsHistory(w http.ResponseWriter, r *http.Request) {
respondJSON(w, http.StatusOK, samples)
}
+// workloadStatsPoint is one aggregated time bucket for a workload's metrics
+// graph: every container the workload owns is summed at each timestamp so a
+// multi-container (compose) workload reads as a single line. MemoryLimit is
+// the max across containers — the effective ceiling — though the UI plots
+// absolute MiB because the limit is often 0 (unlimited).
+type workloadStatsPoint struct {
+ TS int64 `json:"ts"`
+ CPUPercent float64 `json:"cpu_percent"`
+ MemoryUsage int64 `json:"memory_usage"`
+ MemoryLimit int64 `json:"memory_limit"`
+}
+
+// getWorkloadStatsHistory handles GET /api/workloads/{id}/stats/history?window=1h.
+// Read-only and open to any authenticated user (mirrors the per-workload
+// events/runtime-state feeds). Always returns a (possibly empty) array — never
+// 503 — because samples come from SQLite, which is available even when the
+// Docker daemon is down or stats collection is disabled. Unknown workload id
+// 404s; a known workload with no samples yet returns [].
+func (s *Server) getWorkloadStatsHistory(w http.ResponseWriter, r *http.Request) {
+ id := chi.URLParam(r, "id")
+ if id == "" {
+ respondError(w, http.StatusBadRequest, "workload id is required")
+ return
+ }
+ if _, err := s.store.GetWorkloadByID(id); err != nil {
+ if errors.Is(err, store.ErrNotFound) {
+ respondNotFound(w, "workload")
+ return
+ }
+ respondError(w, http.StatusInternalServerError, "get workload")
+ return
+ }
+
+ samples, err := s.store.ListContainerStatsSamplesByWorkload(id, sinceTimestamp(parseWindow(r)))
+ if err != nil {
+ slog.Error("failed to list workload stats samples", "workload", id, "error", err)
+ respondError(w, http.StatusInternalServerError, "failed to list samples")
+ return
+ }
+
+ respondJSON(w, http.StatusOK, aggregateWorkloadStats(samples))
+}
+
+// aggregateWorkloadStats folds per-container samples into one series keyed by
+// timestamp: CPU% and memory usage are summed across the workload's containers,
+// memory limit takes the max. Samples arrive ts-ascending, so the output keeps
+// that order without an extra sort.
+func aggregateWorkloadStats(samples []store.ContainerStatsSample) []workloadStatsPoint {
+ points := make([]workloadStatsPoint, 0)
+ idx := make(map[int64]int) // ts → index in points
+ for _, sm := range samples {
+ if i, ok := idx[sm.TS]; ok {
+ points[i].CPUPercent += sm.CPUPercent
+ points[i].MemoryUsage += sm.MemoryUsage
+ if sm.MemoryLimit > points[i].MemoryLimit {
+ points[i].MemoryLimit = sm.MemoryLimit
+ }
+ continue
+ }
+ idx[sm.TS] = len(points)
+ points = append(points, workloadStatsPoint{
+ TS: sm.TS,
+ CPUPercent: sm.CPUPercent,
+ MemoryUsage: sm.MemoryUsage,
+ MemoryLimit: sm.MemoryLimit,
+ })
+ }
+ return points
+}
+
// listTopContainers handles GET /api/system/stats/top?limit=5&by=cpu.
// Returns the top-N most recent samples across containers, sorted by CPU or
// memory. Container IDs are stripped for non-admins so a low-privilege viewer
diff --git a/internal/api/stats_history_test.go b/internal/api/stats_history_test.go
new file mode 100644
index 0000000..1a6ea2e
--- /dev/null
+++ b/internal/api/stats_history_test.go
@@ -0,0 +1,64 @@
+package api
+
+import (
+ "testing"
+
+ "github.com/alexei/tinyforge/internal/store"
+)
+
+func TestAggregateWorkloadStats_SumsPerTimestamp(t *testing.T) {
+ // Two containers reporting at the same two ticks → summed per ts.
+ samples := []store.ContainerStatsSample{
+ {TS: 100, CPUPercent: 10, MemoryUsage: 1000, MemoryLimit: 4000},
+ {TS: 100, CPUPercent: 5, MemoryUsage: 500, MemoryLimit: 8000},
+ {TS: 200, CPUPercent: 20, MemoryUsage: 2000, MemoryLimit: 4000},
+ }
+ pts := aggregateWorkloadStats(samples)
+ if len(pts) != 2 {
+ t.Fatalf("expected 2 buckets, got %d", len(pts))
+ }
+ if pts[0].TS != 100 || pts[0].CPUPercent != 15 || pts[0].MemoryUsage != 1500 {
+ t.Fatalf("ts=100 bucket wrong: %+v", pts[0])
+ }
+ // Memory limit takes the max across containers.
+ if pts[0].MemoryLimit != 8000 {
+ t.Fatalf("expected max memory limit 8000, got %d", pts[0].MemoryLimit)
+ }
+ if pts[1].TS != 200 || pts[1].CPUPercent != 20 {
+ t.Fatalf("ts=200 bucket wrong: %+v", pts[1])
+ }
+}
+
+func TestAggregateWorkloadStats_Empty(t *testing.T) {
+ pts := aggregateWorkloadStats(nil)
+ if pts == nil {
+ t.Fatal("expected non-nil empty slice for clean JSON []")
+ }
+ if len(pts) != 0 {
+ t.Fatalf("expected 0 points, got %d", len(pts))
+ }
+}
+
+func TestWorkloadStatsHistory_UnknownWorkload404(t *testing.T) {
+ e := newAPITestEnv(t)
+ resp := e.do(t, "GET", "/api/workloads/nope/stats/history", nil)
+ if resp.StatusCode != 404 {
+ t.Fatalf("expected 404 for unknown workload, got %d", resp.StatusCode)
+ }
+}
+
+func TestWorkloadStatsHistory_KnownWorkloadEmpty(t *testing.T) {
+ e := newAPITestEnv(t)
+ id := createImageWorkload(t, e, "metrics-app")
+ resp := e.do(t, "GET", "/api/workloads/"+id+"/stats/history", nil)
+ if resp.StatusCode != 200 {
+ t.Fatalf("expected 200, got %d", resp.StatusCode)
+ }
+ var pts []workloadStatsPoint
+ if errMsg := decodeEnvelope(t, resp, &pts); errMsg != "" {
+ t.Fatalf("envelope error: %q", errMsg)
+ }
+ if len(pts) != 0 {
+ t.Fatalf("expected empty series for app with no samples, got %d", len(pts))
+ }
+}
diff --git a/internal/deployer/deploy_history.go b/internal/deployer/deploy_history.go
new file mode 100644
index 0000000..4bf6c2b
--- /dev/null
+++ b/internal/deployer/deploy_history.go
@@ -0,0 +1,76 @@
+package deployer
+
+import (
+ "log/slog"
+
+ "github.com/alexei/tinyforge/internal/store"
+ "github.com/alexei/tinyforge/internal/workload/plugin"
+)
+
+// deployHistoryKeepPerWorkload bounds the ledger per workload. Newer rows
+// always have larger ids, so pruning keeps the most recent N — enough for a
+// useful rollback menu without unbounded growth on hot workloads.
+const deployHistoryKeepPerWorkload = 50
+
+// recordDeployHistory appends one ledger row for a completed dispatch.
+//
+// Best-effort: a store failure is logged and swallowed — recording must
+// never turn a successful deploy into a failed request (same contract as
+// EmitDeployEvent and the pre-deploy backup). The raw deploy error is NEVER
+// persisted: it can carry registry-auth bytes or compose stdout, so only a
+// fixed, secret-free marker lands in the row (raw detail goes to slog at the
+// call site). Called only from DispatchPlugin — reconcile/teardown ticks are
+// not deploys and must not appear in the ledger.
+func (d *Deployer) recordDeployHistory(w plugin.Workload, intent plugin.DeploymentIntent, outcome string, deployErr error, startedAt string) {
+ if d.store == nil {
+ return
+ }
+ entry := store.DeployHistoryEntry{
+ WorkloadID: w.ID,
+ SourceKind: w.SourceKind,
+ Reference: d.effectiveReference(w, intent),
+ Reason: intent.Reason,
+ TriggeredBy: intent.TriggeredBy,
+ Note: intent.Metadata["note"], // nil map read is safe
+ Outcome: outcome,
+ StartedAt: startedAt,
+ FinishedAt: store.Now(),
+ }
+ if deployErr != nil {
+ entry.Error = "deploy failed (see server logs)"
+ }
+ if _, err := d.store.InsertDeployHistory(entry); err != nil {
+ slog.Warn("deploy history: insert failed", "workload", w.ID, "error", err)
+ return
+ }
+ // Cheap indexed DELETE — negligible next to a multi-second deploy, so it
+ // stays inline rather than on an untracked goroutine that could outrace
+ // graceful shutdown's db.Close().
+ if err := d.store.PruneDeployHistory(w.ID, deployHistoryKeepPerWorkload); err != nil {
+ slog.Warn("deploy history: prune failed", "workload", w.ID, "error", err)
+ }
+}
+
+// effectiveReference resolves the artifact handle to record (and, for
+// rollback-capable sources, to replay). It starts from the trigger-supplied
+// intent.Reference and, for the image source, prefers the tag actually
+// written onto the freshest container row — capturing the DefaultTag /
+// "latest" resolution the source performs when intent.Reference is empty
+// (e.g. a manual deploy with no override). ListContainersByWorkload returns
+// newest-first, so rows[0] is the just-deployed container on success.
+//
+// For static/dockerfile the git trigger already supplies the commit SHA as
+// intent.Reference; a manual deploy of those may record an empty reference
+// (acceptable — they are not rollback-capable in this phase). compose has no
+// single artifact handle.
+func (d *Deployer) effectiveReference(w plugin.Workload, intent plugin.DeploymentIntent) string {
+ ref := intent.Reference
+ if w.SourceKind == "image" && d.store != nil {
+ if rows, err := d.store.ListContainersByWorkload(w.ID); err == nil && len(rows) > 0 {
+ if tag := rows[0].ImageTag; tag != "" {
+ ref = tag
+ }
+ }
+ }
+ return ref
+}
diff --git a/internal/deployer/dispatch.go b/internal/deployer/dispatch.go
index 78689fc..a4ffd46 100644
--- a/internal/deployer/dispatch.go
+++ b/internal/deployer/dispatch.go
@@ -5,6 +5,7 @@ import (
"fmt"
"github.com/alexei/tinyforge/internal/metrics"
+ "github.com/alexei/tinyforge/internal/store"
"github.com/alexei/tinyforge/internal/workload/plugin"
)
@@ -33,12 +34,17 @@ func (d *Deployer) DispatchPlugin(ctx context.Context, w plugin.Workload, intent
// check (e.g. the image source's same-tag short-circuit), so a same-tag
// redeploy still snapshots — "backup before every deploy attempt".
d.maybeBackupBeforeDeploy(w.ID)
+ startedAt := store.Now()
err = src.Deploy(ctx, d.PluginDeps(), w, intent)
outcome := "success"
if err != nil {
outcome = "failure"
}
metrics.DeploysTotal.Inc(w.SourceKind, outcome)
+ // Append to the structured deploy ledger (powers the per-app history
+ // panel + rollback). Best-effort and secret-free; see recordDeployHistory.
+ // Only DispatchPlugin records — reconcile/teardown are not deploys.
+ d.recordDeployHistory(w, intent, outcome, err, startedAt)
return err
}
diff --git a/internal/deployer/dispatch_test.go b/internal/deployer/dispatch_test.go
index 4b4a7f1..10d9107 100644
--- a/internal/deployer/dispatch_test.go
+++ b/internal/deployer/dispatch_test.go
@@ -250,6 +250,84 @@ func TestDispatchReconcile_PropagatesSourceError(t *testing.T) {
}
}
+// ---- Deploy history recording ----------------------------------------------
+
+// seedDispatchWorkload inserts a real workloads row so deploy_history's FK
+// (workload_id REFERENCES workloads) is satisfied, then returns a plugin
+// workload pointing at the fake source.
+func seedDispatchWorkload(t *testing.T, d *Deployer) plugin.Workload {
+ t.Helper()
+ row, err := d.store.CreateWorkload(store.Workload{Kind: "project", RefID: "dh", Name: "dh"})
+ if err != nil {
+ t.Fatalf("CreateWorkload: %v", err)
+ }
+ return plugin.Workload{ID: row.ID, Name: "dh", SourceKind: "dispatchertest"}
+}
+
+func TestDispatchPlugin_RecordsSuccessHistory(t *testing.T) {
+ resetFake(t)
+ d := newTestDeployer(t)
+ w := seedDispatchWorkload(t, d)
+
+ intent := plugin.DeploymentIntent{Reason: "manual", Reference: "v9", TriggeredBy: "alice",
+ Metadata: map[string]string{"note": "ship it"}}
+ if err := d.DispatchPlugin(context.Background(), w, intent); err != nil {
+ t.Fatalf("DispatchPlugin: %v", err)
+ }
+ rows, err := d.store.ListDeployHistory(w.ID, 10, 0)
+ if err != nil {
+ t.Fatalf("ListDeployHistory: %v", err)
+ }
+ if len(rows) != 1 {
+ t.Fatalf("expected 1 history row, got %d", len(rows))
+ }
+ got := rows[0]
+ if got.Outcome != "success" || got.Reason != "manual" || got.Reference != "v9" {
+ t.Fatalf("unexpected row: %+v", got)
+ }
+ if got.TriggeredBy != "alice" || got.Note != "ship it" {
+ t.Fatalf("intent fields not recorded: %+v", got)
+ }
+ if got.Error != "" {
+ t.Fatalf("success row must have empty error, got %q", got.Error)
+ }
+}
+
+func TestDispatchPlugin_RecordsFailureWithoutLeakingError(t *testing.T) {
+ resetFake(t)
+ d := newTestDeployer(t)
+ w := seedDispatchWorkload(t, d)
+
+ // A deploy error carrying a "secret" must never reach the persisted row.
+ dispatchTestSource.setDeployErr(errors.New("compose up failed (output: SUPER_SECRET=hunter2)"))
+ _ = d.DispatchPlugin(context.Background(), w, plugin.DeploymentIntent{Reason: "manual"})
+
+ rows, _ := d.store.ListDeployHistory(w.ID, 10, 0)
+ if len(rows) != 1 {
+ t.Fatalf("expected 1 history row, got %d", len(rows))
+ }
+ if rows[0].Outcome != "failure" {
+ t.Fatalf("expected failure outcome, got %q", rows[0].Outcome)
+ }
+ if strings.Contains(rows[0].Error, "hunter2") || strings.Contains(rows[0].Error, "SECRET") {
+ t.Fatalf("raw error leaked into history: %q", rows[0].Error)
+ }
+}
+
+func TestDispatchReconcile_RecordsNoHistory(t *testing.T) {
+ resetFake(t)
+ d := newTestDeployer(t)
+ w := seedDispatchWorkload(t, d)
+
+ if err := d.DispatchReconcile(context.Background(), w); err != nil {
+ t.Fatalf("DispatchReconcile: %v", err)
+ }
+ rows, _ := d.store.ListDeployHistory(w.ID, 10, 0)
+ if len(rows) != 0 {
+ t.Fatalf("reconcile must not write history, got %d rows", len(rows))
+ }
+}
+
// ---- PluginDeps -------------------------------------------------------------
func TestPluginDeps_PassesStoreAndEncKey(t *testing.T) {
diff --git a/internal/store/deploy_history.go b/internal/store/deploy_history.go
new file mode 100644
index 0000000..de3d8eb
--- /dev/null
+++ b/internal/store/deploy_history.go
@@ -0,0 +1,123 @@
+package store
+
+import (
+ "database/sql"
+ "errors"
+ "fmt"
+)
+
+// InsertDeployHistory appends one row to the per-workload deploy ledger.
+// Callers (the deployer choke point) treat this as best-effort: a failure
+// here must never fail an otherwise-successful deploy. Error is expected to
+// be a fixed, secret-free marker — never the raw source error.
+func (s *Store) InsertDeployHistory(e DeployHistoryEntry) (DeployHistoryEntry, error) {
+ if e.StartedAt == "" {
+ e.StartedAt = Now()
+ }
+ if e.FinishedAt == "" {
+ e.FinishedAt = Now()
+ }
+ res, err := s.db.Exec(
+ `INSERT INTO deploy_history
+ (workload_id, source_kind, reference, reason, triggered_by,
+ note, outcome, error, started_at, finished_at)
+ VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`,
+ e.WorkloadID, e.SourceKind, e.Reference, e.Reason, e.TriggeredBy,
+ e.Note, e.Outcome, e.Error, e.StartedAt, e.FinishedAt,
+ )
+ if err != nil {
+ return DeployHistoryEntry{}, fmt.Errorf("insert deploy history: %w", err)
+ }
+ id, err := res.LastInsertId()
+ if err != nil {
+ return DeployHistoryEntry{}, fmt.Errorf("get deploy history id: %w", err)
+ }
+ e.ID = id
+ return e, nil
+}
+
+// ListDeployHistory returns a workload's ledger newest-first. limit/offset
+// are assumed pre-clamped by the API layer; a non-positive limit falls back
+// to a sane default so a bad query can't return the whole table.
+func (s *Store) ListDeployHistory(workloadID string, limit, offset int) ([]DeployHistoryEntry, error) {
+ if limit <= 0 {
+ limit = 50
+ }
+ if offset < 0 {
+ offset = 0
+ }
+ rows, err := s.db.Query(
+ `SELECT id, workload_id, source_kind, reference, reason, triggered_by,
+ note, outcome, error, started_at, finished_at
+ FROM deploy_history
+ WHERE workload_id = ?
+ ORDER BY id DESC
+ LIMIT ? OFFSET ?`,
+ workloadID, limit, offset,
+ )
+ if err != nil {
+ return nil, fmt.Errorf("query deploy history: %w", err)
+ }
+ defer rows.Close()
+
+ out := make([]DeployHistoryEntry, 0, limit)
+ for rows.Next() {
+ var e DeployHistoryEntry
+ if err := rows.Scan(
+ &e.ID, &e.WorkloadID, &e.SourceKind, &e.Reference, &e.Reason,
+ &e.TriggeredBy, &e.Note, &e.Outcome, &e.Error, &e.StartedAt, &e.FinishedAt,
+ ); err != nil {
+ return nil, fmt.Errorf("scan deploy history: %w", err)
+ }
+ out = append(out, e)
+ }
+ return out, rows.Err()
+}
+
+// GetDeployHistory fetches one ledger row by id, or ErrNotFound. The
+// rollback handler uses this to resolve the pinned reference to replay.
+func (s *Store) GetDeployHistory(id int64) (DeployHistoryEntry, error) {
+ row := s.db.QueryRow(
+ `SELECT id, workload_id, source_kind, reference, reason, triggered_by,
+ note, outcome, error, started_at, finished_at
+ FROM deploy_history WHERE id = ?`, id,
+ )
+ var e DeployHistoryEntry
+ err := row.Scan(
+ &e.ID, &e.WorkloadID, &e.SourceKind, &e.Reference, &e.Reason,
+ &e.TriggeredBy, &e.Note, &e.Outcome, &e.Error, &e.StartedAt, &e.FinishedAt,
+ )
+ if errors.Is(err, sql.ErrNoRows) {
+ return DeployHistoryEntry{}, fmt.Errorf("deploy history %d: %w", id, ErrNotFound)
+ }
+ if err != nil {
+ return DeployHistoryEntry{}, fmt.Errorf("scan deploy history: %w", err)
+ }
+ return e, nil
+}
+
+// PruneDeployHistory keeps only the newest `keep` rows for a workload,
+// deleting older ones. Bounds unbounded growth on hot workloads. Best-
+// effort and id-monotonic (newer rows always have larger ids), so it
+// deletes everything below the keep-th id. A non-positive keep is treated
+// as "keep a sane default" rather than "delete everything".
+func (s *Store) PruneDeployHistory(workloadID string, keep int) error {
+ if keep <= 0 {
+ keep = 50
+ }
+ _, err := s.db.Exec(
+ `DELETE FROM deploy_history
+ WHERE workload_id = ?
+ AND id NOT IN (
+ SELECT id FROM deploy_history
+ WHERE workload_id = ?
+ ORDER BY id DESC
+ LIMIT ?
+ )`,
+ workloadID, workloadID, keep,
+ )
+ if err != nil {
+ return fmt.Errorf("prune deploy history: %w", err)
+ }
+ return nil
+}
diff --git a/internal/store/deploy_history_test.go b/internal/store/deploy_history_test.go
new file mode 100644
index 0000000..11ddd18
--- /dev/null
+++ b/internal/store/deploy_history_test.go
@@ -0,0 +1,133 @@
+package store
+
+import (
+ "errors"
+ "testing"
+)
+
+func seedWorkload(t *testing.T, s *Store, name string) Workload {
+ t.Helper()
+ w, err := s.CreateWorkload(Workload{Kind: "project", RefID: name, Name: name})
+ if err != nil {
+ t.Fatalf("CreateWorkload(%s): %v", name, err)
+ }
+ return w
+}
+
+func TestDeployHistory_InsertListGet(t *testing.T) {
+ s := newTestStore(t)
+ w := seedWorkload(t, s, "app1")
+
+ first, err := s.InsertDeployHistory(DeployHistoryEntry{
+ WorkloadID: w.ID, SourceKind: "image", Reference: "v1",
+ Reason: "manual", TriggeredBy: "admin", Outcome: "success",
+ })
+ if err != nil {
+ t.Fatalf("InsertDeployHistory: %v", err)
+ }
+ if first.ID == 0 {
+ t.Fatal("expected non-zero id")
+ }
+ if first.StartedAt == "" || first.FinishedAt == "" {
+ t.Fatal("expected timestamps to be defaulted")
+ }
+
+ second, _ := s.InsertDeployHistory(DeployHistoryEntry{
+ WorkloadID: w.ID, SourceKind: "image", Reference: "v2",
+ Reason: "registry-push", Outcome: "success",
+ })
+
+ list, err := s.ListDeployHistory(w.ID, 10, 0)
+ if err != nil {
+ t.Fatalf("ListDeployHistory: %v", err)
+ }
+ if len(list) != 2 {
+ t.Fatalf("expected 2 rows, got %d", len(list))
+ }
+ // Newest-first ordering.
+ if list[0].ID != second.ID || list[1].ID != first.ID {
+ t.Fatalf("expected newest-first ordering, got %d then %d", list[0].ID, list[1].ID)
+ }
+
+ got, err := s.GetDeployHistory(first.ID)
+ if err != nil {
+ t.Fatalf("GetDeployHistory: %v", err)
+ }
+ if got.Reference != "v1" || got.SourceKind != "image" {
+ t.Fatalf("unexpected row: %+v", got)
+ }
+}
+
+func TestDeployHistory_GetNotFound(t *testing.T) {
+ s := newTestStore(t)
+ _, err := s.GetDeployHistory(999)
+ if !errors.Is(err, ErrNotFound) {
+ t.Fatalf("expected ErrNotFound, got %v", err)
+ }
+}
+
+func TestDeployHistory_ListScopedToWorkload(t *testing.T) {
+ s := newTestStore(t)
+ a := seedWorkload(t, s, "a")
+ b := seedWorkload(t, s, "b")
+ s.InsertDeployHistory(DeployHistoryEntry{WorkloadID: a.ID, Outcome: "success"})
+ s.InsertDeployHistory(DeployHistoryEntry{WorkloadID: b.ID, Outcome: "success"})
+
+ list, _ := s.ListDeployHistory(a.ID, 10, 0)
+ if len(list) != 1 || list[0].WorkloadID != a.ID {
+ t.Fatalf("expected only workload a's rows, got %+v", list)
+ }
+}
+
+func TestDeployHistory_Pagination(t *testing.T) {
+ s := newTestStore(t)
+ w := seedWorkload(t, s, "paged")
+ for i := 0; i < 5; i++ {
+ s.InsertDeployHistory(DeployHistoryEntry{WorkloadID: w.ID, Outcome: "success"})
+ }
+ page1, _ := s.ListDeployHistory(w.ID, 2, 0)
+ page2, _ := s.ListDeployHistory(w.ID, 2, 2)
+ if len(page1) != 2 || len(page2) != 2 {
+ t.Fatalf("expected 2 per page, got %d and %d", len(page1), len(page2))
+ }
+ if page1[0].ID == page2[0].ID {
+ t.Fatal("expected distinct rows across pages")
+ }
+}
+
+func TestDeployHistory_Prune(t *testing.T) {
+ s := newTestStore(t)
+ w := seedWorkload(t, s, "noisy")
+ for i := 0; i < 10; i++ {
+ s.InsertDeployHistory(DeployHistoryEntry{WorkloadID: w.ID, Outcome: "success"})
+ }
+ if err := s.PruneDeployHistory(w.ID, 3); err != nil {
+ t.Fatalf("PruneDeployHistory: %v", err)
+ }
+ list, _ := s.ListDeployHistory(w.ID, 100, 0)
+ if len(list) != 3 {
+ t.Fatalf("expected 3 rows after prune, got %d", len(list))
+ }
+ // Prune keeps the newest rows.
+ all, _ := s.ListDeployHistory(w.ID, 100, 0)
+ for i := 1; i < len(all); i++ {
+ if all[i-1].ID < all[i].ID {
+ t.Fatal("expected newest-first after prune")
+ }
+ }
+}
+
+func TestDeployHistory_CascadeOnWorkloadDelete(t *testing.T) {
+ s := newTestStore(t)
+ w := seedWorkload(t, s, "doomed")
+ s.InsertDeployHistory(DeployHistoryEntry{WorkloadID: w.ID, Outcome: "success"})
+ s.InsertDeployHistory(DeployHistoryEntry{WorkloadID: w.ID, Outcome: "failure"})
+
+ if err := s.DeleteWorkload(w.ID); err != nil {
+ t.Fatalf("DeleteWorkload: %v", err)
+ }
+ list, _ := s.ListDeployHistory(w.ID, 100, 0)
+ if len(list) != 0 {
+ t.Fatalf("expected history removed with workload, got %d rows", len(list))
+ }
+}
diff --git a/internal/store/models.go b/internal/store/models.go
index c2bcf9e..7bcf4d6 100644
--- a/internal/store/models.go
+++ b/internal/store/models.go
@@ -507,3 +507,28 @@ type App struct {
CreatedAt string `json:"created_at"`
UpdatedAt string `json:"updated_at"`
}
+
+// DeployHistoryEntry is one row in the per-workload deploy ledger. Unlike
+// event_log (free-text human timeline), this is the structured, version-
+// pinned record the rollback action replays from. Reference is the
+// effective deployed artifact handle (image tag for image sources, commit
+// sha for git-built sources, "" when none applies). Error is NEVER the raw
+// source error — that can carry registry-auth bytes or compose stdout; it
+// holds only a fixed, secret-free marker. Raw detail goes to slog.
+type DeployHistoryEntry struct {
+ ID int64 `json:"id"`
+ WorkloadID string `json:"workload_id"`
+ SourceKind string `json:"source_kind"`
+ Reference string `json:"reference"` // effective tag | commit sha | ""
+ Reason string `json:"reason"` // manual|registry-push|git-push|cron|rollback|promote
+ TriggeredBy string `json:"triggered_by"`
+ Note string `json:"note"`
+ Outcome string `json:"outcome"` // success | failure
+ Error string `json:"error"` // generic, secret-free marker on failure
+ StartedAt string `json:"started_at"`
+ FinishedAt string `json:"finished_at"`
+ // Rollbackable is computed at the API layer (not persisted): a row is
+ // rollbackable when it succeeded, has a non-empty Reference, and its
+ // source kind supports reference-pinned redeploy.
+ Rollbackable bool `json:"rollbackable"`
+}
diff --git a/internal/store/stats_by_workload_test.go b/internal/store/stats_by_workload_test.go
new file mode 100644
index 0000000..82956f4
--- /dev/null
+++ b/internal/store/stats_by_workload_test.go
@@ -0,0 +1,56 @@
+package store
+
+import "testing"
+
+func TestListContainerStatsSamplesByWorkload_ScopedToWorkload(t *testing.T) {
+ s := newTestStore(t)
+ wa := seedWorkload(t, s, "wa")
+ wb := seedWorkload(t, s, "wb")
+
+ ca, err := s.CreateContainer(Container{WorkloadID: wa.ID, WorkloadKind: "image", ContainerID: "da", Host: "local", State: "running"})
+ if err != nil {
+ t.Fatalf("CreateContainer a: %v", err)
+ }
+ cb, err := s.CreateContainer(Container{WorkloadID: wb.ID, WorkloadKind: "image", ContainerID: "db", Host: "local", State: "running"})
+ if err != nil {
+ t.Fatalf("CreateContainer b: %v", err)
+ }
+
+ // owner_id is the container ROW id.
+ mustInsertSample(t, s, ca.ID, 100, 12.5, 2048)
+ mustInsertSample(t, s, ca.ID, 200, 15.0, 3072)
+ mustInsertSample(t, s, cb.ID, 150, 99.0, 9999)
+
+ got, err := s.ListContainerStatsSamplesByWorkload(wa.ID, 0)
+ if err != nil {
+ t.Fatalf("ListContainerStatsSamplesByWorkload: %v", err)
+ }
+ if len(got) != 2 {
+ t.Fatalf("expected 2 samples for workload a, got %d", len(got))
+ }
+ // ts ascending.
+ if got[0].TS != 100 || got[1].TS != 200 {
+ t.Fatalf("expected ts-ascending 100,200, got %d,%d", got[0].TS, got[1].TS)
+ }
+ for _, sm := range got {
+ if sm.OwnerID != ca.ID {
+ t.Fatalf("leaked a sample from another workload: %+v", sm)
+ }
+ }
+
+ // Since-cutoff filters older samples.
+ recent, _ := s.ListContainerStatsSamplesByWorkload(wa.ID, 150)
+ if len(recent) != 1 || recent[0].TS != 200 {
+ t.Fatalf("expected only ts=200 after cutoff, got %+v", recent)
+ }
+}
+
+func mustInsertSample(t *testing.T, s *Store, ownerID string, ts int64, cpu float64, mem int64) {
+ t.Helper()
+ if err := s.InsertContainerStatsSample(ContainerStatsSample{
+ ContainerID: "c-" + ownerID, OwnerType: "instance", OwnerID: ownerID, TS: ts,
+ CPUPercent: cpu, MemoryUsage: mem, MemoryLimit: mem * 2,
+ }); err != nil {
+ t.Fatalf("InsertContainerStatsSample: %v", err)
+ }
+}
diff --git a/internal/store/stats_samples.go b/internal/store/stats_samples.go
index 6e9de03..6ce18ef 100644
--- a/internal/store/stats_samples.go
+++ b/internal/store/stats_samples.go
@@ -74,6 +74,43 @@ func (s *Store) ListContainerStatsSamples(ownerType, ownerID string, sinceTS int
return out, rows.Err()
}
+// ListContainerStatsSamplesByWorkload returns every container sample owned by
+// a workload since the given unix timestamp, ordered by ts ascending. Samples
+// are linked to their workload through the containers index (owner_id is the
+// container row id), so this joins through it. Powers the per-workload metrics
+// graph on /apps/[id].
+func (s *Store) ListContainerStatsSamplesByWorkload(workloadID string, sinceTS int64) ([]ContainerStatsSample, error) {
+ rows, err := s.db.Query(
+ `SELECT cs.container_id, cs.owner_type, cs.owner_id, cs.ts,
+ cs.cpu_percent, cs.memory_usage, cs.memory_limit,
+ cs.network_rx, cs.network_tx, cs.block_read, cs.block_write
+ FROM container_stats_samples cs
+ JOIN containers c ON c.id = cs.owner_id
+ WHERE c.workload_id = ? AND cs.ts >= ?
+ ORDER BY cs.ts ASC`,
+ workloadID, sinceTS,
+ )
+ if err != nil {
+ return nil, fmt.Errorf("list container stats samples by workload: %w", err)
+ }
+ defer rows.Close()
+
+ var out []ContainerStatsSample
+ for rows.Next() {
+ var s ContainerStatsSample
+ if err := rows.Scan(
+ &s.ContainerID, &s.OwnerType, &s.OwnerID, &s.TS,
+ &s.CPUPercent, &s.MemoryUsage, &s.MemoryLimit,
+ &s.NetworkRxBytes, &s.NetworkTxBytes,
+ &s.BlockReadBytes, &s.BlockWriteBytes,
+ ); err != nil {
+ return nil, fmt.Errorf("scan container stats sample: %w", err)
+ }
+ out = append(out, s)
+ }
+ return out, rows.Err()
+}
+
// ListAllRecentContainerStatsSamples returns samples across every owner since
// the given unix timestamp, ordered by ts ascending. Used by the system
// dashboard "top containers" widget where the UI wants a mixed pool.
diff --git a/internal/store/store.go b/internal/store/store.go
index 4a52947..e419ea3 100644
--- a/internal/store/store.go
+++ b/internal/store/store.go
@@ -459,6 +459,28 @@ func (s *Store) runMigrations() error {
)`,
`CREATE UNIQUE INDEX IF NOT EXISTS idx_shared_secrets_scope_name ON shared_secrets(scope, app_id, name)`,
`CREATE INDEX IF NOT EXISTS idx_shared_secrets_app ON shared_secrets(app_id)`,
+ // deploy_history: structured, version-pinned ledger of every deploy
+ // dispatch (success AND failure) per workload. Distinct from the
+ // free-text event_log — this carries the replayable `reference` the
+ // rollback action redeploys from. `error` holds only a generic,
+ // secret-free marker (the raw source error can echo registry-auth /
+ // compose stdout, so it goes to slog only). FK cascade is backed by
+ // PRAGMA foreign_keys=ON, but DeleteWorkload also deletes these rows
+ // explicitly (matching the containers cleanup convention).
+ `CREATE TABLE IF NOT EXISTS deploy_history (
+ id INTEGER PRIMARY KEY AUTOINCREMENT,
+ workload_id TEXT NOT NULL REFERENCES workloads(id) ON DELETE CASCADE,
+ source_kind TEXT NOT NULL DEFAULT '',
+ reference TEXT NOT NULL DEFAULT '',
+ reason TEXT NOT NULL DEFAULT '',
+ triggered_by TEXT NOT NULL DEFAULT '',
+ note TEXT NOT NULL DEFAULT '',
+ outcome TEXT NOT NULL DEFAULT '',
+ error TEXT NOT NULL DEFAULT '',
+ started_at TEXT NOT NULL DEFAULT '',
+ finished_at TEXT NOT NULL DEFAULT ''
+ )`,
+ `CREATE INDEX IF NOT EXISTS idx_deploy_history_workload ON deploy_history(workload_id, id DESC)`,
}
for _, t := range observabilityTables {
if _, err := s.db.Exec(t); err != nil {
diff --git a/internal/store/workloads.go b/internal/store/workloads.go
index 9e0df70..7507911 100644
--- a/internal/store/workloads.go
+++ b/internal/store/workloads.go
@@ -190,6 +190,12 @@ func (s *Store) DeleteWorkload(id string) error {
if _, err := tx.Exec(`DELETE FROM containers WHERE workload_id = ?`, id); err != nil {
return fmt.Errorf("delete containers: %w", err)
}
+ // Deploy ledger rows are FK-cascaded, but we delete them explicitly in
+ // the same transaction — consistent with the containers cleanup above
+ // and robust even if the cascade is ever disabled.
+ if _, err := tx.Exec(`DELETE FROM deploy_history WHERE workload_id = ?`, id); err != nil {
+ return fmt.Errorf("delete deploy history: %w", err)
+ }
result, err := tx.Exec(`DELETE FROM workloads WHERE id = ?`, id)
if err != nil {
return fmt.Errorf("delete workload: %w", err)
diff --git a/web/src/lib/api.ts b/web/src/lib/api.ts
index 828a6b4..ddd1bb4 100644
--- a/web/src/lib/api.ts
+++ b/web/src/lib/api.ts
@@ -938,6 +938,66 @@ export function deployPluginWorkload(
return post(`/api/workloads/${id}/deploy`, body ?? {});
}
+// ── Deploy history + rollback ───────────────────────────────────────
+// Structured, version-pinned ledger of every deploy dispatch (success and
+// failure). `rollbackable` is computed server-side: a successful deploy of a
+// source kind that supports reference-pinned redeploy (image today).
+export interface DeployHistoryEntry {
+ id: number;
+ workload_id: string;
+ source_kind: string;
+ reference: string;
+ reason: string;
+ triggered_by: string;
+ note: string;
+ outcome: 'success' | 'failure';
+ error: string;
+ started_at: string;
+ finished_at: string;
+ rollbackable: boolean;
+}
+
+export function fetchWorkloadDeploys(
+ id: string,
+ params?: { limit?: number; offset?: number },
+ signal?: AbortSignal
+): Promise {
+ const query = new URLSearchParams();
+ if (params?.limit) query.set('limit', String(params.limit));
+ if (params?.offset) query.set('offset', String(params.offset));
+ const qs = query.toString();
+ return get(`/api/workloads/${id}/deploys${qs ? `?${qs}` : ''}`, signal);
+}
+
+export function rollbackWorkload(
+ id: string,
+ deployId: number
+): Promise<{ workload_id: string; reference: string; rollback_of: number; triggered_by: string }> {
+ return post(`/api/workloads/${id}/rollback`, { deploy_id: deployId });
+}
+
+// ── Per-workload metrics history ────────────────────────────────────
+// CPU% and memory (bytes) summed across the workload's containers, one
+// point per sampled timestamp. Empty when stats collection is off / Docker
+// was down / the workload is new.
+export interface WorkloadStatsPoint {
+ ts: number;
+ cpu_percent: number;
+ memory_usage: number;
+ memory_limit: number;
+}
+
+export function fetchWorkloadStatsHistory(
+ id: string,
+ window = '2h',
+ signal?: AbortSignal
+): Promise {
+ return get(
+ `/api/workloads/${id}/stats/history?window=${encodeURIComponent(window)}`,
+ signal
+ );
+}
+
export function listHookKinds(signal?: AbortSignal): Promise {
return get('/api/hooks/kinds', signal);
}
diff --git a/web/src/lib/components/DeployHistoryPanel.svelte b/web/src/lib/components/DeployHistoryPanel.svelte
new file mode 100644
index 0000000..07f69d0
--- /dev/null
+++ b/web/src/lib/components/DeployHistoryPanel.svelte
@@ -0,0 +1,190 @@
+
+
+
+
+
+
+
+
+
+