0c4c338bfe
Two additions to the app detail page, each backed by a per-workload
endpoint.
Deploy history + rollback:
- New deploy_history table — a structured, version-pinned ledger of every
dispatch (success AND failure), distinct from the free-text event_log.
Recorded at the single DispatchPlugin choke point so every source kind
is covered. The raw deploy error is never persisted (it can carry
registry-auth / compose-stdout secrets) — only a generic marker, with
detail going to slog. Pruned to the newest N per workload; cascade-
deleted with the workload.
- GET /api/workloads/{id}/deploys lists the ledger; POST .../rollback
(admin) replays a prior successful deploy's pinned reference as a
rollback-reason dispatch. Phase 1 is image-source only (RollbackCapable);
git-built sources need checkout-by-commit, a later phase.
- DeployHistoryPanel.svelte renders the ledger with confirm-gated rollback.
Per-workload metrics:
- ListContainerStatsSamplesByWorkload joins the existing container stats
samples through the containers index; GET /api/workloads/{id}/stats/history
aggregates CPU/memory per timestamp across the workload's containers.
- WorkloadMetricsPanel.svelte reuses ResourceChart (CPU% + memory MiB,
windowed, 15s poll).
en/ru i18n added with parity. Tests: store CRUD + cascade + workload-scoped
join, deployer recording (incl. secret-non-leak on failure), API rollback
guards, and per-timestamp aggregation. Plans under docs/plans/.
85 lines
4.2 KiB
Markdown
85 lines
4.2 KiB
Markdown
# Per-Workload Metrics Graph — Implementation Plan
|
|
|
|
**Status:** planned · **Feature rank:** #2 · **Date:** 2026-06-19
|
|
|
|
## Problem
|
|
|
|
Stats are collected per container (`container_stats_samples`, CPU/mem/net/disk) and
|
|
charted **globally** on the dashboard (`SystemResourcesCard` + `ResourceChart`), but
|
|
`/apps/[id]` shows only live snapshots — there's no per-workload "is my app leaking
|
|
memory / pegging CPU over the last few hours" view. This is a daily question and the
|
|
data already exists; we just need a per-workload query + a panel that reuses the chart.
|
|
|
|
## Verified facts
|
|
|
|
- `ContainerStatsSample.OwnerID` == the **container row id** (`containers.id`), confirmed
|
|
by `lookupInstanceName` → `GetContainerByID(sm.OwnerID)` in
|
|
[stats_history.go](../../internal/api/stats_history.go). `OwnerType` ∈ {instance, site}.
|
|
- Each sample's `ts` is that container's own Docker-stats `Timestamp.Unix()`
|
|
([collector.go](../../internal/stats/collector.go)) — NOT one shared tick stamp. In a
|
|
multi-container tick the per-second truncation usually collapses them to the same
|
|
integer `ts`, so per-`ts` aggregation works; a ±1s split at a second boundary is
|
|
cosmetic for a trend line. (Reviewer-corrected.) The handler 404s on an unknown
|
|
workload id but returns `[]` for a known workload with no samples yet.
|
|
- `ResourceChart.svelte` takes a fully-built `EChartsOption` from the parent; the parent
|
|
owns series/axes (see `SystemResourcesCard`). Reads stay available when Docker is down
|
|
(samples come from SQLite, not the daemon).
|
|
- Per-workload reads (`/events`, `/runtime-state`) are open to any authenticated user;
|
|
this endpoint follows suit (no `AdminOnly`).
|
|
|
|
## Backend
|
|
|
|
1. **Store** — `ListContainerStatsSamplesByWorkload(workloadID string, sinceTS int64)`:
|
|
```sql
|
|
SELECT cs.container_id, cs.owner_type, cs.owner_id, cs.ts,
|
|
cs.cpu_percent, cs.memory_usage, cs.memory_limit,
|
|
cs.network_rx, cs.network_tx, cs.block_read, cs.block_write
|
|
FROM container_stats_samples cs
|
|
JOIN containers c ON c.id = cs.owner_id
|
|
WHERE c.workload_id = ? AND cs.ts >= ?
|
|
ORDER BY cs.ts ASC
|
|
```
|
|
Returns `[]ContainerStatsSample`.
|
|
|
|
2. **API** — `getWorkloadStatsHistory` (GET `/api/workloads/{id}/stats/history?window=`):
|
|
reuse `parseWindow`/`sinceTimestamp`; aggregate samples **per ts** into a compact
|
|
series so multi-container workloads (compose) sum correctly:
|
|
```go
|
|
type workloadStatsPoint struct {
|
|
TS int64 `json:"ts"`
|
|
CPUPercent float64 `json:"cpu_percent"` // sum across the workload's containers
|
|
MemoryUsage int64 `json:"memory_usage"` // sum bytes
|
|
MemoryLimit int64 `json:"memory_limit"` // max (effective ceiling)
|
|
}
|
|
```
|
|
Always returns `[]` (never 503) — empty when stats are disabled / Docker was down /
|
|
the workload is new. Register in the `/workloads/{id}` route block.
|
|
|
|
3. **Tests** — store: join scopes to the right workload (A's samples ≠ B's); API:
|
|
per-ts aggregation sums two containers at the same tick.
|
|
|
|
## Frontend
|
|
|
|
4. **api.ts** — `WorkloadStatsPoint` type + `fetchWorkloadStatsHistory(id, window, signal)`.
|
|
5. **`WorkloadMetricsPanel.svelte`** — window selector (30m / 2h / 6h), fetch + 15s poll
|
|
(mirror `SystemResourcesCard`), build an `EChartsOption` with **two series**: CPU %
|
|
on the left axis, Memory (MiB) on the right axis (absolute bytes, because
|
|
`memory_limit` is often 0/unlimited so a % would divide by zero). `EmptyState`/ hint
|
|
when there are no samples. Render via `ResourceChart`. Mount on `/apps/[id]` near the
|
|
deploy-history panel.
|
|
6. **i18n** — `apps.detail.metrics.*` in both en.json and ru.json (parity mandatory).
|
|
|
|
## Risks / mitigations
|
|
|
|
- **Docker down / stats disabled** → empty series, friendly hint (no error). SQLite read
|
|
path is independent of the daemon.
|
|
- **memory_limit = 0 (unlimited)** → plot absolute MiB, not %, to avoid div-by-zero.
|
|
- **Sparse sampling** → chart shows whatever ticks exist; window selector lets the user
|
|
widen. No interpolation.
|
|
- **Auth** → read-only, any authenticated user (consistent with other per-workload reads).
|
|
|
|
## Rollout
|
|
|
|
Single change set, additive, no migration. Reuses the existing `echarts` dependency and
|
|
`ResourceChart` component.
|