tiny-forge/docs/plans/WORKLOAD_METRICS_GRAPH_PLAN.md

# Per-Workload Metrics Graph — Implementation Plan

**Status:** planned · **Feature rank:** #2 · **Date:** 2026-06-19

## Problem

Stats are collected per container (`container_stats_samples`, CPU/mem/net/disk) and
charted **globally** on the dashboard (`SystemResourcesCard` + `ResourceChart`), but
`/apps/[id]` shows only live snapshots — there's no per-workload "is my app leaking
memory / pegging CPU over the last few hours" view. This is a daily question and the
data already exists; we just need a per-workload query + a panel that reuses the chart.

## Verified facts

- `ContainerStatsSample.OwnerID` == the **container row id** (`containers.id`), confirmed
  by `lookupInstanceName` → `GetContainerByID(sm.OwnerID)` in
  [stats_history.go](../../internal/api/stats_history.go). `OwnerType` ∈ {instance, site}.
- Each sample's `ts` is that container's own Docker-stats `Timestamp.Unix()`
  ([collector.go](../../internal/stats/collector.go)) — NOT one shared tick stamp. In a
  multi-container tick the per-second truncation usually collapses them to the same
  integer `ts`, so per-`ts` aggregation works; a ±1s split at a second boundary is
  cosmetic for a trend line. (Reviewer-corrected.) The handler 404s on an unknown
  workload id but returns `[]` for a known workload with no samples yet.
- `ResourceChart.svelte` takes a fully-built `EChartsOption` from the parent; the parent
  owns series/axes (see `SystemResourcesCard`). Reads stay available when Docker is down
  (samples come from SQLite, not the daemon).
- Per-workload reads (`/events`, `/runtime-state`) are open to any authenticated user;
  this endpoint follows suit (no `AdminOnly`).

## Backend

1. **Store** — `ListContainerStatsSamplesByWorkload(workloadID string, sinceTS int64)`:
   ```sql
   SELECT cs.container_id, cs.owner_type, cs.owner_id, cs.ts,
          cs.cpu_percent, cs.memory_usage, cs.memory_limit,
          cs.network_rx, cs.network_tx, cs.block_read, cs.block_write
   FROM container_stats_samples cs
   JOIN containers c ON c.id = cs.owner_id
   WHERE c.workload_id = ? AND cs.ts >= ?
   ORDER BY cs.ts ASC
   ```
   Returns `[]ContainerStatsSample`.

2. **API** — `getWorkloadStatsHistory` (GET `/api/workloads/{id}/stats/history?window=`):
   reuse `parseWindow`/`sinceTimestamp`; aggregate samples **per ts** into a compact
   series so multi-container workloads (compose) sum correctly:
   ```go
   type workloadStatsPoint struct {
       TS          int64   `json:"ts"`
       CPUPercent  float64 `json:"cpu_percent"`   // sum across the workload's containers
       MemoryUsage int64   `json:"memory_usage"`  // sum bytes
       MemoryLimit int64   `json:"memory_limit"`  // max (effective ceiling)
   }
   ```
   Always returns `[]` (never 503) — empty when stats are disabled / Docker was down /
   the workload is new. Register in the `/workloads/{id}` route block.

3. **Tests** — store: join scopes to the right workload (A's samples ≠ B's); API:
   per-ts aggregation sums two containers at the same tick.

## Frontend

4. **api.ts** — `WorkloadStatsPoint` type + `fetchWorkloadStatsHistory(id, window, signal)`.
5. **`WorkloadMetricsPanel.svelte`** — window selector (30m / 2h / 6h), fetch + 15s poll
   (mirror `SystemResourcesCard`), build an `EChartsOption` with **two series**: CPU %
   on the left axis, Memory (MiB) on the right axis (absolute bytes, because
   `memory_limit` is often 0/unlimited so a % would divide by zero). `EmptyState`/ hint
   when there are no samples. Render via `ResourceChart`. Mount on `/apps/[id]` near the
   deploy-history panel.
6. **i18n** — `apps.detail.metrics.*` in both en.json and ru.json (parity mandatory).

## Risks / mitigations

- **Docker down / stats disabled** → empty series, friendly hint (no error). SQLite read
  path is independent of the daemon.
- **memory_limit = 0 (unlimited)** → plot absolute MiB, not %, to avoid div-by-zero.
- **Sparse sampling** → chart shows whatever ticks exist; window selector lets the user
  widen. No interpolation.
- **Auth** → read-only, any authenticated user (consistent with other per-workload reads).

## Rollout

Single change set, additive, no migration. Reuses the existing `echarts` dependency and
`ResourceChart` component.