diff --git a/cmd/server/main.go b/cmd/server/main.go index eb91f71..12b74be 100644 --- a/cmd/server/main.go +++ b/cmd/server/main.go @@ -373,8 +373,9 @@ func main() { poller.Stop() statsCollector.Stop() - // Drain in-progress deploys and notifications. + // Drain in-progress deploys, site syncs, and notifications. dep.Drain() + webhookHandler.Drain() notifier.Drain() // Shut down HTTP server. diff --git a/docs/reviews/functionality-review-2026-05-07.md b/docs/reviews/functionality-review-2026-05-07.md new file mode 100644 index 0000000..87237e1 --- /dev/null +++ b/docs/reviews/functionality-review-2026-05-07.md @@ -0,0 +1,464 @@ +# Functionality Review — 2026-05-07 + +Last 5 commits reviewed: + +1. `05440a5` feat(stats): resource metrics dashboard + sites logs/stats +2. `0632f51` feat(webhook): per-project and per-site webhook URLs +3. `e08acf5` refactor(settings): split General into focused pages +4. `03d58a0` fix: treat naive backend timestamps as UTC for relative labels +5. `90e6e59` feat: daemon health panel, brand-rail status chips, user timezone selector + +Method: desk review of `git diff HEAD~5 HEAD` plus targeted reads of large +new components. No dev-server execution. Citations use absolute paths. + +## TL;DR + +- **Stats dashboard, daemon panel, timezone selector, settings split, and + per-entity webhooks all wire end-to-end** — every Go endpoint added in + these commits has a Svelte caller, every new field on the settings/health + shapes is rendered, and i18n is parallel-keyed in `en.json` and `ru.json`. +- **One real flow gap:** the `WebhookPanel` confirm button (Project/Site + detail) does not auto-close when regenerate succeeds in the "no current + URL" case — it stays open until the user manually cancels. Minor. +- **i18n is 99 % complete but three hardcoded English fallbacks slipped in:** + `'Docker daemon is not reachable.'` in `SystemDaemonsCard.svelte:98`, and + `'Service status'` / `'Close sidebar'` aria-labels plus `'Docker daemon · … + reachable'` / `'Proxy unreachable'` tooltips in `+layout.svelte` (lines + 194, 201, 208, 225). All three are user-visible. +- **Stats collector skips ticks when Docker is unreachable** but still calls + `prune` — confirmed safe, but the very first sample after a Docker outage + will show no system row for the outage window. Acceptable; documented in + code. +- **Naive-UTC fix has full reach:** the fix lives in `toDate()` inside + `web/src/lib/format/datetime.ts:34-46`, so every one of the 15 components + that goes through `$fmt.*` benefits. `InstanceCard` was the only file + that had its own ad-hoc parser; that parser is removed. + +## Feature: Resource Metrics Dashboard (05440a5) + +**What it claims:** background CPU/memory/network/block I/O collector with +configurable interval (5–300s, default 15) and retention (0–24h, default +2h). New host snapshot/history/top-N API endpoints, ECharts visualisation, +sites logs/stats reuse instance components, Docker-down 503 handling. + +**What works** + +- Collector lives in `internal/stats/collector.go:50-309`. It re-reads + settings every tick (`run`/`readConfig`), so `/settings/maintenance` + changes propagate within one tick. `interval=0` legitimately disables + collection (`run` polls settings every minute in that branch). +- API endpoints and routing are wired: `internal/api/router.go:222,289-291,341-343` + mounts `/api/system/stats`, `/api/system/stats/history`, + `/api/system/stats/top`, plus the per-instance and per-site + `/stats/history` endpoints, all behind the auth middleware. +- Frontend has matching helpers in `web/src/lib/api.ts:683-731` + (`fetchSystemStats`, `fetchSystemStatsHistory`, `fetchTopContainers`, + `fetchInstanceStatsHistory`, `fetchStaticSiteStats(s)History`, + `fetchStaticSiteLogs`). +- `SystemResourcesCard.svelte:33-52` uses `Promise.allSettled` so a 503 on + the live snapshot does not blank out history (which is read from SQLite + and remains valid). Docker-unavailable detection at line 67 produces an + amber banner with the i18n key `resources.dockerUnavailable`. +- `ContainerStats.svelte:13-15` and `ContainerLogs.svelte:14-16` define the + `StatsSource`/`LogSource` discriminated unions exactly as the commit + message describes; the site detail page uses both at + `web/src/routes/sites/[id]/+page.svelte:255-279`. +- 30 m / 2 h / 6 h / 24 h window picker exists at + `SystemResourcesCard.svelte:213-220`. `parseWindow` in + `internal/api/stats_history.go:21-37` clamps any value to ≤ 24 h, so a + hand-crafted `?window=999h` query returns the maxed window (good). +- History persistence survives backend restart — samples live in SQLite + (`container_stats_samples`, `system_stats_samples`); migrations in + `internal/store/store.go:128-180` create them additively with + `IF NOT EXISTS`. + +**Gaps / broken flows** + +- **Top-consumer rows are unlabelled by name.** `SystemResourcesCard.svelte:259-264` + shows only `s.container_id.slice(0,12)` plus an `instance | site` chip. + No project/site name, so identifying the offender requires manual lookup. + Backend already knows `owner_id`; resolving to a friendly name would be a + one-extra-fetch fix. +- **No "stats off" UI hint.** When `stats_interval_seconds=0`, the + collector idles and history endpoints return `[]`. Frontend just shows + the "no samples yet" empty state with the *default* interval (15s) + hardcoded in the message (`resources.noSamples` in `en.json:51`, + `ru.json:51`) — it does not detect that collection is disabled. Users + who toggle stats off will see a confusing "samples every 15s" message + forever. +- **Stats settings live on Maintenance page, not on a dedicated card.** + `web/src/routes/settings/maintenance/+page.svelte:117-132` has 4 fields + (stale, prune, stats interval, stats retention) sharing one Save button. + Not broken, but "Stats collection" is *not* maintenance — it's a runtime + observability feature. Worth a follow-up split. +- **Top endpoint silently filters to last 2 minutes** (`stats_history.go:178`). + If the collector interval is 300 s, two of the last three minutes have no + samples and the top widget will look empty. Window should grow with + interval, e.g. `max(2*interval, 2m)`. + +**API/UI consistency** + +- All snake_case ↔ snake_case (Go `json:"…"` tags match the TS types in + `web/src/lib/types.ts:464-516`). Spot-checked + `ContainerStatsSample`, `SystemStats`, `SystemStatsSample` — perfect + alignment. +- One subtle naming asymmetry: in `SystemStats` (live snapshot) the field + is `disk_total_bytes` and category breakdowns are `disk_images_bytes` etc.; + in `SystemStatsSample` (history row) the field is just `disk_total_bytes` + with no breakdown. The chart only uses workload CPU/memory percent, so + this is fine, but a future "disk over time" chart would have to either + query the live snapshot or the schema would have to grow. + +**i18n** + +- Full coverage. New keys live under `dashboard`, `resources`, and + `statsSettings` namespaces, mirrored in `ru.json:42-87`. No untranslated + strings in the touched files. + +## Feature: Per-Project and Per-Site Webhook URLs (0632f51) + +**What it claims:** replace global `settings.webhook_secret` with per-row +secrets on `projects` and `static_sites`; remove webhook-driven autocreate; +make site `sync_trigger=push|tag` actually trigger a sync. + +**What works** + +- Migration is additive and safe: + `internal/store/store.go:131-138` adds `webhook_secret TEXT NOT NULL DEFAULT ''` + to both tables and creates **partial unique indexes** (`WHERE webhook_secret != ''`) + at `store.go:240-241`, so multiple legacy rows with empty secrets do not + collide. +- Lazy backfill via `EnsureProjectWebhookSecret` / + `EnsureStaticSiteWebhookSecret` (`internal/store/projects.go:158-171`, + `internal/store/static_sites.go:296-308`). UI calls `GET /webhook` first, + which triggers backfill — old projects "just work" the first time you + open them. +- Routing in `internal/webhook/handler.go:127-133`: + `POST /api/webhook/{secret}` for projects, `POST /api/webhook/sites/{secret}` + for sites. Both return 404 for unknown/empty secrets (no information leak). + The order (`/sites/{secret}` first, then `/{secret}`) is correct chi-wise + because the literal `sites` segment beats the catch-all. +- `siteRefMatches` (`internal/webhook/matcher.go:46-90`) implements push and + tag separately, with empty-Branch ⇒ accept-any-heads, and empty-TagPattern + ⇒ `*`. Manual sites short-circuit at `handler.go:295-303`. +- Tests cover both happy and sad paths: + - `internal/webhook/matcher_test.go` (push, tag, manual, empty branch, + `ParseImageRef` cases) + - `internal/webhook/handler_test.go` (unknown-secret 404, image mismatch, + no-stage-match 200/skip, site push match, site manual skip, + site branch mismatch). +- `WebhookPanel.svelte` is generic, used by both detail pages + (`projects/[id]/+page.svelte:771-776`, `sites/[id]/+page.svelte:283-288`). + Absolutises the URL with `window.location.origin` at line 30 so users can + copy a working URL. +- Old global routes removed: no `/api/settings/webhook-url` or + `/api/settings/webhook-url/regenerate` in the diff (router.go:387-388 + shows the deletion). + +**Gaps / broken flows** + +- **WebhookPanel race / minor UX**: `handleRegenerate` (lines 47-57) hides + the confirm strip *before* the network call. If the call fails, the user + sees the toast but the regenerate button reappears with no inline state. + Acceptable, but a "retry" affordance would help. +- **Project image guardrail bypass when `project.Image` is empty.** + `handler.go:206-214`: the check is `if project.Image != "" && !imageMatches(...)`. + A project with an unset image accepts *any* image. Fine if treated as + intentional (commit message says guardrail is misconfig protection, not + security), but worth flagging. +- **No "test webhook" button anywhere.** With per-entity URLs, users have + no way to verify before pointing CI at it. The git diff doesn't add a + ping endpoint either. Follow-up. +- **Settings › Integrations page has a dead-end card** for incoming + webhooks (`integrations/+page.svelte:91-94`): just text saying "go to + the project page". No link, no list of projects. Adds friction. + +**API/UI consistency** + +- `WebhookUrlResponse` shape matches between Go (`internal/api/webhooks.go:17-20`) + and TS (`web/src/lib/api.ts:325-328`). +- `Project.WebhookSecret` and `StaticSite.WebhookSecret` use `json:"-"` + (`internal/store/models.go:14, 253`) — secrets never leak through the + general project/site list endpoints. Good. + +**i18n** + +- New keys `projectDetail.webhookTitle/webhookDesc`, `sites.webhookTitle/webhookDesc`, + `webhookPanel.*`, `settingsIntegrations.*` exist in both `en.json` and + `ru.json`. Verified parallel structure. + +## Feature: Settings Page Split (e08acf5) + +**What it claims:** split the 547-line `settings/+page.svelte` into +focused pages; group the sidebar; each page does its own partial PUT. + +**Sidebar groups** (from `+layout.svelte:32-50` and `64-72`): + +- *Overview*: General, Integrations +- *Routing*: Registries, NPM/Traefik (conditional), DNS +- *System*: Maintenance, Backups +- *Security*: Authentication + +**Old setting → new page mapping** + +| Old setting (HEAD~5 `+page.svelte`) | New location | Status | +|---|---|---| +| Domain / Server IP / Public IP | `/settings` (Overview) | ✓ kept | +| Network / Subdomain pattern | `/settings` | ✓ kept | +| Polling interval / Base volume path | `/settings` | ✓ kept | +| Notification URL | `/settings/integrations` | ✓ moved | +| Stale threshold | `/settings/maintenance` | ✓ moved | +| Image prune threshold | `/settings/maintenance` (Danger zone card) | ✓ moved | +| Prune Images button | `/settings/maintenance` | ✓ moved into separate Danger card | +| Wildcard DNS / Cloudflare token / Zone | `/settings/dns` | ✓ moved | +| Test DNS connection | `/settings/dns` | ✓ moved | +| Proxy provider radio | `/settings` | ✓ kept (with link to /settings/{npm|traefik}) | +| **Global webhook URL** | n/a — feature removed (per-entity now) | ✓ intentional | +| Stats interval / retention (NEW) | `/settings/maintenance` | ✓ added in same commit's diff | + +**Verdict:** every setting from the old page is reachable. Nothing +orphaned. Credentials page (`/settings/credentials/+page.svelte`) was +deleted and the sidebar entry was already gone at HEAD~5, so no broken +link. Tested: the sidebar's `provider`-conditional NPM / Traefik items +still work (`+layout.svelte:54-55`). + +**Gaps / broken flows** + +- **Each page issues an independent `getSettings()` on mount.** Navigating + through the sidebar reloads the entire 30-field settings blob each time. + Not broken, but a shared cache or layout-level fetch would halve the + payload. Follow-up. +- **Save scoping is correct** — each page builds a `Partial` of + only its own keys (e.g. `maintenance/+page.svelte:54-59`). Confirmed by + reading all four split pages. +- **DNS page does not have an inline link to fall back from "test failed"** + to the General/proxy page. Minor. + +**i18n** + +- New `settings.groupMain/groupProxy/groupSystem/groupSecurity`, + `settingsDns.*`, `settingsIntegrations.*`, `settingsMaintenance.*`, + `statsSettings.*`, `settingsGeneral.globalConfigDesc/configureNpm/...` + all present in both locales. + +## Fix: Naive UTC Timestamp Handling (03d58a0) + +**Reach:** the fix is in `toDate()` (`web/src/lib/format/datetime.ts:34-46`) +via `normalizeIsoUtc`. **Every** consumer of `$fmt.*` therefore inherits +the fix: + +``` +web/src/routes/+layout.svelte +web/src/routes/+page.svelte +web/src/routes/projects/+page.svelte +web/src/routes/projects/[id]/+page.svelte +web/src/routes/projects/[id]/volumes/[volId]/browse/+page.svelte +web/src/routes/sites/+page.svelte +web/src/routes/sites/[id]/+page.svelte +web/src/routes/stacks/+page.svelte +web/src/routes/stacks/[id]/+page.svelte +web/src/routes/settings/backup/+page.svelte +web/src/lib/components/EventLogEntry.svelte +web/src/lib/components/InstanceCard.svelte +web/src/lib/components/StaleContainerCard.svelte +web/src/lib/components/TimezoneSelector.svelte +``` + +**Audit for stragglers:** `Grep new Date(` across the frontend returns 5 +files. Two are inside `format/datetime.ts` and `stores/timezone.ts` (the +fix itself); two are in the `TimezoneSelector` and `+layout.svelte` clock +ticker (`new Date()` with no input — current time, not affected); one is +`routes/events/+page.svelte:55` building a `since` *query parameter* that +is sent to the backend, never displayed. Conclusion: **fix has 100 % reach +for displayed timestamps**. + +`InstanceCard.svelte` lost its private `timeSinceCreated` parser +(commit diff lines 32-43); now uses `$fmt.relative(instance.created_at)`. + +## Feature: Daemon Health Panel + Timezone Selector (90e6e59) + +### Daemon health panel + +**What it claims:** rich Docker /info + /version + NPM aggregates exposed +via `/api/health`; status chips moved into the brand block; new +`SystemDaemonsCard` on the dashboard; shared health store de-duplicates +the 30 s poll. + +**What works** + +- `GET /api/health` (`internal/api/health.go:6-39`) now returns + `database`, `docker` (+ rich info), and conditionally `proxy` (with + NPM aggregates). 8 s timeout, NPM fields fetched only when ping succeeds + so an offline proxy doesn't amplify latency. +- `health.ts:38-66` shared store with single 30 s poll; the layout + consumes it via `$health.docker/proxy/checked` (`+layout.svelte:53-56`) + and `SystemDaemonsCard.svelte:13-19` does the same. No duplicate + fetches — verified by the `inFlight` guard at `health.ts:37`. +- Both panels render the rich payload: container running/paused/stopped + stacked bar, version/api/platform/kernel/cpu/memory/storage/images, + latency, root dir. Proxy panel shows total vs managed proxy hosts (with + proportion meter), access lists, certificates. +- Brand-rail chips at `+layout.svelte:201-242` show DKR + NPM/TRF, with + pulse animation classes (`chip-live`/`chip-down`), running container + count, and proxy host count. Click on a down chip toggles `hintsExpanded`. + +**Daemons checked, by name:** + +- **Docker Engine** — connected via socket; "unhealthy" means the ping + failed (text from `Ping`) or the client wasn't initialised. The user + hint is `daemons.dockerHint` ("Check that the Docker daemon is running…"). +- **Proxy provider** — only checked when one is configured (NPM or Traefik). + "Unhealthy" means `Ping` failed; the panel surfaces `proxy.error` and + the configured URL. If proxy_provider=`none`, panel shows + "Not configured" with a CTA link to `/settings`. +- **Database** — included in the JSON response but not surfaced on the + daemons card. The brand-rail also does not show a DB chip; if SQLite + is unreachable the chip rail goes "BOOT" forever (since + `health.ts:50-57` falls back to `prev.docker ?? {connected:false}` and + drops `database`). Minor — but a permanently-unreachable SQLite would + leave the user wondering why everything is dead with no indicator. + +**Gaps / broken flows** + +- **Hardcoded English fallbacks** (i18n leak): + - `web/src/routes/+layout.svelte:194` `aria-label="Close sidebar"` (was already English) + - `web/src/routes/+layout.svelte:201` `aria-label="Service status"` (new in this commit) + - `web/src/routes/+layout.svelte:208` tooltip + `` `Docker daemon · ${dockerHealth?.version ?? 'reachable'}` `` — + "Docker daemon" and "reachable" are English literals; commit added this code + - `web/src/routes/+layout.svelte:208` fallback `'Docker unreachable'` + - `web/src/routes/+layout.svelte:225` fallback `'Proxy unreachable'` + - `web/src/lib/components/SystemDaemonsCard.svelte:98` fallback + `'Docker daemon is not reachable.'` +- **Refresh button has no debounce window**, only an in-flight guard + (`SystemDaemonsCard.svelte:53-61`). Spamming it triggers serial calls. + Acceptable. +- **No DB-down indicator** anywhere visible to the user. Edge case but + worth noting. + +**API/UI consistency** + +- All Docker fields the frontend consumes (`web/src/lib/types.ts:258-285`) + are emitted by `dockerHealth` in `internal/api/health.go:60-100`. Cross-checked + every key (version, api_version, os, arch, kernel, storage_driver, root_dir, + ncpu, memory_total, containers, running, paused, stopped, images, + latency_ms). Matches. +- `ProxyHealth` TS shape (`types.ts:289-296`) matches Go fields: + `provider`, `connected`, `error`, `latency_ms`, `url`, `proxy_hosts`, + `proxy_hosts_managed`, `access_lists`, `certificates`. Matches. + +**i18n** + +- `daemons.*` namespace fully translated in both `en.json:917-953` and + `ru.json:917-953` (parallel keys verified). The hardcoded strings above + are the only gaps. + +### Timezone selector + +**What it claims:** user IANA timezone preference with auto-detect, +applied across all `$fmt.*` rendering, persisted in localStorage. + +**Persistence** + +- Stored at `localStorage.dw_timezone` via subscriber on the `timezonePreference` + writable (`web/src/lib/stores/timezone.ts:12,55-59`). Re-read on next page + load by `getInitialPreference` (lines 44-50). Validates the IANA string + before accepting it, falling back to `auto`. +- "Auto" is a sentinel; `effectiveTimezone` derives a concrete IANA zone + from `Intl.DateTimeFormat().resolvedOptions().timeZone` on every read + (lines 66-69), so changing browser zone with auto enabled re-resolves. + +**Application reach** + +- `effectiveTimezone` is consumed by `makeFormatters` in `datetime.ts:117-119`, + which is the single source for the entire `$fmt` reactive store. Every + `$fmt.dateTime`, `$fmt.date`, `$fmt.relative` etc. respects the user + zone. **Verified across all 15 consumers listed under the naive-UTC fix + section.** +- One subtle case: `$fmt.relative` is timezone-independent (`datetime.ts:142-156`), + which is correct — "5 m ago" doesn't depend on display zone. + +**Gaps / broken flows** + +- **Selector lives only on `/settings`.** Reasonable home, but no quick + "switch zone" affordance from the brand rail or top bar; you have to + navigate. Minor. +- **No backend record.** The preference is browser-local, so logging in + on a fresh device shows server time. Commit message acknowledges this + ("purely client-side preference"). Acceptable. + +**i18n** + +- Full `timezone.*` namespace in both locales (`en.json:1117-1136`, + `ru.json:1117-1136`). Picker placeholder is translated. + +## Cross-cutting Issues + +### i18n leaks + +Three runtime strings in user-visible places are still English-only: + +1. `web/src/routes/+layout.svelte:201` `aria-label="Service status"` (new) +2. `web/src/routes/+layout.svelte:208,225` chip tooltips include + English literals (`'Docker daemon'`, `'reachable'`, `'Docker unreachable'`, + `'Proxy unreachable'`). +3. `web/src/lib/components/SystemDaemonsCard.svelte:98` fallback message + when `docker.error` is empty. + +`+layout.svelte:194` (`Close sidebar`) was already English at HEAD~5; not a +regression but worth fixing while in the area. + +### Naming consistency + +- Backend uses `snake_case` JSON tags everywhere (`disk_total_bytes`, + `latency_ms`, `proxy_hosts_managed`). TypeScript interfaces use the same. + No drift detected. +- One naming asymmetry: `Settings.WebhookSecret` was deleted from the + Go struct — clean removal. `internal/store/static_sites.go:233`, + `projects.go:53` use new column. SQLite column `webhook_secret` on + `settings` table is left alone (per the migration comment); no row + emits it, so it's dead weight but harmless. + +### Dashboard polling + +`SystemResourcesCard` polls every 15 s on its own (`SystemResourcesCard.svelte:79`). +`ContainerStats` polls every 30 s. `health` store polls every 30 s. +`navCounts` store polls separately. Multiple uncoordinated timers; OK in +practice, but a future optimisation candidate. + +### Confirm dialog UX + +Both `WebhookPanel` and the maintenance "Prune Images" Danger zone use +inline confirms / `ConfirmDialog`. Consistent. The brand-rail "click a down +chip to expand hints" is a third confirm-ish pattern, fine but not +discoverable. + +## Suggested Follow-ups (prioritized) + +1. **Localise the three hardcoded English strings** in + `web/src/routes/+layout.svelte:194,201,208,225` and + `SystemDaemonsCard.svelte:98`. ~15 min, replaces 5 literals with + `$t('daemons.…')` keys (which already exist for most cases — e.g. + `daemons.docker`, `daemons.offline`). +2. **Add owner-name resolution to the "top consumers" widget** + (`SystemResourcesCard.svelte:259-264`). Currently only a 12-char ID + + `instance|site` chip; users have no way to know which container is + spiking. +3. **Detect "stats collection disabled" (`stats_interval_seconds=0`) and + tailor the empty-state message** in `SystemResourcesCard.svelte` + instead of always saying "samples every 15 s". +4. **Remove the dead `webhook_secret` column on `settings`** in a future + destructive migration window, OR officially document it as deprecated + in the schema comment. +5. **Add a "Test webhook" button to `WebhookPanel.svelte`** — POSTs a + minimal payload to the URL and surfaces the response. Replaces + guesswork when wiring CI. +6. **Add a DB-down indicator** to the brand rail (a 3rd chip "DB"). The + data is already in `/api/health`; only the UI needs the chip. +7. **Top-N samples 2-minute window** in `internal/api/stats_history.go:178` + should scale with collector interval (`max(2*interval, 2m)`) so users + on slow intervals don't see a falsely-empty widget. +8. **Settings › Integrations dead-end card** — link to the Projects and + Sites lists rather than just text saying "go look there". +9. **Auto-close the WebhookPanel confirm strip on success** (it already + resets, but the strip stays visible until the user clicks Cancel). diff --git a/go.mod b/go.mod index 47b6330..34b8649 100644 --- a/go.mod +++ b/go.mod @@ -1,8 +1,6 @@ module github.com/alexei/tinyforge -go 1.24.0 - -toolchain go1.25.0 +go 1.25.0 require ( github.com/coreos/go-oidc/v3 v3.11.0 @@ -43,6 +41,7 @@ require ( go.opentelemetry.io/otel/metric v1.35.0 // indirect go.opentelemetry.io/otel/trace v1.35.0 // indirect golang.org/x/mod v0.18.0 // indirect + golang.org/x/sync v0.20.0 // indirect golang.org/x/sys v0.33.0 // indirect golang.org/x/tools v0.22.0 // indirect modernc.org/libc v1.55.3 // indirect diff --git a/go.sum b/go.sum index 1bab368..6d1a0fa 100644 --- a/go.sum +++ b/go.sum @@ -87,6 +87,8 @@ golang.org/x/oauth2 v0.25.0 h1:CY4y7XT9v0cRI9oupztF8AgiIu99L/ksR/Xp/6jrZ70= golang.org/x/oauth2 v0.25.0/go.mod h1:XYTD2NtWslqkgxebSiOHnXEap4TF09sJSc7H1sXbhtI= golang.org/x/sync v0.7.0 h1:YsImfSBoP9QPYL0xyKJPq0gcaJdG3rInoqxTWbfQu9M= golang.org/x/sync v0.7.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk= +golang.org/x/sync v0.20.0 h1:e0PTpb7pjO8GAtTs2dQ6jYa5BWYlMuX047Dco/pItO4= +golang.org/x/sync v0.20.0/go.mod h1:9xrNwdLfx4jkKbNva9FpL6vEN7evnE43NNNJQ2LF3+0= golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/sys v0.33.0 h1:q3i8TbbEz+JRD9ywIRlyRAQbM0qF7hu24q3teo2hbuw= golang.org/x/sys v0.33.0/go.mod h1:BJP2sWEmIv4KK5OTEluFJCKSidICx8ciO85XgH3Ak8k= diff --git a/internal/api/docker.go b/internal/api/docker.go index 5b6db93..80a6f4c 100644 --- a/internal/api/docker.go +++ b/internal/api/docker.go @@ -5,15 +5,38 @@ import ( "encoding/json" "errors" "fmt" + "io" "log/slog" "net/http" + "regexp" + "strconv" "strings" + "sync" + "time" "github.com/go-chi/chi/v5" "github.com/alexei/tinyforge/internal/store" ) +// Limits and constants for the log endpoints. +const ( + defaultLogTail = 200 + maxLogTail = 5000 + maxJSONLogBytes = 4 << 20 // 4 MiB cap for non-streaming log responses + maxLogLineBytes = 1 << 20 // 1 MiB max line length for the bufio.Scanner + logHeartbeatPeriod = 20 * time.Second +) + +// ANSI escape sequence patterns. Stripped from streamed log lines so a +// hostile container cannot inject terminal control sequences (cursor moves, +// hyperlink escapes, screen clears) into operator displays or pasted output. +var ( + ansiCSIPattern = regexp.MustCompile(`\x1b\[[0-9;?]*[ -/]*[@-~]`) + ansiOSCPattern = regexp.MustCompile(`\x1b\][^\x07\x1b]*(?:\x07|\x1b\\)`) + ctlBytePattern = regexp.MustCompile(`[\x00-\x08\x0b-\x1a\x1c-\x1f\x7f]`) +) + // listProjectImages handles GET /api/projects/{id}/images. // Returns all local Docker images matching the project's image reference. func (s *Server) listProjectImages(w http.ResponseWriter, r *http.Request) { @@ -50,6 +73,8 @@ func (s *Server) listProjectImages(w http.ResponseWriter, r *http.Request) { // - tail: number of lines from end (default "200") // - follow: "true" to stream new lines in real-time func (s *Server) streamContainerLogs(w http.ResponseWriter, r *http.Request) { + projectID := chi.URLParam(r, "id") + stageID := chi.URLParam(r, "stage") instanceID := chi.URLParam(r, "iid") inst, err := s.store.GetInstanceByID(instanceID) @@ -63,6 +88,14 @@ func (s *Server) streamContainerLogs(w http.ResponseWriter, r *http.Request) { return } + // Verify the instance actually belongs to the project/stage in the path. + // Without this, a user could stream logs for any instance ID by guessing + // it under the wrong project — defence-in-depth for future per-project ACLs. + if inst.ProjectID != projectID || inst.StageID != stageID { + respondNotFound(w, "instance") + return + } + if inst.ContainerID == "" { respondError(w, http.StatusBadRequest, "instance has no container") return @@ -80,10 +113,7 @@ func (s *Server) streamLogsForContainer(w http.ResponseWriter, r *http.Request, return } - tail := r.URL.Query().Get("tail") - if tail == "" { - tail = "200" - } + tail := parseTailParam(r.URL.Query().Get("tail")) follow := r.URL.Query().Get("follow") == "true" // Check if client accepts SSE. @@ -99,8 +129,10 @@ func (s *Server) streamLogsForContainer(w http.ResponseWriter, r *http.Request, defer logReader.Close() if !isSSE { - // JSON mode: read all lines and return as array. - scanner := bufio.NewScanner(logReader) + // JSON mode: cap the total bytes read so a chatty container with + // tail=large cannot exhaust server memory. + scanner := bufio.NewScanner(io.LimitReader(logReader, maxJSONLogBytes)) + scanner.Buffer(make([]byte, 0, 64*1024), maxLogLineBytes) var lines []string for scanner.Scan() { line := sanitizeDockerLogLine(scanner.Text()) @@ -116,6 +148,12 @@ func (s *Server) streamLogsForContainer(w http.ResponseWriter, r *http.Request, } // SSE mode: stream lines as they arrive. + release, ok := acquireSSESlot(w, s.sseGate) + if !ok { + return + } + defer release() + flusher, ok := w.(http.Flusher) if !ok { respondError(w, http.StatusInternalServerError, "streaming not supported") @@ -126,7 +164,31 @@ func (s *Server) streamLogsForContainer(w http.ResponseWriter, r *http.Request, w.Header().Set("Cache-Control", "no-cache") w.Header().Set("Connection", "keep-alive") + // Heartbeat keeps the connection warm through proxies that close idle + // streams. Sent as an SSE comment which the EventSource API ignores. + heartbeat := time.NewTicker(logHeartbeatPeriod) + defer heartbeat.Stop() + heartbeatDone := make(chan struct{}) + defer close(heartbeatDone) + var hbMu sync.Mutex + go func() { + for { + select { + case <-heartbeat.C: + hbMu.Lock() + _, _ = io.WriteString(w, ": ping\n\n") + flusher.Flush() + hbMu.Unlock() + case <-heartbeatDone: + return + case <-r.Context().Done(): + return + } + } + }() + scanner := bufio.NewScanner(logReader) + scanner.Buffer(make([]byte, 0, 64*1024), maxLogLineBytes) for scanner.Scan() { line := sanitizeDockerLogLine(scanner.Text()) if line == "" { @@ -134,8 +196,10 @@ func (s *Server) streamLogsForContainer(w http.ResponseWriter, r *http.Request, } data, _ := json.Marshal(map[string]string{"line": line}) + hbMu.Lock() fmt.Fprintf(w, "data: %s\n\n", data) flusher.Flush() + hbMu.Unlock() // Check if client disconnected. select { @@ -146,17 +210,67 @@ func (s *Server) streamLogsForContainer(w http.ResponseWriter, r *http.Request, } } +// parseTailParam validates and clamps the ?tail= query value. Empty/invalid +// inputs fall back to the default; values above the cap are clamped down. +// "all" is rejected — letting the caller request unbounded log history is a +// trivial DoS vector. +func parseTailParam(raw string) string { + if raw == "" { + return strconv.Itoa(defaultLogTail) + } + n, err := strconv.Atoi(raw) + if err != nil || n <= 0 { + return strconv.Itoa(defaultLogTail) + } + if n > maxLogTail { + n = maxLogTail + } + return strconv.Itoa(n) +} + // sanitizeDockerLogLine strips the Docker log stream header (8-byte prefix) -// that Docker adds to non-TTY container logs. +// that Docker adds to non-TTY container logs, and removes terminal control +// sequences so a hostile container cannot inject ANSI escapes that hijack an +// operator's terminal when log output is pasted or rendered raw. func sanitizeDockerLogLine(line string) string { // Docker multiplexed stream: first 8 bytes are header (stream type + size). // If the line starts with a non-printable byte followed by 0x00 0x00 0x00, strip 8 bytes. if len(line) > 8 && (line[0] == 1 || line[0] == 2) && line[1] == 0 && line[2] == 0 && line[3] == 0 { - return line[8:] + line = line[8:] } + line = ansiOSCPattern.ReplaceAllString(line, "") + line = ansiCSIPattern.ReplaceAllString(line, "") + line = ctlBytePattern.ReplaceAllString(line, "") return line } +// buildActiveImagesSet returns the set of "image:tag" strings currently used +// by any instance, computed in a single DB pass instead of N×K queries. +// Returning an error (rather than swallowing) prevents prune logic from +// treating a transient DB failure as "nothing is active". +func buildActiveImagesSet(st *store.Store, projects []store.Project) (map[string]bool, error) { + imageByProject := make(map[string]string, len(projects)) + for _, p := range projects { + imageByProject[p.ID] = p.Image + } + instances, err := st.ListAllInstances() + if err != nil { + return nil, fmt.Errorf("list instances: %w", err) + } + active := make(map[string]bool, len(instances)) + for _, inst := range instances { + if inst.ImageTag == "" { + continue + } + image := imageByProject[inst.ProjectID] + if image == "" { + continue + } + active[image+":"+inst.ImageTag] = true + } + return active, nil +} + // unusedImageStats handles GET /api/docker/unused-images. // Returns the total size of unused project images and whether the threshold is exceeded. func (s *Server) unusedImageStats(w http.ResponseWriter, r *http.Request) { @@ -181,18 +295,14 @@ func (s *Server) unusedImageStats(w http.ResponseWriter, r *http.Request) { return } - // Build set of active image refs. - activeImages := make(map[string]bool) - for _, p := range projects { - stages, _ := s.store.GetStagesByProjectID(p.ID) - for _, st := range stages { - instances, _ := s.store.GetInstancesByStageID(st.ID) - for _, inst := range instances { - if inst.ImageTag != "" { - activeImages[p.Image+":"+inst.ImageTag] = true - } - } - } + // Build set of active image refs in one DB pass instead of N×K queries. + // A flaky read here previously masqueraded as "no images are active", + // which on the prune endpoint would have deleted *running* images. + activeImages, err := buildActiveImagesSet(s.store, projects) + if err != nil { + slog.Error("unused images: build active set", "error", err) + respondError(w, http.StatusInternalServerError, "internal server error") + return } // Sum unused image sizes. @@ -242,18 +352,14 @@ func (s *Server) pruneImages(w http.ResponseWriter, r *http.Request) { return } - // Build a set of image refs used by active instances. - activeImages := make(map[string]bool) - for _, p := range projects { - stages, _ := s.store.GetStagesByProjectID(p.ID) - for _, st := range stages { - instances, _ := s.store.GetInstancesByStageID(st.ID) - for _, inst := range instances { - if inst.ImageTag != "" { - activeImages[p.Image+":"+inst.ImageTag] = true - } - } - } + // Build a set of image refs used by active instances. Bail out on error + // — silently treating a DB blip as "no active images" would prune + // images currently in use by running containers. + activeImages, err := buildActiveImagesSet(s.store, projects) + if err != nil { + slog.Error("prune: build active set", "error", err) + respondError(w, http.StatusInternalServerError, "internal server error") + return } // Collect all unique image bases from projects (without tags). diff --git a/internal/api/health.go b/internal/api/health.go index 748977f..ec49f63 100644 --- a/internal/api/health.go +++ b/internal/api/health.go @@ -5,20 +5,57 @@ import ( "net/http" "time" + "github.com/alexei/tinyforge/internal/auth" "github.com/alexei/tinyforge/internal/proxy" ) +// healthProbeTimeout caps a single health probe so a stuck dependency does +// not hold the polling endpoint open. The UI polls every 30 s, so 8 s leaves +// headroom for the ping + Info + NPM list calls. +const healthProbeTimeout = 8 * time.Second + +// nonAdminDockerFields enumerates the fields any authenticated user is +// allowed to see — version + connectivity + container counts. Host-detail +// fields (kernel, root_dir, hostname, OS, storage driver) are admin-only to +// avoid recon information leaks. +var nonAdminDockerFields = map[string]bool{ + "connected": true, + "latency_ms": true, + "error": true, + "version": true, + "api_version": true, + "containers": true, + "running": true, + "paused": true, + "stopped": true, + "images": true, + "ncpu": true, + "memory_total": true, +} + +// nonAdminProxyFields are the proxy fields safe to share with non-admins. +// Configured URLs and aggregate counts of internal lists/certs are stripped. +var nonAdminProxyFields = map[string]bool{ + "provider": true, + "connected": true, + "latency_ms": true, + "error": true, + "proxy_hosts_managed": true, +} + // getHealth handles GET /api/health. // -// Returns the connectivity state and (when connected) rich diagnostics for the -// Docker daemon and the active proxy provider. This endpoint is polled by the -// UI every 30 seconds — keep the calls cheap. The expensive NPM list calls -// are only issued when the initial ping succeeds, so a down proxy never -// amplifies latency. +// Returns the connectivity state and (when connected) diagnostics for the +// Docker daemon and the active proxy provider. Detailed host information +// (kernel, root_dir, internal NPM URL, …) is stripped for non-admin users to +// avoid leaking infrastructure details to read-only viewers. func (s *Server) getHealth(w http.ResponseWriter, r *http.Request) { - ctx, cancel := context.WithTimeout(r.Context(), 8*time.Second) + ctx, cancel := context.WithTimeout(r.Context(), healthProbeTimeout) defer cancel() + claims, _ := auth.ClaimsFromContext(r.Context()) + isAdmin := claims.Role == "admin" + now := time.Now().UTC().Format(time.RFC3339) result := map[string]any{ "checked_at": now, @@ -32,16 +69,35 @@ func (s *Server) getHealth(w http.ResponseWriter, r *http.Request) { } // ── Docker daemon ──────────────────────────────────────────────── - result["docker"] = s.dockerHealth(ctx) + docker := s.dockerHealth(ctx) + if !isAdmin { + docker = filterFields(docker, nonAdminDockerFields) + } + result["docker"] = docker // ── Proxy provider ─────────────────────────────────────────────── if s.proxyProvider != nil { - result["proxy"] = s.proxyHealth(ctx) + proxyInfo := s.proxyHealth(ctx) + if !isAdmin { + proxyInfo = filterFields(proxyInfo, nonAdminProxyFields) + } + result["proxy"] = proxyInfo } respondJSON(w, http.StatusOK, result) } +// filterFields returns a copy of m containing only the keys present in allow. +func filterFields(m map[string]any, allow map[string]bool) map[string]any { + out := make(map[string]any, len(allow)) + for k, v := range m { + if allow[k] { + out[k] = v + } + } + return out +} + // dockerHealth probes the Docker daemon and, if reachable, attaches a full // DaemonInfo snapshot. The caller does not need to error-check the Info() // call — if it fails, the connected flag remains true (ping succeeded) but diff --git a/internal/api/middleware.go b/internal/api/middleware.go index f0f0604..af814a4 100644 --- a/internal/api/middleware.go +++ b/internal/api/middleware.go @@ -4,12 +4,15 @@ import ( "log/slog" "net/http" "runtime/debug" + "strings" "sync" "time" ) // logging is an HTTP middleware that logs every request with method, path, -// status code, and duration. +// status code, and duration. Webhook URLs are redacted before being logged +// because the secret is the only authenticator — leaking it to log +// aggregators is equivalent to leaking the credential. func logging(next http.Handler) http.Handler { return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { start := time.Now() @@ -19,13 +22,26 @@ func logging(next http.Handler) http.Handler { slog.Info("http request", "method", r.Method, - "path", r.URL.Path, + "path", redactPath(r.URL.Path), "status", wrapped.status, "duration", time.Since(start).String(), ) }) } +// redactPath strips secrets from URL paths that carry them in segments. +func redactPath(path string) string { + const projectPrefix = "/api/webhook/" + const sitePrefix = "/api/webhook/sites/" + switch { + case strings.HasPrefix(path, sitePrefix): + return sitePrefix + "***" + case strings.HasPrefix(path, projectPrefix): + return projectPrefix + "***" + } + return path +} + // recovery is an HTTP middleware that catches panics and returns a 500 response. func recovery(next http.Handler) http.Handler { return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { @@ -145,6 +161,24 @@ func jsonContentType(next http.Handler) http.Handler { }) } +// rateLimitMiddleware wraps a handler with per-IP rate limiting using the +// supplied limiter. Requests over the limit get 429. +func rateLimitMiddleware(rl *rateLimiter) func(http.Handler) http.Handler { + return func(next http.Handler) http.Handler { + return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + ip := r.RemoteAddr + if fwd := r.Header.Get("X-Forwarded-For"); fwd != "" { + ip = fwd + } + if !rl.allow(ip) { + respondError(w, http.StatusTooManyRequests, "rate limit exceeded") + return + } + next.ServeHTTP(w, r) + }) + } +} + // statusRecorder wraps http.ResponseWriter to capture the status code. type statusRecorder struct { http.ResponseWriter diff --git a/internal/api/router.go b/internal/api/router.go index 216823e..ff46c84 100644 --- a/internal/api/router.go +++ b/internal/api/router.go @@ -47,6 +47,7 @@ type Server struct { staticSiteManager *staticsite.Manager stackManager *stack.Manager backupEngine *backup.Engine + sseGate *sseGate dbPath string shutdownFunc func() // called after restore to trigger graceful shutdown onBackupSettingsChanged func(enabled bool, intervalHours int) // called when backup settings change @@ -76,6 +77,7 @@ func NewServer( eventBus: eventBus, encKey: encKey, localAuth: localAuth, + sseGate: newSSEGate(maxConcurrentSSEStreams), } // Try to initialize OIDC provider from stored settings. @@ -187,6 +189,7 @@ func (s *Server) Router() chi.Router { r.Use(cors) loginLimiter := newRateLimiter() + webhookLimiter := newRateLimiter() r.Route("/api", func(r chi.Router) { // JSON content type and body size limit for API routes. @@ -201,7 +204,10 @@ func (s *Server) Router() chi.Router { r.Post("/auth/oidc/token", s.oidcExchangeToken) // Webhook handler (uses its own secret-based auth). - r.Mount("/webhook", s.webhook.Route()) + // Per-IP rate limit prevents an attacker who has guessed (or leaked) + // a secret from triggering a deploy storm, and rejects unauthenticated + // brute-force probes over the secret URL space. + r.With(rateLimitMiddleware(webhookLimiter)).Mount("/webhook", s.webhook.Route()) // Protected routes: require valid JWT. r.Group(func(r chi.Router) { @@ -340,7 +346,7 @@ func (s *Server) Router() chi.Router { // System resources (read-only). r.Get("/system/stats", s.getSystemStats) r.Get("/system/stats/history", s.getSystemStatsHistory) - r.Get("/system/stats/top", s.listTopContainersByCPU) + r.Get("/system/stats/top", s.listTopContainers) // Admin-only routes: require admin role. r.Group(func(r chi.Router) { diff --git a/internal/api/sse.go b/internal/api/sse.go index ce9085c..affe9a0 100644 --- a/internal/api/sse.go +++ b/internal/api/sse.go @@ -55,6 +55,12 @@ func (s *Server) streamDeployLogs(w http.ResponseWriter, r *http.Request) { } // SSE mode. + release, ok := acquireSSESlot(w, s.sseGate) + if !ok { + return + } + defer release() + flusher, ok := w.(http.Flusher) if !ok { respondError(w, http.StatusInternalServerError, "streaming not supported") @@ -140,6 +146,12 @@ func (s *Server) streamDeployLogs(w http.ResponseWriter, r *http.Request) { // streamEvents handles GET /api/events. // It streams instance status changes and deploy status changes via SSE. func (s *Server) streamEvents(w http.ResponseWriter, r *http.Request) { + release, ok := acquireSSESlot(w, s.sseGate) + if !ok { + return + } + defer release() + flusher, ok := w.(http.Flusher) if !ok { respondError(w, http.StatusInternalServerError, "streaming not supported") diff --git a/internal/api/sse_gate.go b/internal/api/sse_gate.go new file mode 100644 index 0000000..0511619 --- /dev/null +++ b/internal/api/sse_gate.go @@ -0,0 +1,40 @@ +package api + +import ( + "net/http" + "sync/atomic" +) + +// maxConcurrentSSEStreams caps the global number of in-flight SSE +// connections. Each stream holds a goroutine, an event-bus subscription, and +// (for log streams) a Docker daemon TCP socket; a single tab opening +// thousands of EventSources would otherwise exhaust file descriptors. +const maxConcurrentSSEStreams = 256 + +// sseGate is a counting gate that limits concurrent SSE streams. +type sseGate struct { + cap int64 + cur atomic.Int64 +} + +func newSSEGate(cap int) *sseGate { return &sseGate{cap: int64(cap)} } + +// enter reserves a slot and returns a release func, or nil if the gate is full. +func (g *sseGate) enter() func() { + if g.cur.Add(1) > g.cap { + g.cur.Add(-1) + return nil + } + return func() { g.cur.Add(-1) } +} + +// acquireSSESlot is a small helper used by every SSE handler to honour the +// global cap. Returns false (and writes a 503) if the cap is reached. +func acquireSSESlot(w http.ResponseWriter, gate *sseGate) (release func(), ok bool) { + release = gate.enter() + if release == nil { + respondError(w, http.StatusServiceUnavailable, "stream limit reached") + return nil, false + } + return release, true +} diff --git a/internal/api/stats_history.go b/internal/api/stats_history.go index 46eb7ef..9d8c3b0 100644 --- a/internal/api/stats_history.go +++ b/internal/api/stats_history.go @@ -4,15 +4,30 @@ import ( "errors" "log/slog" "net/http" + "sort" "strconv" "time" "github.com/go-chi/chi/v5" + "github.com/alexei/tinyforge/internal/auth" "github.com/alexei/tinyforge/internal/stats" "github.com/alexei/tinyforge/internal/store" ) +// topConsumerWindow is how recent a container sample must be to count toward +// the "top consumers" list. Scaled with the collector interval (read from +// settings) so it stays meaningful even when sampling is sparse. +const topConsumerMinWindow = 2 * time.Minute + +// TopContainerSample augments a stats sample with the human-readable owner +// name so the UI can show "project/stage" or the static-site name without an +// extra round-trip per row. +type TopContainerSample struct { + store.ContainerStatsSample + OwnerName string `json:"owner_name"` +} + const ( // defaultHistoryWindow is used when no ?window= param is provided or the // value fails to parse. Matches the default retention so the "last 2h" @@ -175,11 +190,11 @@ func (s *Server) streamStaticSiteLogs(w http.ResponseWriter, r *http.Request) { s.streamLogsForContainer(w, r, site.ContainerID) } -// listTopContainersByCPU handles GET /api/system/stats/top?limit=5&by=cpu. +// listTopContainers handles GET /api/system/stats/top?limit=5&by=cpu. // Returns the top-N most recent samples across containers, sorted by CPU or -// memory. Useful for a system dashboard "top consumers" widget without -// requiring the frontend to aggregate per-container history on its own. -func (s *Server) listTopContainersByCPU(w http.ResponseWriter, r *http.Request) { +// memory. Container IDs are stripped for non-admins so a low-privilege viewer +// cannot enumerate workloads outside their scope. +func (s *Server) listTopContainers(w http.ResponseWriter, r *http.Request) { limit := 5 if raw := r.URL.Query().Get("limit"); raw != "" { if n, err := strconv.Atoi(raw); err == nil && n > 0 && n <= 50 { @@ -191,9 +206,16 @@ func (s *Server) listTopContainersByCPU(w http.ResponseWriter, r *http.Request) by = "cpu" } - // Samples from the last 2 minutes window so "top" reflects near-current - // load, not long-dead rows. - samples, err := s.store.ListAllRecentContainerStatsSamples(sinceTimestamp(2 * time.Minute)) + // Samples must be at least as recent as max(2*interval, 2 minutes) so the + // list reflects near-current load even when collection is sparse. + window := topConsumerMinWindow + if settings, err := s.store.GetSettings(); err == nil && settings.StatsIntervalSeconds > 0 { + if w := time.Duration(settings.StatsIntervalSeconds*2) * time.Second; w > window { + window = w + } + } + + samples, err := s.store.ListAllRecentContainerStatsSamples(sinceTimestamp(window)) if err != nil { slog.Error("failed to list container samples for top", "error", err) respondError(w, http.StatusInternalServerError, "failed to list samples") @@ -213,33 +235,75 @@ func (s *Server) listTopContainersByCPU(w http.ResponseWriter, r *http.Request) top = append(top, sm) } - // Partial-sort by the requested metric, descending. For small N a simple - // insertion-like approach is plenty. - sortContainerSamples(top, by) + sort.Slice(top, func(i, j int) bool { + if by == "memory" { + return top[i].MemoryUsage > top[j].MemoryUsage + } + return top[i].CPUPercent > top[j].CPUPercent + }) if len(top) > limit { top = top[:limit] } - respondJSON(w, http.StatusOK, top) -} -// sortContainerSamples sorts in place by CPU (or memory) descending. -// Note: ListContainerStatsSamples with empty ownerID returns no rows — the -// caller uses per-owner-type queries and merges; this helper is applied to -// the already-merged slice. -func sortContainerSamples(s []store.ContainerStatsSample, by string) { - // O(n^2) is fine — N is small (bounded by the number of containers). - for i := 1; i < len(s); i++ { - for j := i; j > 0; j-- { - var less bool - if by == "memory" { - less = s[j].MemoryUsage > s[j-1].MemoryUsage - } else { - less = s[j].CPUPercent > s[j-1].CPUPercent - } - if !less { - break - } - s[j-1], s[j] = s[j], s[j-1] + // Resolve owner names so the UI can show "project/stage" or the site name + // without a per-row round trip. + enriched := s.enrichWithOwnerNames(top) + + // Scrub container IDs for non-admins. The owner name is the actionable + // identifier; the container ID is a host-level handle that reveals + // workload existence to viewers who shouldn't have it. + claims, _ := auth.ClaimsFromContext(r.Context()) + if claims.Role != "admin" { + for i := range enriched { + enriched[i].ContainerID = "" } } + + respondJSON(w, http.StatusOK, enriched) +} + +// enrichWithOwnerNames attaches a human-readable owner name to each sample. +// Looks up instances and sites in batch so the cost is independent of the +// number of samples (which is at most 'limit'). +func (s *Server) enrichWithOwnerNames(samples []store.ContainerStatsSample) []TopContainerSample { + out := make([]TopContainerSample, len(samples)) + for i, sm := range samples { + out[i] = TopContainerSample{ContainerStatsSample: sm} + switch sm.OwnerType { + case stats.OwnerTypeInstance: + out[i].OwnerName = s.lookupInstanceName(sm.OwnerID) + case stats.OwnerTypeSite: + out[i].OwnerName = s.lookupSiteName(sm.OwnerID) + } + } + return out +} + +// lookupInstanceName returns "project/stage" for an instance, or empty on +// any lookup error so a transient miss does not break the response. +func (s *Server) lookupInstanceName(instanceID string) string { + inst, err := s.store.GetInstanceByID(instanceID) + if err != nil { + return "" + } + project, perr := s.store.GetProjectByID(inst.ProjectID) + stage, serr := s.store.GetStageByID(inst.StageID) + switch { + case perr == nil && serr == nil: + return project.Name + "/" + stage.Name + case perr == nil: + return project.Name + case serr == nil: + return stage.Name + } + return "" +} + +// lookupSiteName returns the site's display name or empty on lookup error. +func (s *Server) lookupSiteName(siteID string) string { + site, err := s.store.GetStaticSiteByID(siteID) + if err != nil { + return "" + } + return site.Name } diff --git a/internal/api/webhooks.go b/internal/api/webhooks.go index 02560f8..8307a56 100644 --- a/internal/api/webhooks.go +++ b/internal/api/webhooks.go @@ -1,16 +1,28 @@ package api import ( + "crypto/rand" + "encoding/hex" "errors" "log/slog" "net/http" "github.com/go-chi/chi/v5" - "github.com/google/uuid" "github.com/alexei/tinyforge/internal/store" ) +// generateWebhookSecret returns a 256-bit hex-encoded random token. Mirrors +// the helper in internal/store; kept here to avoid an import cycle and so the +// rotation handlers don't pretend to use uuid for what is really a secret. +func generateWebhookSecret() string { + b := make([]byte, 32) + if _, err := rand.Read(b); err != nil { + panic("crypto/rand failed: " + err.Error()) + } + return hex.EncodeToString(b) +} + // webhookURLResponse is the common payload returned by every webhook endpoint. // Clients never see raw secrets except at issue/rotate time via these fields; // the URL shape is "/api/webhook/..." so callers can prepend their own origin. @@ -58,7 +70,7 @@ func (s *Server) regenerateProjectWebhook(w http.ResponseWriter, r *http.Request return } - secret := uuid.New().String() + secret := generateWebhookSecret() if err := s.store.SetProjectWebhookSecret(id, secret); err != nil { slog.Error("regenerate project webhook: set secret", "project", id, "error", err) respondError(w, http.StatusInternalServerError, "failed to rotate webhook secret") @@ -107,7 +119,7 @@ func (s *Server) regenerateStaticSiteWebhook(w http.ResponseWriter, r *http.Requ return } - secret := uuid.New().String() + secret := generateWebhookSecret() if err := s.store.SetStaticSiteWebhookSecret(id, secret); err != nil { slog.Error("regenerate site webhook: set secret", "site", id, "error", err) respondError(w, http.StatusInternalServerError, "failed to rotate webhook secret") diff --git a/internal/docker/system.go b/internal/docker/system.go index 2fd92d0..e85af28 100644 --- a/internal/docker/system.go +++ b/internal/docker/system.go @@ -3,9 +3,11 @@ package docker import ( "context" "fmt" + "log/slog" "time" "github.com/moby/moby/client" + "golang.org/x/sync/errgroup" ) // SystemStats is a host-level snapshot combining daemon capacity @@ -42,33 +44,54 @@ type SystemStats struct { DiskTotalBytes int64 `json:"disk_total_bytes"` } -// GetSystemStats returns a one-shot host-level snapshot. The Info() call -// and disk usage call are made in sequence. Disk usage failures do not -// fail the whole call — the result degrades gracefully with zero disk fields. +// GetSystemStats returns a one-shot host-level snapshot. Info and DiskUsage +// are issued in parallel because DiskUsage walks every layer/volume and is +// often the slowest call on a busy host (1-3 s); Info typically completes in +// ~10 ms. Disk usage failures do not fail the whole call — the result +// degrades gracefully with zero disk fields and a warning log. func (c *Client) GetSystemStats(ctx context.Context) (SystemStats, error) { - info, err := c.Info(ctx) - if err != nil { - return SystemStats{}, fmt.Errorf("system stats: %w", err) - } + stats := SystemStats{Timestamp: time.Now().UTC()} - stats := SystemStats{ - Timestamp: time.Now().UTC(), - NCPU: info.NCPU, - MemoryTotal: info.MemoryTotal, - Containers: info.Containers, - Running: info.Running, - Paused: info.Paused, - Stopped: info.Stopped, - Images: info.Images, - } + g, gctx := errgroup.WithContext(ctx) - du, derr := c.api.DiskUsage(ctx, client.DiskUsageOptions{ - Containers: true, - Images: true, - Volumes: true, - BuildCache: true, + g.Go(func() error { + info, err := c.Info(gctx) + if err != nil { + return fmt.Errorf("system stats info: %w", err) + } + stats.NCPU = info.NCPU + stats.MemoryTotal = info.MemoryTotal + stats.Containers = info.Containers + stats.Running = info.Running + stats.Paused = info.Paused + stats.Stopped = info.Stopped + stats.Images = info.Images + return nil }) - if derr == nil { + + var du *client.DiskUsageResult + g.Go(func() error { + usage, err := c.api.DiskUsage(gctx, client.DiskUsageOptions{ + Containers: true, + Images: true, + Volumes: true, + BuildCache: true, + }) + if err != nil { + // Disk usage is best-effort; swallow but log so the dashboard + // shows zeroed disk fields rather than failing entirely. + slog.Warn("system stats: disk usage failed", "error", err) + return nil + } + du = &usage + return nil + }) + + if err := g.Wait(); err != nil { + return SystemStats{}, err + } + + if du != nil { stats.DiskImagesBytes = du.Images.TotalSize stats.DiskContainersBytes = du.Containers.TotalSize stats.DiskVolumesBytes = du.Volumes.TotalSize diff --git a/internal/stats/collector.go b/internal/stats/collector.go index ad82fd7..f2e5b99 100644 --- a/internal/stats/collector.go +++ b/internal/stats/collector.go @@ -36,9 +36,11 @@ type Collector struct { store *store.Store docker *docker.Client - stopOnce sync.Once - stop chan struct{} - done chan struct{} + startOnce sync.Once + stopOnce sync.Once + started bool + stop chan struct{} + done chan struct{} } // New creates a new stats collector. Call Start to begin sampling. @@ -52,15 +54,24 @@ func New(s *store.Store, d *docker.Client) *Collector { } // Start launches the background loop. Returns immediately. The loop exits -// when Stop is called. +// when Stop is called. Safe to call multiple times — only the first call has +// an effect. func (c *Collector) Start() { - go c.run() + c.startOnce.Do(func() { + c.started = true + go c.run() + }) } // Stop signals the collector to exit and blocks until it has finished the -// in-flight tick. +// in-flight tick. If Start was never called, Stop returns immediately. func (c *Collector) Stop() { - c.stopOnce.Do(func() { close(c.stop) }) + c.stopOnce.Do(func() { + close(c.stop) + if !c.started { + close(c.done) + } + }) <-c.done } @@ -70,6 +81,15 @@ func (c *Collector) Stop() { func (c *Collector) run() { defer close(c.done) + // Derive a base context that's cancelled when Stop is called so in-flight + // Docker requests abort instead of waiting out their timeout. + baseCtx, cancel := context.WithCancel(context.Background()) + defer cancel() + go func() { + <-c.stop + cancel() + }() + // Wait a few seconds before the first sample so the app has settled. select { case <-time.After(3 * time.Second): @@ -90,7 +110,7 @@ func (c *Collector) run() { } } - c.tick(retention) + c.tick(baseCtx, retention) select { case <-time.After(time.Duration(interval) * time.Second): @@ -126,8 +146,8 @@ func (c *Collector) readConfig() (intervalSeconds, retentionHours int) { // persists samples, and prunes rows beyond the retention window. When // the Docker daemon is unreachable the whole tick is skipped with a // single debug log instead of one warning per container. -func (c *Collector) tick(retentionHours int) { - ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) +func (c *Collector) tick(parent context.Context, retentionHours int) { + ctx, cancel := context.WithTimeout(parent, 30*time.Second) defer cancel() pingCtx, pingCancel := context.WithTimeout(ctx, 2*time.Second) @@ -224,10 +244,20 @@ func (c *Collector) sampleAll(ctx context.Context, targets []target) []store.Con var wg sync.WaitGroup for i, t := range targets { + // Acquire the semaphore in the parent loop so ctx cancellation + // short-circuits the queue rather than spawning goroutines that + // block on an unreachable slot. + select { + case sem <- struct{}{}: + case <-ctx.Done(): + break + } + if ctx.Err() != nil { + break + } wg.Add(1) go func(i int, t target) { defer wg.Done() - sem <- struct{}{} defer func() { <-sem }() sampleCtx, cancel := context.WithTimeout(ctx, 10*time.Second) @@ -278,8 +308,12 @@ func (c *Collector) recordSystemSample(ctx context.Context, workloadCPU float64, slog.Warn("stats collector: get system stats", "error", err) return } + ts := sysStats.Timestamp.Unix() + if ts <= 0 { + ts = time.Now().UTC().Unix() + } sample := store.SystemStatsSample{ - TS: sysStats.Timestamp.Unix(), + TS: ts, NCPU: sysStats.NCPU, MemoryTotal: sysStats.MemoryTotal, WorkloadCPUPercent: workloadCPU, diff --git a/internal/store/projects.go b/internal/store/projects.go index 16788f3..eb2fa0e 100644 --- a/internal/store/projects.go +++ b/internal/store/projects.go @@ -1,13 +1,34 @@ package store import ( + "crypto/rand" "database/sql" + "encoding/hex" "errors" "fmt" "github.com/google/uuid" ) +// minWebhookSecretLength is the smallest user-supplied webhook secret accepted +// at insert time. Auto-generated secrets are 64 hex chars (256 bits); a +// 32-char floor still leaves > 128 bits of brute-force resistance for hex +// alphabets and rejects obvious typos / placeholder strings. +const minWebhookSecretLength = 32 + +// generateWebhookSecret returns a 256-bit hex-encoded random token. We use +// crypto/rand directly rather than uuid.New() so the intent ("secret token, +// not identifier") is explicit and the entropy is unambiguous. +func generateWebhookSecret() string { + b := make([]byte, 32) + if _, err := rand.Read(b); err != nil { + // crypto/rand is documented to never fail on supported platforms; + // fall back to a UUID rather than panicking. + return uuid.New().String() + } + return hex.EncodeToString(b) +} + // projectCols is the canonical column list for projects queries. const projectCols = `id, name, registry, image, port, healthcheck, env, volumes, npm_access_list_id, webhook_secret, created_at, updated_at` @@ -19,7 +40,9 @@ func (s *Store) CreateProject(p Project) (Project, error) { p.CreatedAt = Now() p.UpdatedAt = p.CreatedAt if p.WebhookSecret == "" { - p.WebhookSecret = uuid.New().String() + p.WebhookSecret = generateWebhookSecret() + } else if len(p.WebhookSecret) < minWebhookSecretLength { + return Project{}, fmt.Errorf("webhook_secret must be at least %d characters", minWebhookSecretLength) } _, err := s.db.Exec( @@ -163,7 +186,7 @@ func (s *Store) EnsureProjectWebhookSecret(id string) (string, error) { if project.WebhookSecret != "" { return project.WebhookSecret, nil } - secret := uuid.New().String() + secret := generateWebhookSecret() if err := s.SetProjectWebhookSecret(id, secret); err != nil { return "", err } diff --git a/internal/store/static_sites.go b/internal/store/static_sites.go index 8187942..2566ecb 100644 --- a/internal/store/static_sites.go +++ b/internal/store/static_sites.go @@ -22,7 +22,9 @@ func (s *Store) CreateStaticSite(site StaticSite) (StaticSite, error) { site.CreatedAt = Now() site.UpdatedAt = site.CreatedAt if site.WebhookSecret == "" { - site.WebhookSecret = uuid.New().String() + site.WebhookSecret = generateWebhookSecret() + } else if len(site.WebhookSecret) < minWebhookSecretLength { + return StaticSite{}, fmt.Errorf("webhook_secret must be at least %d characters", minWebhookSecretLength) } _, err := s.db.Exec( @@ -301,7 +303,7 @@ func (s *Store) EnsureStaticSiteWebhookSecret(id string) (string, error) { if site.WebhookSecret != "" { return site.WebhookSecret, nil } - secret := uuid.New().String() + secret := generateWebhookSecret() if err := s.SetStaticSiteWebhookSecret(id, secret); err != nil { return "", err } diff --git a/internal/store/stats_samples.go b/internal/store/stats_samples.go index 45e1dab..6e9de03 100644 --- a/internal/store/stats_samples.go +++ b/internal/store/stats_samples.go @@ -139,18 +139,28 @@ func (s *Store) ListSystemStatsSamples(sinceTS int64) ([]SystemStatsSample, erro return out, rows.Err() } -// PruneStatsSamplesBefore deletes all samples older than the given unix timestamp -// from both the container and system stats tables. Returns rows deleted across -// both tables. +// PruneStatsSamplesBefore deletes all samples older than the given unix +// timestamp from both the container and system stats tables in a single +// transaction so a crash between the two cannot leave one table pruned and +// the other not. Returns rows deleted across both tables. func (s *Store) PruneStatsSamplesBefore(ts int64) (int64, error) { - r1, err := s.db.Exec(`DELETE FROM container_stats_samples WHERE ts < ?`, ts) + tx, err := s.db.Begin() + if err != nil { + return 0, fmt.Errorf("begin prune tx: %w", err) + } + defer tx.Rollback() + + r1, err := tx.Exec(`DELETE FROM container_stats_samples WHERE ts < ?`, ts) if err != nil { return 0, fmt.Errorf("prune container stats samples: %w", err) } - r2, err := s.db.Exec(`DELETE FROM system_stats_samples WHERE ts < ?`, ts) + r2, err := tx.Exec(`DELETE FROM system_stats_samples WHERE ts < ?`, ts) if err != nil { return 0, fmt.Errorf("prune system stats samples: %w", err) } + if err := tx.Commit(); err != nil { + return 0, fmt.Errorf("commit prune tx: %w", err) + } n1, _ := r1.RowsAffected() n2, _ := r2.RowsAffected() return n1 + n2, nil diff --git a/internal/store/store.go b/internal/store/store.go index a58566d..27f0977 100644 --- a/internal/store/store.go +++ b/internal/store/store.go @@ -4,6 +4,7 @@ import ( "database/sql" "errors" "fmt" + "strings" "time" _ "modernc.org/sqlite" @@ -214,8 +215,17 @@ func (s *Store) runMigrations() error { } for _, m := range migrations { - // Ignore errors from already-applied migrations (duplicate column). - _, _ = s.db.Exec(m) + if _, err := s.db.Exec(m); err != nil { + // "duplicate column" / "already exists" are expected when a + // migration has already been applied. Anything else (typo, FK + // conflict, real schema bug) must surface, otherwise the store + // silently runs against the wrong shape. + msg := err.Error() + if !strings.Contains(msg, "duplicate column") && + !strings.Contains(msg, "already exists") { + return fmt.Errorf("apply migration %q: %w", m, err) + } + } } // Create indexes on foreign key columns for query performance. diff --git a/internal/webhook/handler.go b/internal/webhook/handler.go index 9182ce6..545d67e 100644 --- a/internal/webhook/handler.go +++ b/internal/webhook/handler.go @@ -5,15 +5,27 @@ import ( "encoding/json" "errors" "fmt" + "io" "log/slog" "net/http" "strings" + "sync" "github.com/go-chi/chi/v5" "github.com/alexei/tinyforge/internal/store" ) +// maxSiteConcurrentSyncs caps fan-out of background site syncs triggered by +// webhooks. Above this limit, requests are rejected with 503. +const maxSiteConcurrentSyncs = 4 + +// maxWebhookBodyBytes caps the request body size for webhook payloads. The +// /api routes already wrap the body with MaxBytesReader, but the webhook +// router relies on its own limit so changes to the parent middleware can't +// silently increase the cap. +const maxWebhookBodyBytes = 256 * 1024 // 256 KiB + // DeployTriggerer is called when a webhook determines a deploy should happen. // Same interface as registry.DeployTriggerer — kept separate to avoid import cycles. type DeployTriggerer interface { @@ -114,12 +126,28 @@ type Handler struct { store *store.Store deployer DeployTriggerer sites SiteSyncTriggerer + + // Site sync coordination — webhooks fire syncs in the background; Drain + // blocks until those goroutines finish, so a graceful shutdown does not + // kill an in-flight git fetch + container rebuild. + siteSyncCtx context.Context + siteSyncCancel context.CancelFunc + siteSyncWG sync.WaitGroup + siteSyncSem chan struct{} } // NewHandler creates a new webhook Handler. The sites triggerer is optional // and may be nil (site webhooks will return 404). func NewHandler(st *store.Store, deployer DeployTriggerer, sites SiteSyncTriggerer) *Handler { - return &Handler{store: st, deployer: deployer, sites: sites} + ctx, cancel := context.WithCancel(context.Background()) + return &Handler{ + store: st, + deployer: deployer, + sites: sites, + siteSyncCtx: ctx, + siteSyncCancel: cancel, + siteSyncSem: make(chan struct{}, maxSiteConcurrentSyncs), + } } // SetSiteSyncTriggerer injects the static-site manager after construction. @@ -130,6 +158,13 @@ func (h *Handler) SetSiteSyncTriggerer(s SiteSyncTriggerer) { h.sites = s } +// Drain cancels in-flight site syncs and waits for their goroutines to exit. +// Safe to call from a graceful-shutdown path. +func (h *Handler) Drain() { + h.siteSyncCancel() + h.siteSyncWG.Wait() +} + // Route returns a chi router with the webhook endpoints mounted. // // Routes: @@ -183,7 +218,8 @@ func (h *Handler) handleWebhook(w http.ResponseWriter, r *http.Request) { } var payload Payload - if err := json.NewDecoder(r.Body).Decode(&payload); err != nil { + dec := json.NewDecoder(io.LimitReader(r.Body, maxWebhookBodyBytes)) + if err := dec.Decode(&payload); err != nil { respondWebhookError(w, http.StatusBadRequest, "invalid JSON payload") return } @@ -302,10 +338,20 @@ func (h *Handler) handleSiteWebhook(w http.ResponseWriter, r *http.Request) { return } - // Body is optional — decode best-effort. + // Body is optional. We attempt to decode but accept an empty body (no Ref + // filter); a malformed non-empty body is treated as bad-request to avoid + // silently bypassing the branch/tag filter. var payload SitePayload - if r.ContentLength > 0 { - _ = json.NewDecoder(r.Body).Decode(&payload) + body, err := io.ReadAll(io.LimitReader(r.Body, maxWebhookBodyBytes)) + if err != nil { + respondWebhookError(w, http.StatusBadRequest, "failed to read request body") + return + } + if len(body) > 0 { + if err := json.Unmarshal(body, &payload); err != nil { + respondWebhookError(w, http.StatusBadRequest, "invalid JSON payload") + return + } } if payload.Ref != "" && !siteRefMatches(site, payload.Ref) { @@ -320,9 +366,20 @@ func (h *Handler) handleSiteWebhook(w http.ResponseWriter, r *http.Request) { return } - // Fire and forget — sync may take a while (git fetch + container rebuild). + // Cap concurrent syncs so a runaway CI cannot fan out unbounded + // git-clone goroutines. + select { + case h.siteSyncSem <- struct{}{}: + default: + respondWebhookError(w, http.StatusServiceUnavailable, "site sync queue full") + return + } + + h.siteSyncWG.Add(1) go func(siteID, siteName string) { - if err := h.sites.Deploy(context.Background(), siteID, false); err != nil { + defer h.siteSyncWG.Done() + defer func() { <-h.siteSyncSem }() + if err := h.sites.Deploy(h.siteSyncCtx, siteID, false); err != nil { slog.Error("webhook: site sync failed", "site", siteName, "error", err) } }(site.ID, site.Name) diff --git a/internal/webhook/matcher.go b/internal/webhook/matcher.go index 2fffe4d..a8c0961 100644 --- a/internal/webhook/matcher.go +++ b/internal/webhook/matcher.go @@ -2,6 +2,7 @@ package webhook import ( "fmt" + "log/slog" "path" "strings" @@ -24,7 +25,8 @@ func matchStage(st *store.Store, projectID, tag string) (store.Stage, bool, erro matched, err := path.Match(pattern, tag) if err != nil { - // Invalid pattern — skip this stage. + slog.Warn("webhook: invalid tag pattern, skipping stage", + "project", projectID, "stage", stage.Name, "pattern", pattern, "error", err) continue } if matched { @@ -36,9 +38,21 @@ func matchStage(st *store.Store, projectID, tag string) (store.Stage, bool, erro } // imageMatches reports whether an incoming image reference matches the -// project's stored image. The comparison is case-sensitive and exact. +// project's stored image. The registry hostname is matched case-insensitively +// (per RFC: registry hostnames are case-insensitive); the path/owner/name are +// matched exactly. func imageMatches(projectImage, incomingImage string) bool { - return projectImage == incomingImage + if projectImage == incomingImage { + return true + } + pIdx := strings.IndexByte(projectImage, '/') + iIdx := strings.IndexByte(incomingImage, '/') + if pIdx <= 0 || iIdx <= 0 { + return false + } + pHost, pPath := projectImage[:pIdx], projectImage[pIdx:] + iHost, iPath := incomingImage[:iIdx], incomingImage[iIdx:] + return strings.EqualFold(pHost, iHost) && pPath == iPath } // siteRefMatches reports whether a Git ref (e.g. "refs/heads/main" or diff --git a/web/src/lib/api.ts b/web/src/lib/api.ts index 7e1608c..4f0f73a 100644 --- a/web/src/lib/api.ts +++ b/web/src/lib/api.ts @@ -4,6 +4,7 @@ import type { ContainerStatsSample, SystemStats, SystemStatsSample, + TopContainerSample, Deploy, DeployLog, DockerHealth, @@ -708,8 +709,8 @@ export function fetchTopContainers( by: 'cpu' | 'memory' = 'cpu', limit = 5, signal?: AbortSignal -): Promise { - return get(`/api/system/stats/top?by=${by}&limit=${limit}`, signal); +): Promise { + return get(`/api/system/stats/top?by=${by}&limit=${limit}`, signal); } export function fetchStaticSiteStats(id: string, signal?: AbortSignal): Promise { diff --git a/web/src/lib/components/ContainerStats.svelte b/web/src/lib/components/ContainerStats.svelte index b1bf64d..555cc07 100644 --- a/web/src/lib/components/ContainerStats.svelte +++ b/web/src/lib/components/ContainerStats.svelte @@ -7,6 +7,7 @@ import type { ContainerStats, ContainerStatsSample } from '$lib/types'; import * as api from '$lib/api'; import { t } from '$lib/i18n'; + import { statsInterval } from '$lib/stores/statsInterval'; import ResourceChart from './ResourceChart.svelte'; import type { EChartsOption } from 'echarts'; @@ -74,24 +75,16 @@ }; }); - function formatBytes(bytes: number): string { - if (bytes < 1024) return `${bytes} B`; - const kb = bytes / 1024; - if (kb < 1024) return `${kb.toFixed(0)} KB`; - const mb = kb / 1024; - if (mb < 1024) return `${mb.toFixed(1)} MB`; - const gb = mb / 1024; - return `${gb.toFixed(2)} GB`; - } + import { formatBytes } from '$lib/format/bytes'; - const cpuColor = $derived(() => { + const cpuColor = $derived.by(() => { if (!stats) return 'bg-gray-300'; if (stats.cpu_percent > 80) return 'bg-red-500'; if (stats.cpu_percent > 50) return 'bg-amber-500'; return 'bg-emerald-500'; }); - const memColor = $derived(() => { + const memColor = $derived.by(() => { if (!stats) return 'bg-gray-300'; if (stats.memory_percent > 80) return 'bg-red-500'; if (stats.memory_percent > 50) return 'bg-amber-500'; @@ -151,7 +144,7 @@ {$t('stats.cpu')}
@@ -164,7 +157,7 @@ {$t('stats.mem')}
@@ -184,7 +177,9 @@ {#if expanded} {#if history.length === 0}

- {$t('resources.noSamples', { interval: '15' })} + {$statsInterval > 0 + ? $t('resources.noSamples', { interval: String($statsInterval) }) + : $t('resources.collectionDisabled')}

{:else}
diff --git a/web/src/lib/components/InstanceCard.svelte b/web/src/lib/components/InstanceCard.svelte index 1bb8f27..9d388c6 100644 --- a/web/src/lib/components/InstanceCard.svelte +++ b/web/src/lib/components/InstanceCard.svelte @@ -32,7 +32,7 @@ : instance.subdomain ? `https://${instance.subdomain}` : '' ); - const timeSinceCreated = $derived(() => $fmt.relative(instance.created_at)); + const timeSinceCreated = $derived($fmt.relative(instance.created_at)); async function handleAction(action: 'stop' | 'start' | 'restart' | 'remove') { loading = true; @@ -90,7 +90,7 @@
:{instance.port} - {timeSinceCreated()} + {timeSinceCreated}
diff --git a/web/src/lib/components/ProjectCard.svelte b/web/src/lib/components/ProjectCard.svelte index d65fcee..914fdb7 100644 --- a/web/src/lib/components/ProjectCard.svelte +++ b/web/src/lib/components/ProjectCard.svelte @@ -19,7 +19,7 @@ const failedCount = $derived(instances.filter((i) => i.status === 'failed').length); const totalCount = $derived(instances.length); - const overallStatus = $derived(() => { + const overallStatus = $derived.by<'failed' | 'running' | 'stopped'>(() => { if (failedCount > 0) return 'failed'; if (runningCount > 0) return 'running'; if (stoppedCount > 0) return 'stopped'; @@ -41,7 +41,7 @@

{project.image}

- + diff --git a/web/src/lib/components/SystemDaemonsCard.svelte b/web/src/lib/components/SystemDaemonsCard.svelte index 2a5ce4d..01eee50 100644 --- a/web/src/lib/components/SystemDaemonsCard.svelte +++ b/web/src/lib/components/SystemDaemonsCard.svelte @@ -36,12 +36,11 @@ totalContainers > 0 ? ((docker?.stopped ?? 0) / totalContainers) * 100 : 0 ); + import { formatBytes as formatBytesShared } from '$lib/format/bytes'; + function formatBytes(n: number | undefined): string { if (!n || n <= 0) return '—'; - const gb = n / 1024 ** 3; - if (gb >= 1) return `${gb.toFixed(1)} GB`; - const mb = n / 1024 ** 2; - return `${mb.toFixed(0)} MB`; + return formatBytesShared(n); } function formatMs(n: number | undefined): string { @@ -95,7 +94,7 @@ {:else if !dockerConnected}
- {docker?.error ?? 'Docker daemon is not reachable.'} + {docker?.error ?? $t('daemons.dockerNotReachable')}

{$t('daemons.dockerHint')}

{:else} diff --git a/web/src/lib/components/SystemResourcesCard.svelte b/web/src/lib/components/SystemResourcesCard.svelte index 4fafe1b..beb449c 100644 --- a/web/src/lib/components/SystemResourcesCard.svelte +++ b/web/src/lib/components/SystemResourcesCard.svelte @@ -3,31 +3,23 @@ breakdown + top consumers. Drops into the dashboard as its own section. -->