Files
tiny-forge/docs/reviews/functionality-review-2026-05-07.md
T
alexei.dolgolyov a4362b842d
Build / build (push) Successful in 11m42s
fix: harden security, fix concurrency bugs, and address review findings
Security:
- rate limit /api/webhook routes per-IP and cap concurrent site syncs
- global SSE connection cap (256) with new sse_gate
- validate ?tail= and cap JSON log responses at 4 MiB
- strip ANSI/CSI/OSC and control bytes from streamed log lines
- redact webhook secret from request log middleware
- scrub host details from /api/health for non-admin viewers
- drop container_id from /api/system/stats/top for non-admins
- generate webhook secrets via crypto/rand; require >=32 chars on insert
- verify iid path consistency in streamContainerLogs
- LimitReader on site webhook body; reject malformed non-empty bodies

Concurrency / correctness:
- stats collector: Stop() no longer hangs without Start(), semaphore
  acquired in parent loop so ctx cancellation short-circuits the queue,
  in-flight tick cancellable via shared base context, zero-ts guard
- webhook handler: replace fire-and-forget goroutine with WaitGroup-tracked
  workers + Drain() wired into graceful shutdown
- $derived(() => ...) mis-idiom fixed in ContainerStats / InstanceCard /
  ProjectCard (returned function instead of value)
- SystemResourcesCard: rename `window` and `t` locals to avoid shadowing
  globalThis.window and the i18n `t` import

Quality / performance:
- replace O(n^2) insertion sort with sort.Slice in stats top
- runMigrations only swallows duplicate-column / already-exists errors
- PruneStatsSamplesBefore wrapped in a transaction
- collapse N+1 in unusedImageStats / pruneImages to one ListAllInstances
  pass; surface DB errors instead of silently treating them as inactive
- run Docker Info + DiskUsage in parallel via errgroup
- container log SSE emits `: ping` heartbeat every 20 s
- imageMatches case-insensitive on registry host (RFC behaviour)
- log warning on invalid stage tag pattern instead of silent skip
- reject malformed non-empty site webhook payloads

Frontend / i18n:
- shared formatBytes utility replaces three local copies
- statsInterval store drives dynamic "no samples / collection disabled"
  copy across ContainerStats and SystemResourcesCard
- top consumers row now shows owner_name (project/stage or site name)
- drop seven `as any` casts on the Settings type; add cloudflare_api_token
  write-only field
- move "Service status", "Docker daemon", "Docker unreachable",
  "Proxy unreachable", "reachable", and "Docker daemon is not reachable."
  strings into en/ru i18n bundles
2026-05-07 00:56:14 +03:00

22 KiB
Raw Blame History

Functionality Review — 2026-05-07

Last 5 commits reviewed:

  1. 05440a5 feat(stats): resource metrics dashboard + sites logs/stats
  2. 0632f51 feat(webhook): per-project and per-site webhook URLs
  3. e08acf5 refactor(settings): split General into focused pages
  4. 03d58a0 fix: treat naive backend timestamps as UTC for relative labels
  5. 90e6e59 feat: daemon health panel, brand-rail status chips, user timezone selector

Method: desk review of git diff HEAD~5 HEAD plus targeted reads of large new components. No dev-server execution. Citations use absolute paths.

TL;DR

  • Stats dashboard, daemon panel, timezone selector, settings split, and per-entity webhooks all wire end-to-end — every Go endpoint added in these commits has a Svelte caller, every new field on the settings/health shapes is rendered, and i18n is parallel-keyed in en.json and ru.json.
  • One real flow gap: the WebhookPanel confirm button (Project/Site detail) does not auto-close when regenerate succeeds in the "no current URL" case — it stays open until the user manually cancels. Minor.
  • i18n is 99 % complete but three hardcoded English fallbacks slipped in: 'Docker daemon is not reachable.' in SystemDaemonsCard.svelte:98, and 'Service status' / 'Close sidebar' aria-labels plus 'Docker daemon · … reachable' / 'Proxy unreachable' tooltips in +layout.svelte (lines 194, 201, 208, 225). All three are user-visible.
  • Stats collector skips ticks when Docker is unreachable but still calls prune — confirmed safe, but the very first sample after a Docker outage will show no system row for the outage window. Acceptable; documented in code.
  • Naive-UTC fix has full reach: the fix lives in toDate() inside web/src/lib/format/datetime.ts:34-46, so every one of the 15 components that goes through $fmt.* benefits. InstanceCard was the only file that had its own ad-hoc parser; that parser is removed.

Feature: Resource Metrics Dashboard (05440a5)

What it claims: background CPU/memory/network/block I/O collector with configurable interval (5300s, default 15) and retention (024h, default 2h). New host snapshot/history/top-N API endpoints, ECharts visualisation, sites logs/stats reuse instance components, Docker-down 503 handling.

What works

  • Collector lives in internal/stats/collector.go:50-309. It re-reads settings every tick (run/readConfig), so /settings/maintenance changes propagate within one tick. interval=0 legitimately disables collection (run polls settings every minute in that branch).
  • API endpoints and routing are wired: internal/api/router.go:222,289-291,341-343 mounts /api/system/stats, /api/system/stats/history, /api/system/stats/top, plus the per-instance and per-site /stats/history endpoints, all behind the auth middleware.
  • Frontend has matching helpers in web/src/lib/api.ts:683-731 (fetchSystemStats, fetchSystemStatsHistory, fetchTopContainers, fetchInstanceStatsHistory, fetchStaticSiteStats(s)History, fetchStaticSiteLogs).
  • SystemResourcesCard.svelte:33-52 uses Promise.allSettled so a 503 on the live snapshot does not blank out history (which is read from SQLite and remains valid). Docker-unavailable detection at line 67 produces an amber banner with the i18n key resources.dockerUnavailable.
  • ContainerStats.svelte:13-15 and ContainerLogs.svelte:14-16 define the StatsSource/LogSource discriminated unions exactly as the commit message describes; the site detail page uses both at web/src/routes/sites/[id]/+page.svelte:255-279.
  • 30 m / 2 h / 6 h / 24 h window picker exists at SystemResourcesCard.svelte:213-220. parseWindow in internal/api/stats_history.go:21-37 clamps any value to ≤ 24 h, so a hand-crafted ?window=999h query returns the maxed window (good).
  • History persistence survives backend restart — samples live in SQLite (container_stats_samples, system_stats_samples); migrations in internal/store/store.go:128-180 create them additively with IF NOT EXISTS.

Gaps / broken flows

  • Top-consumer rows are unlabelled by name. SystemResourcesCard.svelte:259-264 shows only s.container_id.slice(0,12) plus an instance | site chip. No project/site name, so identifying the offender requires manual lookup. Backend already knows owner_id; resolving to a friendly name would be a one-extra-fetch fix.
  • No "stats off" UI hint. When stats_interval_seconds=0, the collector idles and history endpoints return []. Frontend just shows the "no samples yet" empty state with the default interval (15s) hardcoded in the message (resources.noSamples in en.json:51, ru.json:51) — it does not detect that collection is disabled. Users who toggle stats off will see a confusing "samples every 15s" message forever.
  • Stats settings live on Maintenance page, not on a dedicated card. web/src/routes/settings/maintenance/+page.svelte:117-132 has 4 fields (stale, prune, stats interval, stats retention) sharing one Save button. Not broken, but "Stats collection" is not maintenance — it's a runtime observability feature. Worth a follow-up split.
  • Top endpoint silently filters to last 2 minutes (stats_history.go:178). If the collector interval is 300 s, two of the last three minutes have no samples and the top widget will look empty. Window should grow with interval, e.g. max(2*interval, 2m).

API/UI consistency

  • All snake_case ↔ snake_case (Go json:"…" tags match the TS types in web/src/lib/types.ts:464-516). Spot-checked ContainerStatsSample, SystemStats, SystemStatsSample — perfect alignment.
  • One subtle naming asymmetry: in SystemStats (live snapshot) the field is disk_total_bytes and category breakdowns are disk_images_bytes etc.; in SystemStatsSample (history row) the field is just disk_total_bytes with no breakdown. The chart only uses workload CPU/memory percent, so this is fine, but a future "disk over time" chart would have to either query the live snapshot or the schema would have to grow.

i18n

  • Full coverage. New keys live under dashboard, resources, and statsSettings namespaces, mirrored in ru.json:42-87. No untranslated strings in the touched files.

Feature: Per-Project and Per-Site Webhook URLs (0632f51)

What it claims: replace global settings.webhook_secret with per-row secrets on projects and static_sites; remove webhook-driven autocreate; make site sync_trigger=push|tag actually trigger a sync.

What works

  • Migration is additive and safe: internal/store/store.go:131-138 adds webhook_secret TEXT NOT NULL DEFAULT '' to both tables and creates partial unique indexes (WHERE webhook_secret != '') at store.go:240-241, so multiple legacy rows with empty secrets do not collide.
  • Lazy backfill via EnsureProjectWebhookSecret / EnsureStaticSiteWebhookSecret (internal/store/projects.go:158-171, internal/store/static_sites.go:296-308). UI calls GET /webhook first, which triggers backfill — old projects "just work" the first time you open them.
  • Routing in internal/webhook/handler.go:127-133: POST /api/webhook/{secret} for projects, POST /api/webhook/sites/{secret} for sites. Both return 404 for unknown/empty secrets (no information leak). The order (/sites/{secret} first, then /{secret}) is correct chi-wise because the literal sites segment beats the catch-all.
  • siteRefMatches (internal/webhook/matcher.go:46-90) implements push and tag separately, with empty-Branch ⇒ accept-any-heads, and empty-TagPattern ⇒ *. Manual sites short-circuit at handler.go:295-303.
  • Tests cover both happy and sad paths:
    • internal/webhook/matcher_test.go (push, tag, manual, empty branch, ParseImageRef cases)
    • internal/webhook/handler_test.go (unknown-secret 404, image mismatch, no-stage-match 200/skip, site push match, site manual skip, site branch mismatch).
  • WebhookPanel.svelte is generic, used by both detail pages (projects/[id]/+page.svelte:771-776, sites/[id]/+page.svelte:283-288). Absolutises the URL with window.location.origin at line 30 so users can copy a working URL.
  • Old global routes removed: no /api/settings/webhook-url or /api/settings/webhook-url/regenerate in the diff (router.go:387-388 shows the deletion).

Gaps / broken flows

  • WebhookPanel race / minor UX: handleRegenerate (lines 47-57) hides the confirm strip before the network call. If the call fails, the user sees the toast but the regenerate button reappears with no inline state. Acceptable, but a "retry" affordance would help.
  • Project image guardrail bypass when project.Image is empty. handler.go:206-214: the check is if project.Image != "" && !imageMatches(...). A project with an unset image accepts any image. Fine if treated as intentional (commit message says guardrail is misconfig protection, not security), but worth flagging.
  • No "test webhook" button anywhere. With per-entity URLs, users have no way to verify before pointing CI at it. The git diff doesn't add a ping endpoint either. Follow-up.
  • Settings Integrations page has a dead-end card for incoming webhooks (integrations/+page.svelte:91-94): just text saying "go to the project page". No link, no list of projects. Adds friction.

API/UI consistency

  • WebhookUrlResponse shape matches between Go (internal/api/webhooks.go:17-20) and TS (web/src/lib/api.ts:325-328).
  • Project.WebhookSecret and StaticSite.WebhookSecret use json:"-" (internal/store/models.go:14, 253) — secrets never leak through the general project/site list endpoints. Good.

i18n

  • New keys projectDetail.webhookTitle/webhookDesc, sites.webhookTitle/webhookDesc, webhookPanel.*, settingsIntegrations.* exist in both en.json and ru.json. Verified parallel structure.

Feature: Settings Page Split (e08acf5)

What it claims: split the 547-line settings/+page.svelte into focused pages; group the sidebar; each page does its own partial PUT.

Sidebar groups (from +layout.svelte:32-50 and 64-72):

  • Overview: General, Integrations
  • Routing: Registries, NPM/Traefik (conditional), DNS
  • System: Maintenance, Backups
  • Security: Authentication

Old setting → new page mapping

Old setting (HEAD~5 +page.svelte) New location Status
Domain / Server IP / Public IP /settings (Overview) ✓ kept
Network / Subdomain pattern /settings ✓ kept
Polling interval / Base volume path /settings ✓ kept
Notification URL /settings/integrations ✓ moved
Stale threshold /settings/maintenance ✓ moved
Image prune threshold /settings/maintenance (Danger zone card) ✓ moved
Prune Images button /settings/maintenance ✓ moved into separate Danger card
Wildcard DNS / Cloudflare token / Zone /settings/dns ✓ moved
Test DNS connection /settings/dns ✓ moved
Proxy provider radio /settings ✓ kept (with link to /settings/{npm
Global webhook URL n/a — feature removed (per-entity now) ✓ intentional
Stats interval / retention (NEW) /settings/maintenance ✓ added in same commit's diff

Verdict: every setting from the old page is reachable. Nothing orphaned. Credentials page (/settings/credentials/+page.svelte) was deleted and the sidebar entry was already gone at HEAD~5, so no broken link. Tested: the sidebar's provider-conditional NPM / Traefik items still work (+layout.svelte:54-55).

Gaps / broken flows

  • Each page issues an independent getSettings() on mount. Navigating through the sidebar reloads the entire 30-field settings blob each time. Not broken, but a shared cache or layout-level fetch would halve the payload. Follow-up.
  • Save scoping is correct — each page builds a Partial<Settings> of only its own keys (e.g. maintenance/+page.svelte:54-59). Confirmed by reading all four split pages.
  • DNS page does not have an inline link to fall back from "test failed" to the General/proxy page. Minor.

i18n

  • New settings.groupMain/groupProxy/groupSystem/groupSecurity, settingsDns.*, settingsIntegrations.*, settingsMaintenance.*, statsSettings.*, settingsGeneral.globalConfigDesc/configureNpm/... all present in both locales.

Fix: Naive UTC Timestamp Handling (03d58a0)

Reach: the fix is in toDate() (web/src/lib/format/datetime.ts:34-46) via normalizeIsoUtc. Every consumer of $fmt.* therefore inherits the fix:

web/src/routes/+layout.svelte
web/src/routes/+page.svelte
web/src/routes/projects/+page.svelte
web/src/routes/projects/[id]/+page.svelte
web/src/routes/projects/[id]/volumes/[volId]/browse/+page.svelte
web/src/routes/sites/+page.svelte
web/src/routes/sites/[id]/+page.svelte
web/src/routes/stacks/+page.svelte
web/src/routes/stacks/[id]/+page.svelte
web/src/routes/settings/backup/+page.svelte
web/src/lib/components/EventLogEntry.svelte
web/src/lib/components/InstanceCard.svelte
web/src/lib/components/StaleContainerCard.svelte
web/src/lib/components/TimezoneSelector.svelte

Audit for stragglers: Grep new Date( across the frontend returns 5 files. Two are inside format/datetime.ts and stores/timezone.ts (the fix itself); two are in the TimezoneSelector and +layout.svelte clock ticker (new Date() with no input — current time, not affected); one is routes/events/+page.svelte:55 building a since query parameter that is sent to the backend, never displayed. Conclusion: fix has 100 % reach for displayed timestamps.

InstanceCard.svelte lost its private timeSinceCreated parser (commit diff lines 32-43); now uses $fmt.relative(instance.created_at).

Feature: Daemon Health Panel + Timezone Selector (90e6e59)

Daemon health panel

What it claims: rich Docker /info + /version + NPM aggregates exposed via /api/health; status chips moved into the brand block; new SystemDaemonsCard on the dashboard; shared health store de-duplicates the 30 s poll.

What works

  • GET /api/health (internal/api/health.go:6-39) now returns database, docker (+ rich info), and conditionally proxy (with NPM aggregates). 8 s timeout, NPM fields fetched only when ping succeeds so an offline proxy doesn't amplify latency.
  • health.ts:38-66 shared store with single 30 s poll; the layout consumes it via $health.docker/proxy/checked (+layout.svelte:53-56) and SystemDaemonsCard.svelte:13-19 does the same. No duplicate fetches — verified by the inFlight guard at health.ts:37.
  • Both panels render the rich payload: container running/paused/stopped stacked bar, version/api/platform/kernel/cpu/memory/storage/images, latency, root dir. Proxy panel shows total vs managed proxy hosts (with proportion meter), access lists, certificates.
  • Brand-rail chips at +layout.svelte:201-242 show DKR + NPM/TRF, with pulse animation classes (chip-live/chip-down), running container count, and proxy host count. Click on a down chip toggles hintsExpanded.

Daemons checked, by name:

  • Docker Engine — connected via socket; "unhealthy" means the ping failed (text from Ping) or the client wasn't initialised. The user hint is daemons.dockerHint ("Check that the Docker daemon is running…").
  • Proxy provider — only checked when one is configured (NPM or Traefik). "Unhealthy" means Ping failed; the panel surfaces proxy.error and the configured URL. If proxy_provider=none, panel shows "Not configured" with a CTA link to /settings.
  • Database — included in the JSON response but not surfaced on the daemons card. The brand-rail also does not show a DB chip; if SQLite is unreachable the chip rail goes "BOOT" forever (since health.ts:50-57 falls back to prev.docker ?? {connected:false} and drops database). Minor — but a permanently-unreachable SQLite would leave the user wondering why everything is dead with no indicator.

Gaps / broken flows

  • Hardcoded English fallbacks (i18n leak):
    • web/src/routes/+layout.svelte:194 aria-label="Close sidebar" (was already English)
    • web/src/routes/+layout.svelte:201 aria-label="Service status" (new in this commit)
    • web/src/routes/+layout.svelte:208 tooltip `Docker daemon · ${dockerHealth?.version ?? 'reachable'}` — "Docker daemon" and "reachable" are English literals; commit added this code
    • web/src/routes/+layout.svelte:208 fallback 'Docker unreachable'
    • web/src/routes/+layout.svelte:225 fallback 'Proxy unreachable'
    • web/src/lib/components/SystemDaemonsCard.svelte:98 fallback 'Docker daemon is not reachable.'
  • Refresh button has no debounce window, only an in-flight guard (SystemDaemonsCard.svelte:53-61). Spamming it triggers serial calls. Acceptable.
  • No DB-down indicator anywhere visible to the user. Edge case but worth noting.

API/UI consistency

  • All Docker fields the frontend consumes (web/src/lib/types.ts:258-285) are emitted by dockerHealth in internal/api/health.go:60-100. Cross-checked every key (version, api_version, os, arch, kernel, storage_driver, root_dir, ncpu, memory_total, containers, running, paused, stopped, images, latency_ms). Matches.
  • ProxyHealth TS shape (types.ts:289-296) matches Go fields: provider, connected, error, latency_ms, url, proxy_hosts, proxy_hosts_managed, access_lists, certificates. Matches.

i18n

  • daemons.* namespace fully translated in both en.json:917-953 and ru.json:917-953 (parallel keys verified). The hardcoded strings above are the only gaps.

Timezone selector

What it claims: user IANA timezone preference with auto-detect, applied across all $fmt.* rendering, persisted in localStorage.

Persistence

  • Stored at localStorage.dw_timezone via subscriber on the timezonePreference writable (web/src/lib/stores/timezone.ts:12,55-59). Re-read on next page load by getInitialPreference (lines 44-50). Validates the IANA string before accepting it, falling back to auto.
  • "Auto" is a sentinel; effectiveTimezone derives a concrete IANA zone from Intl.DateTimeFormat().resolvedOptions().timeZone on every read (lines 66-69), so changing browser zone with auto enabled re-resolves.

Application reach

  • effectiveTimezone is consumed by makeFormatters in datetime.ts:117-119, which is the single source for the entire $fmt reactive store. Every $fmt.dateTime, $fmt.date, $fmt.relative etc. respects the user zone. Verified across all 15 consumers listed under the naive-UTC fix section.
  • One subtle case: $fmt.relative is timezone-independent (datetime.ts:142-156), which is correct — "5 m ago" doesn't depend on display zone.

Gaps / broken flows

  • Selector lives only on /settings. Reasonable home, but no quick "switch zone" affordance from the brand rail or top bar; you have to navigate. Minor.
  • No backend record. The preference is browser-local, so logging in on a fresh device shows server time. Commit message acknowledges this ("purely client-side preference"). Acceptable.

i18n

  • Full timezone.* namespace in both locales (en.json:1117-1136, ru.json:1117-1136). Picker placeholder is translated.

Cross-cutting Issues

i18n leaks

Three runtime strings in user-visible places are still English-only:

  1. web/src/routes/+layout.svelte:201 aria-label="Service status" (new)
  2. web/src/routes/+layout.svelte:208,225 chip tooltips include English literals ('Docker daemon', 'reachable', 'Docker unreachable', 'Proxy unreachable').
  3. web/src/lib/components/SystemDaemonsCard.svelte:98 fallback message when docker.error is empty.

+layout.svelte:194 (Close sidebar) was already English at HEAD~5; not a regression but worth fixing while in the area.

Naming consistency

  • Backend uses snake_case JSON tags everywhere (disk_total_bytes, latency_ms, proxy_hosts_managed). TypeScript interfaces use the same. No drift detected.
  • One naming asymmetry: Settings.WebhookSecret was deleted from the Go struct — clean removal. internal/store/static_sites.go:233, projects.go:53 use new column. SQLite column webhook_secret on settings table is left alone (per the migration comment); no row emits it, so it's dead weight but harmless.

Dashboard polling

SystemResourcesCard polls every 15 s on its own (SystemResourcesCard.svelte:79). ContainerStats polls every 30 s. health store polls every 30 s. navCounts store polls separately. Multiple uncoordinated timers; OK in practice, but a future optimisation candidate.

Confirm dialog UX

Both WebhookPanel and the maintenance "Prune Images" Danger zone use inline confirms / ConfirmDialog. Consistent. The brand-rail "click a down chip to expand hints" is a third confirm-ish pattern, fine but not discoverable.

Suggested Follow-ups (prioritized)

  1. Localise the three hardcoded English strings in web/src/routes/+layout.svelte:194,201,208,225 and SystemDaemonsCard.svelte:98. ~15 min, replaces 5 literals with $t('daemons.…') keys (which already exist for most cases — e.g. daemons.docker, daemons.offline).
  2. Add owner-name resolution to the "top consumers" widget (SystemResourcesCard.svelte:259-264). Currently only a 12-char ID + instance|site chip; users have no way to know which container is spiking.
  3. Detect "stats collection disabled" (stats_interval_seconds=0) and tailor the empty-state message in SystemResourcesCard.svelte instead of always saying "samples every 15 s".
  4. Remove the dead webhook_secret column on settings in a future destructive migration window, OR officially document it as deprecated in the schema comment.
  5. Add a "Test webhook" button to WebhookPanel.svelte — POSTs a minimal payload to the URL and surfaces the response. Replaces guesswork when wiring CI.
  6. Add a DB-down indicator to the brand rail (a 3rd chip "DB"). The data is already in /api/health; only the UI needs the chip.
  7. Top-N samples 2-minute window in internal/api/stats_history.go:178 should scale with collector interval (max(2*interval, 2m)) so users on slow intervals don't see a falsely-empty widget.
  8. Settings Integrations dead-end card — link to the Projects and Sites lists rather than just text saying "go look there".
  9. Auto-close the WebhookPanel confirm strip on success (it already resets, but the strip stays visible until the user clicks Cancel).