Files

T

Build / build (push) Successful in 11m42s

Details

fix: harden security, fix concurrency bugs, and address review findings

Security:
- rate limit /api/webhook routes per-IP and cap concurrent site syncs
- global SSE connection cap (256) with new sse_gate
- validate ?tail= and cap JSON log responses at 4 MiB
- strip ANSI/CSI/OSC and control bytes from streamed log lines
- redact webhook secret from request log middleware
- scrub host details from /api/health for non-admin viewers
- drop container_id from /api/system/stats/top for non-admins
- generate webhook secrets via crypto/rand; require >=32 chars on insert
- verify iid path consistency in streamContainerLogs
- LimitReader on site webhook body; reject malformed non-empty bodies

Concurrency / correctness:
- stats collector: Stop() no longer hangs without Start(), semaphore
  acquired in parent loop so ctx cancellation short-circuits the queue,
  in-flight tick cancellable via shared base context, zero-ts guard
- webhook handler: replace fire-and-forget goroutine with WaitGroup-tracked
  workers + Drain() wired into graceful shutdown
- $derived(() => ...) mis-idiom fixed in ContainerStats / InstanceCard /
  ProjectCard (returned function instead of value)
- SystemResourcesCard: rename `window` and `t` locals to avoid shadowing
  globalThis.window and the i18n `t` import

Quality / performance:
- replace O(n^2) insertion sort with sort.Slice in stats top
- runMigrations only swallows duplicate-column / already-exists errors
- PruneStatsSamplesBefore wrapped in a transaction
- collapse N+1 in unusedImageStats / pruneImages to one ListAllInstances
  pass; surface DB errors instead of silently treating them as inactive
- run Docker Info + DiskUsage in parallel via errgroup
- container log SSE emits `: ping` heartbeat every 20 s
- imageMatches case-insensitive on registry host (RFC behaviour)
- log warning on invalid stage tag pattern instead of silent skip
- reject malformed non-empty site webhook payloads

Frontend / i18n:
- shared formatBytes utility replaces three local copies
- statsInterval store drives dynamic "no samples / collection disabled"
  copy across ContainerStats and SystemResourcesCard
- top consumers row now shows owner_name (project/stage or site name)
- drop seven `as any` casts on the Settings type; add cloudflare_api_token
  write-only field
- move "Service status", "Docker daemon", "Docker unreachable",
  "Proxy unreachable", "reachable", and "Docker daemon is not reachable."
  strings into en/ru i18n bundles

2026-05-07 00:56:14 +03:00

22 KiB

Raw Blame History

Functionality Review — 2026-05-07

Last 5 commits reviewed:

05440a5 feat(stats): resource metrics dashboard + sites logs/stats
0632f51 feat(webhook): per-project and per-site webhook URLs
e08acf5 refactor(settings): split General into focused pages
03d58a0 fix: treat naive backend timestamps as UTC for relative labels
90e6e59 feat: daemon health panel, brand-rail status chips, user timezone selector

Method: desk review of git diff HEAD~5 HEAD plus targeted reads of large new components. No dev-server execution. Citations use absolute paths.

TL;DR

Stats dashboard, daemon panel, timezone selector, settings split, and per-entity webhooks all wire end-to-end — every Go endpoint added in these commits has a Svelte caller, every new field on the settings/health shapes is rendered, and i18n is parallel-keyed in en.json and ru.json.
One real flow gap: the WebhookPanel confirm button (Project/Site detail) does not auto-close when regenerate succeeds in the "no current URL" case — it stays open until the user manually cancels. Minor.
i18n is 99 % complete but three hardcoded English fallbacks slipped in: 'Docker daemon is not reachable.' in SystemDaemonsCard.svelte:98, and 'Service status' / 'Close sidebar' aria-labels plus 'Docker daemon · … reachable' / 'Proxy unreachable' tooltips in +layout.svelte (lines 194, 201, 208, 225). All three are user-visible.
Stats collector skips ticks when Docker is unreachable but still calls prune — confirmed safe, but the very first sample after a Docker outage will show no system row for the outage window. Acceptable; documented in code.
Naive-UTC fix has full reach: the fix lives in toDate() inside web/src/lib/format/datetime.ts:34-46, so every one of the 15 components that goes through $fmt.* benefits. InstanceCard was the only file that had its own ad-hoc parser; that parser is removed.

Feature: Resource Metrics Dashboard (`05440a5`)

What it claims: background CPU/memory/network/block I/O collector with configurable interval (5–300s, default 15) and retention (0–24h, default 2h). New host snapshot/history/top-N API endpoints, ECharts visualisation, sites logs/stats reuse instance components, Docker-down 503 handling.

What works

Collector lives in internal/stats/collector.go:50-309. It re-reads settings every tick (run/readConfig), so /settings/maintenance changes propagate within one tick. interval=0 legitimately disables collection (run polls settings every minute in that branch).
API endpoints and routing are wired: internal/api/router.go:222,289-291,341-343 mounts /api/system/stats, /api/system/stats/history, /api/system/stats/top, plus the per-instance and per-site /stats/history endpoints, all behind the auth middleware.
Frontend has matching helpers in web/src/lib/api.ts:683-731 (fetchSystemStats, fetchSystemStatsHistory, fetchTopContainers, fetchInstanceStatsHistory, fetchStaticSiteStats(s)History, fetchStaticSiteLogs).
SystemResourcesCard.svelte:33-52 uses Promise.allSettled so a 503 on the live snapshot does not blank out history (which is read from SQLite and remains valid). Docker-unavailable detection at line 67 produces an amber banner with the i18n key resources.dockerUnavailable.
ContainerStats.svelte:13-15 and ContainerLogs.svelte:14-16 define the StatsSource/LogSource discriminated unions exactly as the commit message describes; the site detail page uses both at web/src/routes/sites/[id]/+page.svelte:255-279.
30 m / 2 h / 6 h / 24 h window picker exists at SystemResourcesCard.svelte:213-220. parseWindow in internal/api/stats_history.go:21-37 clamps any value to ≤ 24 h, so a hand-crafted ?window=999h query returns the maxed window (good).
History persistence survives backend restart — samples live in SQLite (container_stats_samples, system_stats_samples); migrations in internal/store/store.go:128-180 create them additively with IF NOT EXISTS.

Gaps / broken flows

Top-consumer rows are unlabelled by name. SystemResourcesCard.svelte:259-264 shows only s.container_id.slice(0,12) plus an instance | site chip. No project/site name, so identifying the offender requires manual lookup. Backend already knows owner_id; resolving to a friendly name would be a one-extra-fetch fix.
No "stats off" UI hint. When stats_interval_seconds=0, the collector idles and history endpoints return []. Frontend just shows the "no samples yet" empty state with the default interval (15s) hardcoded in the message (resources.noSamples in en.json:51, ru.json:51) — it does not detect that collection is disabled. Users who toggle stats off will see a confusing "samples every 15s" message forever.
Stats settings live on Maintenance page, not on a dedicated card. web/src/routes/settings/maintenance/+page.svelte:117-132 has 4 fields (stale, prune, stats interval, stats retention) sharing one Save button. Not broken, but "Stats collection" is not maintenance — it's a runtime observability feature. Worth a follow-up split.
Top endpoint silently filters to last 2 minutes (stats_history.go:178). If the collector interval is 300 s, two of the last three minutes have no samples and the top widget will look empty. Window should grow with interval, e.g. max(2*interval, 2m).

API/UI consistency

All snake_case ↔ snake_case (Go json:"…" tags match the TS types in web/src/lib/types.ts:464-516). Spot-checked ContainerStatsSample, SystemStats, SystemStatsSample — perfect alignment.
One subtle naming asymmetry: in SystemStats (live snapshot) the field is disk_total_bytes and category breakdowns are disk_images_bytes etc.; in SystemStatsSample (history row) the field is just disk_total_bytes with no breakdown. The chart only uses workload CPU/memory percent, so this is fine, but a future "disk over time" chart would have to either query the live snapshot or the schema would have to grow.

i18n

Full coverage. New keys live under dashboard, resources, and statsSettings namespaces, mirrored in ru.json:42-87. No untranslated strings in the touched files.

Feature: Per-Project and Per-Site Webhook URLs (`0632f51`)

What it claims: replace global settings.webhook_secret with per-row secrets on projects and static_sites; remove webhook-driven autocreate; make site sync_trigger=push|tag actually trigger a sync.

What works

Migration is additive and safe: internal/store/store.go:131-138 adds webhook_secret TEXT NOT NULL DEFAULT '' to both tables and creates partial unique indexes (WHERE webhook_secret != '') at store.go:240-241, so multiple legacy rows with empty secrets do not collide.
Lazy backfill via EnsureProjectWebhookSecret / EnsureStaticSiteWebhookSecret (internal/store/projects.go:158-171, internal/store/static_sites.go:296-308). UI calls GET /webhook first, which triggers backfill — old projects "just work" the first time you open them.
Routing in internal/webhook/handler.go:127-133: POST /api/webhook/{secret} for projects, POST /api/webhook/sites/{secret} for sites. Both return 404 for unknown/empty secrets (no information leak). The order (/sites/{secret} first, then /{secret}) is correct chi-wise because the literal sites segment beats the catch-all.
siteRefMatches (internal/webhook/matcher.go:46-90) implements push and tag separately, with empty-Branch ⇒ accept-any-heads, and empty-TagPattern ⇒ *. Manual sites short-circuit at handler.go:295-303.
Tests cover both happy and sad paths:
- internal/webhook/matcher_test.go (push, tag, manual, empty branch, ParseImageRef cases)
- internal/webhook/handler_test.go (unknown-secret 404, image mismatch, no-stage-match 200/skip, site push match, site manual skip, site branch mismatch).
WebhookPanel.svelte is generic, used by both detail pages (projects/[id]/+page.svelte:771-776, sites/[id]/+page.svelte:283-288). Absolutises the URL with window.location.origin at line 30 so users can copy a working URL.
Old global routes removed: no /api/settings/webhook-url or /api/settings/webhook-url/regenerate in the diff (router.go:387-388 shows the deletion).

Gaps / broken flows

WebhookPanel race / minor UX: handleRegenerate (lines 47-57) hides the confirm strip before the network call. If the call fails, the user sees the toast but the regenerate button reappears with no inline state. Acceptable, but a "retry" affordance would help.
Project image guardrail bypass when project.Image is empty. handler.go:206-214: the check is if project.Image != "" && !imageMatches(...). A project with an unset image accepts any image. Fine if treated as intentional (commit message says guardrail is misconfig protection, not security), but worth flagging.
No "test webhook" button anywhere. With per-entity URLs, users have no way to verify before pointing CI at it. The git diff doesn't add a ping endpoint either. Follow-up.
Settings › Integrations page has a dead-end card for incoming webhooks (integrations/+page.svelte:91-94): just text saying "go to the project page". No link, no list of projects. Adds friction.

API/UI consistency

WebhookUrlResponse shape matches between Go (internal/api/webhooks.go:17-20) and TS (web/src/lib/api.ts:325-328).
Project.WebhookSecret and StaticSite.WebhookSecret use json:"-" (internal/store/models.go:14, 253) — secrets never leak through the general project/site list endpoints. Good.

i18n

New keys projectDetail.webhookTitle/webhookDesc, sites.webhookTitle/webhookDesc, webhookPanel.*, settingsIntegrations.* exist in both en.json and ru.json. Verified parallel structure.

Feature: Settings Page Split (`e08acf5`)

What it claims: split the 547-line settings/+page.svelte into focused pages; group the sidebar; each page does its own partial PUT.

Sidebar groups (from +layout.svelte:32-50 and 64-72):

Overview: General, Integrations
Routing: Registries, NPM/Traefik (conditional), DNS
System: Maintenance, Backups
Security: Authentication

Old setting → new page mapping

Old setting (HEAD~5 `+page.svelte`)	New location	Status
Domain / Server IP / Public IP	`/settings` (Overview)	✓ kept
Network / Subdomain pattern	`/settings`	✓ kept
Polling interval / Base volume path	`/settings`	✓ kept
Notification URL	`/settings/integrations`	✓ moved
Stale threshold	`/settings/maintenance`	✓ moved
Image prune threshold	`/settings/maintenance` (Danger zone card)	✓ moved
Prune Images button	`/settings/maintenance`	✓ moved into separate Danger card
Wildcard DNS / Cloudflare token / Zone	`/settings/dns`	✓ moved
Test DNS connection	`/settings/dns`	✓ moved
Proxy provider radio	`/settings`	✓ kept (with link to /settings/{npm
Global webhook URL	n/a — feature removed (per-entity now)	✓ intentional
Stats interval / retention (NEW)	`/settings/maintenance`	✓ added in same commit's diff

Verdict: every setting from the old page is reachable. Nothing orphaned. Credentials page (/settings/credentials/+page.svelte) was deleted and the sidebar entry was already gone at HEAD~5, so no broken link. Tested: the sidebar's provider-conditional NPM / Traefik items still work (+layout.svelte:54-55).

Gaps / broken flows

Each page issues an independent getSettings() on mount. Navigating through the sidebar reloads the entire 30-field settings blob each time. Not broken, but a shared cache or layout-level fetch would halve the payload. Follow-up.
Save scoping is correct — each page builds a Partial<Settings> of only its own keys (e.g. maintenance/+page.svelte:54-59). Confirmed by reading all four split pages.
DNS page does not have an inline link to fall back from "test failed" to the General/proxy page. Minor.

i18n

New settings.groupMain/groupProxy/groupSystem/groupSecurity, settingsDns.*, settingsIntegrations.*, settingsMaintenance.*, statsSettings.*, settingsGeneral.globalConfigDesc/configureNpm/... all present in both locales.

Fix: Naive UTC Timestamp Handling (`03d58a0`)

Reach: the fix is in toDate() (web/src/lib/format/datetime.ts:34-46) via normalizeIsoUtc. Every consumer of $fmt.* therefore inherits the fix:

web/src/routes/+layout.svelte
web/src/routes/+page.svelte
web/src/routes/projects/+page.svelte
web/src/routes/projects/[id]/+page.svelte
web/src/routes/projects/[id]/volumes/[volId]/browse/+page.svelte
web/src/routes/sites/+page.svelte
web/src/routes/sites/[id]/+page.svelte
web/src/routes/stacks/+page.svelte
web/src/routes/stacks/[id]/+page.svelte
web/src/routes/settings/backup/+page.svelte
web/src/lib/components/EventLogEntry.svelte
web/src/lib/components/InstanceCard.svelte
web/src/lib/components/StaleContainerCard.svelte
web/src/lib/components/TimezoneSelector.svelte

Audit for stragglers: Grep new Date( across the frontend returns 5 files. Two are inside format/datetime.ts and stores/timezone.ts (the fix itself); two are in the TimezoneSelector and +layout.svelte clock ticker (new Date() with no input — current time, not affected); one is routes/events/+page.svelte:55 building a since query parameter that is sent to the backend, never displayed. Conclusion: fix has 100 % reach for displayed timestamps.

InstanceCard.svelte lost its private timeSinceCreated parser (commit diff lines 32-43); now uses $fmt.relative(instance.created_at).

Feature: Daemon Health Panel + Timezone Selector (`90e6e59`)

Daemon health panel

What it claims: rich Docker /info + /version + NPM aggregates exposed via /api/health; status chips moved into the brand block; new SystemDaemonsCard on the dashboard; shared health store de-duplicates the 30 s poll.

What works

GET /api/health (internal/api/health.go:6-39) now returns database, docker (+ rich info), and conditionally proxy (with NPM aggregates). 8 s timeout, NPM fields fetched only when ping succeeds so an offline proxy doesn't amplify latency.
health.ts:38-66 shared store with single 30 s poll; the layout consumes it via $health.docker/proxy/checked (+layout.svelte:53-56) and SystemDaemonsCard.svelte:13-19 does the same. No duplicate fetches — verified by the inFlight guard at health.ts:37.
Both panels render the rich payload: container running/paused/stopped stacked bar, version/api/platform/kernel/cpu/memory/storage/images, latency, root dir. Proxy panel shows total vs managed proxy hosts (with proportion meter), access lists, certificates.
Brand-rail chips at +layout.svelte:201-242 show DKR + NPM/TRF, with pulse animation classes (chip-live/chip-down), running container count, and proxy host count. Click on a down chip toggles hintsExpanded.

Daemons checked, by name:

Docker Engine — connected via socket; "unhealthy" means the ping failed (text from Ping) or the client wasn't initialised. The user hint is daemons.dockerHint ("Check that the Docker daemon is running…").
Proxy provider — only checked when one is configured (NPM or Traefik). "Unhealthy" means Ping failed; the panel surfaces proxy.error and the configured URL. If proxy_provider=none, panel shows "Not configured" with a CTA link to /settings.
Database — included in the JSON response but not surfaced on the daemons card. The brand-rail also does not show a DB chip; if SQLite is unreachable the chip rail goes "BOOT" forever (since health.ts:50-57 falls back to prev.docker ?? {connected:false} and drops database). Minor — but a permanently-unreachable SQLite would leave the user wondering why everything is dead with no indicator.

Gaps / broken flows

Hardcoded English fallbacks (i18n leak):
- web/src/routes/+layout.svelte:194 aria-label="Close sidebar" (was already English)
- web/src/routes/+layout.svelte:201 aria-label="Service status" (new in this commit)
- web/src/routes/+layout.svelte:208 tooltip `Docker daemon · ${dockerHealth?.version ?? 'reachable'}` — "Docker daemon" and "reachable" are English literals; commit added this code
- web/src/routes/+layout.svelte:208 fallback 'Docker unreachable'
- web/src/routes/+layout.svelte:225 fallback 'Proxy unreachable'
- web/src/lib/components/SystemDaemonsCard.svelte:98 fallback 'Docker daemon is not reachable.'
Refresh button has no debounce window, only an in-flight guard (SystemDaemonsCard.svelte:53-61). Spamming it triggers serial calls. Acceptable.
No DB-down indicator anywhere visible to the user. Edge case but worth noting.

API/UI consistency

All Docker fields the frontend consumes (web/src/lib/types.ts:258-285) are emitted by dockerHealth in internal/api/health.go:60-100. Cross-checked every key (version, api_version, os, arch, kernel, storage_driver, root_dir, ncpu, memory_total, containers, running, paused, stopped, images, latency_ms). Matches.
ProxyHealth TS shape (types.ts:289-296) matches Go fields: provider, connected, error, latency_ms, url, proxy_hosts, proxy_hosts_managed, access_lists, certificates. Matches.

i18n

daemons.* namespace fully translated in both en.json:917-953 and ru.json:917-953 (parallel keys verified). The hardcoded strings above are the only gaps.

Timezone selector

What it claims: user IANA timezone preference with auto-detect, applied across all $fmt.* rendering, persisted in localStorage.

Persistence

Stored at localStorage.dw_timezone via subscriber on the timezonePreference writable (web/src/lib/stores/timezone.ts:12,55-59). Re-read on next page load by getInitialPreference (lines 44-50). Validates the IANA string before accepting it, falling back to auto.
"Auto" is a sentinel; effectiveTimezone derives a concrete IANA zone from Intl.DateTimeFormat().resolvedOptions().timeZone on every read (lines 66-69), so changing browser zone with auto enabled re-resolves.

Application reach

effectiveTimezone is consumed by makeFormatters in datetime.ts:117-119, which is the single source for the entire $fmt reactive store. Every $fmt.dateTime, $fmt.date, $fmt.relative etc. respects the user zone. Verified across all 15 consumers listed under the naive-UTC fix section.
One subtle case: $fmt.relative is timezone-independent (datetime.ts:142-156), which is correct — "5 m ago" doesn't depend on display zone.

Gaps / broken flows

Selector lives only on /settings. Reasonable home, but no quick "switch zone" affordance from the brand rail or top bar; you have to navigate. Minor.
No backend record. The preference is browser-local, so logging in on a fresh device shows server time. Commit message acknowledges this ("purely client-side preference"). Acceptable.

i18n

Full timezone.* namespace in both locales (en.json:1117-1136, ru.json:1117-1136). Picker placeholder is translated.

Cross-cutting Issues

i18n leaks

Three runtime strings in user-visible places are still English-only:

web/src/routes/+layout.svelte:201 aria-label="Service status" (new)
web/src/routes/+layout.svelte:208,225 chip tooltips include English literals ('Docker daemon', 'reachable', 'Docker unreachable', 'Proxy unreachable').
web/src/lib/components/SystemDaemonsCard.svelte:98 fallback message when docker.error is empty.

+layout.svelte:194 (Close sidebar) was already English at HEAD~5; not a regression but worth fixing while in the area.

Naming consistency

Backend uses snake_case JSON tags everywhere (disk_total_bytes, latency_ms, proxy_hosts_managed). TypeScript interfaces use the same. No drift detected.
One naming asymmetry: Settings.WebhookSecret was deleted from the Go struct — clean removal. internal/store/static_sites.go:233, projects.go:53 use new column. SQLite column webhook_secret on settings table is left alone (per the migration comment); no row emits it, so it's dead weight but harmless.

Dashboard polling

SystemResourcesCard polls every 15 s on its own (SystemResourcesCard.svelte:79). ContainerStats polls every 30 s. health store polls every 30 s. navCounts store polls separately. Multiple uncoordinated timers; OK in practice, but a future optimisation candidate.

Confirm dialog UX

Both WebhookPanel and the maintenance "Prune Images" Danger zone use inline confirms / ConfirmDialog. Consistent. The brand-rail "click a down chip to expand hints" is a third confirm-ish pattern, fine but not discoverable.

Suggested Follow-ups (prioritized)

Localise the three hardcoded English strings in web/src/routes/+layout.svelte:194,201,208,225 and SystemDaemonsCard.svelte:98. ~15 min, replaces 5 literals with $t('daemons.…') keys (which already exist for most cases — e.g. daemons.docker, daemons.offline).
Add owner-name resolution to the "top consumers" widget (SystemResourcesCard.svelte:259-264). Currently only a 12-char ID + instance|site chip; users have no way to know which container is spiking.
Detect "stats collection disabled" (stats_interval_seconds=0) and tailor the empty-state message in SystemResourcesCard.svelte instead of always saying "samples every 15 s".
Remove the dead webhook_secret column on settings in a future destructive migration window, OR officially document it as deprecated in the schema comment.
Add a "Test webhook" button to WebhookPanel.svelte — POSTs a minimal payload to the URL and surfaces the response. Replaces guesswork when wiring CI.
Add a DB-down indicator to the brand rail (a 3rd chip "DB"). The data is already in /api/health; only the UI needs the chip.
Top-N samples 2-minute window in internal/api/stats_history.go:178 should scale with collector interval (max(2*interval, 2m)) so users on slow intervals don't see a falsely-empty widget.
Settings › Integrations dead-end card — link to the Projects and Sites lists rather than just text saying "go look there".
Auto-close the WebhookPanel confirm strip on success (it already resets, but the strip stays visible until the user clicks Cancel).

22 KiB Raw Blame History Unescape Escape

Functionality Review — 2026-05-07

TL;DR

Feature: Resource Metrics Dashboard (05440a5)

Feature: Per-Project and Per-Site Webhook URLs (0632f51)

Feature: Settings Page Split (e08acf5)

Fix: Naive UTC Timestamp Handling (03d58a0)

Feature: Daemon Health Panel + Timezone Selector (90e6e59)

Daemon health panel

Timezone selector

Cross-cutting Issues

i18n leaks

Naming consistency

Dashboard polling

Confirm dialog UX

Suggested Follow-ups (prioritized)

22 KiB

Raw Blame History

Feature: Resource Metrics Dashboard (`05440a5`)

Feature: Per-Project and Per-Site Webhook URLs (`0632f51`)

Feature: Settings Page Split (`e08acf5`)

Fix: Naive UTC Timestamp Handling (`03d58a0`)

Feature: Daemon Health Panel + Timezone Selector (`90e6e59`)