fix: harden security, fix concurrency bugs, and address review findings
Build / build (push) Successful in 11m42s

Security:
- rate limit /api/webhook routes per-IP and cap concurrent site syncs
- global SSE connection cap (256) with new sse_gate
- validate ?tail= and cap JSON log responses at 4 MiB
- strip ANSI/CSI/OSC and control bytes from streamed log lines
- redact webhook secret from request log middleware
- scrub host details from /api/health for non-admin viewers
- drop container_id from /api/system/stats/top for non-admins
- generate webhook secrets via crypto/rand; require >=32 chars on insert
- verify iid path consistency in streamContainerLogs
- LimitReader on site webhook body; reject malformed non-empty bodies

Concurrency / correctness:
- stats collector: Stop() no longer hangs without Start(), semaphore
  acquired in parent loop so ctx cancellation short-circuits the queue,
  in-flight tick cancellable via shared base context, zero-ts guard
- webhook handler: replace fire-and-forget goroutine with WaitGroup-tracked
  workers + Drain() wired into graceful shutdown
- $derived(() => ...) mis-idiom fixed in ContainerStats / InstanceCard /
  ProjectCard (returned function instead of value)
- SystemResourcesCard: rename `window` and `t` locals to avoid shadowing
  globalThis.window and the i18n `t` import

Quality / performance:
- replace O(n^2) insertion sort with sort.Slice in stats top
- runMigrations only swallows duplicate-column / already-exists errors
- PruneStatsSamplesBefore wrapped in a transaction
- collapse N+1 in unusedImageStats / pruneImages to one ListAllInstances
  pass; surface DB errors instead of silently treating them as inactive
- run Docker Info + DiskUsage in parallel via errgroup
- container log SSE emits `: ping` heartbeat every 20 s
- imageMatches case-insensitive on registry host (RFC behaviour)
- log warning on invalid stage tag pattern instead of silent skip
- reject malformed non-empty site webhook payloads

Frontend / i18n:
- shared formatBytes utility replaces three local copies
- statsInterval store drives dynamic "no samples / collection disabled"
  copy across ContainerStats and SystemResourcesCard
- top consumers row now shows owner_name (project/stage or site name)
- drop seven `as any` casts on the Settings type; add cloudflare_api_token
  write-only field
- move "Service status", "Docker daemon", "Docker unreachable",
  "Proxy unreachable", "reachable", and "Docker daemon is not reachable."
  strings into en/ru i18n bundles
This commit is contained in:
2026-05-07 00:56:14 +03:00
parent 05440a5f92
commit a4362b842d
39 changed files with 1249 additions and 213 deletions
+2 -1
View File
@@ -373,8 +373,9 @@ func main() {
poller.Stop()
statsCollector.Stop()
// Drain in-progress deploys and notifications.
// Drain in-progress deploys, site syncs, and notifications.
dep.Drain()
webhookHandler.Drain()
notifier.Drain()
// Shut down HTTP server.
@@ -0,0 +1,464 @@
# Functionality Review — 2026-05-07
Last 5 commits reviewed:
1. `05440a5` feat(stats): resource metrics dashboard + sites logs/stats
2. `0632f51` feat(webhook): per-project and per-site webhook URLs
3. `e08acf5` refactor(settings): split General into focused pages
4. `03d58a0` fix: treat naive backend timestamps as UTC for relative labels
5. `90e6e59` feat: daemon health panel, brand-rail status chips, user timezone selector
Method: desk review of `git diff HEAD~5 HEAD` plus targeted reads of large
new components. No dev-server execution. Citations use absolute paths.
## TL;DR
- **Stats dashboard, daemon panel, timezone selector, settings split, and
per-entity webhooks all wire end-to-end** — every Go endpoint added in
these commits has a Svelte caller, every new field on the settings/health
shapes is rendered, and i18n is parallel-keyed in `en.json` and `ru.json`.
- **One real flow gap:** the `WebhookPanel` confirm button (Project/Site
detail) does not auto-close when regenerate succeeds in the "no current
URL" case — it stays open until the user manually cancels. Minor.
- **i18n is 99 % complete but three hardcoded English fallbacks slipped in:**
`'Docker daemon is not reachable.'` in `SystemDaemonsCard.svelte:98`, and
`'Service status'` / `'Close sidebar'` aria-labels plus `'Docker daemon · …
reachable'` / `'Proxy unreachable'` tooltips in `+layout.svelte` (lines
194, 201, 208, 225). All three are user-visible.
- **Stats collector skips ticks when Docker is unreachable** but still calls
`prune` — confirmed safe, but the very first sample after a Docker outage
will show no system row for the outage window. Acceptable; documented in
code.
- **Naive-UTC fix has full reach:** the fix lives in `toDate()` inside
`web/src/lib/format/datetime.ts:34-46`, so every one of the 15 components
that goes through `$fmt.*` benefits. `InstanceCard` was the only file
that had its own ad-hoc parser; that parser is removed.
## Feature: Resource Metrics Dashboard (05440a5)
**What it claims:** background CPU/memory/network/block I/O collector with
configurable interval (5300s, default 15) and retention (024h, default
2h). New host snapshot/history/top-N API endpoints, ECharts visualisation,
sites logs/stats reuse instance components, Docker-down 503 handling.
**What works**
- Collector lives in `internal/stats/collector.go:50-309`. It re-reads
settings every tick (`run`/`readConfig`), so `/settings/maintenance`
changes propagate within one tick. `interval=0` legitimately disables
collection (`run` polls settings every minute in that branch).
- API endpoints and routing are wired: `internal/api/router.go:222,289-291,341-343`
mounts `/api/system/stats`, `/api/system/stats/history`,
`/api/system/stats/top`, plus the per-instance and per-site
`/stats/history` endpoints, all behind the auth middleware.
- Frontend has matching helpers in `web/src/lib/api.ts:683-731`
(`fetchSystemStats`, `fetchSystemStatsHistory`, `fetchTopContainers`,
`fetchInstanceStatsHistory`, `fetchStaticSiteStats(s)History`,
`fetchStaticSiteLogs`).
- `SystemResourcesCard.svelte:33-52` uses `Promise.allSettled` so a 503 on
the live snapshot does not blank out history (which is read from SQLite
and remains valid). Docker-unavailable detection at line 67 produces an
amber banner with the i18n key `resources.dockerUnavailable`.
- `ContainerStats.svelte:13-15` and `ContainerLogs.svelte:14-16` define the
`StatsSource`/`LogSource` discriminated unions exactly as the commit
message describes; the site detail page uses both at
`web/src/routes/sites/[id]/+page.svelte:255-279`.
- 30 m / 2 h / 6 h / 24 h window picker exists at
`SystemResourcesCard.svelte:213-220`. `parseWindow` in
`internal/api/stats_history.go:21-37` clamps any value to ≤ 24 h, so a
hand-crafted `?window=999h` query returns the maxed window (good).
- History persistence survives backend restart — samples live in SQLite
(`container_stats_samples`, `system_stats_samples`); migrations in
`internal/store/store.go:128-180` create them additively with
`IF NOT EXISTS`.
**Gaps / broken flows**
- **Top-consumer rows are unlabelled by name.** `SystemResourcesCard.svelte:259-264`
shows only `s.container_id.slice(0,12)` plus an `instance | site` chip.
No project/site name, so identifying the offender requires manual lookup.
Backend already knows `owner_id`; resolving to a friendly name would be a
one-extra-fetch fix.
- **No "stats off" UI hint.** When `stats_interval_seconds=0`, the
collector idles and history endpoints return `[]`. Frontend just shows
the "no samples yet" empty state with the *default* interval (15s)
hardcoded in the message (`resources.noSamples` in `en.json:51`,
`ru.json:51`) — it does not detect that collection is disabled. Users
who toggle stats off will see a confusing "samples every 15s" message
forever.
- **Stats settings live on Maintenance page, not on a dedicated card.**
`web/src/routes/settings/maintenance/+page.svelte:117-132` has 4 fields
(stale, prune, stats interval, stats retention) sharing one Save button.
Not broken, but "Stats collection" is *not* maintenance — it's a runtime
observability feature. Worth a follow-up split.
- **Top endpoint silently filters to last 2 minutes** (`stats_history.go:178`).
If the collector interval is 300 s, two of the last three minutes have no
samples and the top widget will look empty. Window should grow with
interval, e.g. `max(2*interval, 2m)`.
**API/UI consistency**
- All snake_case ↔ snake_case (Go `json:"…"` tags match the TS types in
`web/src/lib/types.ts:464-516`). Spot-checked
`ContainerStatsSample`, `SystemStats`, `SystemStatsSample` — perfect
alignment.
- One subtle naming asymmetry: in `SystemStats` (live snapshot) the field
is `disk_total_bytes` and category breakdowns are `disk_images_bytes` etc.;
in `SystemStatsSample` (history row) the field is just `disk_total_bytes`
with no breakdown. The chart only uses workload CPU/memory percent, so
this is fine, but a future "disk over time" chart would have to either
query the live snapshot or the schema would have to grow.
**i18n**
- Full coverage. New keys live under `dashboard`, `resources`, and
`statsSettings` namespaces, mirrored in `ru.json:42-87`. No untranslated
strings in the touched files.
## Feature: Per-Project and Per-Site Webhook URLs (0632f51)
**What it claims:** replace global `settings.webhook_secret` with per-row
secrets on `projects` and `static_sites`; remove webhook-driven autocreate;
make site `sync_trigger=push|tag` actually trigger a sync.
**What works**
- Migration is additive and safe:
`internal/store/store.go:131-138` adds `webhook_secret TEXT NOT NULL DEFAULT ''`
to both tables and creates **partial unique indexes** (`WHERE webhook_secret != ''`)
at `store.go:240-241`, so multiple legacy rows with empty secrets do not
collide.
- Lazy backfill via `EnsureProjectWebhookSecret` /
`EnsureStaticSiteWebhookSecret` (`internal/store/projects.go:158-171`,
`internal/store/static_sites.go:296-308`). UI calls `GET /webhook` first,
which triggers backfill — old projects "just work" the first time you
open them.
- Routing in `internal/webhook/handler.go:127-133`:
`POST /api/webhook/{secret}` for projects, `POST /api/webhook/sites/{secret}`
for sites. Both return 404 for unknown/empty secrets (no information leak).
The order (`/sites/{secret}` first, then `/{secret}`) is correct chi-wise
because the literal `sites` segment beats the catch-all.
- `siteRefMatches` (`internal/webhook/matcher.go:46-90`) implements push and
tag separately, with empty-Branch ⇒ accept-any-heads, and empty-TagPattern
`*`. Manual sites short-circuit at `handler.go:295-303`.
- Tests cover both happy and sad paths:
- `internal/webhook/matcher_test.go` (push, tag, manual, empty branch,
`ParseImageRef` cases)
- `internal/webhook/handler_test.go` (unknown-secret 404, image mismatch,
no-stage-match 200/skip, site push match, site manual skip,
site branch mismatch).
- `WebhookPanel.svelte` is generic, used by both detail pages
(`projects/[id]/+page.svelte:771-776`, `sites/[id]/+page.svelte:283-288`).
Absolutises the URL with `window.location.origin` at line 30 so users can
copy a working URL.
- Old global routes removed: no `/api/settings/webhook-url` or
`/api/settings/webhook-url/regenerate` in the diff (router.go:387-388
shows the deletion).
**Gaps / broken flows**
- **WebhookPanel race / minor UX**: `handleRegenerate` (lines 47-57) hides
the confirm strip *before* the network call. If the call fails, the user
sees the toast but the regenerate button reappears with no inline state.
Acceptable, but a "retry" affordance would help.
- **Project image guardrail bypass when `project.Image` is empty.**
`handler.go:206-214`: the check is `if project.Image != "" && !imageMatches(...)`.
A project with an unset image accepts *any* image. Fine if treated as
intentional (commit message says guardrail is misconfig protection, not
security), but worth flagging.
- **No "test webhook" button anywhere.** With per-entity URLs, users have
no way to verify before pointing CI at it. The git diff doesn't add a
ping endpoint either. Follow-up.
- **Settings Integrations page has a dead-end card** for incoming
webhooks (`integrations/+page.svelte:91-94`): just text saying "go to
the project page". No link, no list of projects. Adds friction.
**API/UI consistency**
- `WebhookUrlResponse` shape matches between Go (`internal/api/webhooks.go:17-20`)
and TS (`web/src/lib/api.ts:325-328`).
- `Project.WebhookSecret` and `StaticSite.WebhookSecret` use `json:"-"`
(`internal/store/models.go:14, 253`) — secrets never leak through the
general project/site list endpoints. Good.
**i18n**
- New keys `projectDetail.webhookTitle/webhookDesc`, `sites.webhookTitle/webhookDesc`,
`webhookPanel.*`, `settingsIntegrations.*` exist in both `en.json` and
`ru.json`. Verified parallel structure.
## Feature: Settings Page Split (e08acf5)
**What it claims:** split the 547-line `settings/+page.svelte` into
focused pages; group the sidebar; each page does its own partial PUT.
**Sidebar groups** (from `+layout.svelte:32-50` and `64-72`):
- *Overview*: General, Integrations
- *Routing*: Registries, NPM/Traefik (conditional), DNS
- *System*: Maintenance, Backups
- *Security*: Authentication
**Old setting → new page mapping**
| Old setting (HEAD~5 `+page.svelte`) | New location | Status |
|---|---|---|
| Domain / Server IP / Public IP | `/settings` (Overview) | ✓ kept |
| Network / Subdomain pattern | `/settings` | ✓ kept |
| Polling interval / Base volume path | `/settings` | ✓ kept |
| Notification URL | `/settings/integrations` | ✓ moved |
| Stale threshold | `/settings/maintenance` | ✓ moved |
| Image prune threshold | `/settings/maintenance` (Danger zone card) | ✓ moved |
| Prune Images button | `/settings/maintenance` | ✓ moved into separate Danger card |
| Wildcard DNS / Cloudflare token / Zone | `/settings/dns` | ✓ moved |
| Test DNS connection | `/settings/dns` | ✓ moved |
| Proxy provider radio | `/settings` | ✓ kept (with link to /settings/{npm|traefik}) |
| **Global webhook URL** | n/a — feature removed (per-entity now) | ✓ intentional |
| Stats interval / retention (NEW) | `/settings/maintenance` | ✓ added in same commit's diff |
**Verdict:** every setting from the old page is reachable. Nothing
orphaned. Credentials page (`/settings/credentials/+page.svelte`) was
deleted and the sidebar entry was already gone at HEAD~5, so no broken
link. Tested: the sidebar's `provider`-conditional NPM / Traefik items
still work (`+layout.svelte:54-55`).
**Gaps / broken flows**
- **Each page issues an independent `getSettings()` on mount.** Navigating
through the sidebar reloads the entire 30-field settings blob each time.
Not broken, but a shared cache or layout-level fetch would halve the
payload. Follow-up.
- **Save scoping is correct** — each page builds a `Partial<Settings>` of
only its own keys (e.g. `maintenance/+page.svelte:54-59`). Confirmed by
reading all four split pages.
- **DNS page does not have an inline link to fall back from "test failed"**
to the General/proxy page. Minor.
**i18n**
- New `settings.groupMain/groupProxy/groupSystem/groupSecurity`,
`settingsDns.*`, `settingsIntegrations.*`, `settingsMaintenance.*`,
`statsSettings.*`, `settingsGeneral.globalConfigDesc/configureNpm/...`
all present in both locales.
## Fix: Naive UTC Timestamp Handling (03d58a0)
**Reach:** the fix is in `toDate()` (`web/src/lib/format/datetime.ts:34-46`)
via `normalizeIsoUtc`. **Every** consumer of `$fmt.*` therefore inherits
the fix:
```
web/src/routes/+layout.svelte
web/src/routes/+page.svelte
web/src/routes/projects/+page.svelte
web/src/routes/projects/[id]/+page.svelte
web/src/routes/projects/[id]/volumes/[volId]/browse/+page.svelte
web/src/routes/sites/+page.svelte
web/src/routes/sites/[id]/+page.svelte
web/src/routes/stacks/+page.svelte
web/src/routes/stacks/[id]/+page.svelte
web/src/routes/settings/backup/+page.svelte
web/src/lib/components/EventLogEntry.svelte
web/src/lib/components/InstanceCard.svelte
web/src/lib/components/StaleContainerCard.svelte
web/src/lib/components/TimezoneSelector.svelte
```
**Audit for stragglers:** `Grep new Date(` across the frontend returns 5
files. Two are inside `format/datetime.ts` and `stores/timezone.ts` (the
fix itself); two are in the `TimezoneSelector` and `+layout.svelte` clock
ticker (`new Date()` with no input — current time, not affected); one is
`routes/events/+page.svelte:55` building a `since` *query parameter* that
is sent to the backend, never displayed. Conclusion: **fix has 100 % reach
for displayed timestamps**.
`InstanceCard.svelte` lost its private `timeSinceCreated` parser
(commit diff lines 32-43); now uses `$fmt.relative(instance.created_at)`.
## Feature: Daemon Health Panel + Timezone Selector (90e6e59)
### Daemon health panel
**What it claims:** rich Docker /info + /version + NPM aggregates exposed
via `/api/health`; status chips moved into the brand block; new
`SystemDaemonsCard` on the dashboard; shared health store de-duplicates
the 30 s poll.
**What works**
- `GET /api/health` (`internal/api/health.go:6-39`) now returns
`database`, `docker` (+ rich info), and conditionally `proxy` (with
NPM aggregates). 8 s timeout, NPM fields fetched only when ping succeeds
so an offline proxy doesn't amplify latency.
- `health.ts:38-66` shared store with single 30 s poll; the layout
consumes it via `$health.docker/proxy/checked` (`+layout.svelte:53-56`)
and `SystemDaemonsCard.svelte:13-19` does the same. No duplicate
fetches — verified by the `inFlight` guard at `health.ts:37`.
- Both panels render the rich payload: container running/paused/stopped
stacked bar, version/api/platform/kernel/cpu/memory/storage/images,
latency, root dir. Proxy panel shows total vs managed proxy hosts (with
proportion meter), access lists, certificates.
- Brand-rail chips at `+layout.svelte:201-242` show DKR + NPM/TRF, with
pulse animation classes (`chip-live`/`chip-down`), running container
count, and proxy host count. Click on a down chip toggles `hintsExpanded`.
**Daemons checked, by name:**
- **Docker Engine** — connected via socket; "unhealthy" means the ping
failed (text from `Ping`) or the client wasn't initialised. The user
hint is `daemons.dockerHint` ("Check that the Docker daemon is running…").
- **Proxy provider** — only checked when one is configured (NPM or Traefik).
"Unhealthy" means `Ping` failed; the panel surfaces `proxy.error` and
the configured URL. If proxy_provider=`none`, panel shows
"Not configured" with a CTA link to `/settings`.
- **Database** — included in the JSON response but not surfaced on the
daemons card. The brand-rail also does not show a DB chip; if SQLite
is unreachable the chip rail goes "BOOT" forever (since
`health.ts:50-57` falls back to `prev.docker ?? {connected:false}` and
drops `database`). Minor — but a permanently-unreachable SQLite would
leave the user wondering why everything is dead with no indicator.
**Gaps / broken flows**
- **Hardcoded English fallbacks** (i18n leak):
- `web/src/routes/+layout.svelte:194` `aria-label="Close sidebar"` (was already English)
- `web/src/routes/+layout.svelte:201` `aria-label="Service status"` (new in this commit)
- `web/src/routes/+layout.svelte:208` tooltip
`` `Docker daemon · ${dockerHealth?.version ?? 'reachable'}` `` —
"Docker daemon" and "reachable" are English literals; commit added this code
- `web/src/routes/+layout.svelte:208` fallback `'Docker unreachable'`
- `web/src/routes/+layout.svelte:225` fallback `'Proxy unreachable'`
- `web/src/lib/components/SystemDaemonsCard.svelte:98` fallback
`'Docker daemon is not reachable.'`
- **Refresh button has no debounce window**, only an in-flight guard
(`SystemDaemonsCard.svelte:53-61`). Spamming it triggers serial calls.
Acceptable.
- **No DB-down indicator** anywhere visible to the user. Edge case but
worth noting.
**API/UI consistency**
- All Docker fields the frontend consumes (`web/src/lib/types.ts:258-285`)
are emitted by `dockerHealth` in `internal/api/health.go:60-100`. Cross-checked
every key (version, api_version, os, arch, kernel, storage_driver, root_dir,
ncpu, memory_total, containers, running, paused, stopped, images,
latency_ms). Matches.
- `ProxyHealth` TS shape (`types.ts:289-296`) matches Go fields:
`provider`, `connected`, `error`, `latency_ms`, `url`, `proxy_hosts`,
`proxy_hosts_managed`, `access_lists`, `certificates`. Matches.
**i18n**
- `daemons.*` namespace fully translated in both `en.json:917-953` and
`ru.json:917-953` (parallel keys verified). The hardcoded strings above
are the only gaps.
### Timezone selector
**What it claims:** user IANA timezone preference with auto-detect,
applied across all `$fmt.*` rendering, persisted in localStorage.
**Persistence**
- Stored at `localStorage.dw_timezone` via subscriber on the `timezonePreference`
writable (`web/src/lib/stores/timezone.ts:12,55-59`). Re-read on next page
load by `getInitialPreference` (lines 44-50). Validates the IANA string
before accepting it, falling back to `auto`.
- "Auto" is a sentinel; `effectiveTimezone` derives a concrete IANA zone
from `Intl.DateTimeFormat().resolvedOptions().timeZone` on every read
(lines 66-69), so changing browser zone with auto enabled re-resolves.
**Application reach**
- `effectiveTimezone` is consumed by `makeFormatters` in `datetime.ts:117-119`,
which is the single source for the entire `$fmt` reactive store. Every
`$fmt.dateTime`, `$fmt.date`, `$fmt.relative` etc. respects the user
zone. **Verified across all 15 consumers listed under the naive-UTC fix
section.**
- One subtle case: `$fmt.relative` is timezone-independent (`datetime.ts:142-156`),
which is correct — "5 m ago" doesn't depend on display zone.
**Gaps / broken flows**
- **Selector lives only on `/settings`.** Reasonable home, but no quick
"switch zone" affordance from the brand rail or top bar; you have to
navigate. Minor.
- **No backend record.** The preference is browser-local, so logging in
on a fresh device shows server time. Commit message acknowledges this
("purely client-side preference"). Acceptable.
**i18n**
- Full `timezone.*` namespace in both locales (`en.json:1117-1136`,
`ru.json:1117-1136`). Picker placeholder is translated.
## Cross-cutting Issues
### i18n leaks
Three runtime strings in user-visible places are still English-only:
1. `web/src/routes/+layout.svelte:201` `aria-label="Service status"` (new)
2. `web/src/routes/+layout.svelte:208,225` chip tooltips include
English literals (`'Docker daemon'`, `'reachable'`, `'Docker unreachable'`,
`'Proxy unreachable'`).
3. `web/src/lib/components/SystemDaemonsCard.svelte:98` fallback message
when `docker.error` is empty.
`+layout.svelte:194` (`Close sidebar`) was already English at HEAD~5; not a
regression but worth fixing while in the area.
### Naming consistency
- Backend uses `snake_case` JSON tags everywhere (`disk_total_bytes`,
`latency_ms`, `proxy_hosts_managed`). TypeScript interfaces use the same.
No drift detected.
- One naming asymmetry: `Settings.WebhookSecret` was deleted from the
Go struct — clean removal. `internal/store/static_sites.go:233`,
`projects.go:53` use new column. SQLite column `webhook_secret` on
`settings` table is left alone (per the migration comment); no row
emits it, so it's dead weight but harmless.
### Dashboard polling
`SystemResourcesCard` polls every 15 s on its own (`SystemResourcesCard.svelte:79`).
`ContainerStats` polls every 30 s. `health` store polls every 30 s.
`navCounts` store polls separately. Multiple uncoordinated timers; OK in
practice, but a future optimisation candidate.
### Confirm dialog UX
Both `WebhookPanel` and the maintenance "Prune Images" Danger zone use
inline confirms / `ConfirmDialog`. Consistent. The brand-rail "click a down
chip to expand hints" is a third confirm-ish pattern, fine but not
discoverable.
## Suggested Follow-ups (prioritized)
1. **Localise the three hardcoded English strings** in
`web/src/routes/+layout.svelte:194,201,208,225` and
`SystemDaemonsCard.svelte:98`. ~15 min, replaces 5 literals with
`$t('daemons.…')` keys (which already exist for most cases — e.g.
`daemons.docker`, `daemons.offline`).
2. **Add owner-name resolution to the "top consumers" widget**
(`SystemResourcesCard.svelte:259-264`). Currently only a 12-char ID +
`instance|site` chip; users have no way to know which container is
spiking.
3. **Detect "stats collection disabled" (`stats_interval_seconds=0`) and
tailor the empty-state message** in `SystemResourcesCard.svelte`
instead of always saying "samples every 15 s".
4. **Remove the dead `webhook_secret` column on `settings`** in a future
destructive migration window, OR officially document it as deprecated
in the schema comment.
5. **Add a "Test webhook" button to `WebhookPanel.svelte`** — POSTs a
minimal payload to the URL and surfaces the response. Replaces
guesswork when wiring CI.
6. **Add a DB-down indicator** to the brand rail (a 3rd chip "DB"). The
data is already in `/api/health`; only the UI needs the chip.
7. **Top-N samples 2-minute window** in `internal/api/stats_history.go:178`
should scale with collector interval (`max(2*interval, 2m)`) so users
on slow intervals don't see a falsely-empty widget.
8. **Settings Integrations dead-end card** — link to the Projects and
Sites lists rather than just text saying "go look there".
9. **Auto-close the WebhookPanel confirm strip on success** (it already
resets, but the strip stays visible until the user clicks Cancel).
+2 -3
View File
@@ -1,8 +1,6 @@
module github.com/alexei/tinyforge
go 1.24.0
toolchain go1.25.0
go 1.25.0
require (
github.com/coreos/go-oidc/v3 v3.11.0
@@ -43,6 +41,7 @@ require (
go.opentelemetry.io/otel/metric v1.35.0 // indirect
go.opentelemetry.io/otel/trace v1.35.0 // indirect
golang.org/x/mod v0.18.0 // indirect
golang.org/x/sync v0.20.0 // indirect
golang.org/x/sys v0.33.0 // indirect
golang.org/x/tools v0.22.0 // indirect
modernc.org/libc v1.55.3 // indirect
+2
View File
@@ -87,6 +87,8 @@ golang.org/x/oauth2 v0.25.0 h1:CY4y7XT9v0cRI9oupztF8AgiIu99L/ksR/Xp/6jrZ70=
golang.org/x/oauth2 v0.25.0/go.mod h1:XYTD2NtWslqkgxebSiOHnXEap4TF09sJSc7H1sXbhtI=
golang.org/x/sync v0.7.0 h1:YsImfSBoP9QPYL0xyKJPq0gcaJdG3rInoqxTWbfQu9M=
golang.org/x/sync v0.7.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk=
golang.org/x/sync v0.20.0 h1:e0PTpb7pjO8GAtTs2dQ6jYa5BWYlMuX047Dco/pItO4=
golang.org/x/sync v0.20.0/go.mod h1:9xrNwdLfx4jkKbNva9FpL6vEN7evnE43NNNJQ2LF3+0=
golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.33.0 h1:q3i8TbbEz+JRD9ywIRlyRAQbM0qF7hu24q3teo2hbuw=
golang.org/x/sys v0.33.0/go.mod h1:BJP2sWEmIv4KK5OTEluFJCKSidICx8ciO85XgH3Ak8k=
+138 -32
View File
@@ -5,15 +5,38 @@ import (
"encoding/json"
"errors"
"fmt"
"io"
"log/slog"
"net/http"
"regexp"
"strconv"
"strings"
"sync"
"time"
"github.com/go-chi/chi/v5"
"github.com/alexei/tinyforge/internal/store"
)
// Limits and constants for the log endpoints.
const (
defaultLogTail = 200
maxLogTail = 5000
maxJSONLogBytes = 4 << 20 // 4 MiB cap for non-streaming log responses
maxLogLineBytes = 1 << 20 // 1 MiB max line length for the bufio.Scanner
logHeartbeatPeriod = 20 * time.Second
)
// ANSI escape sequence patterns. Stripped from streamed log lines so a
// hostile container cannot inject terminal control sequences (cursor moves,
// hyperlink escapes, screen clears) into operator displays or pasted output.
var (
ansiCSIPattern = regexp.MustCompile(`\x1b\[[0-9;?]*[ -/]*[@-~]`)
ansiOSCPattern = regexp.MustCompile(`\x1b\][^\x07\x1b]*(?:\x07|\x1b\\)`)
ctlBytePattern = regexp.MustCompile(`[\x00-\x08\x0b-\x1a\x1c-\x1f\x7f]`)
)
// listProjectImages handles GET /api/projects/{id}/images.
// Returns all local Docker images matching the project's image reference.
func (s *Server) listProjectImages(w http.ResponseWriter, r *http.Request) {
@@ -50,6 +73,8 @@ func (s *Server) listProjectImages(w http.ResponseWriter, r *http.Request) {
// - tail: number of lines from end (default "200")
// - follow: "true" to stream new lines in real-time
func (s *Server) streamContainerLogs(w http.ResponseWriter, r *http.Request) {
projectID := chi.URLParam(r, "id")
stageID := chi.URLParam(r, "stage")
instanceID := chi.URLParam(r, "iid")
inst, err := s.store.GetInstanceByID(instanceID)
@@ -63,6 +88,14 @@ func (s *Server) streamContainerLogs(w http.ResponseWriter, r *http.Request) {
return
}
// Verify the instance actually belongs to the project/stage in the path.
// Without this, a user could stream logs for any instance ID by guessing
// it under the wrong project — defence-in-depth for future per-project ACLs.
if inst.ProjectID != projectID || inst.StageID != stageID {
respondNotFound(w, "instance")
return
}
if inst.ContainerID == "" {
respondError(w, http.StatusBadRequest, "instance has no container")
return
@@ -80,10 +113,7 @@ func (s *Server) streamLogsForContainer(w http.ResponseWriter, r *http.Request,
return
}
tail := r.URL.Query().Get("tail")
if tail == "" {
tail = "200"
}
tail := parseTailParam(r.URL.Query().Get("tail"))
follow := r.URL.Query().Get("follow") == "true"
// Check if client accepts SSE.
@@ -99,8 +129,10 @@ func (s *Server) streamLogsForContainer(w http.ResponseWriter, r *http.Request,
defer logReader.Close()
if !isSSE {
// JSON mode: read all lines and return as array.
scanner := bufio.NewScanner(logReader)
// JSON mode: cap the total bytes read so a chatty container with
// tail=large cannot exhaust server memory.
scanner := bufio.NewScanner(io.LimitReader(logReader, maxJSONLogBytes))
scanner.Buffer(make([]byte, 0, 64*1024), maxLogLineBytes)
var lines []string
for scanner.Scan() {
line := sanitizeDockerLogLine(scanner.Text())
@@ -116,6 +148,12 @@ func (s *Server) streamLogsForContainer(w http.ResponseWriter, r *http.Request,
}
// SSE mode: stream lines as they arrive.
release, ok := acquireSSESlot(w, s.sseGate)
if !ok {
return
}
defer release()
flusher, ok := w.(http.Flusher)
if !ok {
respondError(w, http.StatusInternalServerError, "streaming not supported")
@@ -126,7 +164,31 @@ func (s *Server) streamLogsForContainer(w http.ResponseWriter, r *http.Request,
w.Header().Set("Cache-Control", "no-cache")
w.Header().Set("Connection", "keep-alive")
// Heartbeat keeps the connection warm through proxies that close idle
// streams. Sent as an SSE comment which the EventSource API ignores.
heartbeat := time.NewTicker(logHeartbeatPeriod)
defer heartbeat.Stop()
heartbeatDone := make(chan struct{})
defer close(heartbeatDone)
var hbMu sync.Mutex
go func() {
for {
select {
case <-heartbeat.C:
hbMu.Lock()
_, _ = io.WriteString(w, ": ping\n\n")
flusher.Flush()
hbMu.Unlock()
case <-heartbeatDone:
return
case <-r.Context().Done():
return
}
}
}()
scanner := bufio.NewScanner(logReader)
scanner.Buffer(make([]byte, 0, 64*1024), maxLogLineBytes)
for scanner.Scan() {
line := sanitizeDockerLogLine(scanner.Text())
if line == "" {
@@ -134,8 +196,10 @@ func (s *Server) streamLogsForContainer(w http.ResponseWriter, r *http.Request,
}
data, _ := json.Marshal(map[string]string{"line": line})
hbMu.Lock()
fmt.Fprintf(w, "data: %s\n\n", data)
flusher.Flush()
hbMu.Unlock()
// Check if client disconnected.
select {
@@ -146,17 +210,67 @@ func (s *Server) streamLogsForContainer(w http.ResponseWriter, r *http.Request,
}
}
// parseTailParam validates and clamps the ?tail= query value. Empty/invalid
// inputs fall back to the default; values above the cap are clamped down.
// "all" is rejected — letting the caller request unbounded log history is a
// trivial DoS vector.
func parseTailParam(raw string) string {
if raw == "" {
return strconv.Itoa(defaultLogTail)
}
n, err := strconv.Atoi(raw)
if err != nil || n <= 0 {
return strconv.Itoa(defaultLogTail)
}
if n > maxLogTail {
n = maxLogTail
}
return strconv.Itoa(n)
}
// sanitizeDockerLogLine strips the Docker log stream header (8-byte prefix)
// that Docker adds to non-TTY container logs.
// that Docker adds to non-TTY container logs, and removes terminal control
// sequences so a hostile container cannot inject ANSI escapes that hijack an
// operator's terminal when log output is pasted or rendered raw.
func sanitizeDockerLogLine(line string) string {
// Docker multiplexed stream: first 8 bytes are header (stream type + size).
// If the line starts with a non-printable byte followed by 0x00 0x00 0x00, strip 8 bytes.
if len(line) > 8 && (line[0] == 1 || line[0] == 2) && line[1] == 0 && line[2] == 0 && line[3] == 0 {
return line[8:]
line = line[8:]
}
line = ansiOSCPattern.ReplaceAllString(line, "")
line = ansiCSIPattern.ReplaceAllString(line, "")
line = ctlBytePattern.ReplaceAllString(line, "")
return line
}
// buildActiveImagesSet returns the set of "image:tag" strings currently used
// by any instance, computed in a single DB pass instead of N×K queries.
// Returning an error (rather than swallowing) prevents prune logic from
// treating a transient DB failure as "nothing is active".
func buildActiveImagesSet(st *store.Store, projects []store.Project) (map[string]bool, error) {
imageByProject := make(map[string]string, len(projects))
for _, p := range projects {
imageByProject[p.ID] = p.Image
}
instances, err := st.ListAllInstances()
if err != nil {
return nil, fmt.Errorf("list instances: %w", err)
}
active := make(map[string]bool, len(instances))
for _, inst := range instances {
if inst.ImageTag == "" {
continue
}
image := imageByProject[inst.ProjectID]
if image == "" {
continue
}
active[image+":"+inst.ImageTag] = true
}
return active, nil
}
// unusedImageStats handles GET /api/docker/unused-images.
// Returns the total size of unused project images and whether the threshold is exceeded.
func (s *Server) unusedImageStats(w http.ResponseWriter, r *http.Request) {
@@ -181,18 +295,14 @@ func (s *Server) unusedImageStats(w http.ResponseWriter, r *http.Request) {
return
}
// Build set of active image refs.
activeImages := make(map[string]bool)
for _, p := range projects {
stages, _ := s.store.GetStagesByProjectID(p.ID)
for _, st := range stages {
instances, _ := s.store.GetInstancesByStageID(st.ID)
for _, inst := range instances {
if inst.ImageTag != "" {
activeImages[p.Image+":"+inst.ImageTag] = true
}
}
}
// Build set of active image refs in one DB pass instead of N×K queries.
// A flaky read here previously masqueraded as "no images are active",
// which on the prune endpoint would have deleted *running* images.
activeImages, err := buildActiveImagesSet(s.store, projects)
if err != nil {
slog.Error("unused images: build active set", "error", err)
respondError(w, http.StatusInternalServerError, "internal server error")
return
}
// Sum unused image sizes.
@@ -242,18 +352,14 @@ func (s *Server) pruneImages(w http.ResponseWriter, r *http.Request) {
return
}
// Build a set of image refs used by active instances.
activeImages := make(map[string]bool)
for _, p := range projects {
stages, _ := s.store.GetStagesByProjectID(p.ID)
for _, st := range stages {
instances, _ := s.store.GetInstancesByStageID(st.ID)
for _, inst := range instances {
if inst.ImageTag != "" {
activeImages[p.Image+":"+inst.ImageTag] = true
}
}
}
// Build a set of image refs used by active instances. Bail out on error
// — silently treating a DB blip as "no active images" would prune
// images currently in use by running containers.
activeImages, err := buildActiveImagesSet(s.store, projects)
if err != nil {
slog.Error("prune: build active set", "error", err)
respondError(w, http.StatusInternalServerError, "internal server error")
return
}
// Collect all unique image bases from projects (without tags).
+64 -8
View File
@@ -5,20 +5,57 @@ import (
"net/http"
"time"
"github.com/alexei/tinyforge/internal/auth"
"github.com/alexei/tinyforge/internal/proxy"
)
// healthProbeTimeout caps a single health probe so a stuck dependency does
// not hold the polling endpoint open. The UI polls every 30 s, so 8 s leaves
// headroom for the ping + Info + NPM list calls.
const healthProbeTimeout = 8 * time.Second
// nonAdminDockerFields enumerates the fields any authenticated user is
// allowed to see — version + connectivity + container counts. Host-detail
// fields (kernel, root_dir, hostname, OS, storage driver) are admin-only to
// avoid recon information leaks.
var nonAdminDockerFields = map[string]bool{
"connected": true,
"latency_ms": true,
"error": true,
"version": true,
"api_version": true,
"containers": true,
"running": true,
"paused": true,
"stopped": true,
"images": true,
"ncpu": true,
"memory_total": true,
}
// nonAdminProxyFields are the proxy fields safe to share with non-admins.
// Configured URLs and aggregate counts of internal lists/certs are stripped.
var nonAdminProxyFields = map[string]bool{
"provider": true,
"connected": true,
"latency_ms": true,
"error": true,
"proxy_hosts_managed": true,
}
// getHealth handles GET /api/health.
//
// Returns the connectivity state and (when connected) rich diagnostics for the
// Docker daemon and the active proxy provider. This endpoint is polled by the
// UI every 30 seconds — keep the calls cheap. The expensive NPM list calls
// are only issued when the initial ping succeeds, so a down proxy never
// amplifies latency.
// Returns the connectivity state and (when connected) diagnostics for the
// Docker daemon and the active proxy provider. Detailed host information
// (kernel, root_dir, internal NPM URL, …) is stripped for non-admin users to
// avoid leaking infrastructure details to read-only viewers.
func (s *Server) getHealth(w http.ResponseWriter, r *http.Request) {
ctx, cancel := context.WithTimeout(r.Context(), 8*time.Second)
ctx, cancel := context.WithTimeout(r.Context(), healthProbeTimeout)
defer cancel()
claims, _ := auth.ClaimsFromContext(r.Context())
isAdmin := claims.Role == "admin"
now := time.Now().UTC().Format(time.RFC3339)
result := map[string]any{
"checked_at": now,
@@ -32,16 +69,35 @@ func (s *Server) getHealth(w http.ResponseWriter, r *http.Request) {
}
// ── Docker daemon ────────────────────────────────────────────────
result["docker"] = s.dockerHealth(ctx)
docker := s.dockerHealth(ctx)
if !isAdmin {
docker = filterFields(docker, nonAdminDockerFields)
}
result["docker"] = docker
// ── Proxy provider ───────────────────────────────────────────────
if s.proxyProvider != nil {
result["proxy"] = s.proxyHealth(ctx)
proxyInfo := s.proxyHealth(ctx)
if !isAdmin {
proxyInfo = filterFields(proxyInfo, nonAdminProxyFields)
}
result["proxy"] = proxyInfo
}
respondJSON(w, http.StatusOK, result)
}
// filterFields returns a copy of m containing only the keys present in allow.
func filterFields(m map[string]any, allow map[string]bool) map[string]any {
out := make(map[string]any, len(allow))
for k, v := range m {
if allow[k] {
out[k] = v
}
}
return out
}
// dockerHealth probes the Docker daemon and, if reachable, attaches a full
// DaemonInfo snapshot. The caller does not need to error-check the Info()
// call — if it fails, the connected flag remains true (ping succeeded) but
+36 -2
View File
@@ -4,12 +4,15 @@ import (
"log/slog"
"net/http"
"runtime/debug"
"strings"
"sync"
"time"
)
// logging is an HTTP middleware that logs every request with method, path,
// status code, and duration.
// status code, and duration. Webhook URLs are redacted before being logged
// because the secret is the only authenticator — leaking it to log
// aggregators is equivalent to leaking the credential.
func logging(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
start := time.Now()
@@ -19,13 +22,26 @@ func logging(next http.Handler) http.Handler {
slog.Info("http request",
"method", r.Method,
"path", r.URL.Path,
"path", redactPath(r.URL.Path),
"status", wrapped.status,
"duration", time.Since(start).String(),
)
})
}
// redactPath strips secrets from URL paths that carry them in segments.
func redactPath(path string) string {
const projectPrefix = "/api/webhook/"
const sitePrefix = "/api/webhook/sites/"
switch {
case strings.HasPrefix(path, sitePrefix):
return sitePrefix + "***"
case strings.HasPrefix(path, projectPrefix):
return projectPrefix + "***"
}
return path
}
// recovery is an HTTP middleware that catches panics and returns a 500 response.
func recovery(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
@@ -145,6 +161,24 @@ func jsonContentType(next http.Handler) http.Handler {
})
}
// rateLimitMiddleware wraps a handler with per-IP rate limiting using the
// supplied limiter. Requests over the limit get 429.
func rateLimitMiddleware(rl *rateLimiter) func(http.Handler) http.Handler {
return func(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
ip := r.RemoteAddr
if fwd := r.Header.Get("X-Forwarded-For"); fwd != "" {
ip = fwd
}
if !rl.allow(ip) {
respondError(w, http.StatusTooManyRequests, "rate limit exceeded")
return
}
next.ServeHTTP(w, r)
})
}
}
// statusRecorder wraps http.ResponseWriter to capture the status code.
type statusRecorder struct {
http.ResponseWriter
+8 -2
View File
@@ -47,6 +47,7 @@ type Server struct {
staticSiteManager *staticsite.Manager
stackManager *stack.Manager
backupEngine *backup.Engine
sseGate *sseGate
dbPath string
shutdownFunc func() // called after restore to trigger graceful shutdown
onBackupSettingsChanged func(enabled bool, intervalHours int) // called when backup settings change
@@ -76,6 +77,7 @@ func NewServer(
eventBus: eventBus,
encKey: encKey,
localAuth: localAuth,
sseGate: newSSEGate(maxConcurrentSSEStreams),
}
// Try to initialize OIDC provider from stored settings.
@@ -187,6 +189,7 @@ func (s *Server) Router() chi.Router {
r.Use(cors)
loginLimiter := newRateLimiter()
webhookLimiter := newRateLimiter()
r.Route("/api", func(r chi.Router) {
// JSON content type and body size limit for API routes.
@@ -201,7 +204,10 @@ func (s *Server) Router() chi.Router {
r.Post("/auth/oidc/token", s.oidcExchangeToken)
// Webhook handler (uses its own secret-based auth).
r.Mount("/webhook", s.webhook.Route())
// Per-IP rate limit prevents an attacker who has guessed (or leaked)
// a secret from triggering a deploy storm, and rejects unauthenticated
// brute-force probes over the secret URL space.
r.With(rateLimitMiddleware(webhookLimiter)).Mount("/webhook", s.webhook.Route())
// Protected routes: require valid JWT.
r.Group(func(r chi.Router) {
@@ -340,7 +346,7 @@ func (s *Server) Router() chi.Router {
// System resources (read-only).
r.Get("/system/stats", s.getSystemStats)
r.Get("/system/stats/history", s.getSystemStatsHistory)
r.Get("/system/stats/top", s.listTopContainersByCPU)
r.Get("/system/stats/top", s.listTopContainers)
// Admin-only routes: require admin role.
r.Group(func(r chi.Router) {
+12
View File
@@ -55,6 +55,12 @@ func (s *Server) streamDeployLogs(w http.ResponseWriter, r *http.Request) {
}
// SSE mode.
release, ok := acquireSSESlot(w, s.sseGate)
if !ok {
return
}
defer release()
flusher, ok := w.(http.Flusher)
if !ok {
respondError(w, http.StatusInternalServerError, "streaming not supported")
@@ -140,6 +146,12 @@ func (s *Server) streamDeployLogs(w http.ResponseWriter, r *http.Request) {
// streamEvents handles GET /api/events.
// It streams instance status changes and deploy status changes via SSE.
func (s *Server) streamEvents(w http.ResponseWriter, r *http.Request) {
release, ok := acquireSSESlot(w, s.sseGate)
if !ok {
return
}
defer release()
flusher, ok := w.(http.Flusher)
if !ok {
respondError(w, http.StatusInternalServerError, "streaming not supported")
+40
View File
@@ -0,0 +1,40 @@
package api
import (
"net/http"
"sync/atomic"
)
// maxConcurrentSSEStreams caps the global number of in-flight SSE
// connections. Each stream holds a goroutine, an event-bus subscription, and
// (for log streams) a Docker daemon TCP socket; a single tab opening
// thousands of EventSources would otherwise exhaust file descriptors.
const maxConcurrentSSEStreams = 256
// sseGate is a counting gate that limits concurrent SSE streams.
type sseGate struct {
cap int64
cur atomic.Int64
}
func newSSEGate(cap int) *sseGate { return &sseGate{cap: int64(cap)} }
// enter reserves a slot and returns a release func, or nil if the gate is full.
func (g *sseGate) enter() func() {
if g.cur.Add(1) > g.cap {
g.cur.Add(-1)
return nil
}
return func() { g.cur.Add(-1) }
}
// acquireSSESlot is a small helper used by every SSE handler to honour the
// global cap. Returns false (and writes a 503) if the cap is reached.
func acquireSSESlot(w http.ResponseWriter, gate *sseGate) (release func(), ok bool) {
release = gate.enter()
if release == nil {
respondError(w, http.StatusServiceUnavailable, "stream limit reached")
return nil, false
}
return release, true
}
+94 -30
View File
@@ -4,15 +4,30 @@ import (
"errors"
"log/slog"
"net/http"
"sort"
"strconv"
"time"
"github.com/go-chi/chi/v5"
"github.com/alexei/tinyforge/internal/auth"
"github.com/alexei/tinyforge/internal/stats"
"github.com/alexei/tinyforge/internal/store"
)
// topConsumerWindow is how recent a container sample must be to count toward
// the "top consumers" list. Scaled with the collector interval (read from
// settings) so it stays meaningful even when sampling is sparse.
const topConsumerMinWindow = 2 * time.Minute
// TopContainerSample augments a stats sample with the human-readable owner
// name so the UI can show "project/stage" or the static-site name without an
// extra round-trip per row.
type TopContainerSample struct {
store.ContainerStatsSample
OwnerName string `json:"owner_name"`
}
const (
// defaultHistoryWindow is used when no ?window= param is provided or the
// value fails to parse. Matches the default retention so the "last 2h"
@@ -175,11 +190,11 @@ func (s *Server) streamStaticSiteLogs(w http.ResponseWriter, r *http.Request) {
s.streamLogsForContainer(w, r, site.ContainerID)
}
// listTopContainersByCPU handles GET /api/system/stats/top?limit=5&by=cpu.
// listTopContainers handles GET /api/system/stats/top?limit=5&by=cpu.
// Returns the top-N most recent samples across containers, sorted by CPU or
// memory. Useful for a system dashboard "top consumers" widget without
// requiring the frontend to aggregate per-container history on its own.
func (s *Server) listTopContainersByCPU(w http.ResponseWriter, r *http.Request) {
// memory. Container IDs are stripped for non-admins so a low-privilege viewer
// cannot enumerate workloads outside their scope.
func (s *Server) listTopContainers(w http.ResponseWriter, r *http.Request) {
limit := 5
if raw := r.URL.Query().Get("limit"); raw != "" {
if n, err := strconv.Atoi(raw); err == nil && n > 0 && n <= 50 {
@@ -191,9 +206,16 @@ func (s *Server) listTopContainersByCPU(w http.ResponseWriter, r *http.Request)
by = "cpu"
}
// Samples from the last 2 minutes window so "top" reflects near-current
// load, not long-dead rows.
samples, err := s.store.ListAllRecentContainerStatsSamples(sinceTimestamp(2 * time.Minute))
// Samples must be at least as recent as max(2*interval, 2 minutes) so the
// list reflects near-current load even when collection is sparse.
window := topConsumerMinWindow
if settings, err := s.store.GetSettings(); err == nil && settings.StatsIntervalSeconds > 0 {
if w := time.Duration(settings.StatsIntervalSeconds*2) * time.Second; w > window {
window = w
}
}
samples, err := s.store.ListAllRecentContainerStatsSamples(sinceTimestamp(window))
if err != nil {
slog.Error("failed to list container samples for top", "error", err)
respondError(w, http.StatusInternalServerError, "failed to list samples")
@@ -213,33 +235,75 @@ func (s *Server) listTopContainersByCPU(w http.ResponseWriter, r *http.Request)
top = append(top, sm)
}
// Partial-sort by the requested metric, descending. For small N a simple
// insertion-like approach is plenty.
sortContainerSamples(top, by)
sort.Slice(top, func(i, j int) bool {
if by == "memory" {
return top[i].MemoryUsage > top[j].MemoryUsage
}
return top[i].CPUPercent > top[j].CPUPercent
})
if len(top) > limit {
top = top[:limit]
}
respondJSON(w, http.StatusOK, top)
}
// sortContainerSamples sorts in place by CPU (or memory) descending.
// Note: ListContainerStatsSamples with empty ownerID returns no rows — the
// caller uses per-owner-type queries and merges; this helper is applied to
// the already-merged slice.
func sortContainerSamples(s []store.ContainerStatsSample, by string) {
// O(n^2) is fine — N is small (bounded by the number of containers).
for i := 1; i < len(s); i++ {
for j := i; j > 0; j-- {
var less bool
if by == "memory" {
less = s[j].MemoryUsage > s[j-1].MemoryUsage
} else {
less = s[j].CPUPercent > s[j-1].CPUPercent
}
if !less {
break
}
s[j-1], s[j] = s[j], s[j-1]
// Resolve owner names so the UI can show "project/stage" or the site name
// without a per-row round trip.
enriched := s.enrichWithOwnerNames(top)
// Scrub container IDs for non-admins. The owner name is the actionable
// identifier; the container ID is a host-level handle that reveals
// workload existence to viewers who shouldn't have it.
claims, _ := auth.ClaimsFromContext(r.Context())
if claims.Role != "admin" {
for i := range enriched {
enriched[i].ContainerID = ""
}
}
respondJSON(w, http.StatusOK, enriched)
}
// enrichWithOwnerNames attaches a human-readable owner name to each sample.
// Looks up instances and sites in batch so the cost is independent of the
// number of samples (which is at most 'limit').
func (s *Server) enrichWithOwnerNames(samples []store.ContainerStatsSample) []TopContainerSample {
out := make([]TopContainerSample, len(samples))
for i, sm := range samples {
out[i] = TopContainerSample{ContainerStatsSample: sm}
switch sm.OwnerType {
case stats.OwnerTypeInstance:
out[i].OwnerName = s.lookupInstanceName(sm.OwnerID)
case stats.OwnerTypeSite:
out[i].OwnerName = s.lookupSiteName(sm.OwnerID)
}
}
return out
}
// lookupInstanceName returns "project/stage" for an instance, or empty on
// any lookup error so a transient miss does not break the response.
func (s *Server) lookupInstanceName(instanceID string) string {
inst, err := s.store.GetInstanceByID(instanceID)
if err != nil {
return ""
}
project, perr := s.store.GetProjectByID(inst.ProjectID)
stage, serr := s.store.GetStageByID(inst.StageID)
switch {
case perr == nil && serr == nil:
return project.Name + "/" + stage.Name
case perr == nil:
return project.Name
case serr == nil:
return stage.Name
}
return ""
}
// lookupSiteName returns the site's display name or empty on lookup error.
func (s *Server) lookupSiteName(siteID string) string {
site, err := s.store.GetStaticSiteByID(siteID)
if err != nil {
return ""
}
return site.Name
}
+15 -3
View File
@@ -1,16 +1,28 @@
package api
import (
"crypto/rand"
"encoding/hex"
"errors"
"log/slog"
"net/http"
"github.com/go-chi/chi/v5"
"github.com/google/uuid"
"github.com/alexei/tinyforge/internal/store"
)
// generateWebhookSecret returns a 256-bit hex-encoded random token. Mirrors
// the helper in internal/store; kept here to avoid an import cycle and so the
// rotation handlers don't pretend to use uuid for what is really a secret.
func generateWebhookSecret() string {
b := make([]byte, 32)
if _, err := rand.Read(b); err != nil {
panic("crypto/rand failed: " + err.Error())
}
return hex.EncodeToString(b)
}
// webhookURLResponse is the common payload returned by every webhook endpoint.
// Clients never see raw secrets except at issue/rotate time via these fields;
// the URL shape is "/api/webhook/..." so callers can prepend their own origin.
@@ -58,7 +70,7 @@ func (s *Server) regenerateProjectWebhook(w http.ResponseWriter, r *http.Request
return
}
secret := uuid.New().String()
secret := generateWebhookSecret()
if err := s.store.SetProjectWebhookSecret(id, secret); err != nil {
slog.Error("regenerate project webhook: set secret", "project", id, "error", err)
respondError(w, http.StatusInternalServerError, "failed to rotate webhook secret")
@@ -107,7 +119,7 @@ func (s *Server) regenerateStaticSiteWebhook(w http.ResponseWriter, r *http.Requ
return
}
secret := uuid.New().String()
secret := generateWebhookSecret()
if err := s.store.SetStaticSiteWebhookSecret(id, secret); err != nil {
slog.Error("regenerate site webhook: set secret", "site", id, "error", err)
respondError(w, http.StatusInternalServerError, "failed to rotate webhook secret")
+46 -23
View File
@@ -3,9 +3,11 @@ package docker
import (
"context"
"fmt"
"log/slog"
"time"
"github.com/moby/moby/client"
"golang.org/x/sync/errgroup"
)
// SystemStats is a host-level snapshot combining daemon capacity
@@ -42,33 +44,54 @@ type SystemStats struct {
DiskTotalBytes int64 `json:"disk_total_bytes"`
}
// GetSystemStats returns a one-shot host-level snapshot. The Info() call
// and disk usage call are made in sequence. Disk usage failures do not
// fail the whole call — the result degrades gracefully with zero disk fields.
// GetSystemStats returns a one-shot host-level snapshot. Info and DiskUsage
// are issued in parallel because DiskUsage walks every layer/volume and is
// often the slowest call on a busy host (1-3 s); Info typically completes in
// ~10 ms. Disk usage failures do not fail the whole call — the result
// degrades gracefully with zero disk fields and a warning log.
func (c *Client) GetSystemStats(ctx context.Context) (SystemStats, error) {
info, err := c.Info(ctx)
if err != nil {
return SystemStats{}, fmt.Errorf("system stats: %w", err)
}
stats := SystemStats{Timestamp: time.Now().UTC()}
stats := SystemStats{
Timestamp: time.Now().UTC(),
NCPU: info.NCPU,
MemoryTotal: info.MemoryTotal,
Containers: info.Containers,
Running: info.Running,
Paused: info.Paused,
Stopped: info.Stopped,
Images: info.Images,
}
g, gctx := errgroup.WithContext(ctx)
du, derr := c.api.DiskUsage(ctx, client.DiskUsageOptions{
Containers: true,
Images: true,
Volumes: true,
BuildCache: true,
g.Go(func() error {
info, err := c.Info(gctx)
if err != nil {
return fmt.Errorf("system stats info: %w", err)
}
stats.NCPU = info.NCPU
stats.MemoryTotal = info.MemoryTotal
stats.Containers = info.Containers
stats.Running = info.Running
stats.Paused = info.Paused
stats.Stopped = info.Stopped
stats.Images = info.Images
return nil
})
if derr == nil {
var du *client.DiskUsageResult
g.Go(func() error {
usage, err := c.api.DiskUsage(gctx, client.DiskUsageOptions{
Containers: true,
Images: true,
Volumes: true,
BuildCache: true,
})
if err != nil {
// Disk usage is best-effort; swallow but log so the dashboard
// shows zeroed disk fields rather than failing entirely.
slog.Warn("system stats: disk usage failed", "error", err)
return nil
}
du = &usage
return nil
})
if err := g.Wait(); err != nil {
return SystemStats{}, err
}
if du != nil {
stats.DiskImagesBytes = du.Images.TotalSize
stats.DiskContainersBytes = du.Containers.TotalSize
stats.DiskVolumesBytes = du.Volumes.TotalSize
+46 -12
View File
@@ -36,9 +36,11 @@ type Collector struct {
store *store.Store
docker *docker.Client
stopOnce sync.Once
stop chan struct{}
done chan struct{}
startOnce sync.Once
stopOnce sync.Once
started bool
stop chan struct{}
done chan struct{}
}
// New creates a new stats collector. Call Start to begin sampling.
@@ -52,15 +54,24 @@ func New(s *store.Store, d *docker.Client) *Collector {
}
// Start launches the background loop. Returns immediately. The loop exits
// when Stop is called.
// when Stop is called. Safe to call multiple times — only the first call has
// an effect.
func (c *Collector) Start() {
go c.run()
c.startOnce.Do(func() {
c.started = true
go c.run()
})
}
// Stop signals the collector to exit and blocks until it has finished the
// in-flight tick.
// in-flight tick. If Start was never called, Stop returns immediately.
func (c *Collector) Stop() {
c.stopOnce.Do(func() { close(c.stop) })
c.stopOnce.Do(func() {
close(c.stop)
if !c.started {
close(c.done)
}
})
<-c.done
}
@@ -70,6 +81,15 @@ func (c *Collector) Stop() {
func (c *Collector) run() {
defer close(c.done)
// Derive a base context that's cancelled when Stop is called so in-flight
// Docker requests abort instead of waiting out their timeout.
baseCtx, cancel := context.WithCancel(context.Background())
defer cancel()
go func() {
<-c.stop
cancel()
}()
// Wait a few seconds before the first sample so the app has settled.
select {
case <-time.After(3 * time.Second):
@@ -90,7 +110,7 @@ func (c *Collector) run() {
}
}
c.tick(retention)
c.tick(baseCtx, retention)
select {
case <-time.After(time.Duration(interval) * time.Second):
@@ -126,8 +146,8 @@ func (c *Collector) readConfig() (intervalSeconds, retentionHours int) {
// persists samples, and prunes rows beyond the retention window. When
// the Docker daemon is unreachable the whole tick is skipped with a
// single debug log instead of one warning per container.
func (c *Collector) tick(retentionHours int) {
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
func (c *Collector) tick(parent context.Context, retentionHours int) {
ctx, cancel := context.WithTimeout(parent, 30*time.Second)
defer cancel()
pingCtx, pingCancel := context.WithTimeout(ctx, 2*time.Second)
@@ -224,10 +244,20 @@ func (c *Collector) sampleAll(ctx context.Context, targets []target) []store.Con
var wg sync.WaitGroup
for i, t := range targets {
// Acquire the semaphore in the parent loop so ctx cancellation
// short-circuits the queue rather than spawning goroutines that
// block on an unreachable slot.
select {
case sem <- struct{}{}:
case <-ctx.Done():
break
}
if ctx.Err() != nil {
break
}
wg.Add(1)
go func(i int, t target) {
defer wg.Done()
sem <- struct{}{}
defer func() { <-sem }()
sampleCtx, cancel := context.WithTimeout(ctx, 10*time.Second)
@@ -278,8 +308,12 @@ func (c *Collector) recordSystemSample(ctx context.Context, workloadCPU float64,
slog.Warn("stats collector: get system stats", "error", err)
return
}
ts := sysStats.Timestamp.Unix()
if ts <= 0 {
ts = time.Now().UTC().Unix()
}
sample := store.SystemStatsSample{
TS: sysStats.Timestamp.Unix(),
TS: ts,
NCPU: sysStats.NCPU,
MemoryTotal: sysStats.MemoryTotal,
WorkloadCPUPercent: workloadCPU,
+25 -2
View File
@@ -1,13 +1,34 @@
package store
import (
"crypto/rand"
"database/sql"
"encoding/hex"
"errors"
"fmt"
"github.com/google/uuid"
)
// minWebhookSecretLength is the smallest user-supplied webhook secret accepted
// at insert time. Auto-generated secrets are 64 hex chars (256 bits); a
// 32-char floor still leaves > 128 bits of brute-force resistance for hex
// alphabets and rejects obvious typos / placeholder strings.
const minWebhookSecretLength = 32
// generateWebhookSecret returns a 256-bit hex-encoded random token. We use
// crypto/rand directly rather than uuid.New() so the intent ("secret token,
// not identifier") is explicit and the entropy is unambiguous.
func generateWebhookSecret() string {
b := make([]byte, 32)
if _, err := rand.Read(b); err != nil {
// crypto/rand is documented to never fail on supported platforms;
// fall back to a UUID rather than panicking.
return uuid.New().String()
}
return hex.EncodeToString(b)
}
// projectCols is the canonical column list for projects queries.
const projectCols = `id, name, registry, image, port, healthcheck, env, volumes,
npm_access_list_id, webhook_secret, created_at, updated_at`
@@ -19,7 +40,9 @@ func (s *Store) CreateProject(p Project) (Project, error) {
p.CreatedAt = Now()
p.UpdatedAt = p.CreatedAt
if p.WebhookSecret == "" {
p.WebhookSecret = uuid.New().String()
p.WebhookSecret = generateWebhookSecret()
} else if len(p.WebhookSecret) < minWebhookSecretLength {
return Project{}, fmt.Errorf("webhook_secret must be at least %d characters", minWebhookSecretLength)
}
_, err := s.db.Exec(
@@ -163,7 +186,7 @@ func (s *Store) EnsureProjectWebhookSecret(id string) (string, error) {
if project.WebhookSecret != "" {
return project.WebhookSecret, nil
}
secret := uuid.New().String()
secret := generateWebhookSecret()
if err := s.SetProjectWebhookSecret(id, secret); err != nil {
return "", err
}
+4 -2
View File
@@ -22,7 +22,9 @@ func (s *Store) CreateStaticSite(site StaticSite) (StaticSite, error) {
site.CreatedAt = Now()
site.UpdatedAt = site.CreatedAt
if site.WebhookSecret == "" {
site.WebhookSecret = uuid.New().String()
site.WebhookSecret = generateWebhookSecret()
} else if len(site.WebhookSecret) < minWebhookSecretLength {
return StaticSite{}, fmt.Errorf("webhook_secret must be at least %d characters", minWebhookSecretLength)
}
_, err := s.db.Exec(
@@ -301,7 +303,7 @@ func (s *Store) EnsureStaticSiteWebhookSecret(id string) (string, error) {
if site.WebhookSecret != "" {
return site.WebhookSecret, nil
}
secret := uuid.New().String()
secret := generateWebhookSecret()
if err := s.SetStaticSiteWebhookSecret(id, secret); err != nil {
return "", err
}
+15 -5
View File
@@ -139,18 +139,28 @@ func (s *Store) ListSystemStatsSamples(sinceTS int64) ([]SystemStatsSample, erro
return out, rows.Err()
}
// PruneStatsSamplesBefore deletes all samples older than the given unix timestamp
// from both the container and system stats tables. Returns rows deleted across
// both tables.
// PruneStatsSamplesBefore deletes all samples older than the given unix
// timestamp from both the container and system stats tables in a single
// transaction so a crash between the two cannot leave one table pruned and
// the other not. Returns rows deleted across both tables.
func (s *Store) PruneStatsSamplesBefore(ts int64) (int64, error) {
r1, err := s.db.Exec(`DELETE FROM container_stats_samples WHERE ts < ?`, ts)
tx, err := s.db.Begin()
if err != nil {
return 0, fmt.Errorf("begin prune tx: %w", err)
}
defer tx.Rollback()
r1, err := tx.Exec(`DELETE FROM container_stats_samples WHERE ts < ?`, ts)
if err != nil {
return 0, fmt.Errorf("prune container stats samples: %w", err)
}
r2, err := s.db.Exec(`DELETE FROM system_stats_samples WHERE ts < ?`, ts)
r2, err := tx.Exec(`DELETE FROM system_stats_samples WHERE ts < ?`, ts)
if err != nil {
return 0, fmt.Errorf("prune system stats samples: %w", err)
}
if err := tx.Commit(); err != nil {
return 0, fmt.Errorf("commit prune tx: %w", err)
}
n1, _ := r1.RowsAffected()
n2, _ := r2.RowsAffected()
return n1 + n2, nil
+12 -2
View File
@@ -4,6 +4,7 @@ import (
"database/sql"
"errors"
"fmt"
"strings"
"time"
_ "modernc.org/sqlite"
@@ -214,8 +215,17 @@ func (s *Store) runMigrations() error {
}
for _, m := range migrations {
// Ignore errors from already-applied migrations (duplicate column).
_, _ = s.db.Exec(m)
if _, err := s.db.Exec(m); err != nil {
// "duplicate column" / "already exists" are expected when a
// migration has already been applied. Anything else (typo, FK
// conflict, real schema bug) must surface, otherwise the store
// silently runs against the wrong shape.
msg := err.Error()
if !strings.Contains(msg, "duplicate column") &&
!strings.Contains(msg, "already exists") {
return fmt.Errorf("apply migration %q: %w", m, err)
}
}
}
// Create indexes on foreign key columns for query performance.
+64 -7
View File
@@ -5,15 +5,27 @@ import (
"encoding/json"
"errors"
"fmt"
"io"
"log/slog"
"net/http"
"strings"
"sync"
"github.com/go-chi/chi/v5"
"github.com/alexei/tinyforge/internal/store"
)
// maxSiteConcurrentSyncs caps fan-out of background site syncs triggered by
// webhooks. Above this limit, requests are rejected with 503.
const maxSiteConcurrentSyncs = 4
// maxWebhookBodyBytes caps the request body size for webhook payloads. The
// /api routes already wrap the body with MaxBytesReader, but the webhook
// router relies on its own limit so changes to the parent middleware can't
// silently increase the cap.
const maxWebhookBodyBytes = 256 * 1024 // 256 KiB
// DeployTriggerer is called when a webhook determines a deploy should happen.
// Same interface as registry.DeployTriggerer — kept separate to avoid import cycles.
type DeployTriggerer interface {
@@ -114,12 +126,28 @@ type Handler struct {
store *store.Store
deployer DeployTriggerer
sites SiteSyncTriggerer
// Site sync coordination — webhooks fire syncs in the background; Drain
// blocks until those goroutines finish, so a graceful shutdown does not
// kill an in-flight git fetch + container rebuild.
siteSyncCtx context.Context
siteSyncCancel context.CancelFunc
siteSyncWG sync.WaitGroup
siteSyncSem chan struct{}
}
// NewHandler creates a new webhook Handler. The sites triggerer is optional
// and may be nil (site webhooks will return 404).
func NewHandler(st *store.Store, deployer DeployTriggerer, sites SiteSyncTriggerer) *Handler {
return &Handler{store: st, deployer: deployer, sites: sites}
ctx, cancel := context.WithCancel(context.Background())
return &Handler{
store: st,
deployer: deployer,
sites: sites,
siteSyncCtx: ctx,
siteSyncCancel: cancel,
siteSyncSem: make(chan struct{}, maxSiteConcurrentSyncs),
}
}
// SetSiteSyncTriggerer injects the static-site manager after construction.
@@ -130,6 +158,13 @@ func (h *Handler) SetSiteSyncTriggerer(s SiteSyncTriggerer) {
h.sites = s
}
// Drain cancels in-flight site syncs and waits for their goroutines to exit.
// Safe to call from a graceful-shutdown path.
func (h *Handler) Drain() {
h.siteSyncCancel()
h.siteSyncWG.Wait()
}
// Route returns a chi router with the webhook endpoints mounted.
//
// Routes:
@@ -183,7 +218,8 @@ func (h *Handler) handleWebhook(w http.ResponseWriter, r *http.Request) {
}
var payload Payload
if err := json.NewDecoder(r.Body).Decode(&payload); err != nil {
dec := json.NewDecoder(io.LimitReader(r.Body, maxWebhookBodyBytes))
if err := dec.Decode(&payload); err != nil {
respondWebhookError(w, http.StatusBadRequest, "invalid JSON payload")
return
}
@@ -302,10 +338,20 @@ func (h *Handler) handleSiteWebhook(w http.ResponseWriter, r *http.Request) {
return
}
// Body is optional — decode best-effort.
// Body is optional. We attempt to decode but accept an empty body (no Ref
// filter); a malformed non-empty body is treated as bad-request to avoid
// silently bypassing the branch/tag filter.
var payload SitePayload
if r.ContentLength > 0 {
_ = json.NewDecoder(r.Body).Decode(&payload)
body, err := io.ReadAll(io.LimitReader(r.Body, maxWebhookBodyBytes))
if err != nil {
respondWebhookError(w, http.StatusBadRequest, "failed to read request body")
return
}
if len(body) > 0 {
if err := json.Unmarshal(body, &payload); err != nil {
respondWebhookError(w, http.StatusBadRequest, "invalid JSON payload")
return
}
}
if payload.Ref != "" && !siteRefMatches(site, payload.Ref) {
@@ -320,9 +366,20 @@ func (h *Handler) handleSiteWebhook(w http.ResponseWriter, r *http.Request) {
return
}
// Fire and forget — sync may take a while (git fetch + container rebuild).
// Cap concurrent syncs so a runaway CI cannot fan out unbounded
// git-clone goroutines.
select {
case h.siteSyncSem <- struct{}{}:
default:
respondWebhookError(w, http.StatusServiceUnavailable, "site sync queue full")
return
}
h.siteSyncWG.Add(1)
go func(siteID, siteName string) {
if err := h.sites.Deploy(context.Background(), siteID, false); err != nil {
defer h.siteSyncWG.Done()
defer func() { <-h.siteSyncSem }()
if err := h.sites.Deploy(h.siteSyncCtx, siteID, false); err != nil {
slog.Error("webhook: site sync failed", "site", siteName, "error", err)
}
}(site.ID, site.Name)
+17 -3
View File
@@ -2,6 +2,7 @@ package webhook
import (
"fmt"
"log/slog"
"path"
"strings"
@@ -24,7 +25,8 @@ func matchStage(st *store.Store, projectID, tag string) (store.Stage, bool, erro
matched, err := path.Match(pattern, tag)
if err != nil {
// Invalid pattern skip this stage.
slog.Warn("webhook: invalid tag pattern, skipping stage",
"project", projectID, "stage", stage.Name, "pattern", pattern, "error", err)
continue
}
if matched {
@@ -36,9 +38,21 @@ func matchStage(st *store.Store, projectID, tag string) (store.Stage, bool, erro
}
// imageMatches reports whether an incoming image reference matches the
// project's stored image. The comparison is case-sensitive and exact.
// project's stored image. The registry hostname is matched case-insensitively
// (per RFC: registry hostnames are case-insensitive); the path/owner/name are
// matched exactly.
func imageMatches(projectImage, incomingImage string) bool {
return projectImage == incomingImage
if projectImage == incomingImage {
return true
}
pIdx := strings.IndexByte(projectImage, '/')
iIdx := strings.IndexByte(incomingImage, '/')
if pIdx <= 0 || iIdx <= 0 {
return false
}
pHost, pPath := projectImage[:pIdx], projectImage[pIdx:]
iHost, iPath := incomingImage[:iIdx], incomingImage[iIdx:]
return strings.EqualFold(pHost, iHost) && pPath == iPath
}
// siteRefMatches reports whether a Git ref (e.g. "refs/heads/main" or
+3 -2
View File
@@ -4,6 +4,7 @@ import type {
ContainerStatsSample,
SystemStats,
SystemStatsSample,
TopContainerSample,
Deploy,
DeployLog,
DockerHealth,
@@ -708,8 +709,8 @@ export function fetchTopContainers(
by: 'cpu' | 'memory' = 'cpu',
limit = 5,
signal?: AbortSignal
): Promise<ContainerStatsSample[]> {
return get<ContainerStatsSample[]>(`/api/system/stats/top?by=${by}&limit=${limit}`, signal);
): Promise<TopContainerSample[]> {
return get<TopContainerSample[]>(`/api/system/stats/top?by=${by}&limit=${limit}`, signal);
}
export function fetchStaticSiteStats(id: string, signal?: AbortSignal): Promise<ContainerStats> {
+9 -14
View File
@@ -7,6 +7,7 @@
import type { ContainerStats, ContainerStatsSample } from '$lib/types';
import * as api from '$lib/api';
import { t } from '$lib/i18n';
import { statsInterval } from '$lib/stores/statsInterval';
import ResourceChart from './ResourceChart.svelte';
import type { EChartsOption } from 'echarts';
@@ -74,24 +75,16 @@
};
});
function formatBytes(bytes: number): string {
if (bytes < 1024) return `${bytes} B`;
const kb = bytes / 1024;
if (kb < 1024) return `${kb.toFixed(0)} KB`;
const mb = kb / 1024;
if (mb < 1024) return `${mb.toFixed(1)} MB`;
const gb = mb / 1024;
return `${gb.toFixed(2)} GB`;
}
import { formatBytes } from '$lib/format/bytes';
const cpuColor = $derived(() => {
const cpuColor = $derived.by(() => {
if (!stats) return 'bg-gray-300';
if (stats.cpu_percent > 80) return 'bg-red-500';
if (stats.cpu_percent > 50) return 'bg-amber-500';
return 'bg-emerald-500';
});
const memColor = $derived(() => {
const memColor = $derived.by(() => {
if (!stats) return 'bg-gray-300';
if (stats.memory_percent > 80) return 'bg-red-500';
if (stats.memory_percent > 50) return 'bg-amber-500';
@@ -151,7 +144,7 @@
<span class="w-8 text-[10px] font-medium text-[var(--text-tertiary)]">{$t('stats.cpu')}</span>
<div class="relative h-1.5 flex-1 overflow-hidden rounded-full bg-[var(--surface-card-hover)]">
<div
class="absolute inset-y-0 left-0 rounded-full transition-all duration-500 {cpuColor()}"
class="absolute inset-y-0 left-0 rounded-full transition-all duration-500 {cpuColor}"
style="width: {Math.min(stats.cpu_percent, 100)}%"
></div>
</div>
@@ -164,7 +157,7 @@
<span class="w-8 text-[10px] font-medium text-[var(--text-tertiary)]">{$t('stats.mem')}</span>
<div class="relative h-1.5 flex-1 overflow-hidden rounded-full bg-[var(--surface-card-hover)]">
<div
class="absolute inset-y-0 left-0 rounded-full transition-all duration-500 {memColor()}"
class="absolute inset-y-0 left-0 rounded-full transition-all duration-500 {memColor}"
style="width: {Math.min(stats.memory_percent, 100)}%"
></div>
</div>
@@ -184,7 +177,9 @@
{#if expanded}
{#if history.length === 0}
<p class="mt-1 text-[10px] text-[var(--text-tertiary)]">
{$t('resources.noSamples', { interval: '15' })}
{$statsInterval > 0
? $t('resources.noSamples', { interval: String($statsInterval) })
: $t('resources.collectionDisabled')}
</p>
{:else}
<div class="mt-1 rounded-md border border-[var(--border-primary)] bg-[var(--surface-page)] p-2">
+2 -2
View File
@@ -32,7 +32,7 @@
: instance.subdomain ? `https://${instance.subdomain}` : ''
);
const timeSinceCreated = $derived(() => $fmt.relative(instance.created_at));
const timeSinceCreated = $derived($fmt.relative(instance.created_at));
async function handleAction(action: 'stop' | 'start' | 'restart' | 'remove') {
loading = true;
@@ -90,7 +90,7 @@
<div class="mt-1.5 flex items-center gap-3 text-xs text-[var(--text-tertiary)]">
<span class="rounded bg-[var(--surface-card-hover)] px-1.5 py-0.5 font-mono">:{instance.port}</span>
<span>{timeSinceCreated()}</span>
<span>{timeSinceCreated}</span>
</div>
</div>
+2 -2
View File
@@ -19,7 +19,7 @@
const failedCount = $derived(instances.filter((i) => i.status === 'failed').length);
const totalCount = $derived(instances.length);
const overallStatus = $derived<string>(() => {
const overallStatus = $derived.by<'failed' | 'running' | 'stopped'>(() => {
if (failedCount > 0) return 'failed';
if (runningCount > 0) return 'running';
if (stoppedCount > 0) return 'stopped';
@@ -41,7 +41,7 @@
</div>
<p class="mt-2 truncate font-mono text-xs text-[var(--text-tertiary)]">{project.image}</p>
</div>
<StatusBadge status={overallStatus()} size="sm" />
<StatusBadge status={overallStatus} size="sm" />
</div>
<!-- Instance count badges -->
@@ -36,12 +36,11 @@
totalContainers > 0 ? ((docker?.stopped ?? 0) / totalContainers) * 100 : 0
);
import { formatBytes as formatBytesShared } from '$lib/format/bytes';
function formatBytes(n: number | undefined): string {
if (!n || n <= 0) return '—';
const gb = n / 1024 ** 3;
if (gb >= 1) return `${gb.toFixed(1)} GB`;
const mb = n / 1024 ** 2;
return `${mb.toFixed(0)} MB`;
return formatBytesShared(n);
}
function formatMs(n: number | undefined): string {
@@ -95,7 +94,7 @@
</div>
{:else if !dockerConnected}
<div class="panel-error">
<code>{docker?.error ?? 'Docker daemon is not reachable.'}</code>
<code>{docker?.error ?? $t('daemons.dockerNotReachable')}</code>
<p>{$t('daemons.dockerHint')}</p>
</div>
{:else}
@@ -3,31 +3,23 @@
breakdown + top consumers. Drops into the dashboard as its own section.
-->
<script lang="ts">
import type { SystemStats, SystemStatsSample, ContainerStatsSample } from '$lib/types';
import type { SystemStats, SystemStatsSample, TopContainerSample } from '$lib/types';
import * as api from '$lib/api';
import ResourceChart from './ResourceChart.svelte';
import type { EChartsOption } from 'echarts';
import { t } from '$lib/i18n';
import { formatBytes } from '$lib/format/bytes';
import { statsInterval, ensureStatsIntervalLoaded } from '$lib/stores/statsInterval';
let current = $state<SystemStats | null>(null);
let history = $state<SystemStatsSample[]>([]);
let top = $state<ContainerStatsSample[]>([]);
let top = $state<TopContainerSample[]>([]);
let topBy = $state<'cpu' | 'memory'>('cpu');
let window = $state<'30m' | '2h' | '6h' | '24h'>('2h');
let historyWindow = $state<'30m' | '2h' | '6h' | '24h'>('2h');
let dockerDown = $state(false);
let otherError = $state('');
function formatBytes(bytes: number): string {
if (bytes < 1024) return `${bytes} B`;
const kb = bytes / 1024;
if (kb < 1024) return `${kb.toFixed(0)} KB`;
const mb = kb / 1024;
if (mb < 1024) return `${mb.toFixed(1)} MB`;
const gb = mb / 1024;
if (gb < 1024) return `${gb.toFixed(2)} GB`;
const tb = gb / 1024;
return `${tb.toFixed(2)} TB`;
}
ensureStatsIntervalLoaded();
async function load(signal?: AbortSignal) {
// Each request is handled independently so a 503 on `current` does
@@ -35,7 +27,7 @@
// which is available even when Docker is down).
const [currRes, histRes, topRes] = await Promise.allSettled([
api.fetchSystemStats(signal),
api.fetchSystemStatsHistory(window, signal),
api.fetchSystemStatsHistory(historyWindow, signal),
api.fetchTopContainers(topBy, 5, signal)
]);
@@ -75,15 +67,15 @@
}
$effect(() => {
// Read window/topBy so this effect re-runs when they change.
void window;
// Read historyWindow/topBy so this effect re-runs when they change.
void historyWindow;
void topBy;
const controller = new AbortController();
load(controller.signal);
const t = setInterval(() => load(controller.signal), 15_000);
const intervalId = setInterval(() => load(controller.signal), 15_000);
return () => {
controller.abort();
clearInterval(t);
clearInterval(intervalId);
};
});
@@ -180,7 +172,7 @@
<div class="mb-2 flex items-center justify-between gap-2">
<span class="text-xs font-medium text-[var(--text-secondary)]">{$t('resources.workloadUtilization')}</span>
<select
bind:value={window}
bind:value={historyWindow}
class="rounded border border-[var(--border-input)] bg-[var(--surface-input)] px-2 py-0.5 text-xs text-[var(--text-secondary)] focus:outline-none"
>
<option value="30m">{$t('resources.windowMinutes', { n: '30' })}</option>
@@ -191,7 +183,9 @@
</div>
{#if history.length === 0}
<p class="py-6 text-center text-xs text-[var(--text-tertiary)]">
{$t('resources.noSamples', { interval: '15' })}
{$statsInterval > 0
? $t('resources.noSamples', { interval: String($statsInterval) })
: $t('resources.collectionDisabled')}
</p>
{:else}
<ResourceChart option={chartOption} height="180px" ariaLabel={$t('resources.workloadUtilization')} />
@@ -253,8 +247,8 @@
<span class="rounded bg-[var(--surface-card-hover)] px-1.5 py-0.5 text-[10px] text-[var(--text-tertiary)]">
{s.owner_type === 'site' ? $t('resources.site') : $t('resources.instance')}
</span>
<span class="ml-2 font-mono text-[10px] text-[var(--text-tertiary)]">
{s.container_id.slice(0, 12)}
<span class="ml-2 truncate text-[var(--text-primary)]">
{s.owner_name || (s.container_id ? s.container_id.slice(0, 12) : '')}
</span>
</span>
<span class="tabular-nums text-[var(--text-primary)]">
+16
View File
@@ -0,0 +1,16 @@
/**
* Formats a byte count using the closest binary unit (B / KB / MB / GB / TB).
* Returns "0 B" for zero or negative inputs so the UI never renders "NaN".
*/
export function formatBytes(bytes: number): string {
if (!Number.isFinite(bytes) || bytes <= 0) return '0 B';
if (bytes < 1024) return `${bytes} B`;
const kb = bytes / 1024;
if (kb < 1024) return `${kb.toFixed(0)} KB`;
const mb = kb / 1024;
if (mb < 1024) return `${mb.toFixed(1)} MB`;
const gb = mb / 1024;
if (gb < 1024) return `${gb.toFixed(2)} GB`;
const tb = gb / 1024;
return `${tb.toFixed(2)} TB`;
}
+9 -1
View File
@@ -3,6 +3,9 @@
"name": "Tinyforge",
"version": "v0.1"
},
"layout": {
"serviceStatus": "Service status"
},
"health": {
"connected": "connected",
"disconnected": "disconnected",
@@ -56,6 +59,7 @@
"windowMinutes": "{n} minutes",
"windowHours": "{n} hours",
"noSamples": "No samples yet — the collector samples every {interval}s.",
"collectionDisabled": "Stats collection is disabled. Enable it in Settings to populate this chart.",
"diskImages": "Images",
"diskContainers": "Containers",
"diskVolumes": "Volumes",
@@ -949,7 +953,11 @@
"dockerHint": "Check that the Docker daemon is running and that the socket is reachable.",
"proxyHint": "Verify the proxy URL, credentials, and that the service is listening.",
"noProxyDesc": "No proxy provider is configured. Tinyforge can manage routes via Nginx Proxy Manager or Traefik.",
"configureProxy": "Configure in Settings"
"configureProxy": "Configure in Settings",
"dockerNotReachable": "Docker daemon is not reachable.",
"dockerUnreachable": "Docker unreachable",
"proxyUnreachable": "Proxy unreachable",
"reachable": "reachable"
},
"dns": {
"title": "DNS Records",
+9 -1
View File
@@ -3,6 +3,9 @@
"name": "Tinyforge",
"version": "v0.1"
},
"layout": {
"serviceStatus": "Состояние служб"
},
"health": {
"connected": "подключён",
"disconnected": "отключён",
@@ -56,6 +59,7 @@
"windowMinutes": "{n} минут",
"windowHours": "{n} часов",
"noSamples": "Пока нет данных — сбор идёт каждые {interval}с.",
"collectionDisabled": "Сбор статистики отключён. Включите его в Настройках, чтобы заполнить график.",
"diskImages": "Образы",
"diskContainers": "Контейнеры",
"diskVolumes": "Тома",
@@ -949,7 +953,11 @@
"dockerHint": "Проверьте, что Docker-демон запущен и сокет доступен.",
"proxyHint": "Проверьте URL прокси, учётные данные и доступность сервиса.",
"noProxyDesc": "Провайдер прокси не настроен. Tinyforge поддерживает Nginx Proxy Manager или Traefik.",
"configureProxy": "Настроить в параметрах"
"configureProxy": "Настроить в параметрах",
"dockerNotReachable": "Docker-демон недоступен.",
"dockerUnreachable": "Docker недоступен",
"proxyUnreachable": "Прокси недоступен",
"reachable": "доступен"
},
"dns": {
"title": "DNS-записи",
+27
View File
@@ -0,0 +1,27 @@
import { writable } from 'svelte/store';
import { getSettings } from '$lib/api';
/**
* Reactive view of the configured stats collection interval (seconds).
* Set to 0 when collection is disabled. Refreshed on mount and after the
* settings page saves. Components can subscribe with `$statsInterval` to
* render contextual messages ("samples every Ns", "collection disabled").
*/
export const statsInterval = writable<number>(15);
let loaded = false;
export async function refreshStatsInterval(): Promise<void> {
try {
const s = await getSettings();
const v = s?.stats_interval_seconds;
statsInterval.set(typeof v === 'number' ? v : 15);
loaded = true;
} catch {
// Leave the previous value if the request fails.
}
}
export function ensureStatsIntervalLoaded(): void {
if (!loaded) void refreshStatsInterval();
}
+11
View File
@@ -120,6 +120,8 @@ export interface Settings {
wildcard_dns: boolean;
dns_provider: string;
has_cloudflare_api_token: boolean;
/** Sent on PUT to update the Cloudflare API token; never returned by GET. */
cloudflare_api_token?: string;
cloudflare_zone_id: string;
image_prune_threshold_mb: number;
proxy_provider: string;
@@ -492,6 +494,15 @@ export interface ContainerStatsSample {
block_write_bytes: number;
}
/**
* A container sample augmented with the human-readable owner name returned
* by the /system/stats/top endpoint. Container ID is empty for non-admin
* viewers to avoid leaking workload identifiers across access boundaries.
*/
export interface TopContainerSample extends ContainerStatsSample {
owner_name: string;
}
/** Host-level snapshot returned by /api/system/stats. */
export interface SystemStats {
timestamp: string;
+20 -12
View File
@@ -14,6 +14,8 @@
import { t } from '$lib/i18n';
import { navCounts, startNavCountsPolling, stopNavCountsPolling, refreshNavCounts } from '$lib/stores/navCounts';
import { health, startHealthPolling, stopHealthPolling, refreshHealth } from '$lib/stores/health';
import { effectiveTimezone, formatOffsetLabel } from '$lib/stores/timezone';
import { fmt } from '$lib/format/datetime';
interface Props {
children: Snippet;
@@ -54,15 +56,17 @@
const proxyHealth = $derived($health.proxy);
const healthChecked = $derived($health.checked);
// Live UTC forge clock (refreshes every second). A small thing, but it makes
// Live forge clock (refreshes every second). A small thing, but it makes
// the sidebar feel alive and reinforces the "control room" aesthetic.
let nowUtc = $state('');
// Renders in the user's chosen timezone via the shared formatter.
let nowTick = $state(new Date());
let clockTimer: ReturnType<typeof setInterval> | null = null;
function tickClock() {
const d = new Date();
const pad = (n: number) => String(n).padStart(2, '0');
nowUtc = `${pad(d.getUTCHours())}:${pad(d.getUTCMinutes())}:${pad(d.getUTCSeconds())}`;
nowTick = new Date();
}
const clockDisplay = $derived($fmt.clock(nowTick));
const clockOffset = $derived(formatOffsetLabel($effectiveTimezone, nowTick));
const clockTitle = $derived(`${$effectiveTimezone.replace(/_/g, ' ')} · ${clockOffset}`);
// Keyboard quick-nav: "g" then a letter jumps to a section (vim-style).
// g+d → dashboard, g+p → projects, g+s → sites, g+k → stacks, g+x → deploy,
@@ -194,14 +198,16 @@
</div>
<!-- Daemon health chips (Docker + proxy provider) -->
<div class="brand-rail" aria-label="Service status">
<div class="brand-rail" aria-label={$t('layout.serviceStatus')}>
{#if healthChecked}
<button
type="button"
class="chip"
class:chip-live={dockerConnected}
class:chip-down={!dockerConnected}
title={dockerConnected ? `Docker daemon · ${dockerHealth?.version ?? 'reachable'}` : dockerHealth?.error ?? 'Docker unreachable'}
title={dockerConnected
? `${$t('daemons.docker')} · ${dockerHealth?.version ?? $t('daemons.reachable')}`
: dockerHealth?.error ?? $t('daemons.dockerUnreachable')}
onclick={() => { if (!dockerConnected) hintsExpanded = !hintsExpanded; }}
>
<span class="chip-dot" aria-hidden="true"></span>
@@ -218,7 +224,9 @@
class="chip"
class:chip-live={proxyConnected}
class:chip-down={!proxyConnected}
title={proxyConnected ? `${proxyProviderName.toUpperCase()} · ${proxyHealth.latency_ms ?? '?'} ms` : proxyHealth.error ?? 'Proxy unreachable'}
title={proxyConnected
? `${proxyProviderName.toUpperCase()} · ${proxyHealth.latency_ms ?? '?'} ms`
: proxyHealth.error ?? $t('daemons.proxyUnreachable')}
onclick={() => { if (!proxyConnected) proxyHintsExpanded = !proxyHintsExpanded; }}
>
<span class="chip-dot" aria-hidden="true"></span>
@@ -323,10 +331,10 @@
</div>
<div class="forge-footline">
<span class="forge-footline-version">{$t('app.name')} {$t('app.version')}</span>
<span class="forge-footline-clock" title="UTC">
<span class="forge-footline-clock" title={clockTitle}>
<span class="clock-dot"></span>
<span class="clock-time">{nowUtc || '--:--:--'}</span>
<span class="clock-suffix">UTC</span>
<span class="clock-time">{clockDisplay}</span>
<span class="clock-suffix">{clockOffset}</span>
</span>
</div>
<p class="forge-nav-hint" title="Press 'g' then a letter to jump between sections">
@@ -599,7 +607,7 @@
color: var(--text-primary);
}
/* ── Sidebar footline (version + live UTC clock) ───────────── */
/* ── Sidebar footline (version + live timezone-aware clock) ───────────── */
.forge-footline {
display: flex;
align-items: center;
+1 -1
View File
@@ -109,7 +109,7 @@
polling_interval: secondsToDuration(pollingInterval),
base_volume_path: baseVolumePath.trim(),
proxy_provider: proxyProvider
} as any);
});
toasts.success($t('settingsGeneral.saved'));
} catch (err) {
toasts.error(err instanceof Error ? err.message : $t('settingsGeneral.saveFailed'));
+1 -1
View File
@@ -54,7 +54,7 @@
backup_enabled: backupEnabled,
backup_interval_hours: Math.max(1, parseInt(backupIntervalHours, 10) || 24),
backup_retention_count: Math.max(1, parseInt(backupRetentionCount, 10) || 10)
} as any);
});
toasts.success($t('settingsBackup.saved'));
} catch (err) {
toasts.error(err instanceof Error ? err.message : $t('settingsBackup.saveFailed'));
+7 -7
View File
@@ -7,10 +7,11 @@
-->
<script lang="ts">
import { getSettings, updateSettings, testDnsConnection, listDnsZones } from '$lib/api';
import type { EntityPickerItem } from '$lib/types';
import type { EntityPickerItem, Settings } from '$lib/types';
import FormField from '$lib/components/FormField.svelte';
import EntityPicker from '$lib/components/EntityPicker.svelte';
import Skeleton from '$lib/components/Skeleton.svelte';
import ToggleSwitch from '$lib/components/ToggleSwitch.svelte';
import { toasts } from '$lib/stores/toast';
import { t } from '$lib/i18n';
import { IconLoader, IconX } from '$lib/components/icons';
@@ -50,13 +51,13 @@
async function handleSave() {
saving = true;
try {
const payload: Record<string, unknown> = {
const payload: Partial<Settings> = {
wildcard_dns: wildcardDns,
dns_provider: wildcardDns ? '' : dnsProvider,
cloudflare_zone_id: cloudflareZoneId
};
if (cloudflareApiToken) payload.cloudflare_api_token = cloudflareApiToken;
await updateSettings(payload as any);
await updateSettings(payload);
toasts.success($t('settingsGeneral.saved'));
cloudflareApiToken = '';
hasCloudflareApiToken = hasCloudflareApiToken || Boolean(payload.cloudflare_api_token);
@@ -144,14 +145,13 @@
<h2 class="mb-1 text-lg font-semibold text-[var(--text-primary)]">{$t('settingsDns.title')}</h2>
<p class="mb-4 text-sm text-[var(--text-secondary)]">{$t('settingsDns.description')}</p>
<label class="flex items-center gap-3 cursor-pointer">
<input type="checkbox" bind:checked={wildcardDns}
class="h-4 w-4 rounded border-[var(--border-primary)] text-[var(--color-brand-600)] focus:ring-[var(--color-brand-500)]" />
<div class="flex items-center gap-3">
<ToggleSwitch bind:checked={wildcardDns} label={$t('settingsGeneral.wildcardDns')} />
<div>
<span class="text-sm font-medium text-[var(--text-primary)]">{$t('settingsGeneral.wildcardDns')}</span>
<p class="text-xs text-[var(--text-tertiary)]">{$t('settingsGeneral.wildcardDnsHelp')}</p>
</div>
</label>
</div>
{#if !wildcardDns}
<div class="mt-4 space-y-4 rounded-lg border border-[var(--border-primary)] bg-[var(--surface-card-hover)] p-4">
@@ -42,7 +42,7 @@
if (urlErr) return;
saving = true;
try {
await updateSettings({ notification_url: notificationUrl.trim() } as any);
await updateSettings({ notification_url: notificationUrl.trim() });
toasts.success($t('settingsGeneral.saved'));
} catch (err) {
toasts.error(err instanceof Error ? err.message : $t('settingsGeneral.saveFailed'));
@@ -56,7 +56,7 @@
image_prune_threshold_mb: Math.max(0, parseInt(imagePruneThresholdMb, 10) || 0),
stats_interval_seconds: interval,
stats_retention_hours: retention
} as any);
});
toasts.success($t('settingsGeneral.saved'));
} catch (err) {
toasts.error(err instanceof Error ? err.message : $t('settingsGeneral.saveFailed'));
+2 -2
View File
@@ -145,7 +145,7 @@
}
async function saveAccessList(id: number) {
try { await updateSettings({ npm_access_list_id: id } as any); toasts.success($t('settingsCredentials.saved')); }
try { await updateSettings({ npm_access_list_id: id }); toasts.success($t('settingsCredentials.saved')); }
catch (err) { toasts.error(err instanceof Error ? err.message : $t('settingsCredentials.saveFailed')); }
}
@@ -179,7 +179,7 @@
async function handleNpmRemoteChange() {
try {
await updateSettings({ npm_remote: npmRemote } as any);
await updateSettings({ npm_remote: npmRemote });
toasts.success($t('settingsCredentials.saved'));
} catch (err) {
toasts.error(err instanceof Error ? err.message : $t('settingsCredentials.saveFailed'));
+1 -1
View File
@@ -35,7 +35,7 @@
traefik_cert_resolver: traefikCertResolver.trim(),
traefik_network: traefikNetwork.trim(),
traefik_api_url: traefikApiUrl.trim()
} as any);
});
toasts.success($t('settingsGeneral.saved'));
} catch (err) {
toasts.error(err instanceof Error ? err.message : $t('settingsGeneral.saveFailed'));