Files
tiny-forge/internal/docker/system.go
T
alexei.dolgolyov a4362b842d
Build / build (push) Successful in 11m42s
fix: harden security, fix concurrency bugs, and address review findings
Security:
- rate limit /api/webhook routes per-IP and cap concurrent site syncs
- global SSE connection cap (256) with new sse_gate
- validate ?tail= and cap JSON log responses at 4 MiB
- strip ANSI/CSI/OSC and control bytes from streamed log lines
- redact webhook secret from request log middleware
- scrub host details from /api/health for non-admin viewers
- drop container_id from /api/system/stats/top for non-admins
- generate webhook secrets via crypto/rand; require >=32 chars on insert
- verify iid path consistency in streamContainerLogs
- LimitReader on site webhook body; reject malformed non-empty bodies

Concurrency / correctness:
- stats collector: Stop() no longer hangs without Start(), semaphore
  acquired in parent loop so ctx cancellation short-circuits the queue,
  in-flight tick cancellable via shared base context, zero-ts guard
- webhook handler: replace fire-and-forget goroutine with WaitGroup-tracked
  workers + Drain() wired into graceful shutdown
- $derived(() => ...) mis-idiom fixed in ContainerStats / InstanceCard /
  ProjectCard (returned function instead of value)
- SystemResourcesCard: rename `window` and `t` locals to avoid shadowing
  globalThis.window and the i18n `t` import

Quality / performance:
- replace O(n^2) insertion sort with sort.Slice in stats top
- runMigrations only swallows duplicate-column / already-exists errors
- PruneStatsSamplesBefore wrapped in a transaction
- collapse N+1 in unusedImageStats / pruneImages to one ListAllInstances
  pass; surface DB errors instead of silently treating them as inactive
- run Docker Info + DiskUsage in parallel via errgroup
- container log SSE emits `: ping` heartbeat every 20 s
- imageMatches case-insensitive on registry host (RFC behaviour)
- log warning on invalid stage tag pattern instead of silent skip
- reject malformed non-empty site webhook payloads

Frontend / i18n:
- shared formatBytes utility replaces three local copies
- statsInterval store drives dynamic "no samples / collection disabled"
  copy across ContainerStats and SystemResourcesCard
- top consumers row now shows owner_name (project/stage or site name)
- drop seven `as any` casts on the Settings type; add cloudflare_api_token
  write-only field
- move "Service status", "Docker daemon", "Docker unreachable",
  "Proxy unreachable", "reachable", and "Docker daemon is not reachable."
  strings into en/ru i18n bundles
2026-05-07 00:56:14 +03:00

111 lines
3.4 KiB
Go

package docker
import (
"context"
"fmt"
"log/slog"
"time"
"github.com/moby/moby/client"
"golang.org/x/sync/errgroup"
)
// SystemStats is a host-level snapshot combining daemon capacity
// (NCPU, memory total) with container counts and disk usage broken down
// by category. Workload CPU/memory utilization is aggregated from
// per-container samples by the stats collector, not here.
type SystemStats struct {
Timestamp time.Time `json:"timestamp"`
// Capacity from Docker daemon.
NCPU int `json:"ncpu"`
MemoryTotal int64 `json:"memory_total"`
// Container/image counts.
Containers int `json:"containers"`
Running int `json:"running"`
Paused int `json:"paused"`
Stopped int `json:"stopped"`
Images int `json:"images"`
// Disk usage by category (bytes).
DiskImagesBytes int64 `json:"disk_images_bytes"`
DiskContainersBytes int64 `json:"disk_containers_bytes"`
DiskVolumesBytes int64 `json:"disk_volumes_bytes"`
DiskBuildCacheBytes int64 `json:"disk_build_cache_bytes"`
// Reclaimable disk space by category (bytes).
DiskImagesReclaimable int64 `json:"disk_images_reclaimable"`
DiskContainersReclaimable int64 `json:"disk_containers_reclaimable"`
DiskVolumesReclaimable int64 `json:"disk_volumes_reclaimable"`
DiskBuildCacheReclaimable int64 `json:"disk_build_cache_reclaimable"`
// DiskTotalBytes is the sum of the category totals.
DiskTotalBytes int64 `json:"disk_total_bytes"`
}
// GetSystemStats returns a one-shot host-level snapshot. Info and DiskUsage
// are issued in parallel because DiskUsage walks every layer/volume and is
// often the slowest call on a busy host (1-3 s); Info typically completes in
// ~10 ms. Disk usage failures do not fail the whole call — the result
// degrades gracefully with zero disk fields and a warning log.
func (c *Client) GetSystemStats(ctx context.Context) (SystemStats, error) {
stats := SystemStats{Timestamp: time.Now().UTC()}
g, gctx := errgroup.WithContext(ctx)
g.Go(func() error {
info, err := c.Info(gctx)
if err != nil {
return fmt.Errorf("system stats info: %w", err)
}
stats.NCPU = info.NCPU
stats.MemoryTotal = info.MemoryTotal
stats.Containers = info.Containers
stats.Running = info.Running
stats.Paused = info.Paused
stats.Stopped = info.Stopped
stats.Images = info.Images
return nil
})
var du *client.DiskUsageResult
g.Go(func() error {
usage, err := c.api.DiskUsage(gctx, client.DiskUsageOptions{
Containers: true,
Images: true,
Volumes: true,
BuildCache: true,
})
if err != nil {
// Disk usage is best-effort; swallow but log so the dashboard
// shows zeroed disk fields rather than failing entirely.
slog.Warn("system stats: disk usage failed", "error", err)
return nil
}
du = &usage
return nil
})
if err := g.Wait(); err != nil {
return SystemStats{}, err
}
if du != nil {
stats.DiskImagesBytes = du.Images.TotalSize
stats.DiskContainersBytes = du.Containers.TotalSize
stats.DiskVolumesBytes = du.Volumes.TotalSize
stats.DiskBuildCacheBytes = du.BuildCache.TotalSize
stats.DiskImagesReclaimable = du.Images.Reclaimable
stats.DiskContainersReclaimable = du.Containers.Reclaimable
stats.DiskVolumesReclaimable = du.Volumes.Reclaimable
stats.DiskBuildCacheReclaimable = du.BuildCache.Reclaimable
stats.DiskTotalBytes = stats.DiskImagesBytes +
stats.DiskContainersBytes +
stats.DiskVolumesBytes +
stats.DiskBuildCacheBytes
}
return stats, nil
}