fix: harden security, fix concurrency bugs, and address review findings
Build / build (push) Successful in 11m42s

Security:
- rate limit /api/webhook routes per-IP and cap concurrent site syncs
- global SSE connection cap (256) with new sse_gate
- validate ?tail= and cap JSON log responses at 4 MiB
- strip ANSI/CSI/OSC and control bytes from streamed log lines
- redact webhook secret from request log middleware
- scrub host details from /api/health for non-admin viewers
- drop container_id from /api/system/stats/top for non-admins
- generate webhook secrets via crypto/rand; require >=32 chars on insert
- verify iid path consistency in streamContainerLogs
- LimitReader on site webhook body; reject malformed non-empty bodies

Concurrency / correctness:
- stats collector: Stop() no longer hangs without Start(), semaphore
  acquired in parent loop so ctx cancellation short-circuits the queue,
  in-flight tick cancellable via shared base context, zero-ts guard
- webhook handler: replace fire-and-forget goroutine with WaitGroup-tracked
  workers + Drain() wired into graceful shutdown
- $derived(() => ...) mis-idiom fixed in ContainerStats / InstanceCard /
  ProjectCard (returned function instead of value)
- SystemResourcesCard: rename `window` and `t` locals to avoid shadowing
  globalThis.window and the i18n `t` import

Quality / performance:
- replace O(n^2) insertion sort with sort.Slice in stats top
- runMigrations only swallows duplicate-column / already-exists errors
- PruneStatsSamplesBefore wrapped in a transaction
- collapse N+1 in unusedImageStats / pruneImages to one ListAllInstances
  pass; surface DB errors instead of silently treating them as inactive
- run Docker Info + DiskUsage in parallel via errgroup
- container log SSE emits `: ping` heartbeat every 20 s
- imageMatches case-insensitive on registry host (RFC behaviour)
- log warning on invalid stage tag pattern instead of silent skip
- reject malformed non-empty site webhook payloads

Frontend / i18n:
- shared formatBytes utility replaces three local copies
- statsInterval store drives dynamic "no samples / collection disabled"
  copy across ContainerStats and SystemResourcesCard
- top consumers row now shows owner_name (project/stage or site name)
- drop seven `as any` casts on the Settings type; add cloudflare_api_token
  write-only field
- move "Service status", "Docker daemon", "Docker unreachable",
  "Proxy unreachable", "reachable", and "Docker daemon is not reachable."
  strings into en/ru i18n bundles
This commit is contained in:
2026-05-07 00:56:14 +03:00
parent 05440a5f92
commit a4362b842d
39 changed files with 1249 additions and 213 deletions
+15 -5
View File
@@ -139,18 +139,28 @@ func (s *Store) ListSystemStatsSamples(sinceTS int64) ([]SystemStatsSample, erro
return out, rows.Err()
}
// PruneStatsSamplesBefore deletes all samples older than the given unix timestamp
// from both the container and system stats tables. Returns rows deleted across
// both tables.
// PruneStatsSamplesBefore deletes all samples older than the given unix
// timestamp from both the container and system stats tables in a single
// transaction so a crash between the two cannot leave one table pruned and
// the other not. Returns rows deleted across both tables.
func (s *Store) PruneStatsSamplesBefore(ts int64) (int64, error) {
r1, err := s.db.Exec(`DELETE FROM container_stats_samples WHERE ts < ?`, ts)
tx, err := s.db.Begin()
if err != nil {
return 0, fmt.Errorf("begin prune tx: %w", err)
}
defer tx.Rollback()
r1, err := tx.Exec(`DELETE FROM container_stats_samples WHERE ts < ?`, ts)
if err != nil {
return 0, fmt.Errorf("prune container stats samples: %w", err)
}
r2, err := s.db.Exec(`DELETE FROM system_stats_samples WHERE ts < ?`, ts)
r2, err := tx.Exec(`DELETE FROM system_stats_samples WHERE ts < ?`, ts)
if err != nil {
return 0, fmt.Errorf("prune system stats samples: %w", err)
}
if err := tx.Commit(); err != nil {
return 0, fmt.Errorf("commit prune tx: %w", err)
}
n1, _ := r1.RowsAffected()
n2, _ := r2.RowsAffected()
return n1 + n2, nil