Commit Graph

8 Commits

Author SHA1 Message Date
alexei.dolgolyov 8d6a527a2b refactor(workload): plugin architecture wave + apps UI + volume scopes
Completes the workload-first refactor's plugin layer:

- internal/workload/plugin/ — Source/Trigger plugin contract,
  registry, types (Workload, DeploymentIntent, InboundEvent,
  PublicFace). Self-registering init() pattern + blank-import
  in cmd/server/main.go.
- Source plugins: image (blue-green with multi-face proxy routing),
  compose, static. Trigger plugins: registry, git, manual.
- internal/deployer/dispatch.go — DispatchPlugin/Teardown/Reconcile
  seam routing the legacy deployer through plugins.
- internal/api/workload_*.go — REST surface: workloads, env,
  volumes, chain (parent/children), promote-from. hooks.go
  serves /api/hooks/kinds/{kind}/schema for the wizard.
- internal/store: workload_env (encrypt-at-rest secrets) and
  workload_volumes tables, keyed on workload_id.
- cmd/server/static_backend.go — phantom-row adapter delegating
  the static source plugin to the legacy staticsite.Manager
  (deleted at hard cutover once the static inline port lands).
- web/src/routes/apps/ — /apps list + /apps/new wizard +
  /apps/[id] detail with kind-aware compose / image / static
  forms (Advanced JSON toggle), env panel, volumes panel,
  webhook panel, chain panel, manual deploy.

Volume scope generalization (v2 resolver):

- internal/volume.ResolveWorkloadPath (workload-keyed, sits
  next to legacy ResolvePath). Honors all VolumeScope values:
  absolute, ephemeral, instance, stage, project, project_named,
  named. internal/workload/plugin/source/image/image.go
  computeMounts wires settings + imageTag through. Coverage in
  internal/volume/resolver_test.go (portable Linux/Windows via
  t.TempDir).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 22:17:41 +03:00
alexei.dolgolyov a4362b842d fix: harden security, fix concurrency bugs, and address review findings
Build / build (push) Successful in 11m42s
Security:
- rate limit /api/webhook routes per-IP and cap concurrent site syncs
- global SSE connection cap (256) with new sse_gate
- validate ?tail= and cap JSON log responses at 4 MiB
- strip ANSI/CSI/OSC and control bytes from streamed log lines
- redact webhook secret from request log middleware
- scrub host details from /api/health for non-admin viewers
- drop container_id from /api/system/stats/top for non-admins
- generate webhook secrets via crypto/rand; require >=32 chars on insert
- verify iid path consistency in streamContainerLogs
- LimitReader on site webhook body; reject malformed non-empty bodies

Concurrency / correctness:
- stats collector: Stop() no longer hangs without Start(), semaphore
  acquired in parent loop so ctx cancellation short-circuits the queue,
  in-flight tick cancellable via shared base context, zero-ts guard
- webhook handler: replace fire-and-forget goroutine with WaitGroup-tracked
  workers + Drain() wired into graceful shutdown
- $derived(() => ...) mis-idiom fixed in ContainerStats / InstanceCard /
  ProjectCard (returned function instead of value)
- SystemResourcesCard: rename `window` and `t` locals to avoid shadowing
  globalThis.window and the i18n `t` import

Quality / performance:
- replace O(n^2) insertion sort with sort.Slice in stats top
- runMigrations only swallows duplicate-column / already-exists errors
- PruneStatsSamplesBefore wrapped in a transaction
- collapse N+1 in unusedImageStats / pruneImages to one ListAllInstances
  pass; surface DB errors instead of silently treating them as inactive
- run Docker Info + DiskUsage in parallel via errgroup
- container log SSE emits `: ping` heartbeat every 20 s
- imageMatches case-insensitive on registry host (RFC behaviour)
- log warning on invalid stage tag pattern instead of silent skip
- reject malformed non-empty site webhook payloads

Frontend / i18n:
- shared formatBytes utility replaces three local copies
- statsInterval store drives dynamic "no samples / collection disabled"
  copy across ContainerStats and SystemResourcesCard
- top consumers row now shows owner_name (project/stage or site name)
- drop seven `as any` casts on the Settings type; add cloudflare_api_token
  write-only field
- move "Service status", "Docker daemon", "Docker unreachable",
  "Proxy unreachable", "reachable", and "Docker daemon is not reachable."
  strings into en/ru i18n bundles
2026-05-07 00:56:14 +03:00
alexei.dolgolyov 6667abf03c fix: quick deploy duplicate detection, logout UX, backup toggle, CSP, SSE guard, and migration
- Detect existing projects with same image on quick deploy; show conflict dialog with options
- Move logout button to sidebar header as icon-only
- Replace backup checkbox with ToggleSwitch component
- Allow unsafe-inline in CSP script-src for SvelteKit hydration
- Guard SSE connection behind isAuthenticated() check
- Add notification_url ALTER TABLE migration for existing databases
- Restore RegisterPersistentLogger on event bus
2026-04-04 14:40:59 +03:00
alexei.dolgolyov ff59d9f799 fix: security hardening for middleware, crypto, and backup handlers
- Remove CORS origin reflection (SEC-C1 CRITICAL)
- Add Content-Security-Policy header (SEC-H2)
- Fix rate limiter memory leak with periodic stale IP cleanup (SEC-H5)
- Enforce minimum 32-char ENCRYPTION_KEY (SEC-H4)
- Validate backup type against allowlist (SEC-M6)
- Fix backup download path traversal with path containment check (SEC-C2 CRITICAL)
2026-04-04 12:40:37 +03:00
alexei.dolgolyov be6ad15efc fix: comprehensive security, performance, and quality hardening
Security: apply AdminOnly middleware to mutating routes, require
ENCRYPTION_KEY and ADMIN_PASSWORD (no insecure defaults), restrict
CORS to same-origin, fix OIDC token delivery via cookie instead of
URL query param, add rate limiting on login, add MaxBytesReader,
validate volume paths against traversal, add security headers,
validate user roles, add Secure flag to OIDC cookie.

Performance: set SQLite MaxOpenConns(1) to prevent SQLITE_BUSY,
add FK indexes on 8 columns, track notifier goroutines with
WaitGroup for graceful shutdown, use GetRegistryByName instead of
GetAllRegistries in deployer, pass basePath param to avoid redundant
settings query, return empty slices from store to remove reflection.

Quality: refactor TriggerDeploy to delegate to runDeploy (~100 lines
removed), consolidate duplicated utilities (extractPort, boolToInt,
now, isTerminalStatus) into shared exports, migrate all log.Printf
to slog structured logging, use consistent webhook response envelope,
remove dead code (parseEnvVars, duplicate auth types).

UX: clean up NPM proxy on instance removal via API, add README with
quickstart guide, add .env.example, require ADMIN_PASSWORD in
docker-compose, document staging-net prerequisite.
2026-03-29 12:49:24 +03:00
alexei.dolgolyov 28abad27c6 fix: SSE flusher support, null-safe API responses
- Add Flush() to statusRecorder so SSE works through logging middleware
- Return empty array instead of null for empty project lists
- Fixes 500 on /api/events and null.length crash on dashboard
2026-03-28 13:48:20 +03:00
alexei.dolgolyov 32de5b26a8 feat(docker-watcher): phase 12 - hardening
Blue-green zero-downtime deploys, promote flow validation.
Dual auth: local (bcrypt + JWT) and OAuth2/OIDC (any provider).
Auth middleware, login page, auth settings UI.
Structured logging (slog JSON), config export to YAML.
Graceful shutdown with deploy draining.
Multi-stage Dockerfile and production docker-compose.yml.
Swap phase order: Volumes & Env before UI Polish.
2026-03-27 23:20:56 +03:00
alexei.dolgolyov bbcc4f55f0 feat(docker-watcher): phase 7 - deployer & health checker
Deploy orchestrator with full pipeline: pull → create container →
start → network → NPM proxy → health check. Rollback on failure,
multi-instance support, max_instances enforcement, webhook notifications.
Fix NPM auth in rollback and error logging in removeInstance.
2026-03-27 22:02:09 +03:00