Commit Graph

52 Commits

Author SHA1 Message Date
alexei.dolgolyov a830378c5b fix: replace access list ID field with EntityPicker, add deploy toggle, improve UX
- Replace raw NPM access list ID input with EntityPicker on project edit form
- Resolve access list name from NPM API when editing project
- Add "Deploy immediately" toggle to Quick Deploy (off by default)
- Fix stage form layout: all fields on same row with toggles
- Fix empty port default on project creation (placeholder instead of pre-filled)
- Improve inspect error message when Docker is unavailable
- Trigger proxy resync when NPM access list changes
- Resolve access list name on NPM settings page load
2026-04-05 13:07:09 +03:00
alexei.dolgolyov 7550fe9e32 feat: CPU/RAM limits per stage, NPM access list (global + per-project)
Resource limits:
- Add cpu_limit (cores) and memory_limit (MB) fields to Stage model
- Pass limits to Docker container via NanoCPUs and Memory in HostConfig
- Add CPU/Memory fields to stage creation form in project detail
- 0 = unlimited (default)

NPM access list:
- Add npm_access_list_id to Settings (global default) and Project (per-project override)
- Per-project overrides global when > 0
- NPM provider passes access_list_id when configuring proxy hosts
- Add GET /api/settings/npm-access-lists endpoint to list NPM access lists
- Add access list picker on NPM settings page (global)
- Add access list ID field on project edit form (per-project)
- DB migrations for all new columns
2026-04-05 12:44:26 +03:00
alexei.dolgolyov c6d20ca26e feat: NPM access list support (global default + per-project override) 2026-04-05 12:38:20 +03:00
alexei.dolgolyov 4ff8daafc4 fix: reconcile instance status with Docker on list, add IsContainerRunning 2026-04-05 02:42:31 +03:00
alexei.dolgolyov 12d78bec99 fix: instance link includes domain, project delete cleans up containers and proxies
- InstanceCard appends settings domain to subdomain link (stage-dev-app.example.com instead of just stage-dev-app)
- Project deletion now removes Docker containers and proxy routes before deleting DB records
- Pass domain from settings to InstanceCard via project detail page
2026-04-05 02:38:32 +03:00
alexei.dolgolyov b54481aff8 fix: NPM remote toggle auto-save, proxy resync on remote change, webhook URL as path
- Remote NPM toggle now auto-saves immediately when toggled
- Toggling npm_remote triggers proxy resync (re-creates routes with server_ip or container name)
- Webhook URL shows just the path (/api/webhook/{secret}) instead of full URL with wrong domain
- Fix tag dropdown: resolve registry ID from name before fetching tags
- Remove unused fmt import
2026-04-05 02:27:41 +03:00
alexei.dolgolyov 195ef3e7e5 feat: NPM remote mode for cross-machine deployments
- Add npm_remote setting: when enabled, proxy forwards to server_ip with
  published host ports instead of Docker container names
- Deployer looks up assigned host port via InspectContainerPort in remote mode
- Auto-remove stale containers with same name before creating new ones
- Add Remote NPM toggle with warning on NPM settings page
- DB migration + schema for npm_remote column
2026-04-05 02:18:06 +03:00
alexei.dolgolyov c26c41e6a1 feat: enable proxy toggle on quick deploy, event log clearing, and UX fixes
- Add enable_proxy toggle to Quick Deploy form (defaults to on)
- Add DELETE /api/events/log/{id} and DELETE /api/events/log endpoints
- Add Clear All button with confirmation on Events page
- Rename "NPM Proxy" to "Enable Proxy" on stage form (provider-agnostic)
- Fix polling interval validation (min 60s) and number input trim errors
- Fix domain field no longer required in settings
2026-04-05 01:50:19 +03:00
alexei.dolgolyov 61febefca9 feat: automatic proxy re-sync on settings change
When domain, SSL certificate, or proxy provider changes in settings:
- Delete old proxy routes from the previous provider
- Switch to None: clear all route IDs on instances
- Switch to NPM/Traefik: re-create routes with new settings
- Domain change: re-configure all routes with new FQDN
- SSL cert change: re-apply to all existing routes
- Provider created dynamically at runtime via createProxyProvider()
- Deployer and API server updated via SetProxyProvider callback
2026-04-05 01:39:01 +03:00
alexei.dolgolyov 187e302f4a feat: proxy routes page, OIDC login fix, NPM test connection, webhook URL fix, and UX improvements
- Add /proxies page showing deploy-managed proxy routes with project/stage links, search, and status
- Add GET /api/proxies endpoint joining instances with project/stage names
- Add POST /api/settings/npm/test endpoint for NPM connection validation
- Add GET /api/auth/mode public endpoint for auth mode detection
- Add NPM Test Connection button with validation on save
- Fix OIDC SSO button only shown when auth_mode is oidc
- Fix webhook URL showing empty when domain not set (fallback to request host)
- Fix quick deploy double-tag (image:latest:latest) by splitting tag from image URL
- Fix trim() errors on number inputs in deploy and settings forms
- Fix NPM client auto-append /api to base URL
- Sanitize NPM test error messages (no raw HTML)
- Remove healthcheck field from Quick Deploy form
- Fix env vars placeholder newline
- Make domain field optional in settings
- Set polling interval minimum to 60s
- Add Proxies and Events to sidebar navigation
- Fix SSL cert name flash on NPM settings page
- Fix empty state icon on proxies page
2026-04-05 01:27:54 +03:00
alexei.dolgolyov 308547a3d7 refactor: remove standalone proxies, add Traefik provider with Docker labels
Standalone proxy removal:
- Delete store, API handlers, proxy manager, health monitor, validator, hints
- Delete frontend pages (proxies list, create, edit) and components (ProxyCard, ProxyForm, ProxyFilter, ProxyGroup, ValidationChecklist)
- Remove proxy routes from router, nav items, dashboard references
- Clean up SystemHealthCard to remove proxy section

Traefik provider:
- Add TraefikProvider implementing proxy.Provider via Docker labels
- ContainerLabels() returns traefik.enable, router rule, entrypoints, service port, TLS cert resolver, docker network
- ConfigureRoute() returns router name (labels handle routing at container creation)
- DeleteRoute() is no-op (container removal auto-deregisters)
- Ping() checks Traefik API health (optional)
- Wire ContainerLabels into deployer (executeDeploy + blueGreenDeploy)
- Add Traefik settings: entrypoint, cert_resolver, network, api_url
- Add traefik option to proxy provider selector in settings UI
- Show conditional Traefik config fields
- Add i18n keys (EN + RU)
2026-04-04 22:54:31 +03:00
alexei.dolgolyov 7d6719da12 refactor: extract ProxyProvider interface with None and NPM implementations
Replace direct npm.Client usage throughout the codebase with the
proxy.Provider interface, enabling pluggable proxy backends. The
deployer, API layer, and proxy manager now use provider-agnostic
route management (ConfigureRoute/DeleteRoute) instead of NPM-specific
API calls. Adds ProxyRouteID (string) to Instance model and
ProxyProvider setting to Settings, with SQLite migrations for
backward compatibility.
2026-04-04 19:39:08 +03:00
alexei.dolgolyov 6667abf03c fix: quick deploy duplicate detection, logout UX, backup toggle, CSP, SSE guard, and migration
- Detect existing projects with same image on quick deploy; show conflict dialog with options
- Move logout button to sidebar header as icon-only
- Replace backup checkbox with ToggleSwitch component
- Allow unsafe-inline in CSP script-src for SvelteKit hydration
- Guard SSE connection behind isAuthenticated() check
- Add notification_url ALTER TABLE migration for existing databases
- Restore RegisterPersistentLogger on event bus
2026-04-04 14:40:59 +03:00
alexei.dolgolyov 91b49cb5ed feat: expanded health checks, deploy filtering, per-project notifications, error sanitization, and audit trail
- Expand health endpoint to check DB, Docker, and NPM connectivity (FUNC-M4)
- Add project_id, stage_id, offset query params to deploys endpoint (FUNC-M5, FUNC-M6)
- Add notification_url field to Stage model for per-project overrides (FUNC-M2)
- Add NPM Ping method for health checking
- Sanitize all internal error messages in API handlers (SEC-M4)
- Add audit trail events for admin actions (FUNC-M3)
- Add EventLog event type to event bus
2026-04-04 13:10:10 +03:00
alexei.dolgolyov fa62e9c20f Merge branch 'worktree-agent-a34a6a8b'
# Conflicts:
#	internal/api/router.go
2026-04-04 12:45:34 +03:00
alexei.dolgolyov 98ee2bcd9a feat: auth system hardening with token revocation, password management, and error sanitization
- Add token revocation with in-memory blacklist and periodic cleanup (SEC-M1)
- Add POST /api/auth/logout endpoint
- Fix OIDC auth_token cookie to HttpOnly with exchange endpoint (SEC-H3)
- Add password complexity validation (min 8 chars) (SEC-M2)
- Prevent admin self-deletion (SEC-M3)
- Add PUT /api/auth/users/{uid} for role/email updates (FUNC-M1)
- Add PUT /api/auth/users/{uid}/password for password changes (FUNC-H1)
- Sanitize error messages in auth handlers (SEC-M4)
2026-04-04 12:43:45 +03:00
alexei.dolgolyov ff59d9f799 fix: security hardening for middleware, crypto, and backup handlers
- Remove CORS origin reflection (SEC-C1 CRITICAL)
- Add Content-Security-Policy header (SEC-H2)
- Fix rate limiter memory leak with periodic stale IP cleanup (SEC-H5)
- Enforce minimum 32-char ENCRYPTION_KEY (SEC-H4)
- Validate backup type against allowlist (SEC-M6)
- Fix backup download path traversal with path containment check (SEC-C2 CRITICAL)
2026-04-04 12:40:37 +03:00
alexei.dolgolyov 3c9727162a fix: address review findings for backup management
- HIGH: Add sync.Mutex to backup Engine to prevent concurrent
  backup/restore operations
- HIGH: Restore uses io.Copy instead of ReadFile to avoid OOM on
  large databases
- HIGH: Send HTTP response before closing DB during restore, then
  perform destructive operations in a goroutine
- HIGH: Create pre-restore safety backup before overwriting database
- HIGH: Autobackup cron reschedules dynamically when settings change
  via callback pattern (same as DNS provider changes)
2026-04-02 15:39:54 +03:00
alexei.dolgolyov a9c7775bb7 feat: configuration backup management with manual and auto backup
Add backup/restore functionality for the SQLite database. Users can
trigger manual backups, configure automatic backups on an interval
with retention policies, list/download/delete backups, and restore
from any backup.

- Backup engine using VACUUM INTO (safe with WAL mode)
- Backup metadata tracked in DB, files stored in DATA_DIR/backups/
- Settings: backup_enabled, backup_interval_hours, backup_retention_count
- API: POST/GET/DELETE /api/backups, download, restore endpoints
- Autobackup via cron scheduler with configurable interval
- Retention: prune on startup, after each backup (manual and auto)
- Orphan cleanup: removes backup files without metadata on startup
- Restore: replaces DB and triggers graceful server shutdown
- Settings UI: /settings/backup with toggle, interval, retention config
- Backup list with download, delete, restore actions
- i18n: English and Russian translations
2026-04-02 15:32:15 +03:00
alexei.dolgolyov 6bb4781158 fix: address final review findings
- CRITICAL: Add binaries and .svelte-kit/ to .gitignore, remove from tracking
- HIGH: Return error from computeExpectedFQDNs to prevent mass DNS
  deletion on transient DB errors during sync
- MEDIUM: Log error in rollback DNS cleanup when GetSettings fails
2026-04-02 15:04:53 +03:00
alexei.dolgolyov 670948f113 fix: address code review findings for DNS management
- CRITICAL: Change DNS zones endpoint from GET to POST to avoid
  leaking API token in URL query parameters
- HIGH: Add sync.RWMutex to protect dnsProvider field in Server,
  Deployer, and proxy Manager against concurrent read/write races
- HIGH: Capture old DNS provider reference synchronously before
  launching background cleanup goroutine
- HIGH: Use getDNS()/getDNSProviderLocked() accessors instead of
  direct field reads in all DNS operations
2026-04-02 14:54:15 +03:00
alexei.dolgolyov c730cfaa45 feat: Cloudflare DNS management with automatic record sync
Add flexible DNS management to Docker Watcher. By default, wildcard DNS
is assumed (current behavior). When disabled, users can configure a
Cloudflare DNS provider with API token and zone selection. DNS A records
are automatically created/updated/deleted in sync with proxy consumers
(deployed instances and standalone proxies).

- Settings: wildcard_dns toggle, dns_provider, cloudflare credentials
- Cloudflare client: Provider interface with EnsureRecord/DeleteRecord/ListRecords
- DNS lifecycle hooks in deployer and proxy manager (best-effort)
- Settings UI: DNS config section with provider picker, zone selector, test button
- DNS Records page at /dns with filtering, sync status, reconciliation
- Records visible in both wildcard and managed modes
- Cleanup on provider change: removes old records when switching modes
2026-04-02 14:49:21 +03:00
alexei.dolgolyov 582e7e39e3 feat(volume-browser): absolute scope with allowlist security
- Add 'absolute' volume scope for direct host paths (NFS, external mounts)
- Allowlist in settings: allowed_volume_paths (JSON array of prefixes)
- Validation: absolute source must be under an allowed prefix
- Empty allowlist = absolute scope disabled entirely
- Settings API exposes/validates allowed_volume_paths
- Frontend type updated with absolute scope
2026-04-01 23:31:27 +03:00
alexei.dolgolyov 0491849f0f fix(volume-browser): address security review findings
Critical fixes:
- IDOR: verify volume belongs to project before resolving path
- Upload: override global 1MB body limit for upload endpoint (100MB)

High-priority fixes:
- Symlink escape: use filepath.EvalSymlinks in safePath validation
- Remove host filesystem path from browse API response
- Sanitize Content-Disposition filenames, force application/octet-stream
- Strip directory components from upload filenames
2026-04-01 23:17:35 +03:00
alexei.dolgolyov 4a0f223d61 feat(volume-browser): phase 1 - path resolver & file system API
- Extract volume path resolution into shared internal/volume/resolver.go
- File browser operations: ListDir, OpenFile, WriteZip, SaveFile
- Strict path traversal protection (double-validated)
- API endpoints: browse, download (file or zip), upload (multipart)
- Refactor deployer to use shared resolver
2026-04-01 22:59:02 +03:00
alexei.dolgolyov bb2729ad12 fix: address volume scopes review findings
- CRITICAL: validate volume Name against path traversal (safe regex)
- HIGH: log data migration errors instead of silently ignoring
- HIGH: reject empty source when switching from ephemeral scope
2026-03-31 23:31:27 +03:00
alexei.dolgolyov 8fb959f81f feat: volume scopes redesign — replace shared/isolated with 6 scopes
Replace confusing shared/isolated volume modes with explicit scopes:
- instance: per-deploy isolated directory
- stage: shared within a stage across deploys
- project: shared across all stages
- project_named: named group within a project
- named: global named volume across projects
- ephemeral: tmpfs in-memory mount

Includes schema migration (shared→project, isolated→instance),
backward-compatible deployer resolution, scope metadata API endpoint,
and redesigned volume editor UI with scope guide cards and hints.
2026-03-31 23:22:43 +03:00
alexei.dolgolyov 1a8dfefa77 feat: Docker diagnostic hints on disconnection
- Classify Docker errors into categories (socket_not_found, connection_refused,
  permission_denied, timeout, tls_error) with platform-specific hints
- Enrich GET /api/health with structured diagnostics (category, hints, platform)
- Expandable hints panel in sidebar when Docker is disconnected
- "Retry now" button for immediate re-check
- Collapsible raw error details for advanced users
2026-03-30 14:05:00 +03:00
alexei.dolgolyov 37cfa090ac feat: global Docker health indicator and graceful degradation
- GET /api/health endpoint returning Docker connectivity status
- Sidebar shows Docker connection dot (green=connected, red=disconnected)
- Stale scanner returns store-only results when Docker is unavailable
- Polls health every 30s
2026-03-30 13:43:33 +03:00
alexei.dolgolyov 71aeb615b3 fix(observability): router conflict, logout button, missing i18n
- Fix chi duplicate Route() panic by consolidating read/write routes
  into single Route blocks with nested admin Group
- Add logout button to sidebar with token cleanup
- Add missing settingsAuth.password i18n key
2026-03-30 12:26:22 +03:00
alexei.dolgolyov e0a648fb0c fix(observability): address final review findings
Critical fixes:
- Fix StaleContainer frontend type to match nested backend response shape
- Guard ContainerID[:12] slice against empty/short IDs in ListAllProxies

High-priority fixes:
- Support comma-separated severity/source in event log filtering (IN clause)
- Eliminate N+1 queries in ListAllProxies and FindStaleInstances (pre-load maps)
- Stop leaking internal error messages to API clients (use slog + generic msgs)
2026-03-30 11:47:16 +03:00
alexei.dolgolyov 7c57c740b4 feat(observability): phase 8 - container stats, notifications & dashboard
Add container monitoring and notification system:
- Docker Stats API: real-time CPU/memory for running containers
- Webhook notifications for errors (deploy failures, stale, proxy unhealthy)
- Event log auto-pruning (daily, 30-day retention)
- ContainerStats component with auto-polling progress bars
- SystemHealthCard dashboard widget with running/proxy/error counts
- Full EN/RU i18n for stats and system health
2026-03-30 11:37:25 +03:00
alexei.dolgolyov 7a85441b81 feat(observability): phase 3 - direct proxy creation with validation
Add standalone proxy management:
- Multi-step validation pipeline (DNS, TCP, HTTP) with diagnostic hints
- Proxy lifecycle: create/update/delete via NPM API with SSL auto-assign
- Periodic health monitoring (5min) with event log on status transitions
- Unified /api/proxies/all endpoint merging standalone + managed proxies
- Frontend types and API functions for downstream UI phases
2026-03-30 11:19:55 +03:00
alexei.dolgolyov aefecdffdf feat(observability): phase 2 - stale container detection
Add periodic scanner for stale containers:
- Cron-based scanner (hourly) detects non-running containers exceeding threshold
- last_alive_at tracking on instances, updated on deploy/start/restart
- API: GET /api/containers/stale, POST cleanup (single + bulk)
- Event log warnings emitted for newly stale containers
- Graceful handling of externally removed containers
2026-03-30 11:12:25 +03:00
alexei.dolgolyov c38b7d4c78 feat(observability): phase 1 - schema, models & event log backend
Add database foundation for observability features:
- event_log table with severity/source filtering and pagination
- standalone_proxies table for user-created reverse proxies
- stale_threshold_days setting (default 7 days)
- Auto-persist warn/error events from event bus to database
- SSE broadcast of persistent events for real-time UI updates
- Frontend types and API functions for downstream UI phases
2026-03-30 10:59:13 +03:00
alexei.dolgolyov f71c314262 feat: auto-reapply SSL cert to all managed proxies on change
When the SSL certificate is changed in settings, automatically
updates all existing NPM proxy hosts managed by Docker Watcher
in the background. Clears SSL if cert is removed.
2026-03-29 13:11:21 +03:00
alexei.dolgolyov 9f284932a1 feat: SSL wildcard certificate picker from NPM
- NPM client: ListCertificates endpoint
- API: GET /api/settings/npm-certificates (wildcard-only filter)
- Settings UI: EntityPicker for selecting wildcard certs
- Deployer: applies certificate_id + ssl_forced to proxy hosts
- Uses HTTPS subdomain URLs when SSL cert is configured
2026-03-29 13:07:58 +03:00
alexei.dolgolyov be6ad15efc fix: comprehensive security, performance, and quality hardening
Security: apply AdminOnly middleware to mutating routes, require
ENCRYPTION_KEY and ADMIN_PASSWORD (no insecure defaults), restrict
CORS to same-origin, fix OIDC token delivery via cookie instead of
URL query param, add rate limiting on login, add MaxBytesReader,
validate volume paths against traversal, add security headers,
validate user roles, add Secure flag to OIDC cookie.

Performance: set SQLite MaxOpenConns(1) to prevent SQLITE_BUSY,
add FK indexes on 8 columns, track notifier goroutines with
WaitGroup for graceful shutdown, use GetRegistryByName instead of
GetAllRegistries in deployer, pass basePath param to avoid redundant
settings query, return empty slices from store to remove reflection.

Quality: refactor TriggerDeploy to delegate to runDeploy (~100 lines
removed), consolidate duplicated utilities (extractPort, boolToInt,
now, isTerminalStatus) into shared exports, migrate all log.Printf
to slog structured logging, use consistent webhook response envelope,
remove dead code (parseEnvVars, duplicate auth types).

UX: clean up NPM proxy on instance removal via API, add README with
quickstart guide, add .env.example, require ADMIN_PASSWORD in
docker-compose, document staging-net prerequisite.
2026-03-29 12:49:24 +03:00
alexei.dolgolyov a47f703910 fix: nil slices return [] not null in all API responses 2026-03-28 14:33:36 +03:00
alexei.dolgolyov 4ba3673b96 feat: support multiple owners per registry (comma-separated) 2026-03-28 14:15:42 +03:00
alexei.dolgolyov 37e251da85 feat: auto-discover container images from registries
- Add ListImages() to registry interface, implement for Gitea
- Add owner field to registry config (needed for Gitea packages API)
- GET /api/registries/:id/images endpoint
- "Browse Images" button on Projects and Quick Deploy pages
- Image dropdown with registry grouping and search
- i18n support (EN/RU) for all new UI strings
2026-03-28 14:04:11 +03:00
alexei.dolgolyov 77251c540b fix: align webhook regenerate route with frontend path 2026-03-28 13:58:53 +03:00
alexei.dolgolyov 52eec11d16 fix: registry test endpoint accepts empty body for connectivity check 2026-03-28 13:54:05 +03:00
alexei.dolgolyov 28abad27c6 fix: SSE flusher support, null-safe API responses
- Add Flush() to statusRecorder so SSE works through logging middleware
- Return empty array instead of null for empty project lists
- Fixes 500 on /api/events and null.length crash on dashboard
2026-03-28 13:48:20 +03:00
alexei.dolgolyov 4d4e07eb2e fix: serve SvelteKit _app assets correctly
- Use all: prefix in go:embed to include _app/ directory (Go skips
  _-prefixed dirs by default)
- Move jsonContentType middleware to /api route group only
- Use http.ServeContent for proper MIME type detection
- Remove unused fileServer variable
2026-03-28 13:34:02 +03:00
alexei.dolgolyov 1f81ca9eb0 fix(docker-watcher): address final review findings
Security:
- Move config export behind auth middleware
- Validate OIDC callback token before storing in localStorage
- Use constant-time comparison for webhook secret
- Encrypt OIDC client secret at rest (like registry tokens)

Performance:
- Make TriggerDeploy async from HTTP handlers (return deploy ID
  immediately, run pipeline in background goroutine)

Robustness:
- Wrap api.ts res.json() in try/catch for non-JSON responses

i18n:
- Replace ~20 hardcoded English validation messages with $t() calls
- Localize ConfirmDialog cancel button, InstanceCard confirm titles,
  ProjectCard instance/instances pluralization
- Add validation keys to both en.json and ru.json
2026-03-28 00:14:53 +03:00
alexei.dolgolyov d4659146fc feat(docker-watcher): phase 13 - volumes & environment
Per-stage env var overrides with encryption for secrets.
Volume mounts with shared/isolated modes (isolated appends
/{stage}-{tag}/ to source path). Store CRUD, API endpoints,
and frontend editors for both. Env merge during deploy.
2026-03-27 23:28:59 +03:00
alexei.dolgolyov 32de5b26a8 feat(docker-watcher): phase 12 - hardening
Blue-green zero-downtime deploys, promote flow validation.
Dual auth: local (bcrypt + JWT) and OAuth2/OIDC (any provider).
Auth middleware, login page, auth settings UI.
Structured logging (slog JSON), config export to YAML.
Graceful shutdown with deploy draining.
Multi-stage Dockerfile and production docker-compose.yml.
Swap phase order: Volumes & Env before UI Polish.
2026-03-27 23:20:56 +03:00
alexei.dolgolyov 5558396bb7 feat(docker-watcher): phase 11 - frontend embed & SSE
Embed SvelteKit static build in Go binary via go:embed. Event bus
for pub/sub with deploy log, instance status, and deploy status events.
SSE endpoints for real-time streaming. Frontend SSE client with
exponential backoff reconnection. Makefile for build pipeline.
Update Phase 12 auth plan with OAuth2/OIDC support.
2026-03-27 22:30:25 +03:00
alexei.dolgolyov 757c72eea1 fix(docker-watcher): phase 8 security fixes
Remove webhook secret from logs and API response.
Add auth-pending note to router. Fix decrypt fallback that
would use ciphertext as auth token on decrypt failure.
2026-03-27 22:10:00 +03:00