tiny-forge

Author	SHA1	Message	Date
alexei.dolgolyov	1c47030854	feat(volsnap): volume snapshot restore (backlog #6 ) Restore a captured volume snapshot onto an image workload's live host-bind data volumes, then redeploy — the most destructive workload action, built to the adversarially-reviewed design (C1–C6) with all data-loss guards. - Engine.Restore (engine-owned): all-or-nothing pre-flight re-resolution from the workload's CURRENT config (never the tamperable manifest), per-filesystem disk pre-check, per-workload lock, container quiesce, extract-to-tmp, durable pre-restore snapshot, write-ahead journal, atomic rename swap, redeploy, and crash-recovery sweep (RecoverInterruptedRestores) wired before serving. - internal/keyedmutex: shared per-key lock; deployer now serializes every deploy entrypoint per workload via DispatchPlugin (+ LockWorkload/RedeployLocked for the restore re-dispatch, no deadlock). - Untrusted-archive extractor: zip-slip containment, type allow-list (reg/dir only), decompression-bomb cap, manifest-index bounds. - POST /api/workloads/{id}/snapshots/{sid}/restore: admin, X-Confirm-Restore header (CSRF), per-workload single-flight (409). - WebUI: Restore button + danger ConfirmDialog + busy state + i18n (en/ru). Scope: image-source only; scopes absolute/stage/project (driven off the same supportedScopes constant capture uses). Plan-reviewed before coding; per-phase go/security/ts reviews; final review READY TO MERGE. Security review caught + fixed a CRITICAL manifest-Source path traversal (re-derive target from current config + base containment). Plan: plans/volume-snapshot-restore/	2026-06-22 17:23:52 +03:00
alexei.dolgolyov	6b45ed62bb	feat(snapshots): capture app data-volume snapshots Build / build (push) Successful in 10m59s Details Add per-workload capture of host-bind data volumes as downloadable tar.gz archives: a new internal/volsnap engine (enumerate host-bind volumes via the computeMounts merge, archive with archive/tar+gzip skipping symlinks/special files, per-workload retention + startup orphan cleanup), a volume_snapshots table + store CRUD, admin-gated API (list/snapshotable/create/download/delete), and a Snapshots panel on /apps/[id] that shows coverage and which volumes are skipped (and why). Scope: image-source apps, host-bind scopes (absolute/stage/project); Docker named volumes, tmpfs, and instance scope are surfaced as not-yet-supported. Restore is a separate later phase. Download/FilePath are containment-checked; create returns a typed no-data error (400) vs generic 500. Covered by archiver unit tests + full API e2e.	2026-06-02 14:56:10 +03:00
alexei.dolgolyov	00503b4c0a	feat(cli): add tinyforge terminal client New zero-dependency Go CLI (cmd/cli) that drives the existing HTTP API: login/logout, apps list, deploy (synchronous, --timeout), logs (one-shot + -f SSE follow), and status. Caches a 24h JWT in ~/.tinyforge/config.json (0600, Chmod-enforced on overwrite); Bearer-header auth keeps the token out of server/proxy logs; no-echo password prompt (kernel32 on Windows, stty elsewhere). Server/token resolved via flags, TINYFORGE_URL/TINYFORGE_TOKEN env, or config. README CLI section + root-anchored .gitignore entries for the build output.	2026-06-02 13:34:42 +03:00
alexei.dolgolyov	cdb9fd57d1	feat(alerts): metric-threshold alerting (backend + API) Operators can define metric-threshold alert rules (cpu_percent, memory_percent, memory_bytes; gt/lt) per-workload or global via /api/metric-alert-rules. A periodic evaluator (internal/metricalert, 30s tick) checks the freshest container stats sample per container against enabled rules and, on breach (per-rule-per-workload cooldown), emits into the existing event_log + bus pipeline (source "metric_alert", workload_id set). Alerts therefore surface on the global events page, the per-app activity timeline, and any configured event-trigger webhook -- no new notification plumbing. Mirrors the log_scan_rules store/API/route patterns and the stats.Collector lifecycle. Rule CRUD reads are authed, mutations AdminOnly. Frontend rule-config UI is a follow-up phase. Reviewed: go APPROVE (0 CRITICAL/HIGH).	2026-05-29 14:06:23 +03:00
alexei.dolgolyov	410a131cec	feat(apps): stepped creation wizard, branch previews, and app-creation fixes This session (frontend focus): - Rebuild /apps/new as a 4-step wizard (Basics → Configure → Trigger → Review): WizardRail, SourceKindPicker card grid, AppManifest review, per-step validation, ConfirmDialog-based unsaved-changes guard. - Extract lib/workload/sourceForms.ts (single source of truth for source_config) + {Image,Compose,Static,Dockerfile}SourceForm + StaticDiscoveryWizard; fold the /apps/[id] edit form onto the same components (removes the duplication). Add vitest + sourceForms unit tests. - Branch preview environments UI: /chain is_preview/preview_branch + a Preview environments panel on /apps/[id] (per-branch URLs, ConfirmDialog teardown, armed state); RegistryImagePicker on the registry trigger and the image source. - Fixes: image-inspect 404 -> admin-gated POST /api/discovery/image/inspect; conflict-panel blur flicker; friendly localized discovery errors; CPU/Memory label hints; dashboard + /apps "Total workloads" count only source_kind workloads (drop stale trigger_kind gate); NPM cert/access-list name cache; EntityPicker empty-list guard. - Update CLAUDE.md frontend conventions + add a Build & Test section. Also captures pre-existing in-progress platform work (not from this session): workload notifications, Prometheus metrics export, store lockfile, health probes, backup hardening, and related store/webhook/scheduler changes.	2026-05-29 02:09:54 +03:00
alexei.dolgolyov	5e78f13e06	refactor(triggers): review followups — fire-now, dedupe trigger pages, hardening Build / build (push) Failing after 34s Details Follow-ups on commit `39e1e36` addressing review feedback from go-reviewer / security-reviewer / typescript-reviewer. Backend: - New POST /api/triggers/{id}/fire (AdminOnly, schedule-only): operator "Fire now" button — dispatches immediately without waiting for the next natural interval. Persists last_fired_at BEFORE dispatch, same ordering as the scheduler. Per-trigger in-flight guard (429 if a fire is already running) to defend against rapid double-clicks / runaway scripts. Refuses request when AdminOnly claims are absent rather than logging an unattributable deploy. - SetTriggerLastFired now validates timestamp parses as RFC3339 before writing. Rejects empty string explicitly — empty-clears semantics were dead (no caller) and would silently re-fire on next tick if ever accidentally written. A future reset-cadence flow must add a dedicated ClearTriggerLastFired so the call site is grep-able and separately auditable. - Scheduler logs WARN on catch-up fires (now - lastFired > 2× interval) so the "surprise burst at restart" pattern shows up in audit logs. - BindingResult reason strings extracted to package consts (webhook.Reason*) so the scheduler and api fire-now classifications stay in sync without string-matching drift. - SECURITY NOTE on FanOutForTrigger documents that the WebhookRequireSignature gate is ingress-only by design. Frontend: - Refactored /triggers/new (770 LOC → 155 LOC) and /triggers/[id] (~350 LOC dropped) to use the shared TriggerKindForm. Eliminates the triplicated per-kind state + buildConfig + canSubmit + template that caused the d-unit regex drift in the prior commit. - New seedTriggerKindFormState helper on TriggerKindForm primes the form from a server-returned trigger config with defensive type guards; resets per-kind slots first so re-seeding across kinds doesn't inherit stale state. - /triggers/[id] gains a Schedule status panel with Last Fired + Fire Now button (gated on binding_count > 0). Confirmation dialog, result flash, timer cleanup on unmount + new-fire (no stale-closure race). EN+RU i18n parity.	2026-05-16 12:16:47 +03:00
alexei.dolgolyov	39e1e36510	feat(triggers): add schedule trigger kind + internal scheduler Build / build (push) Successful in 10m42s Details Fourth trigger kind alongside registry/git/manual. Recurring time-interval fires driven by a new internal/scheduler tick loop (default 30s, clamped to 5m). Goes through the same webhook.Handler.FanOutForTrigger seam as inbound HTTP webhooks, so per-binding concurrency, outcome accounting, and config-merge semantics are identical. Schema: triggers.last_fired_at TEXT column (additive ALTER for existing DBs). Scheduler persists last_fired_at BEFORE dispatch so a panicking Match cannot wedge a tight loop; failed deploys wait one full interval before retry — correct trade-off for a periodic refresh trigger. Frontend: TriggerKindForm + /triggers/new + /triggers/[id] gain the schedule kind (4-col card grid, preset chips Hourly/Daily/Weekly, custom interval input matched to Go time.ParseDuration syntax, optional pinned reference). /triggers/[id] surfaces "last fired" on schedule rows. EN+RU i18n in parity. Review fixes from go-reviewer / security-reviewer / typescript-reviewer: - Scheduler Start/Stop wrapped in sync.Once (no goroutine leak / double- cancel panic on shutdown re-entry). - shouldFire rejects sub-MinInterval as defense-in-depth against hand-inserted rows that bypassed Validate. - fire() asserts trigger Kind=="schedule" before dispatching. - Aligned isValidInterval regex across all three frontend sites; reject the unsupported "d" unit (Go time.ParseDuration doesn't accept it). - formatLastFired falls back to lastFiredNever on malformed timestamps rather than leaking raw bytes into the UI. - main.go scheduler closure logs per-fire deployed/errored counts.	2026-05-16 11:24:05 +03:00
alexei.dolgolyov	739b67856a	feat(cutover): hard legacy cutover — drop projects/stacks/sites/deploys Build / build (push) Successful in 10m39s Details The clean-break delete that closes the workload-first refactor arc. Net diff: ~30 backend files deleted, ~20 modified, ~12k LOC removed on the Go side; entire /projects /stacks /sites /deploy frontend trees gone; ~6.7k LOC removed on the Svelte/TypeScript side. Backend - API handlers gone: internal/api/{projects,stages,stage_env,stacks, static_sites,deploys,instances,volume_browser}.go - Store CRUD + tests gone: internal/store/{projects,stages,stage_env, stacks,static_sites,static_site_secrets,deploys,poll_state,volumes, workload_sync}.go (+ _test.go siblings) - Legacy deployer pipeline gone: internal/deployer/{bluegreen,promote, rollback,subdomain,resolver_test}.go; deployer.go trimmed to just the dispatch surface used by the plugin pipeline - internal/staticsite/{manager,healthcheck}.go and internal/stack/manager.go gone (the rest of those packages stay as helpers imported by the static + compose plugins) - internal/registry/poller.go gone (legacy registry poller) - internal/volume.ResolvePath gone; ResolveWorkloadPath stays - internal/webhook: handleWebhook (project) + handleSiteWebhook (site) gone; only POST /api/webhook/triggers/{secret} remains - workload-side webhook URL handlers (getWorkloadWebhook + regenerateWorkloadWebhook + EnsureWorkloadWebhookSecret + SetWorkloadWebhookSecret + GetWorkloadByWebhookSecret) gone — they minted URLs that would 404 against the new trigger-only ingress - cmd/server/main.go: dropped staticsite.Manager, stack.Manager, staticsite.HealthChecker, registry poller, SetSiteSyncTriggerer, SetStaticSiteManager, SetStackManager, wireStaticBackend - store/store.go: idempotent DROP TABLE IF EXISTS for every legacy table (projects, stages, stage_env, volumes, deploys, deploy_logs, poll_states, stacks, stack_revisions, stack_deploys, static_sites, static_site_secrets); FK order children-then-parents - store/models.go: dropped Project, Stage, Deploy, DeployLog, StageEnv, Volume, StaticSite, StaticSiteSecret, Stack, StackRevision, StackDeploy types; kept WorkloadKind constants as documented strings - internal/store/helpers.go (new): BoolToInt, rowScanner, GenerateWebhookSecret extracted from deleted CRUD files - internal/api/secrets.go (new): forwards to store.GenerateWebhookSecret so api + store paths share one secret-generation impl (no panic-vs-UUID-fallback divergence) - internal/reconciler/reconciler.go: dropped legacy stack-by-compose + static-site label paths; only canonical tinyforge.workload.id dispatch remains - providers (gitea_content/github_provider/gitlab_provider) gained path-traversal rejection on every tree entry - internal/webhook ParsedImage / ParseImageRef demoted to package- private (no external callers) Frontend - /projects /stacks /sites /deploy routes deleted (entire trees) - ProjectCard / InstanceCard / StaleContainerCard components deleted - api.ts: dropped every project/stage/stack/site/deploy/instance helper + types (Project, Stage, Stack, StaticSite, Deploy, Instance, Volume, etc.); kept Workload, Container, App, Settings, Registry, EventTrigger, LogScanRule, webhook envelopes - WorkloadWebhook type + getWorkloadWebhook/regenerateWorkloadWebhook api functions gone (mirror of the backend deletion above) - web/src/routes/+layout.svelte: dropped /projects /sites /stacks /deploy nav entries, trimmed quick-nav keymap - web/src/routes/+page.svelte: dashboard rewrite — reads listWorkloads + listContainers only; 4-card stat grid (workloads/running/failed/stale) + recent workloads strip - navCounts.ts, SystemHealthCard.svelte, ContainerLogs.svelte, ContainerStats.svelte, StatusBadge.svelte, TagCombobox.svelte, proxies/+page.svelte, containers/+page.svelte all rewired to the workload-first surface - AbortController plumbing on dashboard, nav-counts, stale page, SystemHealthCard so navigation doesn't leave dangling fetches - i18n: dropped projects., projectDetail., envEditor., volumeEditor., volumeBrowser., quickDeploy., sites., stacks., instance., confirm. namespaces; en/ru parity preserved (1042 keys each) Hardening from go-reviewer + security-reviewer + typescript-reviewer subagent passes (0 CRITICAL across all three; 1 HIGH + ~12 MEDIUM addressed inline before commit): - Sec H1: dead-end workload webhook URL handlers (would mint URLs that 404 the new trigger-only ingress) deleted across backend + frontend - Go M1: IsTerminalDeployStatus dropped (no production callers) - Go M2: ParsedImage/ParseImageRef lowercased (in-package only) - Go M6: generateWebhookSecret unified — api shim forwards to store.GenerateWebhookSecret - Doc/comment freshness: stage_id (no longer FK), ProxyRoute legacy field names, workloadIDRow rationale, webhook_deliveries.target_type enum, WebhookDeliveryLog component header Doc - WORKLOAD_REFACTOR_TODO: cutover marked DONE; all three Priority 1 items are now shipped. Next focus is Priority 3 polish (apps.* i18n + codemap entries) and Priority 4 tests. Behavioral notes for operators upgrading from a pre-cutover build - Existing rows in the dropped tables disappear on first boot. - Legacy webhook URLs at /api/webhook/{secret} and /api/webhook/sites/{secret} return 404; CI configs must repoint to /api/webhook/triggers/{secret} (the trigger-split boot backfill lifted any embedded workload secret onto a Trigger row, so the secret value itself carries over). - Frontend routes /projects /stacks /sites /deploy are gone; nav links replaced with /apps and /triggers.	2026-05-16 06:00:21 +03:00
alexei.dolgolyov	234c3c711e	feat(static): inline static-source plugin; drop phantom-row adapter Build / build (push) Successful in 10m43s Details Lift the static-site deploy pipeline from internal/staticsite/manager.go into internal/workload/plugin/source/static/ so plugin-native static workloads operate directly on plugin.Workload + the containers table + workload_env. The cmd/server/static_backend.go phantom-row adapter is gone; the legacy static_sites table is no longer touched on plugin deploys. Backend - new state.go: runtimeState (last_commit_sha, last_sync_at, last_error, status) persisted in containers.extra_json under the deterministic row id <workloadID>:site - per-workload sync.Mutex serializes saveState read-modify-write so parallel deploys for the same workload can't race container_id / proxy_route_id writes - extra_json round-trips through map[string]json.RawMessage so unknown keys survive — typed runtimeStateKeys are stripped before merge so clearing a typed field actually drops the key - new env.go reads workload_env (replaces static_site_secrets for plugin-native sites); decrypt-failure logs and skips one entry rather than failing the whole deploy - new build.go ports prepareDenoBuild + prepareStaticBuild + copyDir; copyDir uses filepath.WalkDir + Lstat to refuse symlinks and non-regular files - new deploy.go is the ~300-line core; intent.Reason gates force vs skip-if-no-changes; success-path saveState failure rolls back container + proxy route and writes "failed" state (no orphans) - new teardown.go combines Remove + Stop; idempotent on never-deployed workloads - new reconcile.go refreshes container state from Docker; flips runtimeState.Status to failed when the container is missing/crashed Hardening (from go-reviewer + security-reviewer subagent passes; 1 CRITICAL + 5 HIGH + 3 MEDIUM addressed before merge) - path-traversal defense in all 3 providers (gitea_content, github_provider, gitlab_provider): reject tree entries whose resolved local path escapes destDir - verifyDownloadInsideRoot walks the build dir post-download as a second line of defense - sanitizeError redacts the access token, collapses to one line, and clamps to 240 bytes before persisting to extra_json or fanning out to the notification webhook - container/image/volume names suffixed with workload-id short prefix (workload name is not UNIQUE in schema) - primaryDomain reads settings.Domain to complete a bare subdomain face into a full FQDN (matches legacy Manager behavior) - ctx-aware health-check sleep - json.Marshal for event metadata (was fmt.Sprintf JSON template) - strings.HasPrefix for failed-status detection (was brittle slice expression) Wire-up - cmd/server/main.go: removed wireStaticBackend(...) call; existing blank import on _ ".../source/static" drives init() registration - cmd/server/static_backend.go deleted Doc - WORKLOAD_REFACTOR_TODO: static port marked DONE; next focus is the hard legacy cutover (drop /api/projects, /api/stacks, /api/sites, /api/stages + their tables, internal/stack + internal/staticsite packages, frontend /projects /stacks /sites) Behavior notes for operators - plugin-native static workloads no longer write to static_sites; legacy /api/sites/* still serves original rows unchanged - legacy tinyforge.static-site / .static-site-name container labels dropped on plugin deploys; canonical tinyforge.workload.id / .kind cover ownership - container/image/volume names gained an 8-char ID suffix (e.g. dw-site-mysite-a1b2c3d4); legacy-deployed sites keep the old shape until redeployed through the plugin path	2026-05-16 02:56:23 +03:00
alexei.dolgolyov	7a9ff7ad54	feat(observability): event triggers + log scanner backend Two paired backends sharing the events.Bus seam: Event triggers (consumer-side): - internal/store/event_triggers.go — CRUD with action_secret redaction on read (placeholder echo treated as "no change" on PATCH so secrets aren't accidentally wiped). - internal/events/dispatcher.go — bus subscriber, AND-composed filters (severity CSV, source CSV, message regex with memoized compile cache). Structural loop-prevention: never writes to event_log. Sends via notifier.SendPayload. - internal/notify: SendPayload + SendSyncForTestPayload methods, TierEventTrigger constant, doSendRaw shared with the legacy Event-shaped path. - internal/api/event_triggers.go — admin-gated CRUD + /test sending the real TriggerWebhookPayload shape. SSRF guard rejects loopback / link-local / unspecified targets. PATCH uses pointer-typed DTO for partial updates. Log scanner (producer-side): - internal/logscanner/ — engine (per-rule cooldown + per-container token bucket, atomic drop counters), tail (multiplexed docker frame demuxer with TTY fallback + 16 MiB payload cap + 1 MiB reassembly cap + RFC3339Nano-validated timestamp strip + UTF-8-safe message truncation), manager (5s container polling, atomic.Pointer[Snapshot] hot-reload, HitEmitter writes event_log + publishes EventLog so the trigger dispatcher picks them up immediately). - internal/docker/container.go — ContainerLogsOpts exposes stream selection for stderr-only / stdout-only rules. - internal/store: log_scan_rules table + CRUD with EffectiveLogScanRules resolver (globals minus per-workload overrides plus workload-only additions). Transactional cascade-delete of overrides when a global rule is removed. - internal/api/log_scan_rules.go — admin-gated CRUD + /test (sample_line → matched/captures) + /stats (drop counters + active tail count + last-snapshot compile errors) + GET /api/workloads/{id}/effective-rules. cmd/server/main.go wires both subsystems next to the existing RegisterPersistentLogger. Coverage spans engine cooldown / bucket counter tests, snapshot effective-set semantics, manager compile- error capture, dispatcher matching, store validation + cascade-delete, API URL validator + secret redaction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 22:18:11 +03:00
alexei.dolgolyov	8d6a527a2b	refactor(workload): plugin architecture wave + apps UI + volume scopes Completes the workload-first refactor's plugin layer: - internal/workload/plugin/ — Source/Trigger plugin contract, registry, types (Workload, DeploymentIntent, InboundEvent, PublicFace). Self-registering init() pattern + blank-import in cmd/server/main.go. - Source plugins: image (blue-green with multi-face proxy routing), compose, static. Trigger plugins: registry, git, manual. - internal/deployer/dispatch.go — DispatchPlugin/Teardown/Reconcile seam routing the legacy deployer through plugins. - internal/api/workload_*.go — REST surface: workloads, env, volumes, chain (parent/children), promote-from. hooks.go serves /api/hooks/kinds/{kind}/schema for the wizard. - internal/store: workload_env (encrypt-at-rest secrets) and workload_volumes tables, keyed on workload_id. - cmd/server/static_backend.go — phantom-row adapter delegating the static source plugin to the legacy staticsite.Manager (deleted at hard cutover once the static inline port lands). - web/src/routes/apps/ — /apps list + /apps/new wizard + /apps/[id] detail with kind-aware compose / image / static forms (Advanced JSON toggle), env panel, volumes panel, webhook panel, chain panel, manual deploy. Volume scope generalization (v2 resolver): - internal/volume.ResolveWorkloadPath (workload-keyed, sits next to legacy ResolvePath). Honors all VolumeScope values: absolute, ephemeral, instance, stage, project, project_named, named. internal/workload/plugin/source/image/image.go computeMounts wires settings + imageTag through. Coverage in internal/volume/resolver_test.go (portable Linux/Windows via t.TempDir). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 22:17:41 +03:00
alexei.dolgolyov	cba2149aa9	refactor(workload): finalize containers index + post-review hardening Wraps up the workload refactor with the fixes that came out of the multi-agent code review (see docs/plans/workload-refactor.md "What actually shipped"). Backend: - store.ReconcileContainer: separate write path so the 30s reconciler tick no longer overwrites deployer-owned fields (subdomain, proxy_route_id, npm_proxy_id, image_tag). - Container.stage_id column + index; ListProxyRoutes / ListContainersByStageID join via stage_id (survives stage rename), with legacy fallback to (project_id, role=stage_name). - Reconciler: workload-existence check (rejects forged tinyforge.workload.id labels), skips inventing project-kind rows, child-context cancel before wg.Wait() on shutdown. - Transactional CRUD across projects / stacks / static_sites: parent UPDATE and workload sync land in one transaction so secret rotations are durable. - Webhook routing reads exclusively through workloads.webhook_secret; legacy GetProjectByWebhookSecret / GetStaticSiteByWebhookSecret fallback removed. - store.GetStackByComposeProjectName + indexed lookup (no more full-table stack scan per compose container per tick). - store.ListMissingSweepRows: filtered query for the missing-sweep. - /api/instances/* handlers verify (workload_id, role) match URL (project_id, stage_name) before mutating — closes the cross-project hijack the security review flagged. - extra_json no longer referenced from Go (column kept on disk for now). Frontend: - WorkloadContainers.svelte: generic detail-page panel reusable by stack and site detail pages. - Containers page polish: client-side kind/state filters over an unfiltered fetch, URL-synced filters, race-safe loads via sequence number, EN+RU i18n, sidebar counter via navCounts.containers. Misc: - scripts/dev-server.sh: tolerate empty netstat grep result. - .gitignore: ignore docker-watcher binaries, .claude/worktrees/, .facts-sync.json.	2026-05-09 15:44:41 +03:00
alexei.dolgolyov	af82be3fb8	feat(workload): container index reconciler Background worker that keeps the containers table in sync with docker ps. Runs one boot pass and ticks every 30s. Dispatch precedence per container: 1. tinyforge.workload.id label (canonical, new) 2. tinyforge.instance-id label (legacy project — joins via instances) 3. tinyforge.static-site label (legacy site) 4. com.docker.compose.project (stacks — joins via ComposeProjectName) Rows whose Docker container ID is no longer present are flipped to state='missing'. Placeholder rows (empty container_id, e.g. a deploy mid-flight) are left alone so a tick that races a deploy doesn't mark them as missing. DockerLister interface lets tests substitute a fake daemon — 6 unit tests cover the dispatch matrix, missing-sweep, and state normalization. Wired into cmd/server/main.go between docker.New and the existing startup chain. Boot pass populates the containers table from any pre-refactor running containers.	2026-05-09 13:45:13 +03:00
alexei.dolgolyov	db235c1412	feat(workload): write-through workload sync + boot-time backfill CRUD on Project / Stack / StaticSite now keeps a paired Workload row in sync. Secret setters (webhook secret, signing secret, require-signature toggle, notification secret) all re-sync after mutating the source-of-truth row so the workload row always reflects the canonical state. Delete cascades: DeleteProject/Stack/StaticSite now drop the matching workload row plus any container index entries owned by it, so global views don't show ghost rows. Boot-time BackfillWorkloads scans every project/stack/site and ensures each has a workload row. Idempotent — safe to run on every restart, recovers from a deleted/missing workload row. Behavior unchanged for existing call sites; the workloads table just starts being populated. Deployer / reconciler / consumer switchover land in the next commit.	2026-05-09 13:28:20 +03:00
alexei.dolgolyov	0f60a7a5db	feat(webhook): inbound delivery audit log Build / build (push) Successful in 10m35s Details Persists every inbound webhook hit (project + site) so users can debug "why didn't my deploy fire?" without grepping daemon logs. Surfaces a 14-day rolling history under the WebhookPanel on each project + site detail page; refreshes every 30s while open. Daily cron prunes records older than 14 days alongside the existing event log prune. Schema: - webhook_deliveries(id, target_type, target_id, target_name, received_at, source_ip, signature_state, status_code, outcome, detail, body_size) - indexes on (target_type,target_id,received_at) and (received_at) Backend: - store: WebhookDelivery model + Insert/List/Prune helpers - webhook/handler: deferred recordDelivery() captures the final outcome on every return path including HMAC rejects, image mismatch, no-stage, auto_deploy=false, and successful deploys; signatureStateFor() classifies "unconfigured" vs "missing" vs "invalid" vs "valid" - api: GET /api/{projects,sites}/{id}/webhook/deliveries with parseLimit() helper (default 50, max 200) - main: daily prune cron retains the last 14 days Frontend: - WebhookDeliveryLog.svelte: panel with refresh button, status code + outcome + signature badges, relative time tooltip-on-hover for absolute time, source IP column - Mounted below WebhookPanel on project + site detail pages - en/ru i18n strings for outcome/signature enums and column labels	2026-05-07 02:40:39 +03:00
alexei.dolgolyov	8b886ddf2b	feat(backup): take Tinyforge DB snapshot before every deploy Adds an opt-in "auto_backup_before_deploy" setting that triggers a "pre-deploy" backup at the start of every project deploy via the deploy pipeline (covers both the async HTTP path and the sync poller/webhook path). Failures are logged to the deploy log but do not abort — missing a backup is preferable to refusing to ship a fix. - store: settings.auto_backup_before_deploy column + scan/update wiring - backup: accept "pre-deploy" as a valid backup_type - deployer: small PreDeployBackuper interface, hooked into runDeploy right after settings load and before any state-mutating work - api: settings request/response surface the new flag - web: ToggleSwitch on the backup settings page; "Pre-deploy" badge variant in the backup list (badge-warning so it stands out) - i18n: en/ru strings for the toggle, help text, and badge label	2026-05-07 02:14:26 +03:00
alexei.dolgolyov	0405ecd9ce	feat(notify): HMAC-signed outgoing webhooks with per-tier secrets and test sender Build / build (push) Successful in 10m36s Details Outgoing notifications were bare POSTs with no auth and no way to verify they came from Tinyforge. They also went out from one global URL only, even though stages had a notification_url field, and static-site sync emitted no events at all. Schema: add notification_url + notification_secret (lazy-generated) to settings, projects, stages and static_sites. Migrations are additive. Notifier: SendSigned computes HMAC-SHA256 over the exact body bytes and sends X-Hub-Signature-256 (GitHub-compatible — receivers built for GitHub/Gitea/Forgejo verify out of the box). Aux headers X-Tinyforge-Event/Delivery/Timestamp/Tier are advisory and not signed. Empty secret => unsigned send for back-compat. Resolution: deploys fall through stage > project > settings, sites fall through site > settings. The secret travels with the URL that sourced it, so any tier can sign even when its parents are unsigned. Site sync events now actually emit (site_sync_success / site_sync_failure). API: 12 new endpoints — {GET secret, POST regenerate, POST disable, POST test} for each of the 4 tiers. SendSyncForTest returns status_code/latency_ms/signature_sent/delivery_id/response_snippet so the UI surfaces receiver feedback inline. UI: shared OutgoingWebhookPanel.svelte fits the existing card aesthetic. Signing-state pill, secret reveal-on-demand, regenerate/disable behind ConfirmDialog modals (not inline strips — too easy to misclick), send- test result card with colour-coded status. Wired into Settings → Integrations, project edit form, per-stage edit, and per-site detail. EN + RU i18n. Tests: round-trip (sender signs, receiver verifies), tampered-body and wrong-secret rejection, unsigned-send omits header, send-test surfaces 4xx, concurrent fan-out via Drain. Resolver precedence locked for both deploy and site paths. Docs: docs/webhooks.md with header reference, verifier snippets in Node/Python/Go, and a recipe for the service-to-notification-bridge generic webhook provider.	2026-05-07 02:03:32 +03:00
alexei.dolgolyov	a4362b842d	fix: harden security, fix concurrency bugs, and address review findings Build / build (push) Successful in 11m42s Details Security: - rate limit /api/webhook routes per-IP and cap concurrent site syncs - global SSE connection cap (256) with new sse_gate - validate ?tail= and cap JSON log responses at 4 MiB - strip ANSI/CSI/OSC and control bytes from streamed log lines - redact webhook secret from request log middleware - scrub host details from /api/health for non-admin viewers - drop container_id from /api/system/stats/top for non-admins - generate webhook secrets via crypto/rand; require >=32 chars on insert - verify iid path consistency in streamContainerLogs - LimitReader on site webhook body; reject malformed non-empty bodies Concurrency / correctness: - stats collector: Stop() no longer hangs without Start(), semaphore acquired in parent loop so ctx cancellation short-circuits the queue, in-flight tick cancellable via shared base context, zero-ts guard - webhook handler: replace fire-and-forget goroutine with WaitGroup-tracked workers + Drain() wired into graceful shutdown - $derived(() => ...) mis-idiom fixed in ContainerStats / InstanceCard / ProjectCard (returned function instead of value) - SystemResourcesCard: rename `window` and `t` locals to avoid shadowing globalThis.window and the i18n `t` import Quality / performance: - replace O(n^2) insertion sort with sort.Slice in stats top - runMigrations only swallows duplicate-column / already-exists errors - PruneStatsSamplesBefore wrapped in a transaction - collapse N+1 in unusedImageStats / pruneImages to one ListAllInstances pass; surface DB errors instead of silently treating them as inactive - run Docker Info + DiskUsage in parallel via errgroup - container log SSE emits `: ping` heartbeat every 20 s - imageMatches case-insensitive on registry host (RFC behaviour) - log warning on invalid stage tag pattern instead of silent skip - reject malformed non-empty site webhook payloads Frontend / i18n: - shared formatBytes utility replaces three local copies - statsInterval store drives dynamic "no samples / collection disabled" copy across ContainerStats and SystemResourcesCard - top consumers row now shows owner_name (project/stage or site name) - drop seven `as any` casts on the Settings type; add cloudflare_api_token write-only field - move "Service status", "Docker daemon", "Docker unreachable", "Proxy unreachable", "reachable", and "Docker daemon is not reachable." strings into en/ru i18n bundles	2026-05-07 00:56:14 +03:00
alexei.dolgolyov	05440a5f92	feat(stats): resource metrics dashboard + sites logs/stats Build / build (push) Successful in 10m50s Details Background collector samples CPU/memory/network/block I/O for every instance and site on a configurable interval (default 15s, range 5-300s), persists samples to SQLite with a configurable retention window (default 2h, range 0-24h), and skips ticks gracefully when the Docker daemon is unreachable. Settings are reloadable without a restart — each tick re-reads them. New API endpoints: - GET /api/system/stats (host snapshot: info + df) - GET /api/system/stats/history - GET /api/system/stats/top?by=cpu\|memory - GET /api/projects/{id}/stages/{s}/instances/{iid}/stats/history - GET /api/sites/{id}/stats[/history] - GET /api/sites/{id}/logs (SSE + JSON, reuses instance log streamer) Frontend: - ECharts added with tree-shaken imports (~180KB gzip) for future-proof time-series/gantt/graph visualizations - CollapsibleSection wraps all dashboard sections (system health, daemons, system resources, static sites, projects) with localStorage-persisted open state - SystemResourcesCard shows capacity tiles, workload utilization chart with 30m/2h/6h/24h window picker, disk breakdown with reclaimable callouts, and top 5 consumers - ContainerStats and ContainerLogs take a source discriminated union so sites reuse the same components as instances; sites detail page embeds both for Deno backend debugging - Settings › Maintenance exposes collection interval + retention - Docker-unavailable state returns 503 and renders an amber banner instead of a generic 500 Full i18n coverage (en + ru) for all new strings.	2026-04-24 15:02:43 +03:00
alexei.dolgolyov	0632f512e6	feat(webhook): per-project and per-site webhook URLs Build / build (push) Successful in 10m25s Details Replace the single global webhook secret with entity-scoped secrets stored on each project and static site. Webhook-driven project autocreate is removed — projects must exist before their URL can trigger deploys. Also wires static-site webhooks (sync_trigger=push\|tag), turning the previously inert "push" trigger into a functional one: POST the site's webhook URL from a Git provider and Tinyforge re-syncs on matching refs. - Adds webhook_secret columns + unique indexes to projects and static_sites - Per-entity GET/regenerate endpoints under /api/projects/{id}/webhook and /api/sites/{id}/webhook (admin-only) - Removes /api/settings/webhook-url and the global webhook panel - Reusable WebhookPanel Svelte component on both detail pages, i18n in en/ru - Tests for matcher (siteRefMatches, ParseImageRef) and handler (project match/mismatch/404 and site push/manual/branch-skip)	2026-04-23 15:18:19 +03:00
alexei.dolgolyov	75424a5f25	feat: docker-compose stacks with Forge-themed UI Build / build (push) Successful in 10m42s Details Adds a new Stacks feature: upload/edit docker-compose YAML, deploy as atomic units, browse revisions, roll back, and stream logs. Backend in internal/stack + internal/api/stacks.go, persistent storage in internal/store/stacks.go. Stacks pages (list, new, detail) use a modern Forge aesthetic — Instrument Serif display type, JetBrains Mono for meta/code, indigo ember accents, dot-grid hero, registration marks on hover, terminal panel for logs. Palette is sourced from the app's existing design tokens so the feature remains consistent with the rest of Tinyforge. Fonts self-hosted via @fontsource/instrument-serif and @fontsource/jetbrains-mono to satisfy the strict CSP.	2026-04-16 03:48:37 +03:00
alexei.dolgolyov	791cd4d6af	feat: rename Docker Watcher to Tinyforge Build / build (push) Successful in 12m20s Details Rebrand the project as Tinyforge to reflect its evolution from a Docker container watcher into a self-hosted mini CI/deployment platform. Rename covers: Go module path, Docker labels, DB/config filenames, JWT issuer, Dockerfile binary, docker-compose, CI workflows, frontend i18n, README with static sites docs, and all code comments.	2026-04-12 21:30:39 +03:00
alexei.dolgolyov	8d2c5a063b	feat: static sites feature with Gitea/GitHub/GitLab support and Deno backend Deploy static content from Git repository folders with optional server-side API endpoints. Supports Gitea/Forgejo/Gogs, GitHub, and GitLab with provider autodetection. - New Sites entity with CRUD, encrypted secrets, and manual/push/tag sync triggers - Pluggable GitProvider interface with three implementations - Deno container mode: auto-generates router from API_{method}_{name} exports - Static container mode: nginx serving files with optional markdown rendering - Wizard UI with provider selector, repo picker, branch/folder tree pickers - Deploy pipeline builds fresh image, starts container, configures NPM proxy - Stop/Start buttons, force redeploy on manual trigger - Periodic health checker detects crashed containers - Proxy route existence check during auto-sync	2026-04-11 03:35:57 +03:00
alexei.dolgolyov	61febefca9	feat: automatic proxy re-sync on settings change When domain, SSL certificate, or proxy provider changes in settings: - Delete old proxy routes from the previous provider - Switch to None: clear all route IDs on instances - Switch to NPM/Traefik: re-create routes with new settings - Domain change: re-configure all routes with new FQDN - SSL cert change: re-apply to all existing routes - Provider created dynamically at runtime via createProxyProvider() - Deployer and API server updated via SetProxyProvider callback	2026-04-05 01:39:01 +03:00
alexei.dolgolyov	308547a3d7	refactor: remove standalone proxies, add Traefik provider with Docker labels Standalone proxy removal: - Delete store, API handlers, proxy manager, health monitor, validator, hints - Delete frontend pages (proxies list, create, edit) and components (ProxyCard, ProxyForm, ProxyFilter, ProxyGroup, ValidationChecklist) - Remove proxy routes from router, nav items, dashboard references - Clean up SystemHealthCard to remove proxy section Traefik provider: - Add TraefikProvider implementing proxy.Provider via Docker labels - ContainerLabels() returns traefik.enable, router rule, entrypoints, service port, TLS cert resolver, docker network - ConfigureRoute() returns router name (labels handle routing at container creation) - DeleteRoute() is no-op (container removal auto-deregisters) - Ping() checks Traefik API health (optional) - Wire ContainerLabels into deployer (executeDeploy + blueGreenDeploy) - Add Traefik settings: entrypoint, cert_resolver, network, api_url - Add traefik option to proxy provider selector in settings UI - Show conditional Traefik config fields - Add i18n keys (EN + RU)	2026-04-04 22:54:31 +03:00
alexei.dolgolyov	7d6719da12	refactor: extract ProxyProvider interface with None and NPM implementations Replace direct npm.Client usage throughout the codebase with the proxy.Provider interface, enabling pluggable proxy backends. The deployer, API layer, and proxy manager now use provider-agnostic route management (ConfigureRoute/DeleteRoute) instead of NPM-specific API calls. Adds ProxyRouteID (string) to Instance model and ProxyProvider setting to Settings, with SQLite migrations for backward compatibility.	2026-04-04 19:39:08 +03:00
alexei.dolgolyov	3c9727162a	fix: address review findings for backup management - HIGH: Add sync.Mutex to backup Engine to prevent concurrent backup/restore operations - HIGH: Restore uses io.Copy instead of ReadFile to avoid OOM on large databases - HIGH: Send HTTP response before closing DB during restore, then perform destructive operations in a goroutine - HIGH: Create pre-restore safety backup before overwriting database - HIGH: Autobackup cron reschedules dynamically when settings change via callback pattern (same as DNS provider changes)	2026-04-02 15:39:54 +03:00
alexei.dolgolyov	a9c7775bb7	feat: configuration backup management with manual and auto backup Add backup/restore functionality for the SQLite database. Users can trigger manual backups, configure automatic backups on an interval with retention policies, list/download/delete backups, and restore from any backup. - Backup engine using VACUUM INTO (safe with WAL mode) - Backup metadata tracked in DB, files stored in DATA_DIR/backups/ - Settings: backup_enabled, backup_interval_hours, backup_retention_count - API: POST/GET/DELETE /api/backups, download, restore endpoints - Autobackup via cron scheduler with configurable interval - Retention: prune on startup, after each backup (manual and auto) - Orphan cleanup: removes backup files without metadata on startup - Restore: replaces DB and triggers graceful server shutdown - Settings UI: /settings/backup with toggle, interval, retention config - Backup list with download, delete, restore actions - i18n: English and Russian translations	2026-04-02 15:32:15 +03:00
alexei.dolgolyov	c730cfaa45	feat: Cloudflare DNS management with automatic record sync Add flexible DNS management to Docker Watcher. By default, wildcard DNS is assumed (current behavior). When disabled, users can configure a Cloudflare DNS provider with API token and zone selection. DNS A records are automatically created/updated/deleted in sync with proxy consumers (deployed instances and standalone proxies). - Settings: wildcard_dns toggle, dns_provider, cloudflare credentials - Cloudflare client: Provider interface with EnsureRecord/DeleteRecord/ListRecords - DNS lifecycle hooks in deployer and proxy manager (best-effort) - Settings UI: DNS config section with provider picker, zone selector, test button - DNS Records page at /dns with filtering, sync status, reconciliation - Records visible in both wildcard and managed modes - Cleanup on provider change: removes old records when switching modes	2026-04-02 14:49:21 +03:00
alexei.dolgolyov	7c57c740b4	feat(observability): phase 8 - container stats, notifications & dashboard Add container monitoring and notification system: - Docker Stats API: real-time CPU/memory for running containers - Webhook notifications for errors (deploy failures, stale, proxy unhealthy) - Event log auto-pruning (daily, 30-day retention) - ContainerStats component with auto-polling progress bars - SystemHealthCard dashboard widget with running/proxy/error counts - Full EN/RU i18n for stats and system health	2026-03-30 11:37:25 +03:00
alexei.dolgolyov	7a85441b81	feat(observability): phase 3 - direct proxy creation with validation Add standalone proxy management: - Multi-step validation pipeline (DNS, TCP, HTTP) with diagnostic hints - Proxy lifecycle: create/update/delete via NPM API with SSL auto-assign - Periodic health monitoring (5min) with event log on status transitions - Unified /api/proxies/all endpoint merging standalone + managed proxies - Frontend types and API functions for downstream UI phases	2026-03-30 11:19:55 +03:00
alexei.dolgolyov	aefecdffdf	feat(observability): phase 2 - stale container detection Add periodic scanner for stale containers: - Cron-based scanner (hourly) detects non-running containers exceeding threshold - last_alive_at tracking on instances, updated on deploy/start/restart - API: GET /api/containers/stale, POST cleanup (single + bulk) - Event log warnings emitted for newly stale containers - Graceful handling of externally removed containers	2026-03-30 11:12:25 +03:00
alexei.dolgolyov	c38b7d4c78	feat(observability): phase 1 - schema, models & event log backend Add database foundation for observability features: - event_log table with severity/source filtering and pagination - standalone_proxies table for user-created reverse proxies - stale_threshold_days setting (default 7 days) - Auto-persist warn/error events from event bus to database - SSE broadcast of persistent events for real-time UI updates - Frontend types and API functions for downstream UI phases	2026-03-30 10:59:13 +03:00
alexei.dolgolyov	be6ad15efc	fix: comprehensive security, performance, and quality hardening Security: apply AdminOnly middleware to mutating routes, require ENCRYPTION_KEY and ADMIN_PASSWORD (no insecure defaults), restrict CORS to same-origin, fix OIDC token delivery via cookie instead of URL query param, add rate limiting on login, add MaxBytesReader, validate volume paths against traversal, add security headers, validate user roles, add Secure flag to OIDC cookie. Performance: set SQLite MaxOpenConns(1) to prevent SQLITE_BUSY, add FK indexes on 8 columns, track notifier goroutines with WaitGroup for graceful shutdown, use GetRegistryByName instead of GetAllRegistries in deployer, pass basePath param to avoid redundant settings query, return empty slices from store to remove reflection. Quality: refactor TriggerDeploy to delegate to runDeploy (~100 lines removed), consolidate duplicated utilities (extractPort, boolToInt, now, isTerminalStatus) into shared exports, migrate all log.Printf to slog structured logging, use consistent webhook response envelope, remove dead code (parseEnvVars, duplicate auth types). UX: clean up NPM proxy on instance removal via API, add README with quickstart guide, add .env.example, require ADMIN_PASSWORD in docker-compose, document staging-net prerequisite.	2026-03-29 12:49:24 +03:00
alexei.dolgolyov	32de5b26a8	feat(docker-watcher): phase 12 - hardening Blue-green zero-downtime deploys, promote flow validation. Dual auth: local (bcrypt + JWT) and OAuth2/OIDC (any provider). Auth middleware, login page, auth settings UI. Structured logging (slog JSON), config export to YAML. Graceful shutdown with deploy draining. Multi-stage Dockerfile and production docker-compose.yml. Swap phase order: Volumes & Env before UI Polish.	2026-03-27 23:20:56 +03:00
alexei.dolgolyov	5558396bb7	feat(docker-watcher): phase 11 - frontend embed & SSE Embed SvelteKit static build in Go binary via go:embed. Event bus for pub/sub with deploy log, instance status, and deploy status events. SSE endpoints for real-time streaming. Frontend SSE client with exponential backoff reconnection. Makefile for build pipeline. Update Phase 12 auth plan with OAuth2/OIDC support.	2026-03-27 22:30:25 +03:00
alexei.dolgolyov	757c72eea1	fix(docker-watcher): phase 8 security fixes Remove webhook secret from logs and API response. Add auth-pending note to router. Fix decrypt fallback that would use ciphertext as auth token on decrypt failure.	2026-03-27 22:10:00 +03:00
alexei.dolgolyov	97d4243cfe	feat(docker-watcher): phase 8 - REST API layer All REST endpoints wired with chi router: projects, stages, instances, deploys, registries, settings, quick deploy, webhook. Full main.go wiring with graceful shutdown. Consistent JSON envelope responses. Sensitive fields stripped from API responses.	2026-03-27 22:06:57 +03:00
alexei.dolgolyov	cdf21682d6	feat(docker-watcher): phase 2 - crypto & config seed loader AES-256-GCM encryption for credential storage, YAML seed config parser with validation, and transactional import into SQLite. Credentials (registry tokens, NPM password) encrypted before storage.	2026-03-27 21:01:16 +03:00
alexei.dolgolyov	d63c831d15	feat(docker-watcher): phase 1 - project scaffold & SQLite store Initialize Go module, directory structure, and full SQLite store layer: - 7-table schema (projects, stages, registries, settings, instances, deploys, deploy_logs) with auto-migration - CRUD operations for all entities with proper error handling - ErrNotFound sentinel for distinguishing 404 from 500 in handlers - WAL mode, foreign keys, busy timeout pragmas	2026-03-27 20:52:29 +03:00

40 Commits