Commit Graph

96 Commits

Author SHA1 Message Date
alexei.dolgolyov 2aff22f565 feat(triggers): first-class triggers + bindings with fan-out webhook
Build / build (push) Successful in 10m39s
Promote triggers from embedded workload fields to standalone records
joined to workloads via workload_trigger_bindings. One trigger (webhook,
registry watcher, git push, manual) now fans out to many workloads with
per-binding config overrides (top-level JSON merge, binding wins).

Backend
- new triggers + workload_trigger_bindings tables with ON DELETE CASCADE
- boot-time backfill of embedded trigger config inside per-workload tx
- store.ErrUnique sentinel translates SQLite UNIQUE at store boundary
- /api/triggers CRUD + /api/triggers/{id}/{webhook,bindings}
- /api/bindings/{id} update/delete; /api/workloads/{id}/triggers list+bind
- bindTriggerToWorkload accepts trigger_id or inline {kind,name,config}
- inline-create uses CreateTriggerWithBindingTx (no orphan triggers)
- validateBindingConfig enforces 8 KiB cap + plugin Validate on merged
- ListTriggersWithBindingCount + ListBindings*WithNames remove N+1
- POST /api/webhook/triggers/{secret} resolves trigger then fans out
- bounded worker pool (4) per request; per-binding error isolation
- outcome accounting: deployed / skipped / no-match / errored
- legacy /api/webhook/workloads/{secret} route removed (clean break;
  backfill keeps secrets resolvable at the new /triggers/{secret} path)
- reconciler gate dropped from (Source && Trigger) to Source only
- MergeJSONConfig returns freshly allocated slices (no fan-out aliasing)
- WithEffectiveTrigger lets existing Trigger.Match contract stay unchanged

Frontend
- /triggers list, new wizard, [id] detail (bindings, webhook rotate)
- workload create wizard: NEW / PICK / SKIP trigger modes
- workload detail: bindings panel + Add-trigger modal (inline / pick)
- per-binding override editor with merged-preview + 8 KiB guard
- "OVERRIDES n FIELDS" row badge when binding_config is non-empty
- shared TriggerKindForm component (registry / git / manual + JSON)
- 3 raw <input type=checkbox> replaced with <ToggleSwitch>
- full EN + RU i18n: redeployTriggers.*, apps.detail.bindings.*,
  apps.new.triggers.*, nav.triggers; event-triggers nav disambiguated

Doc
- WORKLOAD_REFACTOR_TODO: trigger-split marked DONE; next focus is
  the static-source inline port + hard legacy cutover (Priority 1)
2026-05-16 02:24:31 +03:00
alexei.dolgolyov 7a9ff7ad54 feat(observability): event triggers + log scanner backend
Two paired backends sharing the events.Bus seam:

Event triggers (consumer-side):
- internal/store/event_triggers.go — CRUD with action_secret
  redaction on read (placeholder echo treated as "no change" on
  PATCH so secrets aren't accidentally wiped).
- internal/events/dispatcher.go — bus subscriber, AND-composed
  filters (severity CSV, source CSV, message regex with memoized
  compile cache). Structural loop-prevention: never writes to
  event_log. Sends via notifier.SendPayload.
- internal/notify: SendPayload + SendSyncForTestPayload methods,
  TierEventTrigger constant, doSendRaw shared with the legacy
  Event-shaped path.
- internal/api/event_triggers.go — admin-gated CRUD + /test
  sending the real TriggerWebhookPayload shape. SSRF guard
  rejects loopback / link-local / unspecified targets. PATCH
  uses pointer-typed DTO for partial updates.

Log scanner (producer-side):
- internal/logscanner/ — engine (per-rule cooldown +
  per-container token bucket, atomic drop counters), tail
  (multiplexed docker frame demuxer with TTY fallback + 16 MiB
  payload cap + 1 MiB reassembly cap + RFC3339Nano-validated
  timestamp strip + UTF-8-safe message truncation), manager
  (5s container polling, atomic.Pointer[Snapshot] hot-reload,
  HitEmitter writes event_log + publishes EventLog so the
  trigger dispatcher picks them up immediately).
- internal/docker/container.go — ContainerLogsOpts exposes
  stream selection for stderr-only / stdout-only rules.
- internal/store: log_scan_rules table + CRUD with
  EffectiveLogScanRules resolver (globals minus per-workload
  overrides plus workload-only additions). Transactional
  cascade-delete of overrides when a global rule is removed.
- internal/api/log_scan_rules.go — admin-gated CRUD + /test
  (sample_line → matched/captures) + /stats (drop counters +
  active tail count + last-snapshot compile errors) +
  GET /api/workloads/{id}/effective-rules.

cmd/server/main.go wires both subsystems next to the existing
RegisterPersistentLogger. Coverage spans engine cooldown / bucket
counter tests, snapshot effective-set semantics, manager compile-
error capture, dispatcher matching, store validation +
cascade-delete, API URL validator + secret redaction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 22:18:11 +03:00
alexei.dolgolyov 82d32181ba feat(webhook): vendor-specific event parsing (Gitea / GitHub / GitLab)
The /api/webhook/workloads/{secret} ingress now short-circuits on a
recognized X-*-Event header before falling back to the generic
simple-body parser. Vendor parsers populate fields the generic
parser cannot (image digest, GitEvent.Vendor, registry host).

internal/webhook/vendor_parsers.go covers:
- Gitea package events (X-Gitea-Event: package, container type)
- GitHub registry_package + package events (CONTAINER package_type)
- GitHub / Gitea push events with vendor stamping
- GitLab Push Hook + Tag Push Hook with path_with_namespace mapping

When a vendor parser claims a request (ok=true), it's authoritative
— a malformed Gitea package payload surfaces as an error rather
than silently re-parsing as generic. The generic {image} /
{ref + repository.full_name} fallback stays in place for legacy
CIs that send those shapes.

Coverage: internal/webhook/vendor_parsers_test.go +
inbound_event_test.go (round-trip through buildInboundEvent).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 22:17:53 +03:00
alexei.dolgolyov 8d6a527a2b refactor(workload): plugin architecture wave + apps UI + volume scopes
Completes the workload-first refactor's plugin layer:

- internal/workload/plugin/ — Source/Trigger plugin contract,
  registry, types (Workload, DeploymentIntent, InboundEvent,
  PublicFace). Self-registering init() pattern + blank-import
  in cmd/server/main.go.
- Source plugins: image (blue-green with multi-face proxy routing),
  compose, static. Trigger plugins: registry, git, manual.
- internal/deployer/dispatch.go — DispatchPlugin/Teardown/Reconcile
  seam routing the legacy deployer through plugins.
- internal/api/workload_*.go — REST surface: workloads, env,
  volumes, chain (parent/children), promote-from. hooks.go
  serves /api/hooks/kinds/{kind}/schema for the wizard.
- internal/store: workload_env (encrypt-at-rest secrets) and
  workload_volumes tables, keyed on workload_id.
- cmd/server/static_backend.go — phantom-row adapter delegating
  the static source plugin to the legacy staticsite.Manager
  (deleted at hard cutover once the static inline port lands).
- web/src/routes/apps/ — /apps list + /apps/new wizard +
  /apps/[id] detail with kind-aware compose / image / static
  forms (Advanced JSON toggle), env panel, volumes panel,
  webhook panel, chain panel, manual deploy.

Volume scope generalization (v2 resolver):

- internal/volume.ResolveWorkloadPath (workload-keyed, sits
  next to legacy ResolvePath). Honors all VolumeScope values:
  absolute, ephemeral, instance, stage, project, project_named,
  named. internal/workload/plugin/source/image/image.go
  computeMounts wires settings + imageTag through. Coverage in
  internal/volume/resolver_test.go (portable Linux/Windows via
  t.TempDir).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 22:17:41 +03:00
alexei.dolgolyov cba2149aa9 refactor(workload): finalize containers index + post-review hardening
Wraps up the workload refactor with the fixes that came out of the multi-agent
code review (see docs/plans/workload-refactor.md "What actually shipped").

Backend:
- store.ReconcileContainer: separate write path so the 30s reconciler tick no
  longer overwrites deployer-owned fields (subdomain, proxy_route_id,
  npm_proxy_id, image_tag).
- Container.stage_id column + index; ListProxyRoutes / ListContainersByStageID
  join via stage_id (survives stage rename), with legacy fallback to
  (project_id, role=stage_name).
- Reconciler: workload-existence check (rejects forged tinyforge.workload.id
  labels), skips inventing project-kind rows, child-context cancel before
  wg.Wait() on shutdown.
- Transactional CRUD across projects / stacks / static_sites: parent UPDATE
  and workload sync land in one transaction so secret rotations are durable.
- Webhook routing reads exclusively through workloads.webhook_secret; legacy
  GetProjectByWebhookSecret / GetStaticSiteByWebhookSecret fallback removed.
- store.GetStackByComposeProjectName + indexed lookup (no more full-table
  stack scan per compose container per tick).
- store.ListMissingSweepRows: filtered query for the missing-sweep.
- /api/instances/* handlers verify (workload_id, role) match URL
  (project_id, stage_name) before mutating — closes the cross-project
  hijack the security review flagged.
- extra_json no longer referenced from Go (column kept on disk for now).

Frontend:
- WorkloadContainers.svelte: generic detail-page panel reusable by stack and
  site detail pages.
- Containers page polish: client-side kind/state filters over an unfiltered
  fetch, URL-synced filters, race-safe loads via sequence number, EN+RU i18n,
  sidebar counter via navCounts.containers.

Misc:
- scripts/dev-server.sh: tolerate empty netstat grep result.
- .gitignore: ignore docker-watcher binaries, .claude/worktrees/, .facts-sync.json.
2026-05-09 15:44:41 +03:00
alexei.dolgolyov d8ab22876f refactor(workload): extract Instance entirely; Container is canonical
Build / build (push) Successful in 10m41s
End-to-end extraction of the Instance concept. After this commit:

  * internal/store/instances.go — DELETED
  * internal/store/models.go — Instance struct gone, ProxyRoute moved here
  * containers table is the single source of truth for project/stack/site
    container state. instances table is dropped via DROP TABLE migration
    (idempotent; re-runnable on every boot).
  * Legacy tinyforge.project / tinyforge.stage / tinyforge.instance-id
    Docker labels are no longer emitted; only tinyforge.workload.{id,kind},
    tinyforge.role, and tinyforge.managed are stamped on new containers.

Backend rewrites:
  - internal/deployer:        executeDeploy + blueGreenDeploy + rollback +
                              promote use store.Container natively. New
                              removeContainer() replaces removeInstance().
                              enforceMaxInstances reads via
                              ListContainersByStageID.
  - internal/reconciler:      legacy tinyforge.instance-id dispatch removed;
                              upsertByWorkloadLabel now finds existing rows
                              by docker container ID first and falls back to
                              the deterministic workloadID:role key.
  - internal/stale/scanner:   Scan + new FindStaleContainers walk the
                              containers table; emit StaleContainer JSON.
  - internal/stats/collector: ListContainers replaces ListAllInstances.
  - internal/webhook/handler: workload-secret lookup tried first; falls back
                              to project / static_site secret column.
  - internal/api: instances.go, stale.go, stats.go, stats_history.go,
                  projects.go, settings.go, docker.go, dns.go all read /
                  write through Container.

Docker layer:
  - ManagedContainer exposes WorkloadID/Kind/Role from the canonical labels.
  - ListContainers filters by tinyforge.managed=true.
  - Network creation uses LabelManaged instead of LabelProject.

Frontend:
  - Instance type is now a Container alias; .status → .state,
    .last_alive_at → .last_seen_at.
  - InstanceCard takes stageId as a prop (no longer derived from Instance).
  - StaleContainer JSON shape rewritten: { container, workload_name, role,
    days_stale }. StaleContainerCard + /containers/stale page updated.
  - ProjectCard / homepage / SystemHealthCard filter by .state.

The migration loop now tolerates "no such table" alongside "duplicate
column" / "already exists" so obsolete ALTER TABLE entries targeting the
dropped instances table no-op cleanly on first boot.

Tests: store + deployer + reconciler + webhook + staticsite + notify all
still pass. Frontend svelte-check: zero errors.
2026-05-09 14:43:12 +03:00
alexei.dolgolyov d516462750 feat(workload): switch ListProxyRoutes to containers index
The Proxies page consumer (and the secondary callers in
internal/api/health.go and internal/api/settings.go) now read
from the normalized containers index instead of the instances
table. Stage ID is recovered through a (project_id, role=stage_name)
join — uniquely-indexed via the existing UNIQUE(project_id, name)
constraint on stages.

Source field stays "instance" for back-compat with the Proxies
page filter (the frontend keys off the literal string).

Three new tests pin the join shape, verify the npm_proxy_id-only
WHERE branch survives, and check that an orphan-role row falls
out of the join cleanly (catches a regression to LEFT JOIN).
2026-05-09 14:05:19 +03:00
alexei.dolgolyov 0acbcda084 feat(workload): /api/workloads /api/containers /api/apps endpoints
Adds the read API surface that the global Containers view (and
the per-workload container panel on project/stack/site detail
pages) consume.

- GET /api/workloads (?kind=)              → workload list
- GET /api/workloads/{id}                  → single workload
- GET /api/workloads/{id}/containers       → workload's containers
- PATCH /api/workloads/{id}/app            → assign/clear app_id (admin)

- GET /api/containers (?workload_id=&kind=&state=&app_id=)
                                           → global index, decorated
                                             with workload + app name
                                             so the table renders
                                             without N+1 fetches
- GET /api/containers/{id}                 → single container row

- GET  /api/apps                           → list
- GET  /api/apps/{id}                      → single
- POST /api/apps                           → create   (admin)
- PUT  /api/apps/{id}                      → update   (admin)
- DELETE /api/apps/{id}                    → delete   (admin) — clears
                                             app_id on owning workloads
                                             but leaves them assigned-to-none

Mutations on projects/stacks/sites still go through the existing
kind-specific endpoints; the new surface is read-only at the
workload layer.
2026-05-09 13:52:31 +03:00
alexei.dolgolyov 7f2d1bdae1 feat(workload): switch buildActiveImagesSet to containers index
First consumer migration off the instances table. The image
prune logic now walks the normalized containers.image_ref
column directly — one DB pass against a single table instead
of joining instances against projects to reconstruct the full
"image:tag" string. Demonstrates the consumer-switch pattern
the remaining read sites (proxies, stale scanner, webhook
matcher) will follow.

The legacy `projects []store.Project` parameter is kept on the
function signature for now so call sites don't change in this
commit; the underscore-discard in the body makes it explicit
that it's no longer load-bearing.
2026-05-09 13:47:20 +03:00
alexei.dolgolyov af82be3fb8 feat(workload): container index reconciler
Background worker that keeps the containers table in sync with
docker ps. Runs one boot pass and ticks every 30s.

Dispatch precedence per container:
  1. tinyforge.workload.id label   (canonical, new)
  2. tinyforge.instance-id label   (legacy project — joins via instances)
  3. tinyforge.static-site label   (legacy site)
  4. com.docker.compose.project    (stacks — joins via ComposeProjectName)

Rows whose Docker container ID is no longer present are flipped
to state='missing'. Placeholder rows (empty container_id, e.g.
a deploy mid-flight) are left alone so a tick that races a
deploy doesn't mark them as missing.

DockerLister interface lets tests substitute a fake daemon —
6 unit tests cover the dispatch matrix, missing-sweep, and
state normalization.

Wired into cmd/server/main.go between docker.New and the
existing startup chain. Boot pass populates the containers
table from any pre-refactor running containers.
2026-05-09 13:45:13 +03:00
alexei.dolgolyov b6f20599d7 feat(workload): wire stack + static-site into containers index
Stack manager now upserts a Container row per compose service
after every deploy (deterministic ID = workloadID + service so
re-deploys update in place). Stop/Start bulk-flip the state
field. Compose containers don't yet carry the new tinyforge.*
labels — the reconciler will join via com.docker.compose.project
when it lands.

Static site manager passes WorkloadID/Kind to ContainerConfig
so the new labels are stamped, and upserts a single Container
row per site (deterministic ID = workloadID + ":site"). Stop/
Start flip state. Delete cascades through the store layer.

Now every Tinyforge-managed container — project, stack service,
or static site — has a row in the containers index, ready for
the reconciler + global view in the next batches.
2026-05-09 13:41:03 +03:00
alexei.dolgolyov abb1da903f feat(workload): emit workload labels + dual-write containers from deployer
Project deploys (both standard and blue-green) now stamp the new
workload labels on every container and dual-write a row into the
containers index alongside the existing instances row. The legacy
project/stage/instance-id labels stay for now so operator runbooks
don't break — they will be removed after the migration soaks.

New labels:
- tinyforge.managed       (every Tinyforge container)
- tinyforge.workload.id   (workload row primary key)
- tinyforge.workload.kind ('project' | 'stack' | 'site')
- tinyforge.role          (stage name for projects)

ContainerConfig grows WorkloadID/WorkloadKind/Role fields. The
deployer resolves the project's workload row (guaranteed to exist
by boot-time backfill) and passes the IDs through. Container row
ID matches instance ID by construction so removeInstance can drop
both records together.

Stack and static-site managers still need the same treatment;
those land in the next commit.
2026-05-09 13:37:19 +03:00
alexei.dolgolyov db235c1412 feat(workload): write-through workload sync + boot-time backfill
CRUD on Project / Stack / StaticSite now keeps a paired Workload
row in sync. Secret setters (webhook secret, signing secret,
require-signature toggle, notification secret) all re-sync after
mutating the source-of-truth row so the workload row always
reflects the canonical state.

Delete cascades: DeleteProject/Stack/StaticSite now drop the
matching workload row plus any container index entries owned by
it, so global views don't show ghost rows.

Boot-time BackfillWorkloads scans every project/stack/site and
ensures each has a workload row. Idempotent — safe to run on
every restart, recovers from a deleted/missing workload row.

Behavior unchanged for existing call sites; the workloads table
just starts being populated. Deployer / reconciler / consumer
switchover land in the next commit.
2026-05-09 13:28:20 +03:00
alexei.dolgolyov f54a6ecee3 feat(workload): add Workload/Container/App store foundation
Introduces the data layer for the Workload refactor (see
docs/plans/workload-refactor.md): three new tables and store
methods, no behavior changes elsewhere yet.

- workloads: unifying primitive over Project/Stack/StaticSite,
  paired via UNIQUE(kind, ref_id). Notification + webhook config
  hosted here so it lives in one place across kinds.
- containers: normalized index of every Tinyforge-managed
  container with first-class subdomain/proxy_route_id/npm_proxy_id
  columns (heavily queried by ListProxyRoutes / stale detection).
- apps: optional grouping of workloads; schema only, no UI in v1.

Foundation only — deployer surgery, reconciler, and consumer
switchover land in the next commit.
2026-05-09 13:22:25 +03:00
alexei.dolgolyov 0f60a7a5db feat(webhook): inbound delivery audit log
Build / build (push) Successful in 10m35s
Persists every inbound webhook hit (project + site) so users can debug
"why didn't my deploy fire?" without grepping daemon logs. Surfaces a
14-day rolling history under the WebhookPanel on each project + site
detail page; refreshes every 30s while open. Daily cron prunes records
older than 14 days alongside the existing event log prune.

Schema:
- webhook_deliveries(id, target_type, target_id, target_name, received_at,
  source_ip, signature_state, status_code, outcome, detail, body_size)
- indexes on (target_type,target_id,received_at) and (received_at)

Backend:
- store: WebhookDelivery model + Insert/List/Prune helpers
- webhook/handler: deferred recordDelivery() captures the final outcome
  on every return path including HMAC rejects, image mismatch, no-stage,
  auto_deploy=false, and successful deploys; signatureStateFor()
  classifies "unconfigured" vs "missing" vs "invalid" vs "valid"
- api: GET /api/{projects,sites}/{id}/webhook/deliveries with
  parseLimit() helper (default 50, max 200)
- main: daily prune cron retains the last 14 days

Frontend:
- WebhookDeliveryLog.svelte: panel with refresh button, status code +
  outcome + signature badges, relative time tooltip-on-hover for
  absolute time, source IP column
- Mounted below WebhookPanel on project + site detail pages
- en/ru i18n strings for outcome/signature enums and column labels
2026-05-07 02:40:39 +03:00
alexei.dolgolyov 831b5c1a43 feat(webhook): HMAC-SHA256 signature verification on inbound webhooks
Adds an opt-in inbound HMAC scheme so a leaked URL alone is not enough
to forge deploy/sync requests — the caller must also know a separate
signing secret. Header format is X-Hub-Signature-256, matching the
Gitea/GitHub/GitLab convention so existing CI integrations work without
custom code.

Behaviour:
- per-project / per-site signing_secret is independent of the URL secret
- require_signature flag does a hard 401 on missing/invalid signatures
- even when require_signature is off, an *invalid* submitted signature
  returns 401 — surfaces CI misconfiguration instead of silently passing
- comparison uses subtle/hmac.Equal (constant time)

Backend:
- store: webhook_signing_secret + webhook_require_signature columns on
  projects + static_sites; scanProject helper, scan helpers updated; new
  Set* helpers for both fields
- webhook/handler: verifyHMAC helper, body read once, integrated into
  both project and site handlers
- api: per-entity signing-secret rotate / disable / require-toggle
  endpoints under /api/{projects,sites}/{id}/webhook/...

Frontend:
- WebhookPanel gains optional signing handlers (no breaking change for
  existing callers; signing UI hides when handlers aren't wired)
- one-shot reveal of the issued secret with copy + dismiss
- ToggleSwitch for require-signature, disabled until a secret is issued
- en/ru i18n strings

Tests:
- HMACRequiredAndValid (200 + deploy fires)
- HMACRequiredButMissing (401, no deploy)
- HMACPresentButWrong (401 even when require_signature=false)
- HMACOptionalUnsignedAccepted (200 when neither configured)
2026-05-07 02:34:40 +03:00
alexei.dolgolyov 8b886ddf2b feat(backup): take Tinyforge DB snapshot before every deploy
Adds an opt-in "auto_backup_before_deploy" setting that triggers a
"pre-deploy" backup at the start of every project deploy via the deploy
pipeline (covers both the async HTTP path and the sync poller/webhook
path). Failures are logged to the deploy log but do not abort — missing
a backup is preferable to refusing to ship a fix.

- store: settings.auto_backup_before_deploy column + scan/update wiring
- backup: accept "pre-deploy" as a valid backup_type
- deployer: small PreDeployBackuper interface, hooked into runDeploy
  right after settings load and before any state-mutating work
- api: settings request/response surface the new flag
- web: ToggleSwitch on the backup settings page; "Pre-deploy" badge
  variant in the backup list (badge-warning so it stands out)
- i18n: en/ru strings for the toggle, help text, and badge label
2026-05-07 02:14:26 +03:00
alexei.dolgolyov 0405ecd9ce feat(notify): HMAC-signed outgoing webhooks with per-tier secrets and test sender
Build / build (push) Successful in 10m36s
Outgoing notifications were bare POSTs with no auth and no way to verify
they came from Tinyforge. They also went out from one global URL only,
even though stages had a notification_url field, and static-site sync
emitted no events at all.

Schema: add notification_url + notification_secret (lazy-generated) to
settings, projects, stages and static_sites. Migrations are additive.

Notifier: SendSigned computes HMAC-SHA256 over the exact body bytes and
sends X-Hub-Signature-256 (GitHub-compatible — receivers built for
GitHub/Gitea/Forgejo verify out of the box). Aux headers
X-Tinyforge-Event/Delivery/Timestamp/Tier are advisory and not signed.
Empty secret => unsigned send for back-compat.

Resolution: deploys fall through stage > project > settings, sites fall
through site > settings. The secret travels with the URL that sourced
it, so any tier can sign even when its parents are unsigned. Site sync
events now actually emit (site_sync_success / site_sync_failure).

API: 12 new endpoints — {GET secret, POST regenerate, POST disable,
POST test} for each of the 4 tiers. SendSyncForTest returns
status_code/latency_ms/signature_sent/delivery_id/response_snippet so
the UI surfaces receiver feedback inline.

UI: shared OutgoingWebhookPanel.svelte fits the existing card aesthetic.
Signing-state pill, secret reveal-on-demand, regenerate/disable behind
ConfirmDialog modals (not inline strips — too easy to misclick), send-
test result card with colour-coded status. Wired into Settings →
Integrations, project edit form, per-stage edit, and per-site detail.
EN + RU i18n.

Tests: round-trip (sender signs, receiver verifies), tampered-body and
wrong-secret rejection, unsigned-send omits header, send-test surfaces
4xx, concurrent fan-out via Drain. Resolver precedence locked for both
deploy and site paths.

Docs: docs/webhooks.md with header reference, verifier snippets in
Node/Python/Go, and a recipe for the service-to-notification-bridge
generic webhook provider.
2026-05-07 02:03:32 +03:00
alexei.dolgolyov a4362b842d fix: harden security, fix concurrency bugs, and address review findings
Build / build (push) Successful in 11m42s
Security:
- rate limit /api/webhook routes per-IP and cap concurrent site syncs
- global SSE connection cap (256) with new sse_gate
- validate ?tail= and cap JSON log responses at 4 MiB
- strip ANSI/CSI/OSC and control bytes from streamed log lines
- redact webhook secret from request log middleware
- scrub host details from /api/health for non-admin viewers
- drop container_id from /api/system/stats/top for non-admins
- generate webhook secrets via crypto/rand; require >=32 chars on insert
- verify iid path consistency in streamContainerLogs
- LimitReader on site webhook body; reject malformed non-empty bodies

Concurrency / correctness:
- stats collector: Stop() no longer hangs without Start(), semaphore
  acquired in parent loop so ctx cancellation short-circuits the queue,
  in-flight tick cancellable via shared base context, zero-ts guard
- webhook handler: replace fire-and-forget goroutine with WaitGroup-tracked
  workers + Drain() wired into graceful shutdown
- $derived(() => ...) mis-idiom fixed in ContainerStats / InstanceCard /
  ProjectCard (returned function instead of value)
- SystemResourcesCard: rename `window` and `t` locals to avoid shadowing
  globalThis.window and the i18n `t` import

Quality / performance:
- replace O(n^2) insertion sort with sort.Slice in stats top
- runMigrations only swallows duplicate-column / already-exists errors
- PruneStatsSamplesBefore wrapped in a transaction
- collapse N+1 in unusedImageStats / pruneImages to one ListAllInstances
  pass; surface DB errors instead of silently treating them as inactive
- run Docker Info + DiskUsage in parallel via errgroup
- container log SSE emits `: ping` heartbeat every 20 s
- imageMatches case-insensitive on registry host (RFC behaviour)
- log warning on invalid stage tag pattern instead of silent skip
- reject malformed non-empty site webhook payloads

Frontend / i18n:
- shared formatBytes utility replaces three local copies
- statsInterval store drives dynamic "no samples / collection disabled"
  copy across ContainerStats and SystemResourcesCard
- top consumers row now shows owner_name (project/stage or site name)
- drop seven `as any` casts on the Settings type; add cloudflare_api_token
  write-only field
- move "Service status", "Docker daemon", "Docker unreachable",
  "Proxy unreachable", "reachable", and "Docker daemon is not reachable."
  strings into en/ru i18n bundles
2026-05-07 00:56:14 +03:00
alexei.dolgolyov 05440a5f92 feat(stats): resource metrics dashboard + sites logs/stats
Build / build (push) Successful in 10m50s
Background collector samples CPU/memory/network/block I/O for every
instance and site on a configurable interval (default 15s, range
5-300s), persists samples to SQLite with a configurable retention
window (default 2h, range 0-24h), and skips ticks gracefully when
the Docker daemon is unreachable. Settings are reloadable without
a restart — each tick re-reads them.

New API endpoints:
- GET /api/system/stats (host snapshot: info + df)
- GET /api/system/stats/history
- GET /api/system/stats/top?by=cpu|memory
- GET /api/projects/{id}/stages/{s}/instances/{iid}/stats/history
- GET /api/sites/{id}/stats[/history]
- GET /api/sites/{id}/logs (SSE + JSON, reuses instance log streamer)

Frontend:
- ECharts added with tree-shaken imports (~180KB gzip) for
  future-proof time-series/gantt/graph visualizations
- CollapsibleSection wraps all dashboard sections (system health,
  daemons, system resources, static sites, projects) with
  localStorage-persisted open state
- SystemResourcesCard shows capacity tiles, workload utilization
  chart with 30m/2h/6h/24h window picker, disk breakdown with
  reclaimable callouts, and top 5 consumers
- ContainerStats and ContainerLogs take a source discriminated union
  so sites reuse the same components as instances; sites detail page
  embeds both for Deno backend debugging
- Settings › Maintenance exposes collection interval + retention
- Docker-unavailable state returns 503 and renders an amber banner
  instead of a generic 500

Full i18n coverage (en + ru) for all new strings.
2026-04-24 15:02:43 +03:00
alexei.dolgolyov 0632f512e6 feat(webhook): per-project and per-site webhook URLs
Build / build (push) Successful in 10m25s
Replace the single global webhook secret with entity-scoped secrets stored
on each project and static site. Webhook-driven project autocreate is
removed — projects must exist before their URL can trigger deploys.

Also wires static-site webhooks (sync_trigger=push|tag), turning the
previously inert "push" trigger into a functional one: POST the site's
webhook URL from a Git provider and Tinyforge re-syncs on matching refs.

- Adds webhook_secret columns + unique indexes to projects and static_sites
- Per-entity GET/regenerate endpoints under /api/projects/{id}/webhook
  and /api/sites/{id}/webhook (admin-only)
- Removes /api/settings/webhook-url and the global webhook panel
- Reusable WebhookPanel Svelte component on both detail pages, i18n in en/ru
- Tests for matcher (siteRefMatches, ParseImageRef) and handler (project
  match/mismatch/404 and site push/manual/branch-skip)
2026-04-23 15:18:19 +03:00
alexei.dolgolyov 90e6e59d9e feat: daemon health panel, brand-rail status chips, user timezone selector
Build / build (push) Successful in 10m35s
- Health API now surfaces Docker /info + /version (version, platform,
  kernel, container/image counts, storage driver, memory, latency) and
  NPM aggregates (proxy host total, managed-by-Tinyforge count, access
  lists, certificates, endpoint URL).
- Docker/NPM indicators moved out of the sidebar footer and into a
  compact mono-styled rail directly under the Tinyforge brand title,
  with pulse/fault animations and click-to-expand error hints.
- New SystemDaemonsCard on the dashboard: two terminal-styled panels
  (Docker Engine + Proxy) with a running/paused/stopped stacked bar,
  key-value diagnostics, and a total-vs-managed proportion meter on
  the proxy-hosts tile.
- Shared health store so the sidebar and dashboard share a single
  30 s poll instead of duplicating traffic.
- User-facing timezone preference with auto-detect fallback; all
  dates across projects, sites, stacks, settings, backup, event log
  and stale containers now render through \$fmt.date / \$fmt.datetime.
- en/ru translations for both features.
2026-04-23 14:32:30 +03:00
alexei.dolgolyov a182a93950 feat: nav counter badges, login backdrop, events i18n + misc fixes
Build / build (push) Successful in 10m29s
Nav & UI polish
- Sidebar nav items show monospace count badges (projects, sites, stacks,
  proxies). Events badge shows error count only, styled red as actionable
- New $lib/stores/navCounts.ts polls all counts in parallel every 60s and
  refreshes on route change so badges track mutations
- Login page gets a dynamic forge backdrop: rotating conic glow, drifting
  embers, dot-grid texture, vignette — all pure CSS, reduced-motion safe
- main element gets scrollbar-gutter: stable so Settings tab switching no
  longer shifts horizontally when content heights differ

Events i18n
- events.source.* dictionary rewritten to match actually-emitted backend
  sources (deploy, static_site, stale_scanner, stale_cleanup, admin);
  dead keys (container, proxy, system) removed
- EventLogFilter.allSources + /events default sources state updated to match
- Localize "{N} total" via events.totalCount in the page hero toolbar

Backend
- Stage API accepts enable_proxy on create/update (defaults to true) so
  proxy registration can be opted out per stage

Concurrency
- api.ts: queued request waiters no longer double-increment the inflight
  counter; releasing a slot hands it off directly

Reactive effects
- project detail / env / volumes pages wrap side-effect calls in untrack()
  to prevent $effect feedback loops when their loaders mutate tracked state
2026-04-22 18:30:34 +03:00
alexei.dolgolyov ef0669d5dd feat: unified THE FORGE // SECTION headers and merged proxy routes
Build / build (push) Successful in 10m37s
UI consistency
- ForgeHero now supports backHref, mono kicker, stats snippet, staggered
  entrance animation, and a registration-tick divider
- Every route now opens with the same "THE FORGE // SECTION" eyebrow: projects,
  sites, stacks, proxies, events, dns, deploy, settings, stale containers,
  site/project detail + env/volumes/browse, new site wizard
- Stacks list/detail/new moved to the shared hero and brand-anchor eyebrow
- Toolbars migrated from bespoke buttons to the shared .forge-btn utilities
- Sidebar footline adds a live UTC "forge clock" and a vim-style g-prefix
  quick-nav hint (g d/p/s/k/x/r/e/c jumps to each section)

Proxies page
- Server-side: merge static site proxy routes with instance routes and sort
  by domain (internal/api/proxies.go, internal/store/static_sites.go)
- ProxyRoute gains a Source field ("instance" | "static_site")
- Frontend adds source filter tabs and per-source labels/badges
2026-04-22 16:27:55 +03:00
alexei.dolgolyov 75424a5f25 feat: docker-compose stacks with Forge-themed UI
Build / build (push) Successful in 10m42s
Adds a new Stacks feature: upload/edit docker-compose YAML,
deploy as atomic units, browse revisions, roll back, and
stream logs. Backend in internal/stack + internal/api/stacks.go,
persistent storage in internal/store/stacks.go.

Stacks pages (list, new, detail) use a modern Forge aesthetic —
Instrument Serif display type, JetBrains Mono for meta/code,
indigo ember accents, dot-grid hero, registration marks on
hover, terminal panel for logs. Palette is sourced from the
app's existing design tokens so the feature remains consistent
with the rest of Tinyforge.

Fonts self-hosted via @fontsource/instrument-serif and
@fontsource/jetbrains-mono to satisfy the strict CSP.
2026-04-16 03:48:37 +03:00
alexei.dolgolyov b622384774 feat: persistent storage for Deno static sites
Build / build (push) Successful in 10m21s
- Add storage_enabled and storage_limit_mb columns to static_sites.
- Create/attach Docker volumes (tinyforge-site-{name}-data) for Deno
  sites with storage enabled, mounted at /app/data.
- Grant --allow-write=/app/data in Deno container CMD.
- Add storage usage API endpoint (GET /api/sites/{id}/storage).
- Show storage section in site detail page with usage bar.
- Add storage toggle and limit field to new site wizard.
- Use ConfirmDialog for secret deletion instead of inline delete.
2026-04-13 00:12:51 +03:00
alexei.dolgolyov 96fd910603 fix: resolve ERR_INSUFFICIENT_RESOURCES connection exhaustion
- Add concurrency limiter (max 4 GET requests) to API layer, leaving
  slots for SSE and health checks. Write ops bypass the limiter.
- Add AbortController to ContainerStats, project detail page, and
  dashboard to cancel in-flight requests on navigation/unmount.
- Move global SSE connection from layout to events page (only consumer).
- Add 30s heartbeat to SSE endpoint to detect zombie connections.
- Serialize dashboard project fetches to avoid parallel burst.
- Rebuild frontend in dev-server.sh so go:embed stays in sync.
2026-04-13 00:12:14 +03:00
alexei.dolgolyov 791cd4d6af feat: rename Docker Watcher to Tinyforge
Build / build (push) Successful in 12m20s
Rebrand the project as Tinyforge to reflect its evolution from a Docker
container watcher into a self-hosted mini CI/deployment platform.

Rename covers: Go module path, Docker labels, DB/config filenames,
JWT issuer, Dockerfile binary, docker-compose, CI workflows, frontend
i18n, README with static sites docs, and all code comments.
2026-04-12 21:30:39 +03:00
alexei.dolgolyov 8d2c5a063b feat: static sites feature with Gitea/GitHub/GitLab support and Deno backend
Deploy static content from Git repository folders with optional server-side
API endpoints. Supports Gitea/Forgejo/Gogs, GitHub, and GitLab with provider
autodetection.

- New Sites entity with CRUD, encrypted secrets, and manual/push/tag sync triggers
- Pluggable GitProvider interface with three implementations
- Deno container mode: auto-generates router from API_{method}_{name} exports
- Static container mode: nginx serving files with optional markdown rendering
- Wizard UI with provider selector, repo picker, branch/folder tree pickers
- Deploy pipeline builds fresh image, starts container, configures NPM proxy
- Stop/Start buttons, force redeploy on manual trigger
- Periodic health checker detects crashed containers
- Proxy route existence check during auto-sync
2026-04-11 03:35:57 +03:00
alexei.dolgolyov b0816502bf feat: configurable unused images threshold with dashboard warning
- Add image_prune_threshold_mb setting (default 1024 MB)
- Add GET /api/docker/unused-images endpoint returning unused image count, size, and threshold status
- Dashboard shows amber warning banner when unused project images exceed threshold
- Banner links to settings page for pruning, shows count and human-readable size (MB/GB)
- Threshold configurable in Docker Image Cleanup section of settings
- DB migration + schema for image_prune_threshold_mb
2026-04-05 14:34:48 +03:00
alexei.dolgolyov 21ffef2ee2 feat: separate Public IP for DNS records from Server IP, improve settings help texts
- Add public_ip field to Settings for DNS A records (proxy/load balancer IP)
- DNS records now use public_ip, falling back to server_ip if empty
- Server IP renamed to "Server IP (Docker Host)" for clarity
- Public IP labeled "Public IP (DNS Target)"
- Updated help texts for domain, server IP, public IP, and Docker network
- DB migration + schema for public_ip column
2026-04-05 14:12:53 +03:00
alexei.dolgolyov d03cc3c811 feat: container logs viewer with SSE streaming and line limiter
- Add GET /api/projects/{id}/stages/{stage}/instances/{iid}/logs endpoint
- Supports JSON mode (returns array of lines) and SSE mode (streams in real-time)
- Docker log stream header (8-byte prefix) stripped automatically
- ContainerLogs component with:
  - Tail line selector (50/200/500/1000)
  - Follow button for real-time streaming via SSE
  - Auto-scroll to bottom
  - Dark terminal-style display
  - Close button
- Logs button (events icon) on each instance card
- i18n keys in EN and RU
2026-04-05 14:04:45 +03:00
alexei.dolgolyov ac3132d172 feat: show local Docker images on project detail page
- Add GET /api/projects/{id}/images endpoint returning local images matching the project
- Add ListImagesByRef with tag, size, and created timestamp to Docker client
- Display images table on project page with tag, ID (truncated), size (MB), and created date
- Only shown when Docker is available and images exist locally
2026-04-05 13:56:55 +03:00
alexei.dolgolyov 5577851f22 feat: project-scoped Docker image prune, conflict fix, deploy toggle, access list picker
- Image prune only removes images matching project image refs, skips active instances
- Add ListImagesByRef and RemoveImage to Docker client
- Fix 409 conflict: use listProjects instead of duplicate POST
- Add "Deploy immediately" toggle to Quick Deploy (off by default)
- Replace raw access list ID with EntityPicker on project edit form
- Trigger proxy resync on access list change
- Fix stage form layout: single responsive row
- Fix empty port default on project creation
- Improve inspect error message for remote Docker
2026-04-05 13:49:20 +03:00
alexei.dolgolyov a830378c5b fix: replace access list ID field with EntityPicker, add deploy toggle, improve UX
- Replace raw NPM access list ID input with EntityPicker on project edit form
- Resolve access list name from NPM API when editing project
- Add "Deploy immediately" toggle to Quick Deploy (off by default)
- Fix stage form layout: all fields on same row with toggles
- Fix empty port default on project creation (placeholder instead of pre-filled)
- Improve inspect error message when Docker is unavailable
- Trigger proxy resync when NPM access list changes
- Resolve access list name on NPM settings page load
2026-04-05 13:07:09 +03:00
alexei.dolgolyov 7550fe9e32 feat: CPU/RAM limits per stage, NPM access list (global + per-project)
Resource limits:
- Add cpu_limit (cores) and memory_limit (MB) fields to Stage model
- Pass limits to Docker container via NanoCPUs and Memory in HostConfig
- Add CPU/Memory fields to stage creation form in project detail
- 0 = unlimited (default)

NPM access list:
- Add npm_access_list_id to Settings (global default) and Project (per-project override)
- Per-project overrides global when > 0
- NPM provider passes access_list_id when configuring proxy hosts
- Add GET /api/settings/npm-access-lists endpoint to list NPM access lists
- Add access list picker on NPM settings page (global)
- Add access list ID field on project edit form (per-project)
- DB migrations for all new columns
2026-04-05 12:44:26 +03:00
alexei.dolgolyov c6d20ca26e feat: NPM access list support (global default + per-project override) 2026-04-05 12:38:20 +03:00
alexei.dolgolyov 4ff8daafc4 fix: reconcile instance status with Docker on list, add IsContainerRunning 2026-04-05 02:42:31 +03:00
alexei.dolgolyov 12d78bec99 fix: instance link includes domain, project delete cleans up containers and proxies
- InstanceCard appends settings domain to subdomain link (stage-dev-app.example.com instead of just stage-dev-app)
- Project deletion now removes Docker containers and proxy routes before deleting DB records
- Pass domain from settings to InstanceCard via project detail page
2026-04-05 02:38:32 +03:00
alexei.dolgolyov b54481aff8 fix: NPM remote toggle auto-save, proxy resync on remote change, webhook URL as path
- Remote NPM toggle now auto-saves immediately when toggled
- Toggling npm_remote triggers proxy resync (re-creates routes with server_ip or container name)
- Webhook URL shows just the path (/api/webhook/{secret}) instead of full URL with wrong domain
- Fix tag dropdown: resolve registry ID from name before fetching tags
- Remove unused fmt import
2026-04-05 02:27:41 +03:00
alexei.dolgolyov 195ef3e7e5 feat: NPM remote mode for cross-machine deployments
- Add npm_remote setting: when enabled, proxy forwards to server_ip with
  published host ports instead of Docker container names
- Deployer looks up assigned host port via InspectContainerPort in remote mode
- Auto-remove stale containers with same name before creating new ones
- Add Remote NPM toggle with warning on NPM settings page
- DB migration + schema for npm_remote column
2026-04-05 02:18:06 +03:00
alexei.dolgolyov f71f2275a2 fix: per-event delete button, Docker network default, polling interval duration parsing
- Add per-event delete button (trash icon on hover) in event log entries
- Set Docker network default to 'docker-watcher' in DB schema + migration for existing DBs
- Parse Go duration strings (5m, 1h) to seconds in settings UI, convert back on save
- Clear error when network is empty in deployer instead of hidden fallback
2026-04-05 02:02:03 +03:00
alexei.dolgolyov c26c41e6a1 feat: enable proxy toggle on quick deploy, event log clearing, and UX fixes
- Add enable_proxy toggle to Quick Deploy form (defaults to on)
- Add DELETE /api/events/log/{id} and DELETE /api/events/log endpoints
- Add Clear All button with confirmation on Events page
- Rename "NPM Proxy" to "Enable Proxy" on stage form (provider-agnostic)
- Fix polling interval validation (min 60s) and number input trim errors
- Fix domain field no longer required in settings
2026-04-05 01:50:19 +03:00
alexei.dolgolyov 61febefca9 feat: automatic proxy re-sync on settings change
When domain, SSL certificate, or proxy provider changes in settings:
- Delete old proxy routes from the previous provider
- Switch to None: clear all route IDs on instances
- Switch to NPM/Traefik: re-create routes with new settings
- Domain change: re-configure all routes with new FQDN
- SSL cert change: re-apply to all existing routes
- Provider created dynamically at runtime via createProxyProvider()
- Deployer and API server updated via SetProxyProvider callback
2026-04-05 01:39:01 +03:00
alexei.dolgolyov 187e302f4a feat: proxy routes page, OIDC login fix, NPM test connection, webhook URL fix, and UX improvements
- Add /proxies page showing deploy-managed proxy routes with project/stage links, search, and status
- Add GET /api/proxies endpoint joining instances with project/stage names
- Add POST /api/settings/npm/test endpoint for NPM connection validation
- Add GET /api/auth/mode public endpoint for auth mode detection
- Add NPM Test Connection button with validation on save
- Fix OIDC SSO button only shown when auth_mode is oidc
- Fix webhook URL showing empty when domain not set (fallback to request host)
- Fix quick deploy double-tag (image:latest:latest) by splitting tag from image URL
- Fix trim() errors on number inputs in deploy and settings forms
- Fix NPM client auto-append /api to base URL
- Sanitize NPM test error messages (no raw HTML)
- Remove healthcheck field from Quick Deploy form
- Fix env vars placeholder newline
- Make domain field optional in settings
- Set polling interval minimum to 60s
- Add Proxies and Events to sidebar navigation
- Fix SSL cert name flash on NPM settings page
- Fix empty state icon on proxies page
2026-04-05 01:27:54 +03:00
alexei.dolgolyov 308547a3d7 refactor: remove standalone proxies, add Traefik provider with Docker labels
Standalone proxy removal:
- Delete store, API handlers, proxy manager, health monitor, validator, hints
- Delete frontend pages (proxies list, create, edit) and components (ProxyCard, ProxyForm, ProxyFilter, ProxyGroup, ValidationChecklist)
- Remove proxy routes from router, nav items, dashboard references
- Clean up SystemHealthCard to remove proxy section

Traefik provider:
- Add TraefikProvider implementing proxy.Provider via Docker labels
- ContainerLabels() returns traefik.enable, router rule, entrypoints, service port, TLS cert resolver, docker network
- ConfigureRoute() returns router name (labels handle routing at container creation)
- DeleteRoute() is no-op (container removal auto-deregisters)
- Ping() checks Traefik API health (optional)
- Wire ContainerLabels into deployer (executeDeploy + blueGreenDeploy)
- Add Traefik settings: entrypoint, cert_resolver, network, api_url
- Add traefik option to proxy provider selector in settings UI
- Show conditional Traefik config fields
- Add i18n keys (EN + RU)
2026-04-04 22:54:31 +03:00
alexei.dolgolyov 7d6719da12 refactor: extract ProxyProvider interface with None and NPM implementations
Replace direct npm.Client usage throughout the codebase with the
proxy.Provider interface, enabling pluggable proxy backends. The
deployer, API layer, and proxy manager now use provider-agnostic
route management (ConfigureRoute/DeleteRoute) instead of NPM-specific
API calls. Adds ProxyRouteID (string) to Instance model and
ProxyProvider setting to Settings, with SQLite migrations for
backward compatibility.
2026-04-04 19:39:08 +03:00
alexei.dolgolyov 6667abf03c fix: quick deploy duplicate detection, logout UX, backup toggle, CSP, SSE guard, and migration
- Detect existing projects with same image on quick deploy; show conflict dialog with options
- Move logout button to sidebar header as icon-only
- Replace backup checkbox with ToggleSwitch component
- Allow unsafe-inline in CSP script-src for SvelteKit hydration
- Guard SSE connection behind isAuthenticated() check
- Add notification_url ALTER TABLE migration for existing databases
- Restore RegisterPersistentLogger on event bus
2026-04-04 14:40:59 +03:00
alexei.dolgolyov 205a5a36c6 test: add core test suite for crypto, auth, and store packages
- 8 crypto tests: key derivation, encrypt/decrypt round-trip, wrong key, nonce uniqueness
- 6 auth tests: password hash/verify, JWT generate/validate, token revocation
- 14 store tests: project CRUD, user CRUD, stage/deploy lifecycle, pagination, cascade deletes
- Fix stages CREATE TABLE schema to include notification_url column
- Total: 28 tests, all passing
2026-04-04 14:13:05 +03:00
alexei.dolgolyov 91b49cb5ed feat: expanded health checks, deploy filtering, per-project notifications, error sanitization, and audit trail
- Expand health endpoint to check DB, Docker, and NPM connectivity (FUNC-M4)
- Add project_id, stage_id, offset query params to deploys endpoint (FUNC-M5, FUNC-M6)
- Add notification_url field to Stage model for per-project overrides (FUNC-M2)
- Add NPM Ping method for health checking
- Sanitize all internal error messages in API handlers (SEC-M4)
- Add audit trail events for admin actions (FUNC-M3)
- Add EventLog event type to event bus
2026-04-04 13:10:10 +03:00