12 Commits

Author SHA1 Message Date
alexei.dolgolyov e3d140c57a feat(deployer): configurable per-workload deploy strategy (blue-green for built sources)
Add a deploy_strategy field to each source's config blob — "" (default),
"recreate", or "blue-green" — validated in each source's Validate and read on
the deploy path. No new DB column, no migration: the field rides inside the
existing SourceConfig JSON and every existing workload decodes "" to its
historical behavior (image -> blue-green, others -> recreate).

The real gap this closes: dockerfile and static stopped the old container
before creating the new one on every redeploy — a downtime window image never
had. Their blue-green branch now:
- names the new "green" container with a unique suffix so it coexists with the
  still-serving blue (plumbed into both the container name AND the proxy
  forwardHost);
- skips the collision teardown that destroyed blue early;
- gates green — an HTTP readiness probe (deps.Health.Check) when a healthcheck
  is configured, else the existing liveness window;
- swaps the route via a pure upsert (no pre-DeleteRoute) so NPM repoints in
  place with no gap;
- persists green into the single runtime-state row BEFORE reaping blue, so a
  crash mid-swap can never orphan green or leave the row pointing at a removed
  container (state.go/teardown.go/reconcile.go stay untouched).

image honors explicit "recreate" (reap existing containers after pull, before
cutover); its default blue-green path is unchanged. compose stays
stack-managed and rejects "blue-green" at Validate so the contract is honest.
static forces recreate for storage-backed deno sites — blue-green would mount
the same RW volume into both containers at once.

Shared helper internal/workload/plugin/strategy.go (ValidateStrategy +
BuildGreenName). Backend-only (phase 1); the field is usable today via the
app's advanced-JSON editor — a friendly toggle + i18n follow in phase 2.
Tests: ValidateStrategy matrix, per-source Validate (incl. the empty-key
backward-compat lock), and effectiveStrategy defaults + the deno gate. Design
+ adversarial review: docs/plans/DEPLOY_STRATEGY_PLAN.md.
2026-06-19 16:51:20 +03:00
alexei.dolgolyov c8e71a0c34 refactor(plugin): centralize workload conversion + container cleanup
Three packages (api, reconciler, webhook) each carried a private 30-line
toPluginWorkload() copy that had drifted — only the api version logged
malformed public_faces JSON; the others swallowed it. Hoist the single
implementation to plugin.WorkloadFromStore() (convert.go); store is
already a plugin dependency so no new import edge or cycle forms.

Likewise the dockerfile and static sources each had a private
removeContainerByName() that disagreed (remove-all vs stop-at-first).
Docker enforces unique container names, so the two were equivalent for
every reachable state; converge on plugin.RemoveContainerByName()
(container.go, stop-at-first) with a note on why remove-all was moot.

Callers migrated; old copies removed. Adds convert_test.go pinning the
field-by-field contract and JSON edge cases.
2026-06-19 16:21:54 +03:00
alexei.dolgolyov fa6d5bd3ba feat(secrets): scoped shared secrets — backend + API (Phase 1)
Secrets defined once and applied to many workloads by scope (global or
per-app), encrypted at rest and resolved into container env as a
low-precedence default layer: global-shared < app-shared < image cfg.Env
< workload_env. A workload with no applicable shared secrets is
byte-identical to the prior workload_env-only behavior.

- store: shared_secrets table + CRUD + ListApplicableSharedSecrets
  (enabled global + app, global-first), UNIQUE(scope,app_id,name).
- plugin.ResolveSharedSecrets + integration into BuildWorkloadEnv
  (static/dockerfile) and image buildEnv; best-effort — a shared-secret
  store/decrypt error never fails a deploy, and values are never logged.
- REST CRUD at /api/shared-secrets (reads authed, mutations AdminOnly);
  values encrypted at the boundary via crypto.Encrypt and never returned
  (only a has_value flag), mirroring workload_env. UNIQUE collisions 409.

Compose is out of scope (YAML-defined env). Frontend rule UI is Phase 2.
Reviewed: go + security APPROVE (0 CRITICAL/HIGH); two MEDIUMs fixed
(translateSQLError -> 409, no driver-message leak). Deferred defense-in-
depth: json:"-" on the model value + a description length cap.
2026-05-29 15:26:09 +03:00
alexei.dolgolyov bd7a11d4e7 refactor(source): dedup shared helpers across static + dockerfile plugins
Extract the verbatim-duplicated helpers into shared homes:
- buildEnv -> plugin.BuildWorkloadEnv (base plugin pkg; a sourceName param
  preserves each plugin's slog prefix / log-scraper text)
- idShort -> plugin.IDShort
- commitStatusReporter -> staticsite.CommitStatusReporter, re-parameterized
  on primitives (owner/repo/sha/targetURL/enabled) so staticsite needs no
  dependency on the plugin package; reporter tests ported to staticsite
  (plus a new nil-provider case)

containerNameFor/imageTagFor are intentionally left per-plugin: their
prefixes differ (dw-site- vs tf-build-) and name real Docker resources,
so merging them would risk mis-routing. Behavior-preserving; the
static/dockerfile test suites pass unchanged.

Reviewed: go APPROVE (0 CRITICAL/HIGH).
2026-05-29 14:57:30 +03:00
alexei.dolgolyov 5c17885197 perf(reconciler): batch workloads per tick, drop redundant image inspect
Load every workload once per tick into a map instead of a per-container
GetWorkloadByID (N+1) in the upsert loop plus a second ListWorkloads in
the plugin pass: one query per tick, zero GetWorkloadByID. The
ListWorkloads error path returns before the missing-sweep so a failed
load can't flip live container rows to 'missing'.

image.Reconcile is now a no-op: the generic upsert+markMissing pass
already syncs every labeled container's state from the single
ListAllForReconciler (docker ps -a) snapshot earlier in the same tick,
so the former per-container IsContainerRunning loop was N redundant
Docker calls/tick. (Its no-op body sits in image.go, which landed with
the preceding commit; the tests are here.) compose/static reconcile do
non-redundant work and are intentionally untouched.

Reviewed: go APPROVE.
2026-05-29 13:51:27 +03:00
alexei.dolgolyov 93b6911b34 feat(apps): per-app deploy/activity timeline
Every deploy across all four source kinds now writes a workload-scoped
event via a shared plugin.EmitDeployEvent helper (replacing the inline
emit duplicated in static/dockerfile, standardizing static's metadata
key site_id->workload_id, and adding emission to image+compose which
were silent). New indexed event_log.workload_id column, EventLogFilter
.WorkloadID, and GET /api/workloads/{id}/events (id pinned from path).

Frontend: a forge "Activity" panel on /apps/[id] reusing EventLogEntry,
live SSE prepend filtered by workload_id, load-more pagination, an
All/Errors severity filter, and a shared toEventLogEntry mapper. en/ru
i18n parity.

Security: compose's failure status emits a generic reason instead of raw
`docker compose up` output, which can echo app secrets and egresses to
operator webhooks (NotificationURL + event-trigger actions); full detail
stays only in the returned error. Rune-safe 256-rune status cap.

Reviewed: go + typescript APPROVE; security HIGH fixed.
2026-05-29 13:51:17 +03:00
alexei.dolgolyov 3071cda512 feat(deploy): commit-status reporting to Git providers
Report deploy status back to the Git provider as a commit status
(pending/success/failure) for git-sourced workloads (static + dockerfile).

- GitProvider.SetCommitStatus on gitea/github/gitlab over the existing
  SSRF-safe client; fixed "tinyforge" context so redeploys update one row.
  postJSON returns status-code-only errors (never echoes the upstream body,
  which a hostile provider could use to reflect the auth token into the
  best-effort log line).
- Best-effort deploy hook: pending on deploy start, success/failure on
  outcome, gated on a per-workload report_commit_status flag. Never fails or
  blocks a deploy; emits nothing on the unchanged-SHA short-circuit.
- UI ToggleSwitch (create + edit) + reportCommitStatus in sourceForms.ts
  + en/ru i18n.
- Tests: per-provider state mapping + request shape; reporter gating
  (enabled/disabled/empty-SHA/nil/error-swallow).

Reviewed via go-reviewer + security-reviewer (0 CRITICAL/HIGH; one MEDIUM
body-echo log-leak fixed).
2026-05-29 11:37:56 +03:00
alexei.dolgolyov 410a131cec feat(apps): stepped creation wizard, branch previews, and app-creation fixes
This session (frontend focus):
- Rebuild /apps/new as a 4-step wizard (Basics → Configure → Trigger → Review):
  WizardRail, SourceKindPicker card grid, AppManifest review, per-step validation,
  ConfirmDialog-based unsaved-changes guard.
- Extract lib/workload/sourceForms.ts (single source of truth for source_config)
  + {Image,Compose,Static,Dockerfile}SourceForm + StaticDiscoveryWizard; fold the
  /apps/[id] edit form onto the same components (removes the duplication). Add
  vitest + sourceForms unit tests.
- Branch preview environments UI: /chain is_preview/preview_branch + a Preview
  environments panel on /apps/[id] (per-branch URLs, ConfirmDialog teardown, armed
  state); RegistryImagePicker on the registry trigger and the image source.
- Fixes: image-inspect 404 -> admin-gated POST /api/discovery/image/inspect;
  conflict-panel blur flicker; friendly localized discovery errors; CPU/Memory
  label hints; dashboard + /apps "Total workloads" count only source_kind workloads
  (drop stale trigger_kind gate); NPM cert/access-list name cache; EntityPicker
  empty-list guard.
- Update CLAUDE.md frontend conventions + add a Build & Test section.

Also captures pre-existing in-progress platform work (not from this session):
workload notifications, Prometheus metrics export, store lockfile, health probes,
backup hardening, and related store/webhook/scheduler changes.
2026-05-29 02:09:54 +03:00
alexei.dolgolyov ef62a41fc0 test(static-plugin): cover pure helpers, build helpers, and state/env paths
Build / build (push) Successful in 11m3s
Bring the previously-untested internal/workload/plugin/source/static/
package from 0% to 23.6% coverage with three new test files:

helpers_test.go (20 cases) - idShort/containerNameFor/imageTagFor/
siteVolumeKey shape + same-name-workload collision avoidance;
sanitizeError newline collapse, empty-token no-op, 240-byte cap, and
multi-byte UTF-8 validity at the cap; containerRowID determinism;
lockFor map semantics (same lock for same workload, distinct locks
for different workloads, real serialization under contention, safe
concurrent insertion); runtimeStateKeys exactly equals the JSON-tag
key set.

build_test.go (8 cases) - copyDir copies files + subdirs and
preserves modes on Unix; verifyDownloadInsideRoot accepts clean
trees and surfaces ErrNotExist for missing roots; both functions
reject symlinks (skipped cleanly on Windows non-admin where the
SeCreateSymbolicLink privilege is absent); prepareStaticBuild
writes the Dockerfile even for an empty source.

state_integration_test.go (12 cases) - loadState/saveState round-
trip on an in-memory SQLite store, including: unknown extra_json
keys (future writers) survive a save; clearing a typed field drops
the key; malformed extra_json is recovered from rather than
panicked on; concurrent writers exercise the per-workload mutex by
accumulating into state.LastError - the test verified to fail
loudly (15+ lost markers) when the mutex is disabled. buildEnv
returns plain values, decrypts encrypted ones, skips rows that
fail to decrypt without leaking ciphertext, and returns empty on
store failure without panicking.

Review followups from go-reviewer pass applied inline: H1 rewrite
to exercise actual lost-update race (verified against disabled
mutex), H2 workload-ID scoping by t.Name() so the package-global
saveLocks map cannot bleed across tests or -count=N runs, set-
based env-assertions, JSON tag-set equality check, multi-byte
truncation case, valid-JSON-on-recovery assertion, unique-keys
in concurrent map test, double-close cleanup.
2026-05-16 18:30:37 +03:00
alexei.dolgolyov e3c7b13d58 chore(workload): close the workload-first arc — apps i18n + codemap + tests
Build / build (push) Successful in 10m36s
Closes the workload-first refactor by landing the Priority 3 polish
items and the Priority 4 test gap. Net: ~2,400 lines added,
~350 lines modified across 13 files.

Priority 3 — polish
- apps.* i18n namespace: 276 new keys across apps.list.* (27),
  apps.new.* (91, sibling of existing apps.new.triggers.*), and
  apps.detail.* (158, sibling of existing apps.detail.bindings.*).
  EN+RU at 1314 keys each, perfectly in sync. /apps, /apps/new,
  /apps/[id] now render entirely from i18n.
- New codemap docs/CODEMAPS/workload-plugin.md (238 lines):
  Source × Trigger contract, dispatch seam, webhook fan-out path,
  recipes for adding a new Source or Trigger kind. Plus
  docs/CODEMAPS/INDEX.md gateway.

Priority 4 — tests
- internal/api/workloads_test.go (new, ~30 subtests): /api/workloads
  CRUD + deploy + delete + env + volumes + chain + promote-from +
  triggers list/inline-bind + auth gating + standalone /api/triggers
  CRUD (create / dup-409 / kind filter / delete). Uses real
  POST handlers via httptest.NewServer + a fake plugin source
  registered under "testfakesource".
- internal/deployer/dispatch_test.go (new, 11 tests):
  DispatchPlugin / DispatchTeardown / DispatchReconcile happy +
  unknown-kind + propagated-error each; PluginDeps wiring; a real
  2s-bounded RWMutex deadlock probe on PluginDeps vs SetDNSProvider.
- internal/workload/plugin/source/compose/compose_test.go (new,
  ~26 subtests): composeProjectName sanitization,
  writeYAML / writeYAMLIfChanged hash short-circuit, Validate happy
  + bad inputs, Kind / SchemaSample.

Coverage delta on the workload-plugin path:
- internal/api: 1.1% → 16.0%
- internal/deployer: 0% → 54.1%
- internal/workload/plugin/source/compose: 0% → 38.5%
- Trigger plugins already at 87-95% from the trigger-split work.

Production fix surfaced by the tests
- store.CreateWorkload now self-references RefID = ID when caller
  leaves RefID empty (the typical plugin-native path). The api
  layer's broken backfill loop (called UpdateWorkload, which
  deliberately omits ref_id) is gone. Multiple sibling plugin
  workloads can now coexist under the UNIQUE(kind, ref_id) constraint.

Review fixes addressed before commit
- CRITICAL: deadlock-detect test gained a real 2s time.After (was
  selecting on context.Background().Done() which never fires).
- HIGH: happy-path test now hard-asserts RefID = ID (was a t.Logf
  that would silently pass after a production fix).
- HIGH: standalone /api/triggers CRUD coverage added (was bypassed
  by the workload-side bind flow).
- HIGH: seedWorkload bypass deleted; tests now go through the
  real POST /api/workloads handler.
- MEDIUM: withTempDir restore is a no-op (t.Setenv auto-restores);
  dead `old := os.Getenv(...)` capture removed.
- MEDIUM: list-workloads test now asserts ID membership, not just
  count.

Doc
- WORKLOAD_REFACTOR_TODO: all three Priority 1 items, Priority 3
  polish, and Priority 4 tests marked DONE. The workload-first arc
  is closed.
2026-05-16 06:42:43 +03:00
alexei.dolgolyov 234c3c711e feat(static): inline static-source plugin; drop phantom-row adapter
Build / build (push) Successful in 10m43s
Lift the static-site deploy pipeline from internal/staticsite/manager.go
into internal/workload/plugin/source/static/ so plugin-native static
workloads operate directly on plugin.Workload + the containers table +
workload_env. The cmd/server/static_backend.go phantom-row adapter is
gone; the legacy static_sites table is no longer touched on plugin
deploys.

Backend
- new state.go: runtimeState (last_commit_sha, last_sync_at,
  last_error, status) persisted in containers.extra_json under the
  deterministic row id <workloadID>:site
- per-workload sync.Mutex serializes saveState read-modify-write so
  parallel deploys for the same workload can't race container_id /
  proxy_route_id writes
- extra_json round-trips through map[string]json.RawMessage so
  unknown keys survive — typed runtimeStateKeys are stripped before
  merge so clearing a typed field actually drops the key
- new env.go reads workload_env (replaces static_site_secrets for
  plugin-native sites); decrypt-failure logs and skips one entry
  rather than failing the whole deploy
- new build.go ports prepareDenoBuild + prepareStaticBuild + copyDir;
  copyDir uses filepath.WalkDir + Lstat to refuse symlinks and
  non-regular files
- new deploy.go is the ~300-line core; intent.Reason gates force vs
  skip-if-no-changes; success-path saveState failure rolls back
  container + proxy route and writes "failed" state (no orphans)
- new teardown.go combines Remove + Stop; idempotent on
  never-deployed workloads
- new reconcile.go refreshes container state from Docker; flips
  runtimeState.Status to failed when the container is missing/crashed

Hardening (from go-reviewer + security-reviewer subagent passes;
1 CRITICAL + 5 HIGH + 3 MEDIUM addressed before merge)
- path-traversal defense in all 3 providers (gitea_content,
  github_provider, gitlab_provider): reject tree entries whose
  resolved local path escapes destDir
- verifyDownloadInsideRoot walks the build dir post-download as a
  second line of defense
- sanitizeError redacts the access token, collapses to one line, and
  clamps to 240 bytes before persisting to extra_json or fanning out
  to the notification webhook
- container/image/volume names suffixed with workload-id short prefix
  (workload name is not UNIQUE in schema)
- primaryDomain reads settings.Domain to complete a bare subdomain
  face into a full FQDN (matches legacy Manager behavior)
- ctx-aware health-check sleep
- json.Marshal for event metadata (was fmt.Sprintf JSON template)
- strings.HasPrefix for failed-status detection (was brittle slice
  expression)

Wire-up
- cmd/server/main.go: removed wireStaticBackend(...) call; existing
  blank import on _ ".../source/static" drives init() registration
- cmd/server/static_backend.go deleted

Doc
- WORKLOAD_REFACTOR_TODO: static port marked DONE; next focus is
  the hard legacy cutover (drop /api/projects, /api/stacks,
  /api/sites, /api/stages + their tables, internal/stack +
  internal/staticsite packages, frontend /projects /stacks /sites)

Behavior notes for operators
- plugin-native static workloads no longer write to static_sites;
  legacy /api/sites/* still serves original rows unchanged
- legacy tinyforge.static-site / .static-site-name container labels
  dropped on plugin deploys; canonical tinyforge.workload.id / .kind
  cover ownership
- container/image/volume names gained an 8-char ID suffix
  (e.g. dw-site-mysite-a1b2c3d4); legacy-deployed sites keep the
  old shape until redeployed through the plugin path
2026-05-16 02:56:23 +03:00
alexei.dolgolyov 8d6a527a2b refactor(workload): plugin architecture wave + apps UI + volume scopes
Completes the workload-first refactor's plugin layer:

- internal/workload/plugin/ — Source/Trigger plugin contract,
  registry, types (Workload, DeploymentIntent, InboundEvent,
  PublicFace). Self-registering init() pattern + blank-import
  in cmd/server/main.go.
- Source plugins: image (blue-green with multi-face proxy routing),
  compose, static. Trigger plugins: registry, git, manual.
- internal/deployer/dispatch.go — DispatchPlugin/Teardown/Reconcile
  seam routing the legacy deployer through plugins.
- internal/api/workload_*.go — REST surface: workloads, env,
  volumes, chain (parent/children), promote-from. hooks.go
  serves /api/hooks/kinds/{kind}/schema for the wizard.
- internal/store: workload_env (encrypt-at-rest secrets) and
  workload_volumes tables, keyed on workload_id.
- cmd/server/static_backend.go — phantom-row adapter delegating
  the static source plugin to the legacy staticsite.Manager
  (deleted at hard cutover once the static inline port lands).
- web/src/routes/apps/ — /apps list + /apps/new wizard +
  /apps/[id] detail with kind-aware compose / image / static
  forms (Advanced JSON toggle), env panel, volumes panel,
  webhook panel, chain panel, manual deploy.

Volume scope generalization (v2 resolver):

- internal/volume.ResolveWorkloadPath (workload-keyed, sits
  next to legacy ResolvePath). Honors all VolumeScope values:
  absolute, ephemeral, instance, stage, project, project_named,
  named. internal/workload/plugin/source/image/image.go
  computeMounts wires settings + imageTag through. Coverage in
  internal/volume/resolver_test.go (portable Linux/Windows via
  t.TempDir).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 22:17:41 +03:00