feat(discovery+runtime): restore static-site wizard discovery + close /sites/[id] feature parity
Build / build (push) Successful in 10m43s

Two-stage feature arc closing the gaps left by the hard legacy cutover.
The static-site creation wizard regains its auto-discovery + connection-test
flow; /apps/[id] grows the runtime/storage/lifecycle surface the legacy
/sites/[id] page used to expose.

Backend (Go)
- internal/api/discovery.go: six admin-gated endpoints wrapping
  staticsite.GitProvider — POST /api/discovery/git/{detect-provider,
  test-connection,repos,branches,tree} + GET /api/discovery/image/conflicts.
  Identifier validation (validateGitIdent / validateGitBranch) at the
  boundary so provider URL interpolation cannot be hijacked via `..`.
  Upstream errors scrubbed: detailed slog on the server, generic 502 to
  the client (mitigates token-reflection-in-error-page).
- internal/api/workload_runtime.go: four endpoints —
  GET /api/workloads/{id}/runtime-state decodes containers.extra_json for
  static workloads; GET /api/workloads/{id}/storage execs `du -sb /app/data`
  with a 30s in-process cache (storageProbeCache) so polling can't turn
  into per-request execs; POST /api/workloads/{id}/{stop,start} iterate
  ListContainersByWorkload and call docker.StopContainer / StartContainer,
  returning 200 / 409 (nothing to act on) / 502 (all failed).
- internal/staticsite/safehttp.go: NewSafeHTTPClient + ValidateBaseURL +
  blockReason. DialContext re-resolves hostnames and refuses loopback /
  link-local / multicast / unspecified addresses. RFC1918 + ULA explicitly
  allowed (self-hosted Gitea on LAN is the dominant deployment).
  Replaced four raw &http.Client{} constructions in the provider files.
- internal/staticsite/gitlab_provider.go: url.PathEscape each segment in
  the raw-file URL builder for parity with projectPath().
- Test coverage: 26 cases in discovery_test.go (image-tag stripping,
  source-config decoding, conflict scenarios, validator boundaries,
  scheme rejection), 14 in workload_runtime_test.go (404 / 409 / nil-docker
  / probe-cache), 16 in safehttp_test.go (URL validation + block-reason
  policy matrix + live dial against loopback + AWS metadata literals).

Frontend (Svelte 5 + runes)
- web/src/lib/api.ts: typed wrappers for every endpoint, AbortSignal
  threaded through post(); ApiError exported so callers can narrow on
  e.status; new DetectedGitProvider narrow union.
- web/src/routes/apps/new/+page.svelte: static-form discovery controls
  (auto-detect provider, test connection, repo / branch / folder
  EntityPickers, Deno auto-detect); image-form conflict panel with
  debounced lookup + double-click submit guard ("Forge anyway") + Inspect
  button that pre-fills port/healthcheck; English error fallbacks routed
  through apps.new.errors.* (en + ru).
- web/src/routes/apps/[id]/+page.svelte: runtime-state panel + storage
  panel + Stop / Start / Open-site toolbar; universal live-state badge
  in the hero lede for image/compose/static (RUNNING / TRANSITIONING /
  STOPPED / NOT DEPLOYED / MIXED · n/m RUNNING); ContainerStats panel
  per row (auto-collapsing native <details> when N > 2); read-only
  webhook bindings summary card; responsive toolbar overflow with native
  <details> at <640px (z-index 100 above sticky nav).
- web/src/app.css: project-wide .forge-btn-ghost:focus-visible outline.

Hardening from go-reviewer + security-reviewer + typescript-reviewer +
frontend-design UI/UX subagents (0 CRITICAL, all HIGH/BLOCKER addressed
inline, IMPORTANT applied before commit):
- AbortController + per-call sequence tokens on every long-running
  fetch (loadRuntimeState / loadStorage / loadTriggerMeta / inspectImage /
  listImageConflicts) plus onDestroy cleanup so late resolves cannot
  mutate dead component state.
- doStop / doStart snapshot and restore `error` across the finally-block
  reload so a load()-cleared message doesn't hide a real failure.
- triggersById refreshed after inline trigger creation so the webhook
  card doesn't silently exclude the just-created trigger.
- Live-state badge wraps in role=status / aria-live=polite (no redundant
  aria-label).
- Webhook row has a single click target (was two pointing at the same URL).
- Empty webhook section hides entirely.
- Dropped role=menu / role=menuitem from the overflow menu (they would
  promise arrow-key nav we don't wire; native Tab + ESC carry it).

Doc
- docs/CODEMAPS/INDEX.md + new docs/CODEMAPS/discovery-and-runtime.md
  map the endpoint surface, security posture, frontend integration
  patterns, and an "add a new probe" recipe.

Verification
- svelte-check: 0 errors, 3 pre-existing a11y warnings.
- go build + go vet + go test ./...: all green.
- i18n parity: en + ru at 1413 keys each.
- Live smoke against :8090: 404 / 409 / 502 envelopes correct, discovery
  sanity passes, ProbeError surfaces on no-container path.
This commit is contained in:
2026-05-16 21:35:51 +03:00
parent ef62a41fc0
commit ea55d31177
19 changed files with 4333 additions and 81 deletions
+2 -3
View File
@@ -6,9 +6,8 @@ This directory contains architectural maps of key Tinyforge subsystems. Each cod
## Codemaps
| Area | File | Focus |
|------|------|-------|
| **Workload Plugin** | [`workload-plugin.md`](./workload-plugin.md) | Source × Trigger plugin contracts; registry lookups; webhook fan-out; how to add new kinds |
- **[Workload Plugin](./workload-plugin.md)** — Source × Trigger plugin contracts; registry lookups; webhook fan-out; how to add new kinds.
- **[Discovery & Runtime API](./discovery-and-runtime.md)** — `/api/discovery/*` helpers (Git provider probe, repo/branch/tree pickers, image conflicts); `/api/workloads/{id}/runtime-state` + `/storage` + `/stop` + `/start`; SSRF-safe HTTP client in `internal/staticsite`.
## Cross-References
+88
View File
@@ -0,0 +1,88 @@
# Discovery & Runtime API — Codemap
**Last Updated:** 2026-05-16
Surfaces added during the static-site discovery restoration + workload runtime panel work. All endpoints sit inside the existing `/api` group (auth-middleware enforced); admin-gated routes are noted per-endpoint.
## Files
### Backend
- [`internal/api/discovery.go`](../../internal/api/discovery.go) — six admin-gated handlers wrapping `staticsite.GitProvider` + an image-source conflict scanner.
- [`internal/api/workload_runtime.go`](../../internal/api/workload_runtime.go) — runtime-state read, storage-usage probe (with 30s in-memory cache), and stop/start mutation handlers.
- [`internal/staticsite/safehttp.go`](../../internal/staticsite/safehttp.go) — `NewSafeHTTPClient` + `ValidateBaseURL` + `blockReason` (loopback / link-local / multicast / unspecified blocked at dial time; RFC1918 / ULA explicitly allowed).
- [`internal/api/discovery_test.go`](../../internal/api/discovery_test.go) — 26 table cases (image-tag stripping, source-config decoding, conflict scenarios, validator boundaries, scheme rejection).
- [`internal/api/workload_runtime_test.go`](../../internal/api/workload_runtime_test.go) — 14 cases (404, source-kind branching, never-deployed path, malformed extra-json, nil-docker-client 503, probe cache short-circuit).
- [`internal/staticsite/safehttp_test.go`](../../internal/staticsite/safehttp_test.go) — 16 cases (URL validation matrix, block-reason policy matrix, live dial against loopback + AWS metadata literals).
### Frontend
- [`web/src/lib/api.ts`](../../web/src/lib/api.ts) — typed wrappers for every endpoint, signal-aware (`AbortSignal` threaded through `post()`); `ApiError` exported so callers can narrow on `e.status`.
- [`web/src/routes/apps/new/+page.svelte`](../../web/src/routes/apps/new/+page.svelte) — static-form discovery controls (auto-detect provider, test connection, repo / branch / folder pickers, Deno auto-detect); image-form conflict panel + Inspect button.
- [`web/src/routes/apps/[id]/+page.svelte`](../../web/src/routes/apps/[id]/+page.svelte) — runtime-state panel, storage panel, Stop / Start / Open-site toolbar; live-state badge in hero; ContainerStats panel; webhook bindings card; responsive toolbar overflow.
## Endpoint reference
### Discovery (admin-only)
| Method | Path | Returns |
| ------ | ------------------------------------------ | -------------------------------- |
| POST | `/api/discovery/git/detect-provider` | `{provider: DetectedGitProvider}`|
| POST | `/api/discovery/git/test-connection` | `{status: "ok"}` or 502 |
| POST | `/api/discovery/git/repos` | `RepoInfo[]` |
| POST | `/api/discovery/git/branches` | `string[]` |
| POST | `/api/discovery/git/tree` | `FolderEntry[]` |
| GET | `/api/discovery/image/conflicts?image=...` | `ImageConflict[]` |
All Git endpoints accept the shared `gitProviderRequest` shape: `{provider, base_url, access_token, repo_owner, repo_name, branch, query}`. Token is plaintext over HTTPS and never persisted server-side. `provider` may be empty to trigger `staticsite.DetectProviderWithProbe`.
### Workload runtime
| Method | Path | Auth | Returns |
| ------ | ------------------------------------- | ------------ | ------------------------------- |
| GET | `/api/workloads/{id}/runtime-state` | Any auth | `WorkloadRuntimeState` |
| GET | `/api/workloads/{id}/storage` | Any auth | `WorkloadStorageUsage` |
| POST | `/api/workloads/{id}/stop` | Admin | `{touched, failed}` / 409 / 502 |
| POST | `/api/workloads/{id}/start` | Admin | `{touched, failed}` / 409 / 502 |
`runtime-state` decodes `containers.extra_json` for `<workloadID>:site` (the deterministic container row the static plugin maintains). Returns `{source_kind, has_state: false}` for non-static workloads or never-deployed static workloads.
`storage` returns `{enabled: false}` for non-static or storage-disabled workloads. When enabled, execs `du -sb /app/data` (15s budget) via `docker.InspectSiteStorageUsage`. Results memoized for 30s in the `storageProbeCache` package-level map.
`stop` / `start` iterate `store.ListContainersByWorkload` and call `docker.StopContainer(ctx, id, 10)` / `StartContainer`. Returns 409 when no container row exists ("nothing to act on"), 502 when every container failed, 200 with `{touched, failed}` counts otherwise.
## Security posture
- **SSRF defense** — every outbound HTTP call from `staticsite/{gitea,github,gitlab}_provider.go` and the discovery probe uses `NewSafeHTTPClient`. The `DialContext` re-resolves the host and refuses loopback / link-local / multicast / unspecified addresses. RFC1918 + ULA are intentionally allowed (self-hosted Gitea on LAN is the dominant deployment pattern).
- **Identifier validation** — `validateGitIdent` (regex `^[A-Za-z0-9][A-Za-z0-9._-]*$`) and `validateGitBranch` (allows `/`, rejects `..`) run at the API boundary so provider URL interpolation cannot be hijacked.
- **Error scrubbing** — upstream Git provider errors are never echoed verbatim. `upstreamError(w, op, err)` logs the detail server-side and returns a generic 502 to the client (mitigates token-reflection-in-error-page).
- **Token handling** — tokens are plaintext in request bodies (HTTPS assumed) and never persisted. Discovery endpoints accept them per-call; nothing is stored.
- **Auth model** — read endpoints (`runtime-state`, `storage`) are open to any authenticated user; mutation endpoints (`stop`, `start`, every `/discovery/*` POST/GET) are admin-only.
## Frontend integration patterns
- All long-running requests accept an optional `AbortSignal` and are cancelled on `onDestroy` via per-call AbortController plus a sequence token (`reqSeq`) so a slow earlier response cannot overwrite a faster later one. Mirror this pattern when adding new probes — see `loadRuntimeState` / `loadStorage` / `inspectImageRef` for the canonical shape.
- The wizard's English error fallbacks live under `apps.new.errors.*` in en + ru. Parity is maintained at 1413 keys; verify with the inline `node -e ...` script in the repo root (or `npm run check`).
- `ApiError` narrowing (`e instanceof api.ApiError && e.status === N`) replaces the older regex-over-`Error.message` pattern.
## Recipes
### Add a new probe endpoint
1. Handler in `internal/api/workload_runtime.go` following the established 404-vs-409-vs-502 pattern. Log detail server-side, return generic messages.
2. Route registration in [`internal/api/router.go`](../../internal/api/router.go) under the `/workloads/{id}` group.
3. Typed wrapper in `web/src/lib/api.ts` with `signal?: AbortSignal` parameter.
4. UI consumer mirrors the `loadRuntimeState` pattern: per-call seq token + AbortController stored in module scope + cancelled in `onDestroy`.
5. Tests: table-driven with `newAPITestEnv` from [`internal/api/workloads_test.go`](../../internal/api/workloads_test.go).
### Extend Git discovery to a new provider
1. Add a new `staticsite.GitProvider` implementation (see `gitea_content.go` for the smallest reference). Use `NewSafeHTTPClient(60 * time.Second)` for outbound calls — do not introduce a raw `&http.Client{}`.
2. Register in `staticsite.NewGitProvider` switch.
3. Add `URL.PathEscape` on every interpolated `{owner}/{repo}/{branch}` segment in URL construction.
4. Update `DetectProviderWithProbe` if the new provider has a known API signature worth probing for unknown hosts.
5. Update `DetectedGitProvider` union in `web/src/lib/api.ts`.
## Cross-references
- **Memory** — Project memory under `[[project_discovery_restoration]]` tracks what shipped vs deferred.
- **Workload Plugin** — [`workload-plugin.md`](./workload-plugin.md) — Source × Trigger contracts that the runtime endpoints read from.
- **Webhook Documentation** — [`docs/webhooks.md`](../webhooks.md) — Outgoing webhook events the static plugin fires (`site_sync_success`, `site_sync_failure`).