Files
tiny-forge/internal/api/health.go
T
alexei.dolgolyov 739b67856a
Build / build (push) Successful in 10m39s
feat(cutover): hard legacy cutover — drop projects/stacks/sites/deploys
The clean-break delete that closes the workload-first refactor arc.
Net diff: ~30 backend files deleted, ~20 modified, ~12k LOC removed
on the Go side; entire /projects /stacks /sites /deploy frontend
trees gone; ~6.7k LOC removed on the Svelte/TypeScript side.

Backend
- API handlers gone: internal/api/{projects,stages,stage_env,stacks,
  static_sites,deploys,instances,volume_browser}.go
- Store CRUD + tests gone: internal/store/{projects,stages,stage_env,
  stacks,static_sites,static_site_secrets,deploys,poll_state,volumes,
  workload_sync}.go (+ _test.go siblings)
- Legacy deployer pipeline gone: internal/deployer/{bluegreen,promote,
  rollback,subdomain,resolver_test}.go; deployer.go trimmed to just the
  dispatch surface used by the plugin pipeline
- internal/staticsite/{manager,healthcheck}.go and
  internal/stack/manager.go gone (the rest of those packages stay as
  helpers imported by the static + compose plugins)
- internal/registry/poller.go gone (legacy registry poller)
- internal/volume.ResolvePath gone; ResolveWorkloadPath stays
- internal/webhook: handleWebhook (project) + handleSiteWebhook (site)
  gone; only POST /api/webhook/triggers/{secret} remains
- workload-side webhook URL handlers (getWorkloadWebhook +
  regenerateWorkloadWebhook + EnsureWorkloadWebhookSecret +
  SetWorkloadWebhookSecret + GetWorkloadByWebhookSecret) gone — they
  minted URLs that would 404 against the new trigger-only ingress
- cmd/server/main.go: dropped staticsite.Manager, stack.Manager,
  staticsite.HealthChecker, registry poller, SetSiteSyncTriggerer,
  SetStaticSiteManager, SetStackManager, wireStaticBackend
- store/store.go: idempotent DROP TABLE IF EXISTS for every legacy
  table (projects, stages, stage_env, volumes, deploys, deploy_logs,
  poll_states, stacks, stack_revisions, stack_deploys, static_sites,
  static_site_secrets); FK order children-then-parents
- store/models.go: dropped Project, Stage, Deploy, DeployLog, StageEnv,
  Volume, StaticSite, StaticSiteSecret, Stack, StackRevision,
  StackDeploy types; kept WorkloadKind constants as documented strings
- internal/store/helpers.go (new): BoolToInt, rowScanner,
  GenerateWebhookSecret extracted from deleted CRUD files
- internal/api/secrets.go (new): forwards to store.GenerateWebhookSecret
  so api + store paths share one secret-generation impl (no
  panic-vs-UUID-fallback divergence)
- internal/reconciler/reconciler.go: dropped legacy stack-by-compose
  + static-site label paths; only canonical tinyforge.workload.id
  dispatch remains
- providers (gitea_content/github_provider/gitlab_provider) gained
  path-traversal rejection on every tree entry
- internal/webhook ParsedImage / ParseImageRef demoted to package-
  private (no external callers)

Frontend
- /projects /stacks /sites /deploy routes deleted (entire trees)
- ProjectCard / InstanceCard / StaleContainerCard components deleted
- api.ts: dropped every project/stage/stack/site/deploy/instance
  helper + types (Project, Stage, Stack, StaticSite, Deploy,
  Instance, Volume, etc.); kept Workload, Container, App, Settings,
  Registry, EventTrigger, LogScanRule, webhook envelopes
- WorkloadWebhook type + getWorkloadWebhook/regenerateWorkloadWebhook
  api functions gone (mirror of the backend deletion above)
- web/src/routes/+layout.svelte: dropped /projects /sites /stacks
  /deploy nav entries, trimmed quick-nav keymap
- web/src/routes/+page.svelte: dashboard rewrite — reads
  listWorkloads + listContainers only; 4-card stat grid
  (workloads/running/failed/stale) + recent workloads strip
- navCounts.ts, SystemHealthCard.svelte, ContainerLogs.svelte,
  ContainerStats.svelte, StatusBadge.svelte, TagCombobox.svelte,
  proxies/+page.svelte, containers/+page.svelte all rewired to the
  workload-first surface
- AbortController plumbing on dashboard, nav-counts, stale page,
  SystemHealthCard so navigation doesn't leave dangling fetches
- i18n: dropped projects.*, projectDetail.*, envEditor.*,
  volumeEditor.*, volumeBrowser.*, quickDeploy.*, sites.*, stacks.*,
  instance.*, confirm.* namespaces; en/ru parity preserved (1042
  keys each)

Hardening from go-reviewer + security-reviewer + typescript-reviewer
subagent passes (0 CRITICAL across all three; 1 HIGH + ~12 MEDIUM
addressed inline before commit):

- Sec H1: dead-end workload webhook URL handlers (would mint URLs
  that 404 the new trigger-only ingress) deleted across backend +
  frontend
- Go M1: IsTerminalDeployStatus dropped (no production callers)
- Go M2: ParsedImage/ParseImageRef lowercased (in-package only)
- Go M6: generateWebhookSecret unified — api shim forwards to
  store.GenerateWebhookSecret
- Doc/comment freshness: stage_id (no longer FK), ProxyRoute legacy
  field names, workloadIDRow rationale, webhook_deliveries.target_type
  enum, WebhookDeliveryLog component header

Doc
- WORKLOAD_REFACTOR_TODO: cutover marked DONE; all three Priority 1
  items are now shipped. Next focus is Priority 3 polish (apps.* i18n
  + codemap entries) and Priority 4 tests.

Behavioral notes for operators upgrading from a pre-cutover build
- Existing rows in the dropped tables disappear on first boot.
- Legacy webhook URLs at /api/webhook/{secret} and
  /api/webhook/sites/{secret} return 404; CI configs must repoint to
  /api/webhook/triggers/{secret} (the trigger-split boot backfill
  lifted any embedded workload secret onto a Trigger row, so the
  secret value itself carries over).
- Frontend routes /projects /stacks /sites /deploy are gone; nav
  links replaced with /apps and /triggers.
2026-05-16 06:00:21 +03:00

252 lines
7.4 KiB
Go

package api
import (
"context"
"net/http"
"time"
"github.com/alexei/tinyforge/internal/auth"
"github.com/alexei/tinyforge/internal/proxy"
)
// healthProbeTimeout caps a single health probe so a stuck dependency does
// not hold the polling endpoint open. The UI polls every 30 s, so 8 s leaves
// headroom for the ping + Info + NPM list calls.
const healthProbeTimeout = 8 * time.Second
// nonAdminDockerFields enumerates the fields any authenticated user is
// allowed to see — version + connectivity + container counts. Host-detail
// fields (kernel, root_dir, hostname, OS, storage driver) are admin-only to
// avoid recon information leaks.
var nonAdminDockerFields = map[string]bool{
"connected": true,
"latency_ms": true,
"error": true,
"version": true,
"api_version": true,
"containers": true,
"running": true,
"paused": true,
"stopped": true,
"images": true,
"ncpu": true,
"memory_total": true,
}
// nonAdminProxyFields are the proxy fields safe to share with non-admins.
// Configured URLs and aggregate counts of internal lists/certs are stripped.
var nonAdminProxyFields = map[string]bool{
"provider": true,
"connected": true,
"latency_ms": true,
"error": true,
"proxy_hosts_managed": true,
}
// getHealth handles GET /api/health.
//
// Returns the connectivity state and (when connected) diagnostics for the
// Docker daemon and the active proxy provider. Detailed host information
// (kernel, root_dir, internal NPM URL, …) is stripped for non-admin users to
// avoid leaking infrastructure details to read-only viewers.
func (s *Server) getHealth(w http.ResponseWriter, r *http.Request) {
ctx, cancel := context.WithTimeout(r.Context(), healthProbeTimeout)
defer cancel()
claims, _ := auth.ClaimsFromContext(r.Context())
isAdmin := claims.Role == "admin"
now := time.Now().UTC().Format(time.RFC3339)
result := map[string]any{
"checked_at": now,
}
// ── Database ─────────────────────────────────────────────────────
if err := s.store.DB().PingContext(ctx); err != nil {
result["database"] = map[string]any{"connected": false, "error": "database unreachable"}
} else {
result["database"] = map[string]any{"connected": true}
}
// ── Docker daemon ────────────────────────────────────────────────
docker := s.dockerHealth(ctx)
if !isAdmin {
docker = filterFields(docker, nonAdminDockerFields)
}
result["docker"] = docker
// ── Proxy provider ───────────────────────────────────────────────
if s.proxyProvider != nil {
proxyInfo := s.proxyHealth(ctx)
if !isAdmin {
proxyInfo = filterFields(proxyInfo, nonAdminProxyFields)
}
result["proxy"] = proxyInfo
}
respondJSON(w, http.StatusOK, result)
}
// filterFields returns a copy of m containing only the keys present in allow.
func filterFields(m map[string]any, allow map[string]bool) map[string]any {
out := make(map[string]any, len(allow))
for k, v := range m {
if allow[k] {
out[k] = v
}
}
return out
}
// dockerHealth probes the Docker daemon and, if reachable, attaches a full
// DaemonInfo snapshot. The caller does not need to error-check the Info()
// call — if it fails, the connected flag remains true (ping succeeded) but
// the detail fields are simply omitted.
func (s *Server) dockerHealth(ctx context.Context) map[string]any {
if s.docker == nil {
return map[string]any{
"connected": false,
"error": "docker client not initialized",
}
}
start := time.Now()
if err := s.docker.Ping(ctx); err != nil {
return map[string]any{
"connected": false,
"error": err.Error(),
"latency_ms": time.Since(start).Milliseconds(),
}
}
out := map[string]any{
"connected": true,
"latency_ms": time.Since(start).Milliseconds(),
}
// Info enriches the payload; failures are non-fatal.
info, err := s.docker.Info(ctx)
if err == nil {
if info.Version != "" {
out["version"] = info.Version
}
if info.APIVersion != "" {
out["api_version"] = info.APIVersion
}
if info.OS != "" {
out["os"] = info.OS
}
if info.Arch != "" {
out["arch"] = info.Arch
}
if info.Kernel != "" {
out["kernel"] = info.Kernel
}
if info.OperatingSystem != "" {
out["operating_system"] = info.OperatingSystem
}
if info.StorageDriver != "" {
out["storage_driver"] = info.StorageDriver
}
if info.RootDir != "" {
out["root_dir"] = info.RootDir
}
if info.Name != "" {
out["name"] = info.Name
}
if info.NCPU > 0 {
out["ncpu"] = info.NCPU
}
if info.MemoryTotal > 0 {
out["memory_total"] = info.MemoryTotal
}
out["containers"] = info.Containers
out["running"] = info.Running
out["paused"] = info.Paused
out["stopped"] = info.Stopped
out["images"] = info.Images
}
return out
}
// proxyHealth probes the configured proxy provider. For NPM, attaches
// aggregate counts (proxy hosts, access lists, certificates) which the
// dashboard surfaces alongside the connection indicator.
func (s *Server) proxyHealth(ctx context.Context) map[string]any {
providerName := s.proxyProvider.Name()
start := time.Now()
err := s.proxyProvider.Ping(ctx)
latency := time.Since(start).Milliseconds()
if err != nil {
return map[string]any{
"provider": providerName,
"connected": false,
"error": providerName + " unreachable: " + err.Error(),
"latency_ms": latency,
}
}
out := map[string]any{
"provider": providerName,
"connected": true,
"latency_ms": latency,
}
// Attach configured URL from settings for both NPM and Traefik.
if settings, serr := s.store.GetSettings(); serr == nil {
switch providerName {
case "npm":
if settings.NpmURL != "" {
out["url"] = settings.NpmURL
}
case "traefik":
if settings.TraefikAPIURL != "" {
out["url"] = settings.TraefikAPIURL
}
}
}
// NPM-specific aggregates — a quick glance at route/list/cert counts.
// These calls require an authenticated NPM session, so we trigger the
// provider's auth step first (it's cheap: cached JWT is reused for 1h).
if providerName == "npm" && s.npm != nil {
if np, ok := s.proxyProvider.(*proxy.NpmProvider); ok {
if err := np.Authenticate(ctx); err == nil {
if hosts, herr := s.npm.ListProxyHosts(ctx); herr == nil {
out["proxy_hosts"] = len(hosts)
}
if lists, lerr := s.npm.ListAccessLists(ctx); lerr == nil {
out["access_lists"] = len(lists)
}
if certs, cerr := s.npm.ListCertificates(ctx); cerr == nil {
out["certificates"] = len(certs)
}
}
}
}
// Managed-route count — how many of the proxy's routes were deployed
// by Tinyforge itself, counting both Docker instances and static sites.
// This works for every provider (NPM, Traefik, …) because it reads from
// our own store, not the external proxy API.
if managed, merr := s.managedRouteCount(); merr == nil {
out["proxy_hosts_managed"] = managed
}
return out
}
// managedRouteCount returns the number of proxy routes Tinyforge manages,
// reading from the unified containers index. The domain argument doesn't
// affect the count so we pass an empty string to skip FQDN rendering.
func (s *Server) managedRouteCount() (int, error) {
routes, err := s.store.ListProxyRoutes("")
if err != nil {
return 0, err
}
return len(routes), nil
}