739b67856a
Build / build (push) Successful in 10m39s
The clean-break delete that closes the workload-first refactor arc.
Net diff: ~30 backend files deleted, ~20 modified, ~12k LOC removed
on the Go side; entire /projects /stacks /sites /deploy frontend
trees gone; ~6.7k LOC removed on the Svelte/TypeScript side.
Backend
- API handlers gone: internal/api/{projects,stages,stage_env,stacks,
static_sites,deploys,instances,volume_browser}.go
- Store CRUD + tests gone: internal/store/{projects,stages,stage_env,
stacks,static_sites,static_site_secrets,deploys,poll_state,volumes,
workload_sync}.go (+ _test.go siblings)
- Legacy deployer pipeline gone: internal/deployer/{bluegreen,promote,
rollback,subdomain,resolver_test}.go; deployer.go trimmed to just the
dispatch surface used by the plugin pipeline
- internal/staticsite/{manager,healthcheck}.go and
internal/stack/manager.go gone (the rest of those packages stay as
helpers imported by the static + compose plugins)
- internal/registry/poller.go gone (legacy registry poller)
- internal/volume.ResolvePath gone; ResolveWorkloadPath stays
- internal/webhook: handleWebhook (project) + handleSiteWebhook (site)
gone; only POST /api/webhook/triggers/{secret} remains
- workload-side webhook URL handlers (getWorkloadWebhook +
regenerateWorkloadWebhook + EnsureWorkloadWebhookSecret +
SetWorkloadWebhookSecret + GetWorkloadByWebhookSecret) gone — they
minted URLs that would 404 against the new trigger-only ingress
- cmd/server/main.go: dropped staticsite.Manager, stack.Manager,
staticsite.HealthChecker, registry poller, SetSiteSyncTriggerer,
SetStaticSiteManager, SetStackManager, wireStaticBackend
- store/store.go: idempotent DROP TABLE IF EXISTS for every legacy
table (projects, stages, stage_env, volumes, deploys, deploy_logs,
poll_states, stacks, stack_revisions, stack_deploys, static_sites,
static_site_secrets); FK order children-then-parents
- store/models.go: dropped Project, Stage, Deploy, DeployLog, StageEnv,
Volume, StaticSite, StaticSiteSecret, Stack, StackRevision,
StackDeploy types; kept WorkloadKind constants as documented strings
- internal/store/helpers.go (new): BoolToInt, rowScanner,
GenerateWebhookSecret extracted from deleted CRUD files
- internal/api/secrets.go (new): forwards to store.GenerateWebhookSecret
so api + store paths share one secret-generation impl (no
panic-vs-UUID-fallback divergence)
- internal/reconciler/reconciler.go: dropped legacy stack-by-compose
+ static-site label paths; only canonical tinyforge.workload.id
dispatch remains
- providers (gitea_content/github_provider/gitlab_provider) gained
path-traversal rejection on every tree entry
- internal/webhook ParsedImage / ParseImageRef demoted to package-
private (no external callers)
Frontend
- /projects /stacks /sites /deploy routes deleted (entire trees)
- ProjectCard / InstanceCard / StaleContainerCard components deleted
- api.ts: dropped every project/stage/stack/site/deploy/instance
helper + types (Project, Stage, Stack, StaticSite, Deploy,
Instance, Volume, etc.); kept Workload, Container, App, Settings,
Registry, EventTrigger, LogScanRule, webhook envelopes
- WorkloadWebhook type + getWorkloadWebhook/regenerateWorkloadWebhook
api functions gone (mirror of the backend deletion above)
- web/src/routes/+layout.svelte: dropped /projects /sites /stacks
/deploy nav entries, trimmed quick-nav keymap
- web/src/routes/+page.svelte: dashboard rewrite — reads
listWorkloads + listContainers only; 4-card stat grid
(workloads/running/failed/stale) + recent workloads strip
- navCounts.ts, SystemHealthCard.svelte, ContainerLogs.svelte,
ContainerStats.svelte, StatusBadge.svelte, TagCombobox.svelte,
proxies/+page.svelte, containers/+page.svelte all rewired to the
workload-first surface
- AbortController plumbing on dashboard, nav-counts, stale page,
SystemHealthCard so navigation doesn't leave dangling fetches
- i18n: dropped projects.*, projectDetail.*, envEditor.*,
volumeEditor.*, volumeBrowser.*, quickDeploy.*, sites.*, stacks.*,
instance.*, confirm.* namespaces; en/ru parity preserved (1042
keys each)
Hardening from go-reviewer + security-reviewer + typescript-reviewer
subagent passes (0 CRITICAL across all three; 1 HIGH + ~12 MEDIUM
addressed inline before commit):
- Sec H1: dead-end workload webhook URL handlers (would mint URLs
that 404 the new trigger-only ingress) deleted across backend +
frontend
- Go M1: IsTerminalDeployStatus dropped (no production callers)
- Go M2: ParsedImage/ParseImageRef lowercased (in-package only)
- Go M6: generateWebhookSecret unified — api shim forwards to
store.GenerateWebhookSecret
- Doc/comment freshness: stage_id (no longer FK), ProxyRoute legacy
field names, workloadIDRow rationale, webhook_deliveries.target_type
enum, WebhookDeliveryLog component header
Doc
- WORKLOAD_REFACTOR_TODO: cutover marked DONE; all three Priority 1
items are now shipped. Next focus is Priority 3 polish (apps.* i18n
+ codemap entries) and Priority 4 tests.
Behavioral notes for operators upgrading from a pre-cutover build
- Existing rows in the dropped tables disappear on first boot.
- Legacy webhook URLs at /api/webhook/{secret} and
/api/webhook/sites/{secret} return 404; CI configs must repoint to
/api/webhook/triggers/{secret} (the trigger-split boot backfill
lifted any embedded workload secret onto a Trigger row, so the
secret value itself carries over).
- Frontend routes /projects /stacks /sites /deploy are gone; nav
links replaced with /apps and /triggers.
252 lines
7.4 KiB
Go
252 lines
7.4 KiB
Go
package api
|
|
|
|
import (
|
|
"context"
|
|
"net/http"
|
|
"time"
|
|
|
|
"github.com/alexei/tinyforge/internal/auth"
|
|
"github.com/alexei/tinyforge/internal/proxy"
|
|
)
|
|
|
|
// healthProbeTimeout caps a single health probe so a stuck dependency does
|
|
// not hold the polling endpoint open. The UI polls every 30 s, so 8 s leaves
|
|
// headroom for the ping + Info + NPM list calls.
|
|
const healthProbeTimeout = 8 * time.Second
|
|
|
|
// nonAdminDockerFields enumerates the fields any authenticated user is
|
|
// allowed to see — version + connectivity + container counts. Host-detail
|
|
// fields (kernel, root_dir, hostname, OS, storage driver) are admin-only to
|
|
// avoid recon information leaks.
|
|
var nonAdminDockerFields = map[string]bool{
|
|
"connected": true,
|
|
"latency_ms": true,
|
|
"error": true,
|
|
"version": true,
|
|
"api_version": true,
|
|
"containers": true,
|
|
"running": true,
|
|
"paused": true,
|
|
"stopped": true,
|
|
"images": true,
|
|
"ncpu": true,
|
|
"memory_total": true,
|
|
}
|
|
|
|
// nonAdminProxyFields are the proxy fields safe to share with non-admins.
|
|
// Configured URLs and aggregate counts of internal lists/certs are stripped.
|
|
var nonAdminProxyFields = map[string]bool{
|
|
"provider": true,
|
|
"connected": true,
|
|
"latency_ms": true,
|
|
"error": true,
|
|
"proxy_hosts_managed": true,
|
|
}
|
|
|
|
// getHealth handles GET /api/health.
|
|
//
|
|
// Returns the connectivity state and (when connected) diagnostics for the
|
|
// Docker daemon and the active proxy provider. Detailed host information
|
|
// (kernel, root_dir, internal NPM URL, …) is stripped for non-admin users to
|
|
// avoid leaking infrastructure details to read-only viewers.
|
|
func (s *Server) getHealth(w http.ResponseWriter, r *http.Request) {
|
|
ctx, cancel := context.WithTimeout(r.Context(), healthProbeTimeout)
|
|
defer cancel()
|
|
|
|
claims, _ := auth.ClaimsFromContext(r.Context())
|
|
isAdmin := claims.Role == "admin"
|
|
|
|
now := time.Now().UTC().Format(time.RFC3339)
|
|
result := map[string]any{
|
|
"checked_at": now,
|
|
}
|
|
|
|
// ── Database ─────────────────────────────────────────────────────
|
|
if err := s.store.DB().PingContext(ctx); err != nil {
|
|
result["database"] = map[string]any{"connected": false, "error": "database unreachable"}
|
|
} else {
|
|
result["database"] = map[string]any{"connected": true}
|
|
}
|
|
|
|
// ── Docker daemon ────────────────────────────────────────────────
|
|
docker := s.dockerHealth(ctx)
|
|
if !isAdmin {
|
|
docker = filterFields(docker, nonAdminDockerFields)
|
|
}
|
|
result["docker"] = docker
|
|
|
|
// ── Proxy provider ───────────────────────────────────────────────
|
|
if s.proxyProvider != nil {
|
|
proxyInfo := s.proxyHealth(ctx)
|
|
if !isAdmin {
|
|
proxyInfo = filterFields(proxyInfo, nonAdminProxyFields)
|
|
}
|
|
result["proxy"] = proxyInfo
|
|
}
|
|
|
|
respondJSON(w, http.StatusOK, result)
|
|
}
|
|
|
|
// filterFields returns a copy of m containing only the keys present in allow.
|
|
func filterFields(m map[string]any, allow map[string]bool) map[string]any {
|
|
out := make(map[string]any, len(allow))
|
|
for k, v := range m {
|
|
if allow[k] {
|
|
out[k] = v
|
|
}
|
|
}
|
|
return out
|
|
}
|
|
|
|
// dockerHealth probes the Docker daemon and, if reachable, attaches a full
|
|
// DaemonInfo snapshot. The caller does not need to error-check the Info()
|
|
// call — if it fails, the connected flag remains true (ping succeeded) but
|
|
// the detail fields are simply omitted.
|
|
func (s *Server) dockerHealth(ctx context.Context) map[string]any {
|
|
if s.docker == nil {
|
|
return map[string]any{
|
|
"connected": false,
|
|
"error": "docker client not initialized",
|
|
}
|
|
}
|
|
|
|
start := time.Now()
|
|
if err := s.docker.Ping(ctx); err != nil {
|
|
return map[string]any{
|
|
"connected": false,
|
|
"error": err.Error(),
|
|
"latency_ms": time.Since(start).Milliseconds(),
|
|
}
|
|
}
|
|
|
|
out := map[string]any{
|
|
"connected": true,
|
|
"latency_ms": time.Since(start).Milliseconds(),
|
|
}
|
|
|
|
// Info enriches the payload; failures are non-fatal.
|
|
info, err := s.docker.Info(ctx)
|
|
if err == nil {
|
|
if info.Version != "" {
|
|
out["version"] = info.Version
|
|
}
|
|
if info.APIVersion != "" {
|
|
out["api_version"] = info.APIVersion
|
|
}
|
|
if info.OS != "" {
|
|
out["os"] = info.OS
|
|
}
|
|
if info.Arch != "" {
|
|
out["arch"] = info.Arch
|
|
}
|
|
if info.Kernel != "" {
|
|
out["kernel"] = info.Kernel
|
|
}
|
|
if info.OperatingSystem != "" {
|
|
out["operating_system"] = info.OperatingSystem
|
|
}
|
|
if info.StorageDriver != "" {
|
|
out["storage_driver"] = info.StorageDriver
|
|
}
|
|
if info.RootDir != "" {
|
|
out["root_dir"] = info.RootDir
|
|
}
|
|
if info.Name != "" {
|
|
out["name"] = info.Name
|
|
}
|
|
if info.NCPU > 0 {
|
|
out["ncpu"] = info.NCPU
|
|
}
|
|
if info.MemoryTotal > 0 {
|
|
out["memory_total"] = info.MemoryTotal
|
|
}
|
|
out["containers"] = info.Containers
|
|
out["running"] = info.Running
|
|
out["paused"] = info.Paused
|
|
out["stopped"] = info.Stopped
|
|
out["images"] = info.Images
|
|
}
|
|
|
|
return out
|
|
}
|
|
|
|
// proxyHealth probes the configured proxy provider. For NPM, attaches
|
|
// aggregate counts (proxy hosts, access lists, certificates) which the
|
|
// dashboard surfaces alongside the connection indicator.
|
|
func (s *Server) proxyHealth(ctx context.Context) map[string]any {
|
|
providerName := s.proxyProvider.Name()
|
|
|
|
start := time.Now()
|
|
err := s.proxyProvider.Ping(ctx)
|
|
latency := time.Since(start).Milliseconds()
|
|
|
|
if err != nil {
|
|
return map[string]any{
|
|
"provider": providerName,
|
|
"connected": false,
|
|
"error": providerName + " unreachable: " + err.Error(),
|
|
"latency_ms": latency,
|
|
}
|
|
}
|
|
|
|
out := map[string]any{
|
|
"provider": providerName,
|
|
"connected": true,
|
|
"latency_ms": latency,
|
|
}
|
|
|
|
// Attach configured URL from settings for both NPM and Traefik.
|
|
if settings, serr := s.store.GetSettings(); serr == nil {
|
|
switch providerName {
|
|
case "npm":
|
|
if settings.NpmURL != "" {
|
|
out["url"] = settings.NpmURL
|
|
}
|
|
case "traefik":
|
|
if settings.TraefikAPIURL != "" {
|
|
out["url"] = settings.TraefikAPIURL
|
|
}
|
|
}
|
|
}
|
|
|
|
// NPM-specific aggregates — a quick glance at route/list/cert counts.
|
|
// These calls require an authenticated NPM session, so we trigger the
|
|
// provider's auth step first (it's cheap: cached JWT is reused for 1h).
|
|
if providerName == "npm" && s.npm != nil {
|
|
if np, ok := s.proxyProvider.(*proxy.NpmProvider); ok {
|
|
if err := np.Authenticate(ctx); err == nil {
|
|
if hosts, herr := s.npm.ListProxyHosts(ctx); herr == nil {
|
|
out["proxy_hosts"] = len(hosts)
|
|
}
|
|
if lists, lerr := s.npm.ListAccessLists(ctx); lerr == nil {
|
|
out["access_lists"] = len(lists)
|
|
}
|
|
if certs, cerr := s.npm.ListCertificates(ctx); cerr == nil {
|
|
out["certificates"] = len(certs)
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
// Managed-route count — how many of the proxy's routes were deployed
|
|
// by Tinyforge itself, counting both Docker instances and static sites.
|
|
// This works for every provider (NPM, Traefik, …) because it reads from
|
|
// our own store, not the external proxy API.
|
|
if managed, merr := s.managedRouteCount(); merr == nil {
|
|
out["proxy_hosts_managed"] = managed
|
|
}
|
|
|
|
return out
|
|
}
|
|
|
|
// managedRouteCount returns the number of proxy routes Tinyforge manages,
|
|
// reading from the unified containers index. The domain argument doesn't
|
|
// affect the count so we pass an empty string to skip FQDN rendering.
|
|
func (s *Server) managedRouteCount() (int, error) {
|
|
routes, err := s.store.ListProxyRoutes("")
|
|
if err != nil {
|
|
return 0, err
|
|
}
|
|
return len(routes), nil
|
|
}
|