Files
tiny-forge/plans/observability-proxy-mgmt/phase-8-stats-notifications.md
T
alexei.dolgolyov c38b7d4c78 feat(observability): phase 1 - schema, models & event log backend
Add database foundation for observability features:
- event_log table with severity/source filtering and pagination
- standalone_proxies table for user-created reverse proxies
- stale_threshold_days setting (default 7 days)
- Auto-persist warn/error events from event bus to database
- SSE broadcast of persistent events for real-time UI updates
- Frontend types and API functions for downstream UI phases
2026-03-30 10:59:13 +03:00

68 lines
3.5 KiB
Markdown

# Phase 8: Container Stats & Notifications
**Status:** ⬜ Not Started
**Parent plan:** [PLAN.md](./PLAN.md)
**Domain:** fullstack
## Objective
Add container resource monitoring (CPU/memory), notification triggers for operational events, and a system health dashboard summary.
## Tasks
- [ ] Task 1: Create `internal/docker/stats.go` — wrapper around Docker Stats API to get CPU %, memory usage/limit for a container
- [ ] Task 2: Add API endpoint: `GET /api/projects/{id}/stages/{stage}/instances/{iid}/stats` — returns current CPU/memory for an instance
- [ ] Task 3: Create SSE event type `container_stats` — periodically broadcast stats for running containers (every 30s)
- [ ] Task 4: Extend notification stub (`internal/notify/`) — implement webhook sender for events:
- Stale container detected
- Proxy health failure
- Deploy failure/rollback
- Format: JSON payload with event type, details, timestamp
- [ ] Task 5: Add notification settings UI — enable/disable per event type in settings page
- [ ] Task 6: Update instance cards in frontend — show CPU % bar and memory usage badge
- [ ] Task 7: Create ContainerStats component — mini CPU/memory visualization (progress bars)
- [ ] Task 8: Dashboard system health summary card — total containers (running/stopped), healthy/unhealthy proxies, recent error count (last 24h)
- [ ] Task 9: Wire notification sender to event bus — subscribe to relevant event types, fire notifications
- [ ] Task 10: Add event log pruning cron job — delete events older than 30 days (configurable)
- [ ] Task 11: Add i18n keys for stats and notifications
## Files to Modify/Create
- `internal/docker/stats.go` — NEW: Docker Stats API wrapper
- `internal/api/stats.go` — NEW: Stats HTTP handler
- `internal/api/router.go` — Mount stats endpoint
- `internal/notify/sender.go` — Implement webhook notification sender
- `internal/notify/types.go` — NEW: Notification event types and payloads
- `cmd/server/main.go` — Wire notification subscriber and event pruning cron
- `web/src/lib/types.ts` — Add ContainerStats, NotificationSettings types
- `web/src/lib/api.ts` — Add fetchContainerStats function
- `web/src/lib/components/ContainerStats.svelte` — NEW: CPU/memory display
- `web/src/lib/components/SystemHealthCard.svelte` — NEW: Dashboard summary
- `web/src/routes/+page.svelte` — Add system health card to dashboard
- `web/src/routes/settings/+page.svelte` — Add notification settings section
- `web/src/lib/sse.ts` — Add container_stats SSE handler
## Acceptance Criteria
- Container stats (CPU/memory) visible on instance cards
- Stats update in real-time via SSE
- Webhook notifications fire for configured event types
- Dashboard shows system health summary
- Event log auto-prunes old entries
- Settings page allows configuring notification preferences
- Build passes, existing tests pass
## Notes
- Docker Stats API returns a stream — read one snapshot and close, don't hold the connection
- CPU calculation: (container CPU delta / system CPU delta) * 100 — needs two reads
- Memory: usage_bytes / limit_bytes * 100 for percentage
- Notification webhook format should be compatible with common receivers (Slack webhook, Discord webhook, generic HTTP)
- System health card: consider caching aggregated stats to avoid N+1 queries on dashboard load
## Review Checklist
- [ ] All tasks completed
- [ ] Code follows project conventions
- [ ] No unintended side effects
- [ ] Build passes
- [ ] Tests pass (new + existing)
## Handoff to Next Phase
<!-- Filled in by the implementation agent after completing this phase. -->