Files
ledgrab/plans/activity-log/phase-3-instrumentation.md
T
alexei.dolgolyov 25c613c5cb feat(activity-log): phase 3 - event instrumentation (4 categories)
- entity CRUD via fire_entity_event choke point (name resolved/sanitized; deletes pass name explicitly)
- auth: failures + WS session establishment (no tokens logged); per-IP audit-record throttle
- device: online/offline (health), discovered/lost (zeroconf), ADB connect/disconnect
- capture/system: target start-stop, scenes, playlists, automations, backup/restore, update, restart, calibration, settings
- security hardening: sanitize_display strips control/NUL/ANSI/newlines from untrusted strings; malformed-IPv6 origin guard
- 129 instrumentation tests (incl. secret-leak, log-injection, throttle, best-effort) + autouse throttle-reset fixture
2026-06-09 19:20:57 +03:00

173 lines
9.7 KiB
Markdown

# Phase 3: Event instrumentation (4 categories)
**Status:** ✅ Done
**Parent plan:** [PLAN.md](./PLAN.md)
**Domain:** backend · 🔒 security-sensitive (security reviewer triggers)
## Objective
Emit audit records at the real call sites for all four categories, using the Phase 2 recorder.
Maximize coverage via the central `fire_entity_event` choke point; add explicit
`recorder.record(...)` calls for non-entity events. Never log secrets.
## Tasks
### Entity CRUD (via the choke point)
- [x] In `api/dependencies.py`, extend `fire_entity_event` to ALSO record an audit entry:
- Signature gains an optional `entity_name: str | None = None`.
- For `created`/`updated`: if `entity_name` not supplied, best-effort resolve from the
matching store in `_deps` keyed by `entity_type` (entity still present). For `deleted`:
**do not** resolve post-hoc — rely on the explicit `entity_name` passed by the handler
(deletes are the most important; a name-less delete entry is unacceptable).
- Map `action` → severity (`info`), category `entity`. Build a human message
(e.g. `"Target 'Desk' updated"`). Read actor from the ContextVar.
- Recording is best-effort (never break the entity operation).
- [x] Update entity **delete** handlers to pass `entity_name` into `fire_entity_event`
(the entity object is already loaded for the 404 check). Cover the representative/most-used
entities at minimum: output targets, sync clocks, devices, picture/audio/color-strip
sources, automations, scene presets/playlists, templates, gradients. (Create/update can rely
on hook resolution but pass the name where trivially available.)
### Authentication (DESCOPED: no key create/rotate/revoke — those routes don't exist)
- [x] In `api/auth.py`, record:
- auth **failures**: missing/invalid Bearer token (HTTP), rejected LAN-without-keys, rejected
WS origin (4403), WS auth handshake failure (4401). Category `auth`, severity `warning`.
Include the caller IP/label and the reason in `metadata`**never** the attempted token.
- WS **session establishment** (successful `accept_and_authenticate_ws`): category `auth`,
severity `info`, actor = authenticated label.
- (Do NOT record per-request HTTP auth *success* — too frequent.)
### Device connect/disconnect (use existing discrete seams)
- [x] Hook `device_health_changed` (`core/processing/device_health.py`, fired only on
`online != prev_online`) → record online/offline transition. Category `device`,
severity `info` (online) / `warning` (offline).
- [x] Hook `device_discovered` / `device_lost` (`core/devices/discovery_watcher.py`, **runs on
the zeroconf thread** → recorder must marshal to the loop, which Phase 2 handles). Category
`device`.
- [x] ADB connect/disconnect (`api/routes/system_settings.py:adb_connect/adb_disconnect`).
### Capture & system events (explicit record calls)
- [x] Target processing start/stop + bulk (`api/routes/output_targets_control.py`).
- [x] Scene activation (`scene_presets.py:activate_scene_preset`), playlist start/stop
(`scene_playlists.py`), automation activate/deactivate (`automation_engine.py`).
- [x] System: backup create/restore/delete (`backup.py`), update apply/dismiss (`update.py`),
restart/shutdown (`backup.py`), calibration start/stop/cancel (`calibration.py`).
- [x] Settings changes: scope to high-value settings only (auto-backup, update, shutdown
action). **Exclude the activity-log's own `"activity_log"` settings key** to avoid
self-referential churn.
### Tests
- [x] `server/tests/test_activity_instrumentation.py` (or per-area):
- representative entity create/update/delete produces a record with correct category/actor/
name (incl. a delete carrying its name);
- an auth failure produces a `warning` record and the token never appears in any field;
- a device health transition and a discovery event produce records;
- a capture start and a backup/restore produce records.
## Files to Modify/Create
- `server/src/ledgrab/api/dependencies.py` — modify: `fire_entity_event` records + `entity_name`
- entity **delete** route handlers under `api/routes/` — modify: pass `entity_name`
- `server/src/ledgrab/api/auth.py` — modify: auth-failure + WS-session records
- `server/src/ledgrab/core/processing/device_health.py` — modify: online/offline record
- `server/src/ledgrab/core/devices/discovery_watcher.py` — modify: discovered/lost record
- `server/src/ledgrab/api/routes/system_settings.py` — modify: ADB + settings records
- `server/src/ledgrab/api/routes/output_targets_control.py` — modify: start/stop records
- `server/src/ledgrab/api/routes/{scene_presets,scene_playlists,backup,update,calibration}.py` — modify
- `server/src/ledgrab/core/automations/automation_engine.py` — modify: activate/deactivate records
- `server/tests/test_activity_instrumentation.py` — new
## Acceptance Criteria
- All four categories emit records at the named sites; entity deletes carry the entity name.
- API-key tokens / secrets never appear in any audit field (test-enforced).
- Recording never breaks the audited action (best-effort; failures swallowed + logged).
- Actor is the authenticated label for request-originated events, `"system"` for engine/thread
events. New + existing tests green; `ruff` clean.
## Notes
- Get the recorder via the Phase 2 DI getter; for engine/thread sites that lack DI, use the
module singleton/accessor Phase 2 exposes.
- Keep messages human-readable and localized-agnostic (English source strings; the frontend
renders structured fields, not server message translation — message is a fallback/summary).
- This is the security-sensitive phase — the security reviewer runs here AND at final review.
## Review Checklist
- [x] All tasks completed
- [x] Code follows project conventions
- [x] No unintended side effects (audited actions still succeed on recorder failure)
- [x] No secrets logged (token never recorded) — explicitly verified
- [x] Build passes (ruff + pytest)
- [x] Tests pass (new + existing)
## Handoff to Next Phase
Phase 3 is complete. The following (category, action) pairs are now emitted, along with their
metadata keys, for Phase 4 to expose via query/filter and for Phase 5 quick-filter presets.
### `entity` category
| Action | Severity | Metadata keys | Notes |
|--------|----------|---------------|-------|
| `entity.created` | info | — | All entity types via `fire_entity_event` choke-point |
| `entity.updated` | info | — | All entity types; name resolved from store when not passed |
| `entity.deleted` | info | — | Name passed explicitly by delete handler before deletion |
### `auth` category
| Action | Severity | Metadata keys | Notes |
|--------|----------|---------------|-------|
| `auth.rejected` | warning | `reason` (str), `client` (str/IP) | Missing Bearer, invalid Bearer, LAN-no-keys, WS origin, WS auth timeout, invalid WS token |
| `auth.ws_connected` | info | `client` (str/IP) | Successful WS session established |
### `device` category
| Action | Severity | Metadata keys | Notes |
|--------|----------|---------------|-------|
| `device.online` | info | `latency_ms` (float) | Health monitor, transition only |
| `device.offline` | warning | `latency_ms` (float) | Health monitor, transition only |
| `device.discovered` | info | `url` (str), `device_type` (str) | Zeroconf discovery thread; recorder marshals to loop |
| `device.lost` | warning | `url` (str), `device_type` (str) | Zeroconf discovery thread |
| `device.adb_connected` | info | `address` (str) | ADB route success |
| `device.adb_disconnected` | info | `address` (str) | ADB route success |
### `capture` category
| Action | Severity | Metadata keys | Notes |
|--------|----------|---------------|-------|
| `capture.started` | info | — | Per target (individual + bulk) |
| `capture.stopped` | info | — | Per target (individual + bulk) |
| `scene.activated` | info | — | `scene_presets.py:activate_scene_preset` |
| `playlist.started` | info | — | `scene_playlists.py:start_scene_playlist` |
| `playlist.stopped` | info | — | `scene_playlists.py:stop_scene_playlist` |
| `automation.activated` | info | — | `automation_engine.py:_activate_automation`; actor="system" |
| `automation.deactivated` | info | — | `automation_engine.py:_deactivate_automation`; actor="system" |
### `system` category
| Action | Severity | Metadata keys | Notes |
|--------|----------|---------------|-------|
| `backup.created` | info | `filename` (str) | `backup.py:backup_config` |
| `backup.restored` | info | — | `backup.py:restore_config` |
| `backup.deleted` | info | `filename` (str) | `backup.py:delete_saved_backup` |
| `server.restarting` | info | — | `backup.py:restart_server` |
| `server.shutdown_requested` | info | — | `backup.py:shutdown_server` |
| `update.dismissed` | info | `version` (str) | `update.py:dismiss_update` |
| `update.applied` | info | `version` (str) | `update.py:apply_update` |
| `settings.changed` | info | `setting_key` (str) + setting-specific keys | `setting_key` values: `"auto_backup"`, `"update"`, `"shutdown_action"`. Activity-log own key excluded. |
| `calibration.started` | info | — | `calibration.py`; entity_type="device", entity_id=device_id |
| `calibration.stopped` | info | — | `calibration.py` |
| `calibration.cancelled` | info | — | `calibration.py` |
### Implementation notes for Phase 4
- The `metadata` field is a JSON `TEXT` column. All keys above are scalars (str, float).
- Phase 4 filter `metadata_key` / `metadata_value` lookup, if added, can target `setting_key`
for settings-change filtering.
- `entity_type` is populated for entity CRUD and `calibration.started`. For auth/system/capture
events `entity_type` may be None.
- `entity_name` is always populated for `entity.deleted`; populated for CRUD create/update
when resolved; populated for most capture/system events where a name is meaningful.