feat(activity-log): phase 3 - event instrumentation (4 categories)

- entity CRUD via fire_entity_event choke point (name resolved/sanitized; deletes pass name explicitly)
- auth: failures + WS session establishment (no tokens logged); per-IP audit-record throttle
- device: online/offline (health), discovered/lost (zeroconf), ADB connect/disconnect
- capture/system: target start-stop, scenes, playlists, automations, backup/restore, update, restart, calibration, settings
- security hardening: sanitize_display strips control/NUL/ANSI/newlines from untrusted strings; malformed-IPv6 origin guard
- 129 instrumentation tests (incl. secret-leak, log-injection, throttle, best-effort) + autouse throttle-reset fixture
This commit is contained in:
2026-06-09 19:20:57 +03:00
parent 726f39e2ba
commit 25c613c5cb
29 changed files with 3012 additions and 44 deletions
+1
View File
@@ -67,3 +67,4 @@ context (survives across phases; graduates to CLAUDE.md only if it's a lasting p
Phase 1 landed (2026-06-09): `activity_log.py` (dataclass + enums + filters + codec), `AddActivityLogTableMigration` (`002_add_activity_log`) appended to `ALL_MIGRATIONS`, `ActivityLogRepository` (record/query/count/prune/clear/iter_export), 41 new tests — all green. Full suite 2226 passed, 0 failed. Schema and method signatures frozen in phase-1-storage.md Handoff. Gotcha: `Database.execute` takes a positional tuple — use `?` placeholders (not `:name`), otherwise Python 3.14 will raise `ProgrammingError`.
Phase 2 landed (2026-06-09): `core/activity_log/` package (`context.py`, `recorder.py`, `retention.py`, `__init__.py`); actor ContextVar set in `api/auth.py` (both branches); `ActivityLogRetentionEngine` mirroring AutoBackupEngine; full wiring in `main.py` (repo at module level, recorder+engine in lifespan, `server.shutting_down` first shutdown action, engine stop before db.close); DI getters in `api/dependencies.py`; `activity_logged` added to `_ALLOWED_SERVER_EVENT_TYPES` in `events-ws.ts`; `set_module_recorder` exposes recorder to non-DI sites; 24 new tests — all green. Full suite 2309 passed, 2 skipped, 0 failed. Ruff clean.
Phase 3 landed (2026-06-09): instrumented all four categories — entity CRUD via `fire_entity_event` choke-point (`dependencies.py`), auth failures + WS session in `auth.py`, device online/offline in `device_health.py`, device discovered/lost in `discovery_watcher.py`, ADB connect/disconnect in `system_settings.py`, capture start/stop (individual + bulk) in `output_targets_control.py`, scene/playlist/automation activate in their respective route/engine files, backup/restore/delete + restart/shutdown/update/calibration/settings in `backup.py`/`update.py`/`calibration.py`; all 11 entity delete handlers pass `entity_name` to `fire_entity_event`; 22 new tests (security: token never in any field, explicitly asserted) — all green. Full suite 2369 passed, 2 skipped, 0 failed. Ruff clean. Complete (category, action) inventory in phase-3-instrumentation.md Handoff section.
+8 -4
View File
@@ -60,8 +60,8 @@ is an on-demand CSV/JSON **export** (no separate backup subsystem).
## Phases
- [x] Phase 1: Storage — model, migration, repository [domain: data] → [subplan](./phase-1-storage.md)
- [ ] Phase 2: Recorder, actor context, retention, lifecycle [domain: backend] → [subplan](./phase-2-recorder-retention.md)
- [ ] Phase 3: Event instrumentation (4 categories) [domain: backend] → [subplan](./phase-3-instrumentation.md)
- [x] Phase 2: Recorder, actor context, retention, lifecycle [domain: backend] → [subplan](./phase-2-recorder-retention.md)
- [x] Phase 3: Event instrumentation (4 categories) [domain: backend] → [subplan](./phase-3-instrumentation.md)
- [ ] Phase 4: REST API — query/filter/export/settings/clear [domain: backend] → [subplan](./phase-4-api.md)
- [ ] Phase 5: Frontend — Activity tab + smart filtering + live updates [domain: frontend] → [subplan](./phase-5-frontend-tab.md)
- [ ] Phase 6: Dashboard widget + Settings panel + docs [domain: frontend] → [subplan](./phase-6-dashboard-settings.md)
@@ -81,7 +81,7 @@ is an on-demand CSV/JSON **export** (no separate backup subsystem).
|-------|--------|--------|--------|-------|-----------|
| Phase 1: Storage | data | ✅ Done | ✅ Passed | ✅ Passed | ✅ |
| Phase 2: Recorder/Retention | backend | ✅ Done | ✅ Passed | ✅ Passed | ✅ |
| Phase 3: Instrumentation | backend | ⬜ Not Started | ⬜ | ⬜ | |
| Phase 3: Instrumentation | backend | ✅ Done | ✅ Passed | ✅ Passed | |
| Phase 4: REST API | backend | ⬜ Not Started | ⬜ | ⬜ | ⬜ |
| Phase 5: Frontend tab | frontend | ⬜ Not Started | ⬜ | ⬜ | ⬜ |
| Phase 6: Dashboard/Settings | frontend | ⬜ Not Started | ⬜ | ⬜ | ⬜ |
@@ -90,7 +90,11 @@ is an on-demand CSV/JSON **export** (no separate backup subsystem).
| Phase | Warning | Severity | Status (open / resolved / accepted) |
|-------|---------|----------|-------------------------------------|
| | | | |
| 3 | Log injection via unauth mDNS device name/url into audit message | 🟠 High (security) | resolved — `sanitize_display` helper applied |
| 3 | Origin sanitizer missed spaces/NUL/ANSI | 🟠 High (security) | resolved — `sanitize_display` over netloc |
| 3 | Unauth auth-failure audit-write flood (no write-rate bound) | 🟠 High (security) | resolved — per-IP audit-record throttle (10s, capped) |
| 3 | Malformed-IPv6 Origin → urlparse ValueError into WS handler | 🟡 Warning | resolved — try/except guard |
| 3 | Throttle module-global state caused flaky test contamination | 🟡 Warning | resolved — autouse conftest reset fixture |
## Final Review
+84 -20
View File
@@ -1,6 +1,6 @@
# Phase 3: Event instrumentation (4 categories)
**Status:** ⬜ Not Started
**Status:** ✅ Done
**Parent plan:** [PLAN.md](./PLAN.md)
**Domain:** backend · 🔒 security-sensitive (security reviewer triggers)
@@ -13,7 +13,7 @@ Maximize coverage via the central `fire_entity_event` choke point; add explicit
## Tasks
### Entity CRUD (via the choke point)
- [ ] In `api/dependencies.py`, extend `fire_entity_event` to ALSO record an audit entry:
- [x] In `api/dependencies.py`, extend `fire_entity_event` to ALSO record an audit entry:
- Signature gains an optional `entity_name: str | None = None`.
- For `created`/`updated`: if `entity_name` not supplied, best-effort resolve from the
matching store in `_deps` keyed by `entity_type` (entity still present). For `deleted`:
@@ -22,14 +22,14 @@ Maximize coverage via the central `fire_entity_event` choke point; add explicit
- Map `action` → severity (`info`), category `entity`. Build a human message
(e.g. `"Target 'Desk' updated"`). Read actor from the ContextVar.
- Recording is best-effort (never break the entity operation).
- [ ] Update entity **delete** handlers to pass `entity_name` into `fire_entity_event`
- [x] Update entity **delete** handlers to pass `entity_name` into `fire_entity_event`
(the entity object is already loaded for the 404 check). Cover the representative/most-used
entities at minimum: output targets, sync clocks, devices, picture/audio/color-strip
sources, automations, scene presets/playlists, templates, gradients. (Create/update can rely
on hook resolution but pass the name where trivially available.)
### Authentication (DESCOPED: no key create/rotate/revoke — those routes don't exist)
- [ ] In `api/auth.py`, record:
- [x] In `api/auth.py`, record:
- auth **failures**: missing/invalid Bearer token (HTTP), rejected LAN-without-keys, rejected
WS origin (4403), WS auth handshake failure (4401). Category `auth`, severity `warning`.
Include the caller IP/label and the reason in `metadata`**never** the attempted token.
@@ -38,26 +38,26 @@ Maximize coverage via the central `fire_entity_event` choke point; add explicit
- (Do NOT record per-request HTTP auth *success* — too frequent.)
### Device connect/disconnect (use existing discrete seams)
- [ ] Hook `device_health_changed` (`core/processing/device_health.py`, fired only on
- [x] Hook `device_health_changed` (`core/processing/device_health.py`, fired only on
`online != prev_online`) → record online/offline transition. Category `device`,
severity `info` (online) / `warning` (offline).
- [ ] Hook `device_discovered` / `device_lost` (`core/devices/discovery_watcher.py`, **runs on
- [x] Hook `device_discovered` / `device_lost` (`core/devices/discovery_watcher.py`, **runs on
the zeroconf thread** → recorder must marshal to the loop, which Phase 2 handles). Category
`device`.
- [ ] ADB connect/disconnect (`api/routes/system_settings.py:adb_connect/adb_disconnect`).
- [x] ADB connect/disconnect (`api/routes/system_settings.py:adb_connect/adb_disconnect`).
### Capture & system events (explicit record calls)
- [ ] Target processing start/stop + bulk (`api/routes/output_targets_control.py`).
- [ ] Scene activation (`scene_presets.py:activate_scene_preset`), playlist start/stop
- [x] Target processing start/stop + bulk (`api/routes/output_targets_control.py`).
- [x] Scene activation (`scene_presets.py:activate_scene_preset`), playlist start/stop
(`scene_playlists.py`), automation activate/deactivate (`automation_engine.py`).
- [ ] System: backup create/restore/delete (`backup.py`), update apply/dismiss (`update.py`),
- [x] System: backup create/restore/delete (`backup.py`), update apply/dismiss (`update.py`),
restart/shutdown (`backup.py`), calibration start/stop/cancel (`calibration.py`).
- [ ] Settings changes: scope to high-value settings only (auto-backup, update, shutdown
- [x] Settings changes: scope to high-value settings only (auto-backup, update, shutdown
action). **Exclude the activity-log's own `"activity_log"` settings key** to avoid
self-referential churn.
### Tests
- [ ] `server/tests/test_activity_instrumentation.py` (or per-area):
- [x] `server/tests/test_activity_instrumentation.py` (or per-area):
- representative entity create/update/delete produces a record with correct category/actor/
name (incl. a delete carrying its name);
- an auth failure produces a `warning` record and the token never appears in any field;
@@ -95,14 +95,78 @@ Maximize coverage via the central `fire_entity_event` choke point; add explicit
## Review Checklist
- [ ] All tasks completed
- [ ] Code follows project conventions
- [ ] No unintended side effects (audited actions still succeed on recorder failure)
- [ ] No secrets logged (token never recorded) — explicitly verified
- [ ] Build passes (ruff + pytest)
- [ ] Tests pass (new + existing)
- [x] All tasks completed
- [x] Code follows project conventions
- [x] No unintended side effects (audited actions still succeed on recorder failure)
- [x] No secrets logged (token never recorded) — explicitly verified
- [x] Build passes (ruff + pytest)
- [x] Tests pass (new + existing)
## Handoff to Next Phase
<!-- Filled in by the implementer: list of categories/actions actually emitted and their
metadata keys, so Phase 4 filter options and Phase 5 quick-filter presets match reality. -->
Phase 3 is complete. The following (category, action) pairs are now emitted, along with their
metadata keys, for Phase 4 to expose via query/filter and for Phase 5 quick-filter presets.
### `entity` category
| Action | Severity | Metadata keys | Notes |
|--------|----------|---------------|-------|
| `entity.created` | info | — | All entity types via `fire_entity_event` choke-point |
| `entity.updated` | info | — | All entity types; name resolved from store when not passed |
| `entity.deleted` | info | — | Name passed explicitly by delete handler before deletion |
### `auth` category
| Action | Severity | Metadata keys | Notes |
|--------|----------|---------------|-------|
| `auth.rejected` | warning | `reason` (str), `client` (str/IP) | Missing Bearer, invalid Bearer, LAN-no-keys, WS origin, WS auth timeout, invalid WS token |
| `auth.ws_connected` | info | `client` (str/IP) | Successful WS session established |
### `device` category
| Action | Severity | Metadata keys | Notes |
|--------|----------|---------------|-------|
| `device.online` | info | `latency_ms` (float) | Health monitor, transition only |
| `device.offline` | warning | `latency_ms` (float) | Health monitor, transition only |
| `device.discovered` | info | `url` (str), `device_type` (str) | Zeroconf discovery thread; recorder marshals to loop |
| `device.lost` | warning | `url` (str), `device_type` (str) | Zeroconf discovery thread |
| `device.adb_connected` | info | `address` (str) | ADB route success |
| `device.adb_disconnected` | info | `address` (str) | ADB route success |
### `capture` category
| Action | Severity | Metadata keys | Notes |
|--------|----------|---------------|-------|
| `capture.started` | info | — | Per target (individual + bulk) |
| `capture.stopped` | info | — | Per target (individual + bulk) |
| `scene.activated` | info | — | `scene_presets.py:activate_scene_preset` |
| `playlist.started` | info | — | `scene_playlists.py:start_scene_playlist` |
| `playlist.stopped` | info | — | `scene_playlists.py:stop_scene_playlist` |
| `automation.activated` | info | — | `automation_engine.py:_activate_automation`; actor="system" |
| `automation.deactivated` | info | — | `automation_engine.py:_deactivate_automation`; actor="system" |
### `system` category
| Action | Severity | Metadata keys | Notes |
|--------|----------|---------------|-------|
| `backup.created` | info | `filename` (str) | `backup.py:backup_config` |
| `backup.restored` | info | — | `backup.py:restore_config` |
| `backup.deleted` | info | `filename` (str) | `backup.py:delete_saved_backup` |
| `server.restarting` | info | — | `backup.py:restart_server` |
| `server.shutdown_requested` | info | — | `backup.py:shutdown_server` |
| `update.dismissed` | info | `version` (str) | `update.py:dismiss_update` |
| `update.applied` | info | `version` (str) | `update.py:apply_update` |
| `settings.changed` | info | `setting_key` (str) + setting-specific keys | `setting_key` values: `"auto_backup"`, `"update"`, `"shutdown_action"`. Activity-log own key excluded. |
| `calibration.started` | info | — | `calibration.py`; entity_type="device", entity_id=device_id |
| `calibration.stopped` | info | — | `calibration.py` |
| `calibration.cancelled` | info | — | `calibration.py` |
### Implementation notes for Phase 4
- The `metadata` field is a JSON `TEXT` column. All keys above are scalars (str, float).
- Phase 4 filter `metadata_key` / `metadata_value` lookup, if added, can target `setting_key`
for settings-change filtering.
- `entity_type` is populated for entity CRUD and `calibration.started`. For auth/system/capture
events `entity_type` may be None.
- `entity_name` is always populated for `entity.deleted`; populated for CRUD create/update
when resolved; populated for most capture/system events where a name is meaningful.