ledgrab/plans/activity-log/PLAN.md

# Feature: Activity / Audit Log

**Branch:** `feature/activity-log`
**Base branch:** `master` (merge target)
**Branch point:** `17dd2e02bab4d00a93479eb6af1a8c6ddc0c7224` (use for clean review diffs)
**Created:** 2026-06-09
**Status:** 🟡 In Progress
**Strategy:** Incremental
**Mode:** Automated
**Execution:** Orchestrator
**Remote:** origin → https://git.dolgolyov-family.by/alexei.dolgolyov/ledgrab.git

## Summary

A persistent, queryable audit log of meaningful LedGrab actions, surfaced in the WebUI.
Captures four categories — entity CRUD, authentication, device connect/disconnect, and
capture & system events — as **action-metadata-only** records (who/what/when + entity
type/name/id + a human-readable message + small structured metadata; **no before/after
diffs**). Surfaced as a dedicated top-level **Activity** tab with smart filtering + live
updates, a compact **Recent Activity** widget on the Dashboard, and a **Settings** panel
for retention. Durability rides on the existing whole-DB `ledgrab.db` backup; portability
is an on-demand CSV/JSON **export** (no separate backup subsystem).

## Design pillars (the load-bearing decisions)

1. **Dedicated `activity_log` table + repository — NOT `BaseSqliteStore`.** That base loads
   every row into an in-memory cache and uses a generic `id/name/data` blob — wrong for an
   append-heavy, unbounded log. We use a purpose-built indexed table with query-on-demand
   keyset pagination.
2. **Central choke-point instrumentation.** `fire_entity_event()` (`api/dependencies.py:202`)
   is already called by every entity route on create/update/delete and has `_deps` access to
   resolve names. The recorder hooks there for all entity CRUD. Non-entity events get explicit
   `recorder.record(...)` calls.
3. **Actor via `ContextVar`.** Set inside `verify_api_key` (next to `request.state.auth_label`),
   default `"system"`, reset per-request. The recorder reads it without threading actor
   through every call.
4. **Direct synchronous write on the event-loop thread** (no separate buffered-writer
   subsystem — simpler, and the request already did a `synchronous=FULL` entity write).
   Cross-thread callers (zeroconf discovery thread) marshal via `loop.call_soon_threadsafe`,
   mirroring `utils/log_broadcaster.py`. The "server shutting down" event is recorded as the
   FIRST action in the lifespan shutdown block, before any teardown.
5. **Reuse the existing realtime bus.** A new `activity_logged` event over `/api/v1/events/ws`
   (one `events-ws.ts` allowlist entry + `test_events_ws_parity.py` update). No new socket.
6. **Never log secrets.** API-key *tokens* are never stored — only labels/ids.
7. **Differentiate from the existing Log Viewer.** `utils/log_broadcaster.py` is an ephemeral
   500-line debug-log tail. The audit log is persistent, structured, semantic. Cross-link in
   the Settings panel; never duplicate.

## Build & Test Commands

- **Build (frontend):** `cd server && npm run build`
- **Type-check (TS):** `cd server && npx tsc --noEmit`
- **Test:** `cd server && py -3.13 -m pytest tests/ --no-cov -q`
- **Lint (Python):** `cd server && ruff check src/ tests/ --fix`
- **Events parity test (load-bearing for P2):** included in pytest (`tests/test_events_ws_parity.py`)

> Scope checks to the files actually edited: backend phases run ruff + pytest; frontend-only
> phases run `tsc --noEmit` + `npm run build` (no pytest/ruff). Phases touching both run both.

## Phases

- [x] Phase 1: Storage — model, migration, repository [domain: data] → [subplan](./phase-1-storage.md)
- [x] Phase 2: Recorder, actor context, retention, lifecycle [domain: backend] → [subplan](./phase-2-recorder-retention.md)
- [x] Phase 3: Event instrumentation (4 categories) [domain: backend] → [subplan](./phase-3-instrumentation.md)
- [ ] Phase 4: REST API — query/filter/export/settings/clear [domain: backend] → [subplan](./phase-4-api.md)
- [ ] Phase 5: Frontend — Activity tab + smart filtering + live updates [domain: frontend] → [subplan](./phase-5-frontend-tab.md)
- [ ] Phase 6: Dashboard widget + Settings panel + docs [domain: frontend] → [subplan](./phase-6-dashboard-settings.md)

## Parallelizable Phase Groups (Orchestrator mode only)

- Phases 3 and 4 are **parallelizable only after the schema is frozen at end of Phase 2**.
  Both depend on the P2 recorder/schema; P4 registers the router in `api/__init__.py`, P3
  edits `dependencies.py`/`auth.py` (different files, shared schema contract). To keep the
  Automated run simple and low-risk, they run **sequentially** (3 → 4) unless time pressure
  warrants worktree-isolated parallelism. Phases 5 → 6 are sequential (6 reuses P5 formatters
  and the feature module).

## Phase Progress Log

| Phase | Domain | Status | Review | Build | Committed |
|-------|--------|--------|--------|-------|-----------|
| Phase 1: Storage | data | ✅ Done | ✅ Passed | ✅ Passed | ✅ |
| Phase 2: Recorder/Retention | backend | ✅ Done | ✅ Passed | ✅ Passed | ✅ |
| Phase 3: Instrumentation | backend | ✅ Done | ✅ Passed | ✅ Passed | ✅ |
| Phase 4: REST API | backend | ✅ Done | ✅ Passed | ✅ Passed | ✅ |
| Phase 5: Frontend tab | frontend | ⬜ Not Started | ⬜ | ⬜ | ⬜ |
| Phase 6: Dashboard/Settings | frontend | ⬜ Not Started | ⬜ | ⬜ | ⬜ |

## Outstanding Warnings

| Phase | Warning | Severity | Status (open / resolved / accepted) |
|-------|---------|----------|-------------------------------------|
| 3 | Log injection via unauth mDNS device name/url into audit message | 🟠 High (security) | resolved — `sanitize_display` helper applied |
| 3 | Origin sanitizer missed spaces/NUL/ANSI | 🟠 High (security) | resolved — `sanitize_display` over netloc |
| 3 | Unauth auth-failure audit-write flood (no write-rate bound) | 🟠 High (security) | resolved — per-IP audit-record throttle (10s, capped) |
| 3 | Malformed-IPv6 Origin → urlparse ValueError into WS handler | 🟡 Warning | resolved — try/except guard |
| 3 | Throttle module-global state caused flaky test contamination | 🟡 Warning | resolved — autouse conftest reset fixture |
| 4 | Export held global DB write-lock across the stream (slow-client DoS) | 🟠 High (security) | resolved — chunked keyset export releases lock per batch |
| 4 | PUT /settings only AuthRequired → anon could disable auditing/prune trail | 🟠 High (security) | resolved — `require_authenticated` on settings PUT |
| 4 | CSV formula-injection missed leading TAB/CR | 🟡 Medium (security) | resolved — added `\t`/`\r` to guard |
| 4 | `total` count full-scans on every list request | 🔵 Low (perf) | accepted — bounded by retention; read-only; optional opt-in deferred |

## Final Review

- [ ] Comprehensive code review
- [ ] Security review (auth/PII-in-logs/secrets/log-injection — triggered)
- [ ] All Outstanding Warnings resolved or consciously accepted
- [ ] Full build passes (`npm run build` + `tsc --noEmit`)
- [ ] Full test suite passes (`pytest`)
- [ ] Merged to `master`

## Amendment Log

_(Filled in if the plan is amended mid-implementation.)_

- 2026-06-09: Plan reviewer (pre-implementation) → ⚠️ with 3 Critical Gaps, all resolved
  before Phase 1: (G1) descoped non-existent API-key mutation events; (G2) dropped the
  buffered-writer subsystem for a direct synchronous-on-loop write with `call_soon_threadsafe`
  marshaling for thread-origin events; (G3) record "server shutting down" first in the shutdown
  block (no buffer to flush). Concerns folded in: device events via `device_health_changed` +
  `device_discovered/_lost`; actor ContextVar in `verify_api_key`; name-on-delete passed
  explicitly; settings-audit scoped + self-key excluded; parity allowlist before emit site.
  Adopted suggestions: rowid keyset tiebreaker; log the disable action. Deferred: setup-scaffold
  noise suppression.