Files
ledgrab/plans/activity-log/PLAN.md
T
alexei.dolgolyov 4a0927521a feat(activity-log): phase 4 - REST API (list/export/settings/clear)
- GET /activity-log: filtered, keyset-paginated list (categories/severities/actor/entity/date/q)
- GET /activity-log/export: streaming CSV/JSON, chunked keyset (releases DB lock per batch), CSV formula-injection guard
- GET/PUT /activity-log/settings: retention config (PUT require_authenticated)
- DELETE /activity-log: clear (require_authenticated, self-audited)
- security: export DoS fix, settings-PUT auth gate, CSV \t/\r guard, metadata-as-JSON
- 122 API tests (auth posture, CSV injection, pagination integrity, filters, settings bounds, clear-audited)
2026-06-09 20:09:46 +03:00

7.7 KiB

Feature: Activity / Audit Log

Branch: feature/activity-log Base branch: master (merge target) Branch point: 17dd2e02bab4d00a93479eb6af1a8c6ddc0c7224 (use for clean review diffs) Created: 2026-06-09 Status: 🟡 In Progress Strategy: Incremental Mode: Automated Execution: Orchestrator Remote: origin → https://git.dolgolyov-family.by/alexei.dolgolyov/ledgrab.git

Summary

A persistent, queryable audit log of meaningful LedGrab actions, surfaced in the WebUI. Captures four categories — entity CRUD, authentication, device connect/disconnect, and capture & system events — as action-metadata-only records (who/what/when + entity type/name/id + a human-readable message + small structured metadata; no before/after diffs). Surfaced as a dedicated top-level Activity tab with smart filtering + live updates, a compact Recent Activity widget on the Dashboard, and a Settings panel for retention. Durability rides on the existing whole-DB ledgrab.db backup; portability is an on-demand CSV/JSON export (no separate backup subsystem).

Design pillars (the load-bearing decisions)

  1. Dedicated activity_log table + repository — NOT BaseSqliteStore. That base loads every row into an in-memory cache and uses a generic id/name/data blob — wrong for an append-heavy, unbounded log. We use a purpose-built indexed table with query-on-demand keyset pagination.
  2. Central choke-point instrumentation. fire_entity_event() (api/dependencies.py:202) is already called by every entity route on create/update/delete and has _deps access to resolve names. The recorder hooks there for all entity CRUD. Non-entity events get explicit recorder.record(...) calls.
  3. Actor via ContextVar. Set inside verify_api_key (next to request.state.auth_label), default "system", reset per-request. The recorder reads it without threading actor through every call.
  4. Direct synchronous write on the event-loop thread (no separate buffered-writer subsystem — simpler, and the request already did a synchronous=FULL entity write). Cross-thread callers (zeroconf discovery thread) marshal via loop.call_soon_threadsafe, mirroring utils/log_broadcaster.py. The "server shutting down" event is recorded as the FIRST action in the lifespan shutdown block, before any teardown.
  5. Reuse the existing realtime bus. A new activity_logged event over /api/v1/events/ws (one events-ws.ts allowlist entry + test_events_ws_parity.py update). No new socket.
  6. Never log secrets. API-key tokens are never stored — only labels/ids.
  7. Differentiate from the existing Log Viewer. utils/log_broadcaster.py is an ephemeral 500-line debug-log tail. The audit log is persistent, structured, semantic. Cross-link in the Settings panel; never duplicate.

Build & Test Commands

  • Build (frontend): cd server && npm run build
  • Type-check (TS): cd server && npx tsc --noEmit
  • Test: cd server && py -3.13 -m pytest tests/ --no-cov -q
  • Lint (Python): cd server && ruff check src/ tests/ --fix
  • Events parity test (load-bearing for P2): included in pytest (tests/test_events_ws_parity.py)

Scope checks to the files actually edited: backend phases run ruff + pytest; frontend-only phases run tsc --noEmit + npm run build (no pytest/ruff). Phases touching both run both.

Phases

  • Phase 1: Storage — model, migration, repository [domain: data] → subplan
  • Phase 2: Recorder, actor context, retention, lifecycle [domain: backend] → subplan
  • Phase 3: Event instrumentation (4 categories) [domain: backend] → subplan
  • Phase 4: REST API — query/filter/export/settings/clear [domain: backend] → subplan
  • Phase 5: Frontend — Activity tab + smart filtering + live updates [domain: frontend] → subplan
  • Phase 6: Dashboard widget + Settings panel + docs [domain: frontend] → subplan

Parallelizable Phase Groups (Orchestrator mode only)

  • Phases 3 and 4 are parallelizable only after the schema is frozen at end of Phase 2. Both depend on the P2 recorder/schema; P4 registers the router in api/__init__.py, P3 edits dependencies.py/auth.py (different files, shared schema contract). To keep the Automated run simple and low-risk, they run sequentially (3 → 4) unless time pressure warrants worktree-isolated parallelism. Phases 5 → 6 are sequential (6 reuses P5 formatters and the feature module).

Phase Progress Log

Phase Domain Status Review Build Committed
Phase 1: Storage data Done Passed Passed
Phase 2: Recorder/Retention backend Done Passed Passed
Phase 3: Instrumentation backend Done Passed Passed
Phase 4: REST API backend Done Passed Passed
Phase 5: Frontend tab frontend Not Started
Phase 6: Dashboard/Settings frontend Not Started

Outstanding Warnings

Phase Warning Severity Status (open / resolved / accepted)
3 Log injection via unauth mDNS device name/url into audit message 🟠 High (security) resolved — sanitize_display helper applied
3 Origin sanitizer missed spaces/NUL/ANSI 🟠 High (security) resolved — sanitize_display over netloc
3 Unauth auth-failure audit-write flood (no write-rate bound) 🟠 High (security) resolved — per-IP audit-record throttle (10s, capped)
3 Malformed-IPv6 Origin → urlparse ValueError into WS handler 🟡 Warning resolved — try/except guard
3 Throttle module-global state caused flaky test contamination 🟡 Warning resolved — autouse conftest reset fixture
4 Export held global DB write-lock across the stream (slow-client DoS) 🟠 High (security) resolved — chunked keyset export releases lock per batch
4 PUT /settings only AuthRequired → anon could disable auditing/prune trail 🟠 High (security) resolved — require_authenticated on settings PUT
4 CSV formula-injection missed leading TAB/CR 🟡 Medium (security) resolved — added \t/\r to guard
4 total count full-scans on every list request 🔵 Low (perf) accepted — bounded by retention; read-only; optional opt-in deferred

Final Review

  • Comprehensive code review
  • Security review (auth/PII-in-logs/secrets/log-injection — triggered)
  • All Outstanding Warnings resolved or consciously accepted
  • Full build passes (npm run build + tsc --noEmit)
  • Full test suite passes (pytest)
  • Merged to master

Amendment Log

(Filled in if the plan is amended mid-implementation.)

  • 2026-06-09: Plan reviewer (pre-implementation) → ⚠️ with 3 Critical Gaps, all resolved before Phase 1: (G1) descoped non-existent API-key mutation events; (G2) dropped the buffered-writer subsystem for a direct synchronous-on-loop write with call_soon_threadsafe marshaling for thread-origin events; (G3) record "server shutting down" first in the shutdown block (no buffer to flush). Concerns folded in: device events via device_health_changed + device_discovered/_lost; actor ContextVar in verify_api_key; name-on-delete passed explicitly; settings-audit scoped + self-key excluded; parity allowlist before emit site. Adopted suggestions: rowid keyset tiebreaker; log the disable action. Deferred: setup-scaffold noise suppression.