ledgrab/REVIEW_TODO.md

# Production Review — Remaining Items

Output of the multi-agent production review (security / Python / TypeScript /
performance / architecture / code-quality). Each entry below is something
the original audit flagged and the autonomous hardening pass deliberately
did **not** address — either because it needs design input, profiling
validation, or a multi-day refactor that should land in its own session.

The hardening pass landed everything else: see git log between `master` and
the head of the review branch for the applied changes (URL-scheme +
malicious-input rejection, IconSelect XSS escape, MiniSelect for forbidden
plain `<select>`s, WebSocket Origin allow-list, /docs auth-gate, security
headers middleware, streaming upload size caps, fire-and-forget task
tracking + drain resilience in MQTT runtime, discovery_watcher task
tracking, asyncio.gather return_exceptions, secret_box encryption for MQTT
/ Hue / Govee credentials with auto-migration, SSRF-validated update
redirects, single source of truth for IP classification in
`utils/net_classify.py`, allowlist + parity test for inbound WS events,
typed `Window` globals, and more).

---

## Architecture refactors (multi-day — own session)

- [ ] **Split `core/processing/value_stream.py`** (1856 LOC, 14 stream classes)
      into a `value_streams/` package. Each value-stream type gets its own
      file ≤300 LOC; `manager.py` holds `ValueStreamManager`.
- [ ] **Split `storage/color_strip_source.py`** (1841 LOC, 18 source kinds)
      into a `color_strip_sources/` package mirroring `value_streams/`.
- [ ] **Frontend file splits** — `graph-editor.ts` (2707), `streams.ts`
      (2335), `value-sources.ts` (1889), `types.ts` (1062). Highest-churn
      modules; mixed UI / state / network responsibilities.
- [ ] **Layering reversal**: introduce a neutral `domain/` package and move
      shared DTOs (`FilterInstance`, `CalibrationConfig`, etc.) into it so
      `storage/` no longer imports `core/`. Eliminates 7+ layering
      violations and the lazy-import hacks used to break the resulting
      circulars.
- [ ] **`main.py` boot refactor** — extract import-time side effects into
      `bootstrap.py` + `create_app()` factory. `lifespan()` becomes the
      single place that wires stores and managers.
- [ ] **DI consolidation** — replace `api/dependencies.py` getter sprawl
      (30+ `get_*()` functions reading a process-global `_deps` dict) with
      a single typed `get_container()` dependency. Makes test-overrides
      trivial; ban direct getter calls in handler bodies.
- [ ] **Exception hierarchy** — define `ledgrab/errors.py` (`LedGrabError`,
      `NotFoundError`, `ValidationError`, `RemoteUnavailableError`,
      `SSRFBlockedError`). Move HTTP translation into a FastAPI exception
      handler. Stop raising `HTTPException` from `utils/safe_source.py`.
- [ ] **Lazy-import audit** — 289 in-function `from ledgrab.*` imports.
      Specifically `core/processing/daylight_settings.py` imports
      `api.dependencies` (core → api inversion). Pass the database in via
      the constructor instead of service-locator lookup.

## Performance (profile before applying)

- [ ] **`composite_stream.py` blend modes** — pre-allocate scratch buffers
      in `_blend_override / overlay / hard_light / soft_light / difference
      / exclusion`. Each currently allocates per frame (`mul`, `scr`,
      `blended`, `np.where(...)`). At 100 LEDs × 30 fps × N layers this
      adds up.
- [ ] **`mapped_stream` / `composite_stream` zone resize** — replace the
      per-channel `np.interp` calls with a cached `floor/ceil/frac` LUT
      (same trick as `wled_target_processor._fit_to_device`) or a single
      `cv2.resize` call on the (N,3) array. `np.interp` allocates a new
      `float64` array per channel per frame even on cache-hit.
- [ ] **`processed_stream._processing_loop`** — add ping-pong output
      buffers and pass them as `out=` to filter `process_strip()` calls.
      Today every filter that returns a fresh allocation costs us a copy
      per frame. Also: the loop uses `time.sleep` instead of an
      event-driven wait on the input stream — input updates faster than
      30 fps see up to `frame_time` of latency.
- [ ] **`mqtt_client.py` `send_pixels`** — add a binary publish path (or
      at minimum cache the outer dict skeleton). Today every frame
      `pixels.tolist()` + `json.dumps` for ~300 LEDs × 30 fps × N devices.
- [ ] **Frontend `static/js/features/color-strips/test.ts`** — cache
      `ImageData` per canvas (`canvas._imageData`); only re-create on
      dimension change; use a `Uint32Array` view to copy pixels in one
      loop instead of the per-pixel JS loop. Border-overlay rebuild on
      every frame should also be debounced to dimension changes only.
- [ ] **`ws_stream.py` composite branch** — pre-allocate a `bytearray`
      sized to the largest frame and write into slices instead of
      `b"".join(tobytes()) per layer` every iteration. Same anti-pattern
      in `wled_target_processor._broadcast_led_preview`.
- [ ] **Preview broadcast slow-client guard** — `asyncio.gather` over
      preview clients waits for the slowest. Move to `asyncio.wait` with a
      timeout and drop slow clients, or fire-and-forget with a
      `ws.application_state` filter.

## Security (deferred — non-trivial or design-sensitive)

- [ ] **Content-Security-Policy header** — would need careful tuning
      because the UI uses inline event handlers / Jinja templates.
      Mis-set CSP would break the app silently. Defer until templates can
      move to event-delegated handlers, then add a strict policy.
- [ ] **`api/auth.py` exception specificity** — 9 `except Exception:`
      sites. Most are intentional best-effort `websocket.send_json`
      swallows (the WS is already closed or about to be), but the auth
      decision path itself could be tightened to specific types
      (`jwt.InvalidTokenError`, `OSError`) + `logger.exception` for
      observability.
- [ ] **Hue bridge cert pinning** — `httpx.AsyncClient(verify=False)` for
      Hue bridge (self-signed cert by design). Should record the
      certificate fingerprint at pairing time and pin it on subsequent
      requests; otherwise an on-path attacker can MITM the bridge.

## Mechanical / code-quality (low risk, high line-count)

- [ ] **i18n parity** — **328** keys missing in `ru.json`, **325** missing
      in `zh.json`. Examples: `section.hide`, `filters.hsl_shift`,
      `filters.contrast`, `filters.temporal_blur`,
      `filters.audio_filter_template.desc`. Russian and Chinese users
      currently see raw keys for these. This is translation work, not
      code work.
- [ ] **`Optional[T]` → `T | None`** (PEP 604) — large mechanical refactor
      across the codebase. Can be auto-fixed via `ruff check --fix
      --select UP007`. Worth doing once the file splits land.
- [ ] **Hot-path `logger.error(f"...")` → `logger.error("... %s", e)`**
      lazy-eval — mostly cosmetic; ~200 sites. The f-string still builds
      the message even when DEBUG is off.
- [ ] **Remaining `(window as any)` sites** — typed `global-types.d.ts`
      is in place and new code uses `window.foo` directly, but ~80
      existing sites still have the cast. Per-site mechanical cleanup.
      Add `eslint`-equivalent guard (TS rule) to prevent new ones.
- [ ] **Magic numbers → named constants** in processing hot paths —
      `_FILTER_RECHECK_EVERY_N_FRAMES = 30` in
      `core/processing/processed_stream.py:159`; `5 ms` / `5 s` /
      `30 iterations` literals in `wled_target_processor.py:890,893,915`.
- [ ] **Standardise `from __future__ import annotations`** across the
      codebase. Some modules use the future-annotation form, others stick
      with `Optional[...]`. Enforce one via ruff `FA` rules.

## Test gaps

- [ ] **Route-level integration test** for the WLED scheme inference —
      POST `/api/v1/devices` with `{"url": "192.168.1.42",
      "device_type": "wled"}` and assert the stored device has
      `url == "http://192.168.1.42"`. The helper is exhaustively
      unit-tested but no integration test exercises the create/update
      flow end-to-end.
- [ ] **IPv6 public address regression** — extend `test_url_scheme.py`
      with explicit assertions for `2001:db8::1` and similar public IPv6
      literals (the bare-label fallback used to misclassify these). The
      helper does the right thing today via the IPv6 probe added during
      the hardening pass, but no test pins it.

## Pre-existing issues surfaced during the audit (not in our diff)

These were flagged by the auditors but predate the review session — kept
here as a future-work backlog:

- [ ] **`icon-select.ts:_buildGrid` `item.icon` is interpolated raw** —
      documented as "trusted SVG by design". If callers ever feed
      user-supplied icon strings, that's an XSS sink. Audit every caller
      that builds `IconSelectItem.icon` from non-constant data and
      reject HTML there.
- [ ] **`devices.py:461` `manager.update_device_info(device_url=update_data.url)`**
      receives `None` when a PATCH omits `url` (rename / icon-only edit).
      The processor never re-syncs in that case. Should pass
      `existing.url` (after normalization) or skip the call.
- [ ] **`asyncio.gather` over uncapped client lists** in preview broadcasts
      — slow clients block the loop. Already noted under Performance
      above; pre-existing.