48dbdb90e9
Mark devices.py PATCH fix, WLED route-level test, IPv6 regression test, IconSelect XSS audit, PEP-604 sweep, magic-number constants, api/auth except specificity, and the (window as any) static-access cleanup as done. Defer items are unchanged: performance items keep their "profile first" caveat, Hue cert pinning + CSP keep the design- sensitive note, architecture refactors keep the multi-day banner, and i18n parity is now annotated with the exact missing-key counts (328 ru / 325 zh) so the next translator pass has a clear scope.
181 lines
11 KiB
Markdown
181 lines
11 KiB
Markdown
# Production Review — Remaining Items
|
||
|
||
Output of the multi-agent production review (security / Python / TypeScript /
|
||
performance / architecture / code-quality). Each entry below is something
|
||
the original audit flagged and the autonomous hardening pass deliberately
|
||
did **not** address — either because it needs design input, profiling
|
||
validation, or a multi-day refactor that should land in its own session.
|
||
|
||
The hardening pass landed everything else: see git log between `master` and
|
||
the head of the review branch for the applied changes (URL-scheme +
|
||
malicious-input rejection, IconSelect XSS escape, MiniSelect for forbidden
|
||
plain `<select>`s, WebSocket Origin allow-list, /docs auth-gate, security
|
||
headers middleware, streaming upload size caps, fire-and-forget task
|
||
tracking + drain resilience in MQTT runtime, discovery_watcher task
|
||
tracking, asyncio.gather return_exceptions, secret_box encryption for MQTT
|
||
/ Hue / Govee credentials with auto-migration, SSRF-validated update
|
||
redirects, single source of truth for IP classification in
|
||
`utils/net_classify.py`, allowlist + parity test for inbound WS events,
|
||
typed `Window` globals, and more).
|
||
|
||
## Items completed in the follow-up autonomous pass (2026-05-23)
|
||
|
||
- [x] **devices.py PATCH-without-url processor desync** — `update_device`
|
||
now falls back to `existing.url` so a rename / icon-only edit
|
||
always tells the processor the current address.
|
||
- [x] **WLED scheme integration test** on `/api/v1/devices` — covers
|
||
bare IPv4 (`http://`), public hostname (`https://`), and trailing-slash
|
||
normalisation; lives in `tests/api/routes/test_devices_routes.py`.
|
||
- [x] **IPv6 regression test** — `tests/test_url_scheme.py` now pins
|
||
public IPv6 → `https://`, ULA → `http://`, and documents the
|
||
Python-`ipaddress` documentation-prefix classification quirk.
|
||
- [x] **IconSelect XSS audit + defence-in-depth** — every caller
|
||
audited (all feed `icon` from constants or lookup tables); added
|
||
`sanitiseIcon` that rejects `<script>`, `javascript:`, `on*=`,
|
||
`<iframe>`, `<embed>`, `<object>` and warns to the console.
|
||
- [x] **`Optional[T]` → `T | None` (PEP 604)** — 55 sites cleaned via
|
||
`ruff --fix UP007`. The remaining `Union[…]` aliases for
|
||
pixel/colour/device-config typing converted by hand. `UP007` now
|
||
lives in `pyproject.toml` so the rule fires on new code.
|
||
- [x] **Hot-path magic numbers → named constants** — `processed_stream`
|
||
gains `_FILTER_RECHECK_EVERY_N_FRAMES`; `wled_target_processor`
|
||
gains `_SKIP_REPOLL_SLEEP_SECONDS`, `_DIAGNOSTICS_REPORT_INTERVAL_SECONDS`,
|
||
`_CSPT_RECHECK_EVERY_N_ITERATIONS`.
|
||
- [x] **`api/auth.py` `except Exception` tightening** — every WS send /
|
||
close site is now `except _WS_SEND_BENIGN_EXC` (a narrow tuple of
|
||
WebSocketDisconnect / RuntimeError / ConnectionError / OSError).
|
||
The auth-receive path catches the same set plus a final
|
||
`logger.exception` catch-all for observability on truly unexpected
|
||
shapes.
|
||
- [x] **`(window as any)` cleanup** — 59 static-property accesses
|
||
migrated to typed `window.<name>` against `global-types.d.ts`. The
|
||
remaining 7 sites use dynamic string indexing (`window[fnName]`)
|
||
and intentionally keep the cast (documented in the typedef file).
|
||
|
||
---
|
||
|
||
## Architecture refactors (multi-day — own session)
|
||
|
||
- [ ] **Split `core/processing/value_stream.py`** (1856 LOC, 14 stream classes)
|
||
into a `value_streams/` package. Each value-stream type gets its own
|
||
file ≤300 LOC; `manager.py` holds `ValueStreamManager`.
|
||
- [ ] **Split `storage/color_strip_source.py`** (1841 LOC, 18 source kinds)
|
||
into a `color_strip_sources/` package mirroring `value_streams/`.
|
||
- [ ] **Frontend file splits** — `graph-editor.ts` (2707), `streams.ts`
|
||
(2335), `value-sources.ts` (1889), `types.ts` (1062). Highest-churn
|
||
modules; mixed UI / state / network responsibilities.
|
||
- [ ] **Layering reversal**: introduce a neutral `domain/` package and move
|
||
shared DTOs (`FilterInstance`, `CalibrationConfig`, etc.) into it so
|
||
`storage/` no longer imports `core/`. Eliminates 7+ layering
|
||
violations and the lazy-import hacks used to break the resulting
|
||
circulars.
|
||
- [ ] **`main.py` boot refactor** — extract import-time side effects into
|
||
`bootstrap.py` + `create_app()` factory. `lifespan()` becomes the
|
||
single place that wires stores and managers.
|
||
- [ ] **DI consolidation** — replace `api/dependencies.py` getter sprawl
|
||
(30+ `get_*()` functions reading a process-global `_deps` dict) with
|
||
a single typed `get_container()` dependency. Makes test-overrides
|
||
trivial; ban direct getter calls in handler bodies.
|
||
- [ ] **Exception hierarchy** — define `ledgrab/errors.py` (`LedGrabError`,
|
||
`NotFoundError`, `ValidationError`, `RemoteUnavailableError`,
|
||
`SSRFBlockedError`). Move HTTP translation into a FastAPI exception
|
||
handler. Stop raising `HTTPException` from `utils/safe_source.py`.
|
||
- [ ] **Lazy-import audit** — 289 in-function `from ledgrab.*` imports.
|
||
Specifically `core/processing/daylight_settings.py` imports
|
||
`api.dependencies` (core → api inversion). Pass the database in via
|
||
the constructor instead of service-locator lookup.
|
||
|
||
## Performance (profile before applying)
|
||
|
||
- [ ] **`composite_stream.py` blend modes** — pre-allocate scratch buffers
|
||
in `_blend_override / overlay / hard_light / soft_light / difference
|
||
/ exclusion`. Each currently allocates per frame (`mul`, `scr`,
|
||
`blended`, `np.where(...)`). At 100 LEDs × 30 fps × N layers this
|
||
adds up.
|
||
- [ ] **`mapped_stream` / `composite_stream` zone resize** — replace the
|
||
per-channel `np.interp` calls with a cached `floor/ceil/frac` LUT
|
||
(same trick as `wled_target_processor._fit_to_device`) or a single
|
||
`cv2.resize` call on the (N,3) array. `np.interp` allocates a new
|
||
`float64` array per channel per frame even on cache-hit.
|
||
- [ ] **`processed_stream._processing_loop`** — add ping-pong output
|
||
buffers and pass them as `out=` to filter `process_strip()` calls.
|
||
Today every filter that returns a fresh allocation costs us a copy
|
||
per frame. Also: the loop uses `time.sleep` instead of an
|
||
event-driven wait on the input stream — input updates faster than
|
||
30 fps see up to `frame_time` of latency.
|
||
- [ ] **`mqtt_client.py` `send_pixels`** — add a binary publish path (or
|
||
at minimum cache the outer dict skeleton). Today every frame
|
||
`pixels.tolist()` + `json.dumps` for ~300 LEDs × 30 fps × N devices.
|
||
- [ ] **Frontend `static/js/features/color-strips/test.ts`** — cache
|
||
`ImageData` per canvas (`canvas._imageData`); only re-create on
|
||
dimension change; use a `Uint32Array` view to copy pixels in one
|
||
loop instead of the per-pixel JS loop. Border-overlay rebuild on
|
||
every frame should also be debounced to dimension changes only.
|
||
- [ ] **`ws_stream.py` composite branch** — pre-allocate a `bytearray`
|
||
sized to the largest frame and write into slices instead of
|
||
`b"".join(tobytes()) per layer` every iteration. Same anti-pattern
|
||
in `wled_target_processor._broadcast_led_preview`.
|
||
- [ ] **Preview broadcast slow-client guard** — `asyncio.gather` over
|
||
preview clients waits for the slowest. Move to `asyncio.wait` with a
|
||
timeout and drop slow clients, or fire-and-forget with a
|
||
`ws.application_state` filter.
|
||
|
||
## Security (deferred — non-trivial or design-sensitive)
|
||
|
||
- [ ] **Content-Security-Policy header** — would need careful tuning
|
||
because the UI uses inline event handlers / Jinja templates.
|
||
Mis-set CSP would break the app silently. Defer until templates can
|
||
move to event-delegated handlers, then add a strict policy.
|
||
- [x] **`api/auth.py` exception specificity** — done in the 2026-05-23
|
||
pass; see top of file.
|
||
- [ ] **Hue bridge cert pinning** — `httpx.AsyncClient(verify=False)` for
|
||
Hue bridge (self-signed cert by design). Should record the
|
||
certificate fingerprint at pairing time and pin it on subsequent
|
||
requests; otherwise an on-path attacker can MITM the bridge.
|
||
|
||
## Mechanical / code-quality (low risk, high line-count)
|
||
|
||
- [ ] **i18n parity** — confirmed **328** keys missing in `ru.json` and
|
||
**325** missing in `zh.json` against the canonical English file.
|
||
Translation work — needs a native speaker, not a machine-translation
|
||
pass. Run `py scripts/diff_locale_keys.py` (or copy the diff block
|
||
out of the 2026-05-23 pass log) to get the exact key list.
|
||
- [x] **`Optional[T]` → `T | None`** — done; `UP007` now enforced via
|
||
`pyproject.toml` so the rule prevents regressions.
|
||
- [ ] **Hot-path `logger.error(f"...")` → `logger.error("... %s", e)`**
|
||
lazy-eval — 658 sites flagged by `ruff --select G004`. Deferred
|
||
because it is genuinely cosmetic at ERROR level (always emitted)
|
||
and the cumulative cost is negligible. Worth doing if/when ruff
|
||
gains a safe autofix, or as a Codemod in a dedicated session.
|
||
- [x] **Remaining `(window as any)` sites** — 59 migrated to typed
|
||
`window.<name>` access; the 7 surviving sites use dynamic string
|
||
indexing and are documented as the legitimate exception.
|
||
- [x] **Magic numbers → named constants** — done; see `processed_stream`
|
||
and `wled_target_processor` constants at the top of each module.
|
||
- [ ] **Standardise `from __future__ import annotations`** — partially
|
||
mooted by the UP007 cleanup. Files that previously relied on
|
||
`Optional`/`Union` no longer need the future import; the few that
|
||
already use `__future__` keep it for forward-reference convenience.
|
||
A blanket policy would still help — leave as a stylistic followup.
|
||
|
||
## Test gaps
|
||
|
||
- [x] **Route-level integration test** for the WLED scheme inference —
|
||
done; covers create + update in `tests/api/routes/test_devices_routes.py::TestWLEDSchemeInference`.
|
||
- [x] **IPv6 public address regression** — done; pinned in
|
||
`tests/test_url_scheme.py` for both bracketless and bracketed forms.
|
||
|
||
## Pre-existing issues surfaced during the audit (not in our diff)
|
||
|
||
These were flagged by the auditors but predate the review session — kept
|
||
here as a future-work backlog:
|
||
|
||
- [x] **`icon-select.ts:_buildGrid` `item.icon` is interpolated raw** —
|
||
audited; all callers pass project-owned literals or table-lookup
|
||
results. Added a runtime sanitiser as defence-in-depth.
|
||
- [x] **`devices.py` `manager.update_device_info(device_url=update_data.url)`**
|
||
None-on-PATCH path — fixed; now falls back to `existing.url`.
|
||
- [ ] **`asyncio.gather` over uncapped client lists** in preview broadcasts
|
||
— slow clients block the loop. Already noted under Performance
|
||
above; pre-existing.
|