Files
ledgrab/REVIEW_TODO.md
alexei.dolgolyov 48dbdb90e9 docs(review-todo): check off items addressed in 2026-05-23 autonomous pass
Mark devices.py PATCH fix, WLED route-level test, IPv6 regression
test, IconSelect XSS audit, PEP-604 sweep, magic-number constants,
api/auth except specificity, and the (window as any) static-access
cleanup as done. Defer items are unchanged: performance items keep
their "profile first" caveat, Hue cert pinning + CSP keep the design-
sensitive note, architecture refactors keep the multi-day banner,
and i18n parity is now annotated with the exact missing-key counts
(328 ru / 325 zh) so the next translator pass has a clear scope.
2026-05-23 01:22:41 +03:00

11 KiB
Raw Permalink Blame History

Production Review — Remaining Items

Output of the multi-agent production review (security / Python / TypeScript / performance / architecture / code-quality). Each entry below is something the original audit flagged and the autonomous hardening pass deliberately did not address — either because it needs design input, profiling validation, or a multi-day refactor that should land in its own session.

The hardening pass landed everything else: see git log between master and the head of the review branch for the applied changes (URL-scheme + malicious-input rejection, IconSelect XSS escape, MiniSelect for forbidden plain <select>s, WebSocket Origin allow-list, /docs auth-gate, security headers middleware, streaming upload size caps, fire-and-forget task tracking + drain resilience in MQTT runtime, discovery_watcher task tracking, asyncio.gather return_exceptions, secret_box encryption for MQTT / Hue / Govee credentials with auto-migration, SSRF-validated update redirects, single source of truth for IP classification in utils/net_classify.py, allowlist + parity test for inbound WS events, typed Window globals, and more).

Items completed in the follow-up autonomous pass (2026-05-23)

  • devices.py PATCH-without-url processor desyncupdate_device now falls back to existing.url so a rename / icon-only edit always tells the processor the current address.
  • WLED scheme integration test on /api/v1/devices — covers bare IPv4 (http://), public hostname (https://), and trailing-slash normalisation; lives in tests/api/routes/test_devices_routes.py.
  • IPv6 regression testtests/test_url_scheme.py now pins public IPv6 → https://, ULA → http://, and documents the Python-ipaddress documentation-prefix classification quirk.
  • IconSelect XSS audit + defence-in-depth — every caller audited (all feed icon from constants or lookup tables); added sanitiseIcon that rejects <script>, javascript:, on*=, <iframe>, <embed>, <object> and warns to the console.
  • Optional[T]T | None (PEP 604) — 55 sites cleaned via ruff --fix UP007. The remaining Union[…] aliases for pixel/colour/device-config typing converted by hand. UP007 now lives in pyproject.toml so the rule fires on new code.
  • Hot-path magic numbers → named constantsprocessed_stream gains _FILTER_RECHECK_EVERY_N_FRAMES; wled_target_processor gains _SKIP_REPOLL_SLEEP_SECONDS, _DIAGNOSTICS_REPORT_INTERVAL_SECONDS, _CSPT_RECHECK_EVERY_N_ITERATIONS.
  • api/auth.py except Exception tightening — every WS send / close site is now except _WS_SEND_BENIGN_EXC (a narrow tuple of WebSocketDisconnect / RuntimeError / ConnectionError / OSError). The auth-receive path catches the same set plus a final logger.exception catch-all for observability on truly unexpected shapes.
  • (window as any) cleanup — 59 static-property accesses migrated to typed window.<name> against global-types.d.ts. The remaining 7 sites use dynamic string indexing (window[fnName]) and intentionally keep the cast (documented in the typedef file).

Architecture refactors (multi-day — own session)

  • Split core/processing/value_stream.py (1856 LOC, 14 stream classes) into a value_streams/ package. Each value-stream type gets its own file ≤300 LOC; manager.py holds ValueStreamManager.
  • Split storage/color_strip_source.py (1841 LOC, 18 source kinds) into a color_strip_sources/ package mirroring value_streams/.
  • Frontend file splitsgraph-editor.ts (2707), streams.ts (2335), value-sources.ts (1889), types.ts (1062). Highest-churn modules; mixed UI / state / network responsibilities.
  • Layering reversal: introduce a neutral domain/ package and move shared DTOs (FilterInstance, CalibrationConfig, etc.) into it so storage/ no longer imports core/. Eliminates 7+ layering violations and the lazy-import hacks used to break the resulting circulars.
  • main.py boot refactor — extract import-time side effects into bootstrap.py + create_app() factory. lifespan() becomes the single place that wires stores and managers.
  • DI consolidation — replace api/dependencies.py getter sprawl (30+ get_*() functions reading a process-global _deps dict) with a single typed get_container() dependency. Makes test-overrides trivial; ban direct getter calls in handler bodies.
  • Exception hierarchy — define ledgrab/errors.py (LedGrabError, NotFoundError, ValidationError, RemoteUnavailableError, SSRFBlockedError). Move HTTP translation into a FastAPI exception handler. Stop raising HTTPException from utils/safe_source.py.
  • Lazy-import audit — 289 in-function from ledgrab.* imports. Specifically core/processing/daylight_settings.py imports api.dependencies (core → api inversion). Pass the database in via the constructor instead of service-locator lookup.

Performance (profile before applying)

  • composite_stream.py blend modes — pre-allocate scratch buffers in _blend_override / overlay / hard_light / soft_light / difference / exclusion. Each currently allocates per frame (mul, scr, blended, np.where(...)). At 100 LEDs × 30 fps × N layers this adds up.
  • mapped_stream / composite_stream zone resize — replace the per-channel np.interp calls with a cached floor/ceil/frac LUT (same trick as wled_target_processor._fit_to_device) or a single cv2.resize call on the (N,3) array. np.interp allocates a new float64 array per channel per frame even on cache-hit.
  • processed_stream._processing_loop — add ping-pong output buffers and pass them as out= to filter process_strip() calls. Today every filter that returns a fresh allocation costs us a copy per frame. Also: the loop uses time.sleep instead of an event-driven wait on the input stream — input updates faster than 30 fps see up to frame_time of latency.
  • mqtt_client.py send_pixels — add a binary publish path (or at minimum cache the outer dict skeleton). Today every frame pixels.tolist() + json.dumps for ~300 LEDs × 30 fps × N devices.
  • Frontend static/js/features/color-strips/test.ts — cache ImageData per canvas (canvas._imageData); only re-create on dimension change; use a Uint32Array view to copy pixels in one loop instead of the per-pixel JS loop. Border-overlay rebuild on every frame should also be debounced to dimension changes only.
  • ws_stream.py composite branch — pre-allocate a bytearray sized to the largest frame and write into slices instead of b"".join(tobytes()) per layer every iteration. Same anti-pattern in wled_target_processor._broadcast_led_preview.
  • Preview broadcast slow-client guardasyncio.gather over preview clients waits for the slowest. Move to asyncio.wait with a timeout and drop slow clients, or fire-and-forget with a ws.application_state filter.

Security (deferred — non-trivial or design-sensitive)

  • Content-Security-Policy header — would need careful tuning because the UI uses inline event handlers / Jinja templates. Mis-set CSP would break the app silently. Defer until templates can move to event-delegated handlers, then add a strict policy.
  • api/auth.py exception specificity — done in the 2026-05-23 pass; see top of file.
  • Hue bridge cert pinninghttpx.AsyncClient(verify=False) for Hue bridge (self-signed cert by design). Should record the certificate fingerprint at pairing time and pin it on subsequent requests; otherwise an on-path attacker can MITM the bridge.

Mechanical / code-quality (low risk, high line-count)

  • i18n parity — confirmed 328 keys missing in ru.json and 325 missing in zh.json against the canonical English file. Translation work — needs a native speaker, not a machine-translation pass. Run py scripts/diff_locale_keys.py (or copy the diff block out of the 2026-05-23 pass log) to get the exact key list.
  • Optional[T]T | None — done; UP007 now enforced via pyproject.toml so the rule prevents regressions.
  • Hot-path logger.error(f"...")logger.error("... %s", e) lazy-eval — 658 sites flagged by ruff --select G004. Deferred because it is genuinely cosmetic at ERROR level (always emitted) and the cumulative cost is negligible. Worth doing if/when ruff gains a safe autofix, or as a Codemod in a dedicated session.
  • Remaining (window as any) sites — 59 migrated to typed window.<name> access; the 7 surviving sites use dynamic string indexing and are documented as the legitimate exception.
  • Magic numbers → named constants — done; see processed_stream and wled_target_processor constants at the top of each module.
  • Standardise from __future__ import annotations — partially mooted by the UP007 cleanup. Files that previously relied on Optional/Union no longer need the future import; the few that already use __future__ keep it for forward-reference convenience. A blanket policy would still help — leave as a stylistic followup.

Test gaps

  • Route-level integration test for the WLED scheme inference — done; covers create + update in tests/api/routes/test_devices_routes.py::TestWLEDSchemeInference.
  • IPv6 public address regression — done; pinned in tests/test_url_scheme.py for both bracketless and bracketed forms.

Pre-existing issues surfaced during the audit (not in our diff)

These were flagged by the auditors but predate the review session — kept here as a future-work backlog:

  • icon-select.ts:_buildGrid item.icon is interpolated raw — audited; all callers pass project-owned literals or table-lookup results. Added a runtime sanitiser as defence-in-depth.
  • devices.py manager.update_device_info(device_url=update_data.url) None-on-PATCH path — fixed; now falls back to existing.url.
  • asyncio.gather over uncapped client lists in preview broadcasts — slow clients block the loop. Already noted under Performance above; pre-existing.