Files
ledgrab/REVIEW_TODO.md
T
alexei.dolgolyov 06273ba2bc chore(tooling): vex semantic-search config + REVIEW_TODO backlog
Add .vex.toml so `vex` is the project's primary code-search backend with
auto-update + semantic embeddings enabled. Ignore the .fastembed_cache/
directory that vex creates on first --semantic run. REVIEW_TODO.md
captures items flagged by the multi-agent production review that were
deliberately deferred (multi-day refactors, profile-first perf, and
design-sensitive security work).
2026-05-23 00:46:44 +03:00

9.3 KiB
Raw Blame History

Production Review — Remaining Items

Output of the multi-agent production review (security / Python / TypeScript / performance / architecture / code-quality). Each entry below is something the original audit flagged and the autonomous hardening pass deliberately did not address — either because it needs design input, profiling validation, or a multi-day refactor that should land in its own session.

The hardening pass landed everything else: see git log between master and the head of the review branch for the applied changes (URL-scheme + malicious-input rejection, IconSelect XSS escape, MiniSelect for forbidden plain <select>s, WebSocket Origin allow-list, /docs auth-gate, security headers middleware, streaming upload size caps, fire-and-forget task tracking + drain resilience in MQTT runtime, discovery_watcher task tracking, asyncio.gather return_exceptions, secret_box encryption for MQTT / Hue / Govee credentials with auto-migration, SSRF-validated update redirects, single source of truth for IP classification in utils/net_classify.py, allowlist + parity test for inbound WS events, typed Window globals, and more).


Architecture refactors (multi-day — own session)

  • Split core/processing/value_stream.py (1856 LOC, 14 stream classes) into a value_streams/ package. Each value-stream type gets its own file ≤300 LOC; manager.py holds ValueStreamManager.
  • Split storage/color_strip_source.py (1841 LOC, 18 source kinds) into a color_strip_sources/ package mirroring value_streams/.
  • Frontend file splitsgraph-editor.ts (2707), streams.ts (2335), value-sources.ts (1889), types.ts (1062). Highest-churn modules; mixed UI / state / network responsibilities.
  • Layering reversal: introduce a neutral domain/ package and move shared DTOs (FilterInstance, CalibrationConfig, etc.) into it so storage/ no longer imports core/. Eliminates 7+ layering violations and the lazy-import hacks used to break the resulting circulars.
  • main.py boot refactor — extract import-time side effects into bootstrap.py + create_app() factory. lifespan() becomes the single place that wires stores and managers.
  • DI consolidation — replace api/dependencies.py getter sprawl (30+ get_*() functions reading a process-global _deps dict) with a single typed get_container() dependency. Makes test-overrides trivial; ban direct getter calls in handler bodies.
  • Exception hierarchy — define ledgrab/errors.py (LedGrabError, NotFoundError, ValidationError, RemoteUnavailableError, SSRFBlockedError). Move HTTP translation into a FastAPI exception handler. Stop raising HTTPException from utils/safe_source.py.
  • Lazy-import audit — 289 in-function from ledgrab.* imports. Specifically core/processing/daylight_settings.py imports api.dependencies (core → api inversion). Pass the database in via the constructor instead of service-locator lookup.

Performance (profile before applying)

  • composite_stream.py blend modes — pre-allocate scratch buffers in _blend_override / overlay / hard_light / soft_light / difference / exclusion. Each currently allocates per frame (mul, scr, blended, np.where(...)). At 100 LEDs × 30 fps × N layers this adds up.
  • mapped_stream / composite_stream zone resize — replace the per-channel np.interp calls with a cached floor/ceil/frac LUT (same trick as wled_target_processor._fit_to_device) or a single cv2.resize call on the (N,3) array. np.interp allocates a new float64 array per channel per frame even on cache-hit.
  • processed_stream._processing_loop — add ping-pong output buffers and pass them as out= to filter process_strip() calls. Today every filter that returns a fresh allocation costs us a copy per frame. Also: the loop uses time.sleep instead of an event-driven wait on the input stream — input updates faster than 30 fps see up to frame_time of latency.
  • mqtt_client.py send_pixels — add a binary publish path (or at minimum cache the outer dict skeleton). Today every frame pixels.tolist() + json.dumps for ~300 LEDs × 30 fps × N devices.
  • Frontend static/js/features/color-strips/test.ts — cache ImageData per canvas (canvas._imageData); only re-create on dimension change; use a Uint32Array view to copy pixels in one loop instead of the per-pixel JS loop. Border-overlay rebuild on every frame should also be debounced to dimension changes only.
  • ws_stream.py composite branch — pre-allocate a bytearray sized to the largest frame and write into slices instead of b"".join(tobytes()) per layer every iteration. Same anti-pattern in wled_target_processor._broadcast_led_preview.
  • Preview broadcast slow-client guardasyncio.gather over preview clients waits for the slowest. Move to asyncio.wait with a timeout and drop slow clients, or fire-and-forget with a ws.application_state filter.

Security (deferred — non-trivial or design-sensitive)

  • Content-Security-Policy header — would need careful tuning because the UI uses inline event handlers / Jinja templates. Mis-set CSP would break the app silently. Defer until templates can move to event-delegated handlers, then add a strict policy.
  • api/auth.py exception specificity — 9 except Exception: sites. Most are intentional best-effort websocket.send_json swallows (the WS is already closed or about to be), but the auth decision path itself could be tightened to specific types (jwt.InvalidTokenError, OSError) + logger.exception for observability.
  • Hue bridge cert pinninghttpx.AsyncClient(verify=False) for Hue bridge (self-signed cert by design). Should record the certificate fingerprint at pairing time and pin it on subsequent requests; otherwise an on-path attacker can MITM the bridge.

Mechanical / code-quality (low risk, high line-count)

  • i18n parity328 keys missing in ru.json, 325 missing in zh.json. Examples: section.hide, filters.hsl_shift, filters.contrast, filters.temporal_blur, filters.audio_filter_template.desc. Russian and Chinese users currently see raw keys for these. This is translation work, not code work.
  • Optional[T]T | None (PEP 604) — large mechanical refactor across the codebase. Can be auto-fixed via ruff check --fix --select UP007. Worth doing once the file splits land.
  • Hot-path logger.error(f"...")logger.error("... %s", e) lazy-eval — mostly cosmetic; ~200 sites. The f-string still builds the message even when DEBUG is off.
  • Remaining (window as any) sites — typed global-types.d.ts is in place and new code uses window.foo directly, but ~80 existing sites still have the cast. Per-site mechanical cleanup. Add eslint-equivalent guard (TS rule) to prevent new ones.
  • Magic numbers → named constants in processing hot paths — _FILTER_RECHECK_EVERY_N_FRAMES = 30 in core/processing/processed_stream.py:159; 5 ms / 5 s / 30 iterations literals in wled_target_processor.py:890,893,915.
  • Standardise from __future__ import annotations across the codebase. Some modules use the future-annotation form, others stick with Optional[...]. Enforce one via ruff FA rules.

Test gaps

  • Route-level integration test for the WLED scheme inference — POST /api/v1/devices with {"url": "192.168.1.42", "device_type": "wled"} and assert the stored device has url == "http://192.168.1.42". The helper is exhaustively unit-tested but no integration test exercises the create/update flow end-to-end.
  • IPv6 public address regression — extend test_url_scheme.py with explicit assertions for 2001:db8::1 and similar public IPv6 literals (the bare-label fallback used to misclassify these). The helper does the right thing today via the IPv6 probe added during the hardening pass, but no test pins it.

Pre-existing issues surfaced during the audit (not in our diff)

These were flagged by the auditors but predate the review session — kept here as a future-work backlog:

  • icon-select.ts:_buildGrid item.icon is interpolated raw — documented as "trusted SVG by design". If callers ever feed user-supplied icon strings, that's an XSS sink. Audit every caller that builds IconSelectItem.icon from non-constant data and reject HTML there.
  • devices.py:461 manager.update_device_info(device_url=update_data.url) receives None when a PATCH omits url (rename / icon-only edit). The processor never re-syncs in that case. Should pass existing.url (after normalization) or skip the call.
  • asyncio.gather over uncapped client lists in preview broadcasts — slow clients block the loop. Already noted under Performance above; pre-existing.