Add .vex.toml so `vex` is the project's primary code-search backend with auto-update + semantic embeddings enabled. Ignore the .fastembed_cache/ directory that vex creates on first --semantic run. REVIEW_TODO.md captures items flagged by the multi-agent production review that were deliberately deferred (multi-day refactors, profile-first perf, and design-sensitive security work).
9.3 KiB
Production Review — Remaining Items
Output of the multi-agent production review (security / Python / TypeScript / performance / architecture / code-quality). Each entry below is something the original audit flagged and the autonomous hardening pass deliberately did not address — either because it needs design input, profiling validation, or a multi-day refactor that should land in its own session.
The hardening pass landed everything else: see git log between master and
the head of the review branch for the applied changes (URL-scheme +
malicious-input rejection, IconSelect XSS escape, MiniSelect for forbidden
plain <select>s, WebSocket Origin allow-list, /docs auth-gate, security
headers middleware, streaming upload size caps, fire-and-forget task
tracking + drain resilience in MQTT runtime, discovery_watcher task
tracking, asyncio.gather return_exceptions, secret_box encryption for MQTT
/ Hue / Govee credentials with auto-migration, SSRF-validated update
redirects, single source of truth for IP classification in
utils/net_classify.py, allowlist + parity test for inbound WS events,
typed Window globals, and more).
Architecture refactors (multi-day — own session)
- Split
core/processing/value_stream.py(1856 LOC, 14 stream classes) into avalue_streams/package. Each value-stream type gets its own file ≤300 LOC;manager.pyholdsValueStreamManager. - Split
storage/color_strip_source.py(1841 LOC, 18 source kinds) into acolor_strip_sources/package mirroringvalue_streams/. - Frontend file splits —
graph-editor.ts(2707),streams.ts(2335),value-sources.ts(1889),types.ts(1062). Highest-churn modules; mixed UI / state / network responsibilities. - Layering reversal: introduce a neutral
domain/package and move shared DTOs (FilterInstance,CalibrationConfig, etc.) into it sostorage/no longer importscore/. Eliminates 7+ layering violations and the lazy-import hacks used to break the resulting circulars. main.pyboot refactor — extract import-time side effects intobootstrap.py+create_app()factory.lifespan()becomes the single place that wires stores and managers.- DI consolidation — replace
api/dependencies.pygetter sprawl (30+get_*()functions reading a process-global_depsdict) with a single typedget_container()dependency. Makes test-overrides trivial; ban direct getter calls in handler bodies. - Exception hierarchy — define
ledgrab/errors.py(LedGrabError,NotFoundError,ValidationError,RemoteUnavailableError,SSRFBlockedError). Move HTTP translation into a FastAPI exception handler. Stop raisingHTTPExceptionfromutils/safe_source.py. - Lazy-import audit — 289 in-function
from ledgrab.*imports. Specificallycore/processing/daylight_settings.pyimportsapi.dependencies(core → api inversion). Pass the database in via the constructor instead of service-locator lookup.
Performance (profile before applying)
composite_stream.pyblend modes — pre-allocate scratch buffers in_blend_override / overlay / hard_light / soft_light / difference / exclusion. Each currently allocates per frame (mul,scr,blended,np.where(...)). At 100 LEDs × 30 fps × N layers this adds up.mapped_stream/composite_streamzone resize — replace the per-channelnp.interpcalls with a cachedfloor/ceil/fracLUT (same trick aswled_target_processor._fit_to_device) or a singlecv2.resizecall on the (N,3) array.np.interpallocates a newfloat64array per channel per frame even on cache-hit.processed_stream._processing_loop— add ping-pong output buffers and pass them asout=to filterprocess_strip()calls. Today every filter that returns a fresh allocation costs us a copy per frame. Also: the loop usestime.sleepinstead of an event-driven wait on the input stream — input updates faster than 30 fps see up toframe_timeof latency.mqtt_client.pysend_pixels— add a binary publish path (or at minimum cache the outer dict skeleton). Today every framepixels.tolist()+json.dumpsfor ~300 LEDs × 30 fps × N devices.- Frontend
static/js/features/color-strips/test.ts— cacheImageDataper canvas (canvas._imageData); only re-create on dimension change; use aUint32Arrayview to copy pixels in one loop instead of the per-pixel JS loop. Border-overlay rebuild on every frame should also be debounced to dimension changes only. ws_stream.pycomposite branch — pre-allocate abytearraysized to the largest frame and write into slices instead ofb"".join(tobytes()) per layerevery iteration. Same anti-pattern inwled_target_processor._broadcast_led_preview.- Preview broadcast slow-client guard —
asyncio.gatherover preview clients waits for the slowest. Move toasyncio.waitwith a timeout and drop slow clients, or fire-and-forget with aws.application_statefilter.
Security (deferred — non-trivial or design-sensitive)
- Content-Security-Policy header — would need careful tuning because the UI uses inline event handlers / Jinja templates. Mis-set CSP would break the app silently. Defer until templates can move to event-delegated handlers, then add a strict policy.
api/auth.pyexception specificity — 9except Exception:sites. Most are intentional best-effortwebsocket.send_jsonswallows (the WS is already closed or about to be), but the auth decision path itself could be tightened to specific types (jwt.InvalidTokenError,OSError) +logger.exceptionfor observability.- Hue bridge cert pinning —
httpx.AsyncClient(verify=False)for Hue bridge (self-signed cert by design). Should record the certificate fingerprint at pairing time and pin it on subsequent requests; otherwise an on-path attacker can MITM the bridge.
Mechanical / code-quality (low risk, high line-count)
- i18n parity — 328 keys missing in
ru.json, 325 missing inzh.json. Examples:section.hide,filters.hsl_shift,filters.contrast,filters.temporal_blur,filters.audio_filter_template.desc. Russian and Chinese users currently see raw keys for these. This is translation work, not code work. Optional[T]→T | None(PEP 604) — large mechanical refactor across the codebase. Can be auto-fixed viaruff check --fix --select UP007. Worth doing once the file splits land.- Hot-path
logger.error(f"...")→logger.error("... %s", e)lazy-eval — mostly cosmetic; ~200 sites. The f-string still builds the message even when DEBUG is off. - Remaining
(window as any)sites — typedglobal-types.d.tsis in place and new code useswindow.foodirectly, but ~80 existing sites still have the cast. Per-site mechanical cleanup. Addeslint-equivalent guard (TS rule) to prevent new ones. - Magic numbers → named constants in processing hot paths —
_FILTER_RECHECK_EVERY_N_FRAMES = 30incore/processing/processed_stream.py:159;5 ms/5 s/30 iterationsliterals inwled_target_processor.py:890,893,915. - Standardise
from __future__ import annotationsacross the codebase. Some modules use the future-annotation form, others stick withOptional[...]. Enforce one via ruffFArules.
Test gaps
- Route-level integration test for the WLED scheme inference —
POST
/api/v1/deviceswith{"url": "192.168.1.42", "device_type": "wled"}and assert the stored device hasurl == "http://192.168.1.42". The helper is exhaustively unit-tested but no integration test exercises the create/update flow end-to-end. - IPv6 public address regression — extend
test_url_scheme.pywith explicit assertions for2001:db8::1and similar public IPv6 literals (the bare-label fallback used to misclassify these). The helper does the right thing today via the IPv6 probe added during the hardening pass, but no test pins it.
Pre-existing issues surfaced during the audit (not in our diff)
These were flagged by the auditors but predate the review session — kept here as a future-work backlog:
icon-select.ts:_buildGriditem.iconis interpolated raw — documented as "trusted SVG by design". If callers ever feed user-supplied icon strings, that's an XSS sink. Audit every caller that buildsIconSelectItem.iconfrom non-constant data and reject HTML there.devices.py:461manager.update_device_info(device_url=update_data.url)receivesNonewhen a PATCH omitsurl(rename / icon-only edit). The processor never re-syncs in that case. Should passexisting.url(after normalization) or skip the call.asyncio.gatherover uncapped client lists in preview broadcasts — slow clients block the loop. Already noted under Performance above; pre-existing.