Files

T

alexei.dolgolyov 06273ba2bc chore(tooling): vex semantic-search config + REVIEW_TODO backlog

Add .vex.toml so `vex` is the project's primary code-search backend with
auto-update + semantic embeddings enabled. Ignore the .fastembed_cache/
directory that vex creates on first --semantic run. REVIEW_TODO.md
captures items flagged by the multi-agent production review that were
deliberately deferred (multi-day refactors, profile-first perf, and
design-sensitive security work).

2026-05-23 00:46:44 +03:00

9.3 KiB

Raw Blame History

Production Review — Remaining Items

Output of the multi-agent production review (security / Python / TypeScript / performance / architecture / code-quality). Each entry below is something the original audit flagged and the autonomous hardening pass deliberately did not address — either because it needs design input, profiling validation, or a multi-day refactor that should land in its own session.

The hardening pass landed everything else: see git log between master and the head of the review branch for the applied changes (URL-scheme + malicious-input rejection, IconSelect XSS escape, MiniSelect for forbidden plain <select>s, WebSocket Origin allow-list, /docs auth-gate, security headers middleware, streaming upload size caps, fire-and-forget task tracking + drain resilience in MQTT runtime, discovery_watcher task tracking, asyncio.gather return_exceptions, secret_box encryption for MQTT / Hue / Govee credentials with auto-migration, SSRF-validated update redirects, single source of truth for IP classification in utils/net_classify.py, allowlist + parity test for inbound WS events, typed Window globals, and more).

Architecture refactors (multi-day — own session)

Split core/processing/value_stream.py (1856 LOC, 14 stream classes) into a value_streams/ package. Each value-stream type gets its own file ≤300 LOC; manager.py holds ValueStreamManager.
Split storage/color_strip_source.py (1841 LOC, 18 source kinds) into a color_strip_sources/ package mirroring value_streams/.
Frontend file splits — graph-editor.ts (2707), streams.ts (2335), value-sources.ts (1889), types.ts (1062). Highest-churn modules; mixed UI / state / network responsibilities.
Layering reversal: introduce a neutral domain/ package and move shared DTOs (FilterInstance, CalibrationConfig, etc.) into it so storage/ no longer imports core/. Eliminates 7+ layering violations and the lazy-import hacks used to break the resulting circulars.
main.py boot refactor — extract import-time side effects into bootstrap.py + create_app() factory. lifespan() becomes the single place that wires stores and managers.
DI consolidation — replace api/dependencies.py getter sprawl (30+ get_*() functions reading a process-global _deps dict) with a single typed get_container() dependency. Makes test-overrides trivial; ban direct getter calls in handler bodies.
Exception hierarchy — define ledgrab/errors.py (LedGrabError, NotFoundError, ValidationError, RemoteUnavailableError, SSRFBlockedError). Move HTTP translation into a FastAPI exception handler. Stop raising HTTPException from utils/safe_source.py.
Lazy-import audit — 289 in-function from ledgrab.* imports. Specifically core/processing/daylight_settings.py imports api.dependencies (core → api inversion). Pass the database in via the constructor instead of service-locator lookup.

Performance (profile before applying)

composite_stream.py blend modes — pre-allocate scratch buffers in _blend_override / overlay / hard_light / soft_light / difference / exclusion. Each currently allocates per frame (mul, scr, blended, np.where(...)). At 100 LEDs × 30 fps × N layers this adds up.
mapped_stream / composite_stream zone resize — replace the per-channel np.interp calls with a cached floor/ceil/frac LUT (same trick as wled_target_processor._fit_to_device) or a single cv2.resize call on the (N,3) array. np.interp allocates a new float64 array per channel per frame even on cache-hit.
processed_stream._processing_loop — add ping-pong output buffers and pass them as out= to filter process_strip() calls. Today every filter that returns a fresh allocation costs us a copy per frame. Also: the loop uses time.sleep instead of an event-driven wait on the input stream — input updates faster than 30 fps see up to frame_time of latency.
mqtt_client.py send_pixels — add a binary publish path (or at minimum cache the outer dict skeleton). Today every frame pixels.tolist() + json.dumps for ~300 LEDs × 30 fps × N devices.
Frontend static/js/features/color-strips/test.ts — cache ImageData per canvas (canvas._imageData); only re-create on dimension change; use a Uint32Array view to copy pixels in one loop instead of the per-pixel JS loop. Border-overlay rebuild on every frame should also be debounced to dimension changes only.
ws_stream.py composite branch — pre-allocate a bytearray sized to the largest frame and write into slices instead of b"".join(tobytes()) per layer every iteration. Same anti-pattern in wled_target_processor._broadcast_led_preview.
Preview broadcast slow-client guard — asyncio.gather over preview clients waits for the slowest. Move to asyncio.wait with a timeout and drop slow clients, or fire-and-forget with a ws.application_state filter.

Security (deferred — non-trivial or design-sensitive)

Content-Security-Policy header — would need careful tuning because the UI uses inline event handlers / Jinja templates. Mis-set CSP would break the app silently. Defer until templates can move to event-delegated handlers, then add a strict policy.
api/auth.py exception specificity — 9 except Exception: sites. Most are intentional best-effort websocket.send_json swallows (the WS is already closed or about to be), but the auth decision path itself could be tightened to specific types (jwt.InvalidTokenError, OSError) + logger.exception for observability.
Hue bridge cert pinning — httpx.AsyncClient(verify=False) for Hue bridge (self-signed cert by design). Should record the certificate fingerprint at pairing time and pin it on subsequent requests; otherwise an on-path attacker can MITM the bridge.

Mechanical / code-quality (low risk, high line-count)

i18n parity — 328 keys missing in ru.json, 325 missing in zh.json. Examples: section.hide, filters.hsl_shift, filters.contrast, filters.temporal_blur, filters.audio_filter_template.desc. Russian and Chinese users currently see raw keys for these. This is translation work, not code work.
Optional[T] → T | None (PEP 604) — large mechanical refactor across the codebase. Can be auto-fixed via ruff check --fix --select UP007. Worth doing once the file splits land.
Hot-path logger.error(f"...") → logger.error("... %s", e) lazy-eval — mostly cosmetic; ~200 sites. The f-string still builds the message even when DEBUG is off.
Remaining (window as any) sites — typed global-types.d.ts is in place and new code uses window.foo directly, but ~80 existing sites still have the cast. Per-site mechanical cleanup. Add eslint-equivalent guard (TS rule) to prevent new ones.
Magic numbers → named constants in processing hot paths — _FILTER_RECHECK_EVERY_N_FRAMES = 30 in core/processing/processed_stream.py:159; 5 ms / 5 s / 30 iterations literals in wled_target_processor.py:890,893,915.
Standardise from __future__ import annotations across the codebase. Some modules use the future-annotation form, others stick with Optional[...]. Enforce one via ruff FA rules.

Test gaps

Route-level integration test for the WLED scheme inference — POST /api/v1/devices with {"url": "192.168.1.42", "device_type": "wled"} and assert the stored device has url == "http://192.168.1.42". The helper is exhaustively unit-tested but no integration test exercises the create/update flow end-to-end.
IPv6 public address regression — extend test_url_scheme.py with explicit assertions for 2001:db8::1 and similar public IPv6 literals (the bare-label fallback used to misclassify these). The helper does the right thing today via the IPv6 probe added during the hardening pass, but no test pins it.

Pre-existing issues surfaced during the audit (not in our diff)

These were flagged by the auditors but predate the review session — kept here as a future-work backlog:

icon-select.ts:_buildGrid item.icon is interpolated raw — documented as "trusted SVG by design". If callers ever feed user-supplied icon strings, that's an XSS sink. Audit every caller that builds IconSelectItem.icon from non-constant data and reject HTML there.
devices.py:461 manager.update_device_info(device_url=update_data.url) receives None when a PATCH omits url (rename / icon-only edit). The processor never re-syncs in that case. Should pass existing.url (after normalization) or skip the call.
asyncio.gather over uncapped client lists in preview broadcasts — slow clients block the loop. Already noted under Performance above; pre-existing.

9.3 KiB Raw Blame History Unescape Escape