chore(tooling): vex semantic-search config + REVIEW_TODO backlog

Add .vex.toml so `vex` is the project's primary code-search backend with auto-update + semantic embeddings enabled. Ignore the .fastembed_cache/ directory that vex creates on first --semantic run. REVIEW_TODO.md captures items flagged by the multi-agent production review that were deliberately deferred (multi-day refactors, profile-first perf, and design-sensitive security work).
2026-05-23 00:46:44 +03:00
parent 628c6b2f0d
commit 06273ba2bc
3 changed files with 189 additions and 0 deletions
@@ -97,3 +97,6 @@ Thumbs.db
 .DS_Store
 # Added by code-review-graph
 .code-review-graph/
+
+# vex semantic-search embedding cache (auto-downloaded on first --semantic run)
+.fastembed_cache/
@@ -0,0 +1,24 @@
+# vex configuration — https://github.com/tenatarika/vex
+#
+# Place this file in your project root as .vex.toml
+
+# Glob patterns to exclude from indexing (gitignore syntax, on top of .gitignore)
+# exclude = [
+#     "vendor/**",
+#     "node_modules/**",
+#     "*.generated.go",
+#     "dist/**",
+# ]
+
+# Default output format: "text", "json", or "compact"
+# format = "text"
+
+# Enable semantic embeddings by default (slower indexing, enables meaning-based search)
+semantic = true
+
+# Automatically run `vex update` before search if the index is stale
+auto_update = true
+
+# Embedder used for semantic indexing. Known IDs: minilm-l6-v2 (default).
+# Changing the embedder requires a full reindex.
+# embedder = "minilm-l6-v2"
@@ -0,0 +1,162 @@
+# Production Review — Remaining Items
+
+Output of the multi-agent production review (security / Python / TypeScript /
+performance / architecture / code-quality). Each entry below is something
+the original audit flagged and the autonomous hardening pass deliberately
+did **not** address — either because it needs design input, profiling
+validation, or a multi-day refactor that should land in its own session.
+
+The hardening pass landed everything else: see git log between `master` and
+the head of the review branch for the applied changes (URL-scheme +
+malicious-input rejection, IconSelect XSS escape, MiniSelect for forbidden
+plain `<select>`s, WebSocket Origin allow-list, /docs auth-gate, security
+headers middleware, streaming upload size caps, fire-and-forget task
+tracking + drain resilience in MQTT runtime, discovery_watcher task
+tracking, asyncio.gather return_exceptions, secret_box encryption for MQTT
+/ Hue / Govee credentials with auto-migration, SSRF-validated update
+redirects, single source of truth for IP classification in
+`utils/net_classify.py`, allowlist + parity test for inbound WS events,
+typed `Window` globals, and more).
+
+---
+
+## Architecture refactors (multi-day — own session)
+
+- [ ] **Split `core/processing/value_stream.py`** (1856 LOC, 14 stream classes)
+      into a `value_streams/` package. Each value-stream type gets its own
+      file ≤300 LOC; `manager.py` holds `ValueStreamManager`.
+- [ ] **Split `storage/color_strip_source.py`** (1841 LOC, 18 source kinds)
+      into a `color_strip_sources/` package mirroring `value_streams/`.
+- [ ] **Frontend file splits** — `graph-editor.ts` (2707), `streams.ts`
+      (2335), `value-sources.ts` (1889), `types.ts` (1062). Highest-churn
+      modules; mixed UI / state / network responsibilities.
+- [ ] **Layering reversal**: introduce a neutral `domain/` package and move
+      shared DTOs (`FilterInstance`, `CalibrationConfig`, etc.) into it so
+      `storage/` no longer imports `core/`. Eliminates 7+ layering
+      violations and the lazy-import hacks used to break the resulting
+      circulars.
+- [ ] **`main.py` boot refactor** — extract import-time side effects into
+      `bootstrap.py` + `create_app()` factory. `lifespan()` becomes the
+      single place that wires stores and managers.
+- [ ] **DI consolidation** — replace `api/dependencies.py` getter sprawl
+      (30+ `get_*()` functions reading a process-global `_deps` dict) with
+      a single typed `get_container()` dependency. Makes test-overrides
+      trivial; ban direct getter calls in handler bodies.
+- [ ] **Exception hierarchy** — define `ledgrab/errors.py` (`LedGrabError`,
+      `NotFoundError`, `ValidationError`, `RemoteUnavailableError`,
+      `SSRFBlockedError`). Move HTTP translation into a FastAPI exception
+      handler. Stop raising `HTTPException` from `utils/safe_source.py`.
+- [ ] **Lazy-import audit** — 289 in-function `from ledgrab.*` imports.
+      Specifically `core/processing/daylight_settings.py` imports
+      `api.dependencies` (core → api inversion). Pass the database in via
+      the constructor instead of service-locator lookup.
+
+## Performance (profile before applying)
+
+- [ ] **`composite_stream.py` blend modes** — pre-allocate scratch buffers
+      in `_blend_override / overlay / hard_light / soft_light / difference
+      / exclusion`. Each currently allocates per frame (`mul`, `scr`,
+      `blended`, `np.where(...)`). At 100 LEDs × 30 fps × N layers this
+      adds up.
+- [ ] **`mapped_stream` / `composite_stream` zone resize** — replace the
+      per-channel `np.interp` calls with a cached `floor/ceil/frac` LUT
+      (same trick as `wled_target_processor._fit_to_device`) or a single
+      `cv2.resize` call on the (N,3) array. `np.interp` allocates a new
+      `float64` array per channel per frame even on cache-hit.
+- [ ] **`processed_stream._processing_loop`** — add ping-pong output
+      buffers and pass them as `out=` to filter `process_strip()` calls.
+      Today every filter that returns a fresh allocation costs us a copy
+      per frame. Also: the loop uses `time.sleep` instead of an
+      event-driven wait on the input stream — input updates faster than
+      30 fps see up to `frame_time` of latency.
+- [ ] **`mqtt_client.py` `send_pixels`** — add a binary publish path (or
+      at minimum cache the outer dict skeleton). Today every frame
+      `pixels.tolist()` + `json.dumps` for ~300 LEDs × 30 fps × N devices.
+- [ ] **Frontend `static/js/features/color-strips/test.ts`** — cache
+      `ImageData` per canvas (`canvas._imageData`); only re-create on
+      dimension change; use a `Uint32Array` view to copy pixels in one
+      loop instead of the per-pixel JS loop. Border-overlay rebuild on
+      every frame should also be debounced to dimension changes only.
+- [ ] **`ws_stream.py` composite branch** — pre-allocate a `bytearray`
+      sized to the largest frame and write into slices instead of
+      `b"".join(tobytes()) per layer` every iteration. Same anti-pattern
+      in `wled_target_processor._broadcast_led_preview`.
+- [ ] **Preview broadcast slow-client guard** — `asyncio.gather` over
+      preview clients waits for the slowest. Move to `asyncio.wait` with a
+      timeout and drop slow clients, or fire-and-forget with a
+      `ws.application_state` filter.
+
+## Security (deferred — non-trivial or design-sensitive)
+
+- [ ] **Content-Security-Policy header** — would need careful tuning
+      because the UI uses inline event handlers / Jinja templates.
+      Mis-set CSP would break the app silently. Defer until templates can
+      move to event-delegated handlers, then add a strict policy.
+- [ ] **`api/auth.py` exception specificity** — 9 `except Exception:`
+      sites. Most are intentional best-effort `websocket.send_json`
+      swallows (the WS is already closed or about to be), but the auth
+      decision path itself could be tightened to specific types
+      (`jwt.InvalidTokenError`, `OSError`) + `logger.exception` for
+      observability.
+- [ ] **Hue bridge cert pinning** — `httpx.AsyncClient(verify=False)` for
+      Hue bridge (self-signed cert by design). Should record the
+      certificate fingerprint at pairing time and pin it on subsequent
+      requests; otherwise an on-path attacker can MITM the bridge.
+
+## Mechanical / code-quality (low risk, high line-count)
+
+- [ ] **i18n parity** — **328** keys missing in `ru.json`, **325** missing
+      in `zh.json`. Examples: `section.hide`, `filters.hsl_shift`,
+      `filters.contrast`, `filters.temporal_blur`,
+      `filters.audio_filter_template.desc`. Russian and Chinese users
+      currently see raw keys for these. This is translation work, not
+      code work.
+- [ ] **`Optional[T]` → `T | None`** (PEP 604) — large mechanical refactor
+      across the codebase. Can be auto-fixed via `ruff check --fix
+      --select UP007`. Worth doing once the file splits land.
+- [ ] **Hot-path `logger.error(f"...")` → `logger.error("... %s", e)`**
+      lazy-eval — mostly cosmetic; ~200 sites. The f-string still builds
+      the message even when DEBUG is off.
+- [ ] **Remaining `(window as any)` sites** — typed `global-types.d.ts`
+      is in place and new code uses `window.foo` directly, but ~80
+      existing sites still have the cast. Per-site mechanical cleanup.
+      Add `eslint`-equivalent guard (TS rule) to prevent new ones.
+- [ ] **Magic numbers → named constants** in processing hot paths —
+      `_FILTER_RECHECK_EVERY_N_FRAMES = 30` in
+      `core/processing/processed_stream.py:159`; `5 ms` / `5 s` /
+      `30 iterations` literals in `wled_target_processor.py:890,893,915`.
+- [ ] **Standardise `from __future__ import annotations`** across the
+      codebase. Some modules use the future-annotation form, others stick
+      with `Optional[...]`. Enforce one via ruff `FA` rules.
+
+## Test gaps
+
+- [ ] **Route-level integration test** for the WLED scheme inference —
+      POST `/api/v1/devices` with `{"url": "192.168.1.42",
+      "device_type": "wled"}` and assert the stored device has
+      `url == "http://192.168.1.42"`. The helper is exhaustively
+      unit-tested but no integration test exercises the create/update
+      flow end-to-end.
+- [ ] **IPv6 public address regression** — extend `test_url_scheme.py`
+      with explicit assertions for `2001:db8::1` and similar public IPv6
+      literals (the bare-label fallback used to misclassify these). The
+      helper does the right thing today via the IPv6 probe added during
+      the hardening pass, but no test pins it.
+
+## Pre-existing issues surfaced during the audit (not in our diff)
+
+These were flagged by the auditors but predate the review session — kept
+here as a future-work backlog:
+
+- [ ] **`icon-select.ts:_buildGrid` `item.icon` is interpolated raw** —
+      documented as "trusted SVG by design". If callers ever feed
+      user-supplied icon strings, that's an XSS sink. Audit every caller
+      that builds `IconSelectItem.icon` from non-constant data and
+      reject HTML there.
+- [ ] **`devices.py:461` `manager.update_device_info(device_url=update_data.url)`**
+      receives `None` when a PATCH omits `url` (rename / icon-only edit).
+      The processor never re-syncs in that case. Should pass
+      `existing.url` (after normalization) or skip the call.
+- [ ] **`asyncio.gather` over uncapped client lists** in preview broadcasts
+      — slow clients block the loop. Already noted under Performance
+      above; pre-existing.