10 commits in this branch landed the data-safety bugs (C2, C11), the worst parallel-change problems (C1/C3/C4/C6/C7), and 5 of the 9 HIGH audit findings. The remainder splits cleanly into four follow-up sessions: a frontend sprint (C8/C9/C10/H6/H7/H8/M7-M11/L1), a Device redesign (H4), a BaseTargetProcessor ABC (C5/H5/M1/M2/L2), and polish (M3/M6/M12/L3-L5). This file documents what's left, the recommended ordering, and the three registry patterns already in the codebase that new contributors should reach for instead of writing fresh if/elif chains.
13 KiB
LedGrab Architecture Audit — Remaining Items
Roadmap for the architecture-audit refactor sprint that started 2026-05-22.
This file lists every audit finding that is not yet addressed; the ones
already landed in commits 563cbac..2f15fbb are summarised below for
context.
Already done (10 commits)
| Commit | Findings addressed |
|---|---|
563cbac |
C2, C11, C1 (parallel-change only), C3, C4, C6, C7-streams |
29bdacf |
C5 (HA/Z2M swap helper; full ABC deferred) |
97dae2c |
H1 |
5fec8db |
M4 |
98fb61d |
H2 |
9f3f346 |
M5 |
05f73ee |
H6 (bindable extraction only) |
3b8f00e + c1aa2eb |
C7 store-side |
2f15fbb |
H3 |
All commits have ≥1 code-review subagent pass with HIGH findings fixed before commit. Tests pass on each commit; ruff clean; tsc + bundle build clean for the frontend commit.
The two CRITICAL data-safety items (C2 silent CSS fallback, C11 string-replace JSON migration) are fixed. The two CRITICAL parallel-change problems for color-strip + value-source dispatch are fixed. The two HIGH dispatch problems (H1 effects, H2 rules) are fixed.
Remaining backend items
HIGH
H4 — Device.__init__ 40+ params mixing per-type fields
File: server/src/ledgrab/storage/device_store.py:46-150
The Device dataclass constructor accepts ~40 parameters that mix common
fields with DMX-only / DDP-only / Hue-only / Yeelight-only / Wiz-only /
LIFX-only / Govee-only / Nanoleaf-only / SPI-only / Chroma-only /
GameSense-only fields. Setting hue_username on a WLED device is
silently ignored.
Approach: introduce per-device-type config dataclasses
(DmxConfig, HueConfig, DdpConfig, …) and make Device.config a
discriminated union. Per-type validation moves to the config classes.
Wire migration: every existing device row needs to be re-parsed; use the
versioned MigrationRunner introduced in Phase 1.2.
Risk: medium-high. Touches:
storage/device_store.py— Device dataclass,from_dict,to_dict,create_device,update_deviceapi/schemas/devices.py— Pydantic schemasapi/routes/devices.py— request validationcore/devices/*— every provider reads device fields- A new migration to translate flat fields → nested
config
Estimated scope: ~1500 LOC diff, 1-2 dedicated sessions.
H5 — WledTargetProcessor god class (32 methods, 5 responsibilities)
File: server/src/ledgrab/core/processing/wled_target_processor.py (1238 LOC)
Conflates:
- Device connectivity (probe, liveness, reconnect)
- FPS negotiation (adaptive_fps, keepalive_interval, state_check_interval)
- LED resampling (
_fit_to_device— 60 lines of numpy) - Preview WebSocket fanout (
_preview_clients,_broadcast_led_preview) - Metrics emission (
get_state,get_metrics)
Approach: extract WledDeviceConnector, WledPixelSender,
TargetFitProcessor, TargetPreviewBroadcaster, TargetMetricsCollector.
WledTargetProcessor becomes an orchestrator that composes them.
Risk: HIGHEST in the audit. This class drives physical LED hardware in production. A regression caught at runtime (in the user's living room) is the expensive failure mode. Needs manual verification with at least one real WLED device after the refactor.
Coupled with: C5 (HA/Z2M shared the same shape; should extract a
common BaseTargetProcessor ABC at the same time so all three
processors share lifecycle / preview / metrics code).
Estimated scope: ~2000 LOC diff, 2-3 dedicated sessions, with manual device testing after each.
H7 — device-discovery.ts 1745 LOC
Frontend mirror of H4. The onDeviceTypeChanged handler has a giant
switch with 15+ device kinds and 15+ _showXxxFields / _buildXxxItems
helpers. Adding a device type requires editing 5 separate frontend hooks.
Approach: mirror the H4 backend redesign — once the storage layer has per-type config objects, the frontend can have a per-type field-set registry. Best done after H4 lands so the schemas drive the registry.
Estimated scope: 1-2 sessions; coupled to H4.
H8 — automations.ts 1410 LOC
Frontend mirror of H2 (rule polymorphism). Already addressed on the
backend in 98fb61d; the frontend dispatch on RuleType is still
hand-rolled.
Approach: introduce a rule-type registry on the frontend matching
the backend's _RULE_HANDLERS shape.
Estimated scope: half a session.
MEDIUM
M1 — ProcessorManager.add_target shotgun (11 args, WLED-leak)
File: server/src/ledgrab/core/processing/processor_manager.py:396
Method is named generically (add_target) but accepts protocol="ddp"
and keepalive_interval — WLED-only fields. HA and Z2M have sibling
methods with their own bespoke params.
Approach: extract a TargetFactory (per-kind builders, similar to
value_source_factories.py from Phase 7). Couple with H5/C5 work.
M2 — TargetContext god-bag
File: server/src/ledgrab/core/processing/processor_manager.py
@dataclass TargetContext exposes ~8 attributes (device_store,
color_strip_stream_manager, value_stream_manager, metrics_history,
mqtt_manager, ha_manager, …). Processors silently depend on whichever
fields they read. Tests have to construct a huge mock context.
Approach: make per-processor explicit dependency injection. Couple with H5 work.
M3 — Validation duplicated across layers
Field-level constraints (composite nesting depth, name uniqueness, span ranges) are enforced in route + schema + store. Adding a new constraint means editing 3 places.
Approach: move all validation to the model/schema layer (Pydantic
validators + dataclass __post_init__). Routes trust the schema; store
trusts the model.
Risk: moderate — cross-cutting; needs careful review of which layer currently owns which constraint.
M6 — ws_stream.py mixed concerns (699 LOC)
File: server/src/ledgrab/api/routes/color_strip_sources/ws_stream.py
The worst part (stream-creation dispatch) was fixed in Phase 2.1 — it
now calls color_strip_kinds.build_stream(source, deps). The remaining
699 lines mix config parsing + WebSocket lifecycle + frame loop. Could
extract the frame loop into a separate PreviewFrameLoop class.
Estimated scope: half a session. Low impact since the parallel-change problem is already fixed.
M7 — No shared frontend API client
File: every static/js/features/*.ts
fetchWithAuth(...) + bespoke error-unwrapping is copy-pasted in every
feature's save / load function. ~25 files.
Approach: introduce static/js/core/api-client.ts with typed
methods (get, post, put, delete) that handle auth, JSON parsing,
error normalisation. Replace fetchWithAuth calls across features.
M8 — Global _cached* let vars
Mutable module-level state mutated from multiple feature modules. No
subscription model — features manually invalidate() after CRUD.
Approach: introduce a reactive cache (EventEmitter pattern or a tiny store like Nano Stores). Couple with M7 (the API client can drive cache invalidation on write).
M9 — dashboard.ts 1421 LOC
Frontend god-module orchestrating + rendering device / target / CSS cards. Couple with C8/C9/C10 frontend split work.
M10 — Duplicate frontend modal classes
ValueSourceModal, StreamEditorModal, TargetEditorModal,
AddDeviceModal, etc. each reimplement pristine-check / undo / focus
management.
Approach: introduce a FormModal<T> base class.
M11 — Hardcoded _getSectionForSource / _getTabForSource
Routing tables duplicated across multiple feature files (streams.ts, value-sources.ts). Adding a new stream type requires hunting strings.
Approach: single routing registry keyed by source_type.
M12 — Late imports masking cycles
Partially addressed by the kind registries (Phase 2.1, 2.2). Some
late-imports still exist in value_stream.py, audio_stream.py, the
target processors. Resolving them requires restructuring module layout
to break the circular dependencies.
Estimated scope: small follow-up after H5.
LOW
L1 — (src as any).field casts in value-sources.ts
Discriminated unions aren't narrowed properly. Couple with C8 frontend split.
L2 — Mutable state without locks
_preview_clients, _last_preview_data, _color_stream,
_css_stream are mutated from multiple async tasks without explicit
locks. Production has not exhibited issues but the contract is fragile.
Approach: add explicit asyncio.Lock per processor. Couple with H5.
L3 — Calibration.validate() raises instead of returning result
File: server/src/ledgrab/core/capture/calibration.py:164
All 4 call sites currently rely on the raise; converting to
ValidationResult would force every caller to check a return value
without adding safety. Recommendation: skip — current design is
appropriate.
L4 — _SOURCE_TYPE_MAP is module-private
No public GET /api/v1/source-types discovery endpoint. Frontend
hardcodes the list of source types in types.ts.
Approach: add a discovery route + matching frontend fetch. Couple
with H6 frontend split (since types.ts is involved).
L5 — AudioValueStream implicit state machine
File: server/src/ledgrab/core/processing/value_stream.py:169-383
get_value() can be called before start(); transitions are implicit.
Approach: explicit State pattern. Low value (production callers
always start before reading).
Remaining frontend items (all)
CRITICAL
- C8 —
value-sources.ts1972 LOC (4 god-functions, type-dispatch ladders) - C9 —
graph-editor.ts2707 LOC (layout + interaction + state + WS sync + …) - C10 —
streams.ts2341 LOC (picture / audio / template kitchen-sink)
Other frontend (severity in main list above)
- H6 rest — split remaining ~1100 LOC of
types.tsinto per-entity files - H7 —
device-discovery.ts1745 LOC (couple with H4) - H8 —
automations.ts1410 LOC (mirror H2) - M7 — shared API client
- M8 — reactive cache
- M9 —
dashboard.ts1421 LOC - M10 —
FormModal<T>base - M11 — routing registry
- L1 — narrowing the discriminated unions
The frontend remainder is multi-day work even when broken up by finding. Recommended approach: a dedicated frontend sprint with the typescript-reviewer agent + manual UI testing for each god-module split. Order:
- Finish
types.tssplit (H6) — pure organisation, low risk, unblocks the rest - Introduce API client (M7) — every feature file gains a cleaner shape
- Split
value-sources.ts(C8) — uses the API client + per-type registry pattern - Split
streams.ts(C10) - Split
graph-editor.ts(C9) — needs the most care; the file owns the entire visual editor - Polish:
dashboard.ts(M9),device-discovery.ts(H7),automations.ts(H8),FormModal(M10), routing registry (M11), reactive cache (M8), narrowing (L1)
Recommended ordering for future sessions
Session A — Frontend sprint (multi-day)
Address H6-rest, C8, C9, C10, H7, H8, M7-M11, L1. See order above. Critical to have typescript-reviewer feedback + manual UI testing after each split.
Session B — Device redesign (1-2 sessions)
Address H4 alone. Touches device storage + provider classes; needs a data migration. Once H4 lands, H7 frontend mirror can follow.
Session C — BaseTargetProcessor ABC (2-3 sessions)
Address C5 (full) + H5 + M1 + M2 + L2 together. Highest risk in the audit because it drives physical LED hardware. Each step needs manual verification with a real device.
Session D — Polish (half a session)
Address M3, M6 (remainder), M12 (remainder), L3 (decision: skip), L4, L5.
Pattern reference for new contributors
Three registry-pattern templates that already exist in the codebase and should be the model for the remaining dispatch ladders:
-
Class-level handler dict + import-time coverage assertion
core/processing/effect_stream.py::_RENDERERS(@_effect_rendererdecorator +@_collect_effect_renderersclass decorator)core/automations/automation_engine.py::AutomationEngine._RULE_HANDLERS(module-level binding after class definition)api/routes/output_targets.py::_TARGET_RESPONSE_BUILDERS(response-shape dispatch keyed by storage class)
-
Per-type free functions + dependency-bag dataclass
core/processing/color_strip_kinds.py(StreamDeps+STREAM_BUILDERS)core/processing/value_kinds.py(ValueStreamDeps+STREAM_BUILDERS)storage/value_source_factories.py(CREATE_BUILDERS+UPDATE_APPLIERS)
-
Versioned migration runner
storage/data_migrations.py(MigrationRunner+DataMigrationABC)- Used for any storage rename / field-shape change in the future.
- Audit-table contract: atomic transaction covers applied-check + apply + record, so partial-failure cannot leave data rewritten but unrecorded.
Adding a new feature that touches dispatch should reach for one of these three patterns before writing a fresh if/elif chain.