docs: dashboard innerHTML reconciliation review notes

Design doc capturing the architectural pattern behind the perf-card
flicker (every innerHTML rewrite tears down the subtree, restarting
CSS animations / dropping focus / wasting layout) and the two
drivers — poll timers and server:* push events. Inventories every
remaining latent site in perf-charts.ts / dashboard.ts /
entity-events.ts / game-integration.ts, walks the decision ladder
from a setInnerHtmlIfChanged helper through hand-rolled cell
components to a Lit migration, and lands on Lit for polling-heavy
modules with entity-events.ts tab reconciliation sequenced ahead of
the dashboard cards because of its higher blast radius.

Planning artifact only — no implementation here.
This commit is contained in:
2026-05-28 17:26:56 +03:00
parent 66b85b0175
commit 10eb24b2ce
+249
View File
@@ -0,0 +1,249 @@
# Dashboard Reconciliation — Review Notes
*Captured 2026-05-26. Session focused on dashboard + perf-card flicker and per-poll re-rendering.*
*Updated 2026-05-27 — widened the audit beyond the two poll timers and found a **second driver** (server push) plus the **highest-blast-radius site** (`entity-events.ts`). Added §3.5, corrected the "out of scope" reasoning in §5, and confirmed the decision: **commit to the Lit migration**. Implementation deferred — this is still a planning doc, not a spec.*
This is a thinking-aloud document for whoever picks up reconciliation work next (likely me). It captures the bug class, what's already shipped, what's still latent, the decision ladder we walked through, and the recommendation we landed on. It is **not** a spec — treat any code shown as illustrative.
---
## 1. The bug class in one sentence
> Every place a data-driven render — a poll timer **or** a server-pushed `server:*` event — writes `el.innerHTML = ...`, the existing DOM is torn down — even when the new HTML equals the old — which restarts CSS animations, drops focus, skips transitions, and burns wasted DOM mutation cycles.
The symptom only becomes visually loud when the destroyed subtree contains a CSS keyframe animation (e.g. the pulsing `.perf-patches-empty-dot`). Everywhere else the cost is silent: lost transitions, broken focus, wasted layout work. The bug is **load-bearing in the architecture**, not in any single call site — that's why we keep coming back to it.
---
## 2. What landed in commit `f6486f9` (this session)
Tactical work — solves the worst cases, does not change the architecture.
### `server/src/ledgrab/static/js/features/dashboard.ts`
- Collapsed the two fast-path branches into one. Fast path runs when `structureUnchanged && !forceFullRender` regardless of `running.length`. Previously, **zero running targets meant every poll rebuilt the entire dashboard** even when nothing changed.
- `_lastSyncClockIds` no longer fingerprints `is_running` — pausing/resuming a clock no longer tears down every card. `_updateSyncClocksInPlace` already handles the toggle.
- `_updateAutomationsInPlace` now called from the unified fast path. Automation badges were silently going stale on the fast path.
- `_initFpsCharts` rewritten diff-based: only destroy charts for ids that left or whose canvas was detached by a DOM swap; only create for new ids; only fetch `/api/metrics/history` when there are genuinely new ids needing seed data.
- Sync-clock pause/resume/reset callers + `server:automation_state_changed` SSE handler now use `loadDashboard()` (no force) — `forceFullRender` is now actually load-bearing, meaning "settings changed, full rebuild required."
### `server/src/ledgrab/static/js/features/perf-charts.ts`
- `_renderChartSvg` no longer rewrites `innerHTML` per poll. The SVG skeleton (ref line + sys area/line + app line) is built once via `_ensureSparkNodes` and mutated thereafter. WeakMap cache (`_sparkNodeCache`) keyed by host element avoids the per-tick `querySelector` cost.
- Hidden cards (env-disabled GPU/Temp) skip render entirely.
- `_fetchPerformance` switched to `fetchWithAuth`.
- Hardcoded English strings replaced with `t()` calls. New keys: `perf.no_captures`, `perf.captures_count.{one,few,many,other}`, `perf.ratio_of_requested`, `perf.total_count`, `perf.skipped_per_sec`, `perf.tip.now`, `perf.tip.ago` (en/ru/zh).
- Tooltip reads `dashboardPollInterval` per mousemove tick (was captured at bind time).
- Dead `<defs><linearGradient>` block removed.
- `updateTotalCaptureFpsActual` now delegates to `_paintCaptureFpsActualValue` — single code path.
- `updateActivePatches` / `updateDevices` skip the `innerHTML` write when content signature hasn't changed. This is the direct fix for the "READY TO LAUNCH flickers every update" report — the empty-state dot's CSS pulse no longer resets.
- Two missing semicolons in `_seedAggregateHistories` (ASI was saving us).
### Reviewer findings addressed (typescript-reviewer pass)
- **HIGH:** `_metricLabel` was looking up `dashboard.perf.${key}` but the FPS family uses `dashboard.perf.total_fps`, `total_capture_fps`, `total_capture_fps_actual`. Tooltip would have shouted `FPS` / `CAPTURE_FPS` / `CAPTURE_FPS_ACTUAL`. Fixed via explicit `METRIC_LABEL_KEYS` map.
- **HIGH:** `_ensureSparkNodes` silently coerced `null` children to non-null when the SVG existed but a child was missing. Hardened to validate all four children and rebuild if any are missing.
---
## 3. Hot spots still latent
These are the call sites where `innerHTML` is still written every poll. None are flickering today (no CSS animations on their inner elements), but every one is the same bug shape and will bite the next time someone adds a keyframe / transition / focus target inside.
### `perf-charts.ts`
| Line | Site | Fires per poll? | Notes |
|------|------|-----------------|-------|
| 462 | `updateActivePatches``listEl.innerHTML` | yes | guarded by signature compare (✓) |
| 493 | `updateTotalFps``valEl.innerHTML` | yes | FPS value, no inner animation |
| 526 | `updateTotalCaptureFps``valEl.innerHTML` | yes | same |
| 638 | `_paintNetworkValue``valEl.innerHTML` | yes | bytes/s value |
| 655 | `_paintDeviceLatencyValue``valEl.innerHTML` (no-devices hint) | yes | hint span |
| 657 | `_paintDeviceLatencyValue``valEl.innerHTML` (offline hint) | yes | hint span |
| 660 | `_paintDeviceLatencyValue``valEl.innerHTML` (ms value) | yes | value |
| 676 | `_paintSendTimingValue``valEl.innerHTML` (idle hint) | yes | hint span |
| 679 | `_paintSendTimingValue``valEl.innerHTML` (ms value) | yes | value |
| 738 | `_paintErrorsValue``valEl.innerHTML` | yes | rate value |
| 806 | `updateDevices``dotsEl.innerHTML` | yes | guarded by signature compare (✓) |
| 1086 | `_renderValuePair``mainEl.innerHTML = appVal` | yes | dual sys/app value |
| 1088 | `_renderValuePair``mainEl.innerHTML = sysVal` | yes | dual sys/app value |
| 1094 | `_renderValuePair``tagEl.innerHTML` (App tag) | mode='both' only | App tag in `both` mode |
| 1181 | `_applyPerfDataToDom` temp hint | only when cpu_temp_hint_key changes | rare |
| 1449 | `_paintFpsValue` | seed only | once per init |
| 1456 | `_paintCaptureFpsValue` | seed only | once per init |
| 1463 | `_paintCaptureFpsActualValue` (no-captures hint) | yes via live updater | now goes through painter |
| 1469 | `_paintCaptureFpsActualValue` (value) | yes via live updater | same |
| 1499 | `_paintErrorsValue` (duplicate of 738) | seed only | once per init |
| 1823 | tooltip `tip.innerHTML` | per mousemove | rate-limited by hover only |
### `dashboard.ts`
| Line | Site | Fires per poll? | Notes |
|------|------|-----------------|-------|
| 275 | `_updateRunningMetrics``fpsEl.innerHTML` | per running target | live FPS pill — visible churn |
| 293 | `_updateRunningMetrics``labelEl.innerHTML` (errors label) | per running target | rebuilt each poll |
| 340 | `_updateAutomationsInPlace``btn.innerHTML` | only on enable/disable change | low frequency |
| 366 | `_updateSyncClocksInPlace``btn.innerHTML` | per poll for every clock | wasteful |
| 975 | `loadDashboard` first-load → `container.innerHTML` | once per init | fine |
| 989 | `loadDashboard` slow path → `dynamic.innerHTML = dynamicHtml` | only when slow path fires | the **big** swap, scoped already |
| 1010 | `loadDashboard` error path | rare | fine |
| 1416 | `subscribeDashboardLayout` clear | rare | fine |
### What this list tells us
- The remaining innerHTML writes are **per-cell value updates** that paint formatted spans (`{value}<span class="perf-fps-unit">fps</span>`). Each rewrite destroys two text nodes + a span every poll across ~10 cells. Not flickering today; will flicker the moment anyone adds an animation to `.perf-fps-unit` or `.perf-fps-ceiling`.
- The pattern can be killed without architectural change by splitting these into a stable structure (number text node + static unit span) and only updating `textContent` of the number. That's what L3 / Lit would force naturally.
---
## 3.5 Beyond dashboard/perf — push-driven reconciliation
*Added 2026-05-27. The §3 audit was scoped to the two poll timers we were debugging. Widening the `\.innerHTML\s*=` search showed the bug class has a **second driver** and lives outside dashboard/perf too.*
### Two drivers, not one
The teardown is triggered by anything that re-renders **without user intent**:
- **Poll timers** (`setInterval`) — what §2/§3 covered (`dashboard.ts` `_uptimeTimer` + main refresh, `perf-charts.ts` `_pollTimer`).
- **Server-pushed `server:*` events** — `core/events-ws.ts` turns each WS message into a `server:*` CustomEvent; feature modules listen and re-render through the *same* `innerHTML` paths.
So the one-line bug class in §1 reads "poll- **or** push-driven," not just poll.
### Genuinely-affected sites outside dashboard/perf
| Site | Driver | Shape | Notes |
| ---- | ------ | ----- | ----- |
| `core/entity-events.ts` `_invalidateAndReload` | push (`server:entity_changed`, `server:device_health_changed`) | full-**tab** rebuild via `loadTargetsTab` / `loadPictureSources` / `loadAutomations` / `loadIntegrations` | **highest blast radius.** A single pushed entity change tears down and rebuilds an entire tab — losing scroll, focus, open inline editors, restarting card-enter animations. |
| `features/game-integration.ts` event feed (`_eventMonitorTimer`) | poll (2 s) | `feed.innerHTML = events.slice(0,20).map(...)` | full 20-item list rebuild every 2 s while the panel is open. |
| `features/game-integration.ts` connection test (`_connectionTestTimer`) | poll | `panel.innerHTML = …` per tick | transient, low frequency. |
`entity-events.ts` already has the **L1 floor applied by hand**: a 600 ms debounce plus a diff check (`oldData === newData`, then length + `id` + `updated_at` compare) that skips the reload when nothing changed. That kills the *no-op* case — but a **real** change still does the full-tab teardown. This is exactly the §4-L1 limitation ("still tears down when content *does* differ"), live across the whole app.
### Counter-examples that already do it right
Two poll loops never flicker because they mutate `textContent` on a **stable structure** instead of rewriting `innerHTML`:
- `core/api.ts` `loadServerInfo` (connection-check poll) — `versionEl.textContent` / `statusEl.textContent`.
- `features/color-strips/test.ts` FPS sampler (1 s) — `valueEl.textContent` / `avgEl.textContent`.
These are live proof that "stable structure + mutate text node" is the fix — i.e. what L3 / Lit force by construction.
### What this changes about the plan
The §4 ladder was reasoned entirely around **per-cell** rendering, because that was the visible flicker. The push-driven finding surfaces a second, qualitatively different problem:
- **Problem A — cell value churn:** every poll, one value span. Loud only with animations. *Mostly fixed in `f6486f9`.* → wants `setText` / skip-if-unchanged.
- **Problem B — list/tab teardown:** on change/push, an entire list or tab. Loses scroll/focus/open editors. *Unaddressed.* `entity-events.ts` and the game feed are Problem B. → wants **keyed list reconciliation**.
Problem B is a **list-level** concern, not a cell-level one. In Lit terms it maps to a keyed `repeat()` directive over the tab/list body — the dashboard-card work in Phase 2 already needs this, but `entity-events.ts` needs it for tabs that §5 used to list as "out of scope." This does **not** change the chosen direction (Lit); it adds `entity-events.ts` as a first-class, high-priority target.
---
## 4. Decision ladder
Walked through with the user 2026-05-26. Captured here so we don't re-litigate.
### L1 — drop-in `setInnerHtmlIfChanged` helper
- **Shape:** `WeakMap<Element, string>` cache; replace every `el.innerHTML = x` with `setInnerHtmlIfChanged(el, x)`.
- **Wins:** stops the no-change rewrites globally; zero behavior risk; ~30 call-site changes.
- **Misses:** still tears down DOM when content *does* differ (e.g. FPS row values change every tick); doesn't preserve focus/transition state inside a list.
- **Verdict:** floor, not ceiling. Worth doing for cells that don't get migrated to L3/Lit.
### L2 — lint guard
- **Shape:** pre-commit script greps `\.innerHTML\s*=` in `static/js/` outside an allowlist, fails the commit.
- **Wins:** keeps the discipline; cheap.
- **Misses:** only useful as a pair with L1+; bare guard with no helper makes contributors angry.
- **Verdict:** pair with whatever helper we land on.
### L3 — hand-rolled cell-component pattern
- **Shape:** `defineCell({ html, refs, mount, update, unmount })` + `reconcileList(host, items, binding)` + `setText/setClass/setAttr` mutators. ~150300 lines of runtime.
- **Wins:** correct by construction; no dependencies; explicit about what mutates; composes with existing customize panel / color picker.
- **Misses:** we own the abstraction — it grows over time as we need transitions, async data, focus, devtools, error boundaries. Death by a thousand features.
- **Verdict:** second-best. Strong contender if zero-deps is a hard constraint.
### Lit migration of polling modules — **recommended**
- **Shape:** convert each perf cell + each dashboard card cell to a Lit web component. Use `html\`<span>${value}</span>\`` tagged-template + targeted diff. ~5KB gzip added to bundle, no new build step (esbuild handles it).
- **Wins:** solves the bug class by design; maintained by Google + community; web-components-based so no framework lock-in; composes with vanilla DOM trivially; mental model is close to current template-string idiom; non-polling code can stay vanilla forever.
- **Misses:** introduces a dependency; contributors learn one more thing; rare edge cases (`@html`-equivalent exists and reintroduces the bug if misused).
- **Verdict:** best ceiling-to-cost ratio for a small team. Recommended.
### Full framework rewrite (React / Vue / Solid)
- **Verdict:** overkill. The bug class lives in polling paths; the rest of the app is fine. Spending the migration budget on rebuilding IconSelect / EntitySelect / modals / customize panel / graph editor — none of which are broken — is a bad trade.
---
## 5. Recommendation
**Lit for the polling-heavy modules.**
Migration plan:
### Phase 0 — spike (2-hour time-box)
- Convert `patches` cell to a Lit component, end to end.
- Verify it plays nicely with: color picker integration, customize panel layout reorder, `rerenderPerfGrid` reconciliation, `setPerfMode` toggle, hidden-by-env state, the spark tooltip handler.
- If any of those break in an unfixable way → pivot to L3.
- If they work → commit to the migration.
### Phase 1 — perf-charts cells
1. `patches` (already spiked)
2. `devices`
3. `fps` / `capture_fps` / `capture_fps_actual` (share a sparkline base class)
4. `cpu` / `ram` / `gpu` / `temp` (share `_sparkCardHtml` template family)
5. `network` / `device_latency` / `send_timing` / `errors`
Each is its own PR, dashboard stays working at every step. `renderPerfSection` becomes a registry of Lit components; `rerenderPerfGrid` becomes "reorder existing elements in the grid" (which it mostly already does).
### Phase 2 — dashboard card cells
6. Output target cards (running variant — biggest payoff, has live FPS + uptime + errors)
7. Output target cards (stopped variant)
8. Sync clock cards
9. Automation cards
10. Integration (HA / MQTT) cards
These get bigger wins from the migration because they have nested mutable state (FPS pill, errors cell, health dot, action button) that's currently rebuilt per poll via the `_updateRunningMetrics` path.
### Highest-impact: `entity-events.ts` tab reconciliation (sequence early)
`entity-events.ts` (§3.5) is the single highest-blast-radius site and is **not** on the dashboard — it re-renders the Targets / Integrations / Automations tabs on server push. Whether or not those tabs' cells become Lit components, the loader path (`loadTargetsTab` / `loadIntegrations` / `loadAutomations`) should switch from a full `innerHTML` rebuild to a **keyed list reconcile** (a Lit `repeat()` over the tab body). This preserves scroll / focus / open inline editors across pushes. If the goal is "biggest UX win first" rather than "lowest-risk first," sequence this ahead of Phase 2.
### Phase 3 — stopgap helper for the rest
Add `setInnerHtmlIfChanged` and apply to any remaining vanilla polling sites we don't plan to migrate. Add the L2 lint guard at this point — by now everything that polls is either Lit-managed or uses the helper.
### Out of scope (deliberately) — with one correction (2026-05-27)
- Targets tab, automations editor, integrations, scene presets — these render on-demand, **but they are ALSO re-rendered on server push** via `entity-events.ts` (see §3.5). The original claim that "the bug class doesn't bite them" was **wrong**: a pushed `server:entity_changed` does a full-tab `innerHTML` teardown. The *editor / on-demand views* can stay vanilla, but the **list/tab render that entity-events triggers needs reconciliation** (a keyed list diff) regardless of whether those cells become Lit components. Treat the entity-events reload path as **in-scope** — it is the highest-blast-radius Problem B site.
- Color strips editor, graph editor, settings — genuinely on-demand, no push re-render path, stay vanilla.
- Transport bar cells (CPU/Mem chip in the top bar) — read from the same perf payload, can be migrated opportunistically but not urgent.
---
## 6. Open questions to settle before committing
These came up during the discussion and weren't resolved:
1. **Bundle-size budget.** Is +5KB acceptable? Current bundle is 2.7MB so this is noise — but worth confirming there isn't a strict cap (e.g. for slow networks / Android Chaquopy embed).
2. **Contributor model.** If the project will grow to multiple contributors, Lit's smaller community vs React's is a recruiting tradeoff. Currently solo-ish, so probably moot.
3. **Android TV target.** Chaquopy embed serves the same bundle. Lit works fine in any modern browser — Android TV WebView is Chromium-based. Should be a no-op but verify in Phase 0 spike.
4. **Long-term framework intent.** If there's a chance we ever migrate to React/Vue/Solid for the rest of the app, doing Lit now is *not* lock-in (web components are standard), but it does add a second mental model. Probably fine; just naming the tradeoff.
5. **Customize panel.** The drag-reorder code in `dashboard-customize.ts` mutates `.dashboard-section` DOM directly. Lit components reorder cleanly via `moveBefore` / `insertBefore` since they're just elements, but the dnd library needs to treat them as opaque drag handles. Phase 0 spike should confirm.
---
## 7. Pointers
- Source files most relevant:
- `server/src/ledgrab/static/js/features/dashboard.ts`
- `server/src/ledgrab/static/js/features/perf-charts.ts`
- `server/src/ledgrab/static/js/features/dashboard-layout.ts` (cell ordering + visibility)
- `server/src/ledgrab/static/js/features/dashboard-customize.ts` (drag-reorder UI)
- `server/src/ledgrab/static/js/core/card-modes.ts` (mode toggle that hangs off section headers)
- `server/src/ledgrab/static/js/core/entity-events.ts` (push-driven tab reloads — §3.5, highest blast radius)
- `server/src/ledgrab/static/js/core/events-ws.ts` (WS → `server:*` CustomEvent dispatch)
- `server/src/ledgrab/static/js/features/game-integration.ts` (2 s event-feed list rebuild — §3.5)
- Most recent reconciliation commit: `f6486f9`.
- Related skill files in `~/.claude/skills/`: `frontend-patterns`, `documentation-lookup` (for Lit docs via Context7).
- Locale convention: `perf.*` for cross-card primitives, `dashboard.perf.*` for cell titles.
---
## 8. If this doc gets stale
If you read this and the perf cells are already Lit components — delete this file. If you read this and there's a new flicker / focus / transition bug nobody can explain — search for `\.innerHTML\s*=` in `static/js/features/` **and `static/js/core/`** (`entity-events` lives in core) and you've probably found it. For *state loss on a server event* (scroll jump, focus drop, an inline editor closing itself), look at the `server:*` listeners in `core/entity-events.ts` first.