Files
ledgrab/REVIEW_RECONCILE_NOTES.md
alexei.dolgolyov 10eb24b2ce docs: dashboard innerHTML reconciliation review notes
Design doc capturing the architectural pattern behind the perf-card
flicker (every innerHTML rewrite tears down the subtree, restarting
CSS animations / dropping focus / wasting layout) and the two
drivers — poll timers and server:* push events. Inventories every
remaining latent site in perf-charts.ts / dashboard.ts /
entity-events.ts / game-integration.ts, walks the decision ladder
from a setInnerHtmlIfChanged helper through hand-rolled cell
components to a Lit migration, and lands on Lit for polling-heavy
modules with entity-events.ts tab reconciliation sequenced ahead of
the dashboard cards because of its higher blast radius.

Planning artifact only — no implementation here.
2026-05-28 17:26:56 +03:00

20 KiB
Raw Permalink Blame History

Dashboard Reconciliation — Review Notes

Captured 2026-05-26. Session focused on dashboard + perf-card flicker and per-poll re-rendering.

Updated 2026-05-27 — widened the audit beyond the two poll timers and found a second driver (server push) plus the highest-blast-radius site (entity-events.ts). Added §3.5, corrected the "out of scope" reasoning in §5, and confirmed the decision: commit to the Lit migration. Implementation deferred — this is still a planning doc, not a spec.

This is a thinking-aloud document for whoever picks up reconciliation work next (likely me). It captures the bug class, what's already shipped, what's still latent, the decision ladder we walked through, and the recommendation we landed on. It is not a spec — treat any code shown as illustrative.


1. The bug class in one sentence

Every place a data-driven render — a poll timer or a server-pushed server:* event — writes el.innerHTML = ..., the existing DOM is torn down — even when the new HTML equals the old — which restarts CSS animations, drops focus, skips transitions, and burns wasted DOM mutation cycles.

The symptom only becomes visually loud when the destroyed subtree contains a CSS keyframe animation (e.g. the pulsing .perf-patches-empty-dot). Everywhere else the cost is silent: lost transitions, broken focus, wasted layout work. The bug is load-bearing in the architecture, not in any single call site — that's why we keep coming back to it.


2. What landed in commit f6486f9 (this session)

Tactical work — solves the worst cases, does not change the architecture.

server/src/ledgrab/static/js/features/dashboard.ts

  • Collapsed the two fast-path branches into one. Fast path runs when structureUnchanged && !forceFullRender regardless of running.length. Previously, zero running targets meant every poll rebuilt the entire dashboard even when nothing changed.
  • _lastSyncClockIds no longer fingerprints is_running — pausing/resuming a clock no longer tears down every card. _updateSyncClocksInPlace already handles the toggle.
  • _updateAutomationsInPlace now called from the unified fast path. Automation badges were silently going stale on the fast path.
  • _initFpsCharts rewritten diff-based: only destroy charts for ids that left or whose canvas was detached by a DOM swap; only create for new ids; only fetch /api/metrics/history when there are genuinely new ids needing seed data.
  • Sync-clock pause/resume/reset callers + server:automation_state_changed SSE handler now use loadDashboard() (no force) — forceFullRender is now actually load-bearing, meaning "settings changed, full rebuild required."

server/src/ledgrab/static/js/features/perf-charts.ts

  • _renderChartSvg no longer rewrites innerHTML per poll. The SVG skeleton (ref line + sys area/line + app line) is built once via _ensureSparkNodes and mutated thereafter. WeakMap cache (_sparkNodeCache) keyed by host element avoids the per-tick querySelector cost.
  • Hidden cards (env-disabled GPU/Temp) skip render entirely.
  • _fetchPerformance switched to fetchWithAuth.
  • Hardcoded English strings replaced with t() calls. New keys: perf.no_captures, perf.captures_count.{one,few,many,other}, perf.ratio_of_requested, perf.total_count, perf.skipped_per_sec, perf.tip.now, perf.tip.ago (en/ru/zh).
  • Tooltip reads dashboardPollInterval per mousemove tick (was captured at bind time).
  • Dead <defs><linearGradient> block removed.
  • updateTotalCaptureFpsActual now delegates to _paintCaptureFpsActualValue — single code path.
  • updateActivePatches / updateDevices skip the innerHTML write when content signature hasn't changed. This is the direct fix for the "READY TO LAUNCH flickers every update" report — the empty-state dot's CSS pulse no longer resets.
  • Two missing semicolons in _seedAggregateHistories (ASI was saving us).

Reviewer findings addressed (typescript-reviewer pass)

  • HIGH: _metricLabel was looking up dashboard.perf.${key} but the FPS family uses dashboard.perf.total_fps, total_capture_fps, total_capture_fps_actual. Tooltip would have shouted FPS / CAPTURE_FPS / CAPTURE_FPS_ACTUAL. Fixed via explicit METRIC_LABEL_KEYS map.
  • HIGH: _ensureSparkNodes silently coerced null children to non-null when the SVG existed but a child was missing. Hardened to validate all four children and rebuild if any are missing.

3. Hot spots still latent

These are the call sites where innerHTML is still written every poll. None are flickering today (no CSS animations on their inner elements), but every one is the same bug shape and will bite the next time someone adds a keyframe / transition / focus target inside.

perf-charts.ts

Line Site Fires per poll? Notes
462 updateActivePatcheslistEl.innerHTML yes guarded by signature compare (✓)
493 updateTotalFpsvalEl.innerHTML yes FPS value, no inner animation
526 updateTotalCaptureFpsvalEl.innerHTML yes same
638 _paintNetworkValuevalEl.innerHTML yes bytes/s value
655 _paintDeviceLatencyValuevalEl.innerHTML (no-devices hint) yes hint span
657 _paintDeviceLatencyValuevalEl.innerHTML (offline hint) yes hint span
660 _paintDeviceLatencyValuevalEl.innerHTML (ms value) yes value
676 _paintSendTimingValuevalEl.innerHTML (idle hint) yes hint span
679 _paintSendTimingValuevalEl.innerHTML (ms value) yes value
738 _paintErrorsValuevalEl.innerHTML yes rate value
806 updateDevicesdotsEl.innerHTML yes guarded by signature compare (✓)
1086 _renderValuePairmainEl.innerHTML = appVal yes dual sys/app value
1088 _renderValuePairmainEl.innerHTML = sysVal yes dual sys/app value
1094 _renderValuePairtagEl.innerHTML (App tag) mode='both' only App tag in both mode
1181 _applyPerfDataToDom temp hint only when cpu_temp_hint_key changes rare
1449 _paintFpsValue seed only once per init
1456 _paintCaptureFpsValue seed only once per init
1463 _paintCaptureFpsActualValue (no-captures hint) yes via live updater now goes through painter
1469 _paintCaptureFpsActualValue (value) yes via live updater same
1499 _paintErrorsValue (duplicate of 738) seed only once per init
1823 tooltip tip.innerHTML per mousemove rate-limited by hover only

dashboard.ts

Line Site Fires per poll? Notes
275 _updateRunningMetricsfpsEl.innerHTML per running target live FPS pill — visible churn
293 _updateRunningMetricslabelEl.innerHTML (errors label) per running target rebuilt each poll
340 _updateAutomationsInPlacebtn.innerHTML only on enable/disable change low frequency
366 _updateSyncClocksInPlacebtn.innerHTML per poll for every clock wasteful
975 loadDashboard first-load → container.innerHTML once per init fine
989 loadDashboard slow path → dynamic.innerHTML = dynamicHtml only when slow path fires the big swap, scoped already
1010 loadDashboard error path rare fine
1416 subscribeDashboardLayout clear rare fine

What this list tells us

  • The remaining innerHTML writes are per-cell value updates that paint formatted spans ({value}<span class="perf-fps-unit">fps</span>). Each rewrite destroys two text nodes + a span every poll across ~10 cells. Not flickering today; will flicker the moment anyone adds an animation to .perf-fps-unit or .perf-fps-ceiling.
  • The pattern can be killed without architectural change by splitting these into a stable structure (number text node + static unit span) and only updating textContent of the number. That's what L3 / Lit would force naturally.

3.5 Beyond dashboard/perf — push-driven reconciliation

Added 2026-05-27. The §3 audit was scoped to the two poll timers we were debugging. Widening the \.innerHTML\s*= search showed the bug class has a second driver and lives outside dashboard/perf too.

Two drivers, not one

The teardown is triggered by anything that re-renders without user intent:

  • Poll timers (setInterval) — what §2/§3 covered (dashboard.ts _uptimeTimer + main refresh, perf-charts.ts _pollTimer).
  • Server-pushed server:* eventscore/events-ws.ts turns each WS message into a server:* CustomEvent; feature modules listen and re-render through the same innerHTML paths.

So the one-line bug class in §1 reads "poll- or push-driven," not just poll.

Genuinely-affected sites outside dashboard/perf

Site Driver Shape Notes
core/entity-events.ts _invalidateAndReload push (server:entity_changed, server:device_health_changed) full-tab rebuild via loadTargetsTab / loadPictureSources / loadAutomations / loadIntegrations highest blast radius. A single pushed entity change tears down and rebuilds an entire tab — losing scroll, focus, open inline editors, restarting card-enter animations.
features/game-integration.ts event feed (_eventMonitorTimer) poll (2 s) feed.innerHTML = events.slice(0,20).map(...) full 20-item list rebuild every 2 s while the panel is open.
features/game-integration.ts connection test (_connectionTestTimer) poll panel.innerHTML = … per tick transient, low frequency.

entity-events.ts already has the L1 floor applied by hand: a 600 ms debounce plus a diff check (oldData === newData, then length + id + updated_at compare) that skips the reload when nothing changed. That kills the no-op case — but a real change still does the full-tab teardown. This is exactly the §4-L1 limitation ("still tears down when content does differ"), live across the whole app.

Counter-examples that already do it right

Two poll loops never flicker because they mutate textContent on a stable structure instead of rewriting innerHTML:

  • core/api.ts loadServerInfo (connection-check poll) — versionEl.textContent / statusEl.textContent.
  • features/color-strips/test.ts FPS sampler (1 s) — valueEl.textContent / avgEl.textContent.

These are live proof that "stable structure + mutate text node" is the fix — i.e. what L3 / Lit force by construction.

What this changes about the plan

The §4 ladder was reasoned entirely around per-cell rendering, because that was the visible flicker. The push-driven finding surfaces a second, qualitatively different problem:

  • Problem A — cell value churn: every poll, one value span. Loud only with animations. Mostly fixed in f6486f9. → wants setText / skip-if-unchanged.
  • Problem B — list/tab teardown: on change/push, an entire list or tab. Loses scroll/focus/open editors. Unaddressed. entity-events.ts and the game feed are Problem B. → wants keyed list reconciliation.

Problem B is a list-level concern, not a cell-level one. In Lit terms it maps to a keyed repeat() directive over the tab/list body — the dashboard-card work in Phase 2 already needs this, but entity-events.ts needs it for tabs that §5 used to list as "out of scope." This does not change the chosen direction (Lit); it adds entity-events.ts as a first-class, high-priority target.


4. Decision ladder

Walked through with the user 2026-05-26. Captured here so we don't re-litigate.

L1 — drop-in setInnerHtmlIfChanged helper

  • Shape: WeakMap<Element, string> cache; replace every el.innerHTML = x with setInnerHtmlIfChanged(el, x).
  • Wins: stops the no-change rewrites globally; zero behavior risk; ~30 call-site changes.
  • Misses: still tears down DOM when content does differ (e.g. FPS row values change every tick); doesn't preserve focus/transition state inside a list.
  • Verdict: floor, not ceiling. Worth doing for cells that don't get migrated to L3/Lit.

L2 — lint guard

  • Shape: pre-commit script greps \.innerHTML\s*= in static/js/ outside an allowlist, fails the commit.
  • Wins: keeps the discipline; cheap.
  • Misses: only useful as a pair with L1+; bare guard with no helper makes contributors angry.
  • Verdict: pair with whatever helper we land on.

L3 — hand-rolled cell-component pattern

  • Shape: defineCell({ html, refs, mount, update, unmount }) + reconcileList(host, items, binding) + setText/setClass/setAttr mutators. ~150300 lines of runtime.
  • Wins: correct by construction; no dependencies; explicit about what mutates; composes with existing customize panel / color picker.
  • Misses: we own the abstraction — it grows over time as we need transitions, async data, focus, devtools, error boundaries. Death by a thousand features.
  • Verdict: second-best. Strong contender if zero-deps is a hard constraint.
  • Shape: convert each perf cell + each dashboard card cell to a Lit web component. Use html\${value}`` tagged-template + targeted diff. ~5KB gzip added to bundle, no new build step (esbuild handles it).
  • Wins: solves the bug class by design; maintained by Google + community; web-components-based so no framework lock-in; composes with vanilla DOM trivially; mental model is close to current template-string idiom; non-polling code can stay vanilla forever.
  • Misses: introduces a dependency; contributors learn one more thing; rare edge cases (@html-equivalent exists and reintroduces the bug if misused).
  • Verdict: best ceiling-to-cost ratio for a small team. Recommended.

Full framework rewrite (React / Vue / Solid)

  • Verdict: overkill. The bug class lives in polling paths; the rest of the app is fine. Spending the migration budget on rebuilding IconSelect / EntitySelect / modals / customize panel / graph editor — none of which are broken — is a bad trade.

5. Recommendation

Lit for the polling-heavy modules.

Migration plan:

Phase 0 — spike (2-hour time-box)

  • Convert patches cell to a Lit component, end to end.
  • Verify it plays nicely with: color picker integration, customize panel layout reorder, rerenderPerfGrid reconciliation, setPerfMode toggle, hidden-by-env state, the spark tooltip handler.
  • If any of those break in an unfixable way → pivot to L3.
  • If they work → commit to the migration.

Phase 1 — perf-charts cells

  1. patches (already spiked)
  2. devices
  3. fps / capture_fps / capture_fps_actual (share a sparkline base class)
  4. cpu / ram / gpu / temp (share _sparkCardHtml template family)
  5. network / device_latency / send_timing / errors

Each is its own PR, dashboard stays working at every step. renderPerfSection becomes a registry of Lit components; rerenderPerfGrid becomes "reorder existing elements in the grid" (which it mostly already does).

Phase 2 — dashboard card cells

  1. Output target cards (running variant — biggest payoff, has live FPS + uptime + errors)
  2. Output target cards (stopped variant)
  3. Sync clock cards
  4. Automation cards
  5. Integration (HA / MQTT) cards

These get bigger wins from the migration because they have nested mutable state (FPS pill, errors cell, health dot, action button) that's currently rebuilt per poll via the _updateRunningMetrics path.

Highest-impact: entity-events.ts tab reconciliation (sequence early)

entity-events.ts (§3.5) is the single highest-blast-radius site and is not on the dashboard — it re-renders the Targets / Integrations / Automations tabs on server push. Whether or not those tabs' cells become Lit components, the loader path (loadTargetsTab / loadIntegrations / loadAutomations) should switch from a full innerHTML rebuild to a keyed list reconcile (a Lit repeat() over the tab body). This preserves scroll / focus / open inline editors across pushes. If the goal is "biggest UX win first" rather than "lowest-risk first," sequence this ahead of Phase 2.

Phase 3 — stopgap helper for the rest

Add setInnerHtmlIfChanged and apply to any remaining vanilla polling sites we don't plan to migrate. Add the L2 lint guard at this point — by now everything that polls is either Lit-managed or uses the helper.

Out of scope (deliberately) — with one correction (2026-05-27)

  • Targets tab, automations editor, integrations, scene presets — these render on-demand, but they are ALSO re-rendered on server push via entity-events.ts (see §3.5). The original claim that "the bug class doesn't bite them" was wrong: a pushed server:entity_changed does a full-tab innerHTML teardown. The editor / on-demand views can stay vanilla, but the list/tab render that entity-events triggers needs reconciliation (a keyed list diff) regardless of whether those cells become Lit components. Treat the entity-events reload path as in-scope — it is the highest-blast-radius Problem B site.
  • Color strips editor, graph editor, settings — genuinely on-demand, no push re-render path, stay vanilla.
  • Transport bar cells (CPU/Mem chip in the top bar) — read from the same perf payload, can be migrated opportunistically but not urgent.

6. Open questions to settle before committing

These came up during the discussion and weren't resolved:

  1. Bundle-size budget. Is +5KB acceptable? Current bundle is 2.7MB so this is noise — but worth confirming there isn't a strict cap (e.g. for slow networks / Android Chaquopy embed).
  2. Contributor model. If the project will grow to multiple contributors, Lit's smaller community vs React's is a recruiting tradeoff. Currently solo-ish, so probably moot.
  3. Android TV target. Chaquopy embed serves the same bundle. Lit works fine in any modern browser — Android TV WebView is Chromium-based. Should be a no-op but verify in Phase 0 spike.
  4. Long-term framework intent. If there's a chance we ever migrate to React/Vue/Solid for the rest of the app, doing Lit now is not lock-in (web components are standard), but it does add a second mental model. Probably fine; just naming the tradeoff.
  5. Customize panel. The drag-reorder code in dashboard-customize.ts mutates .dashboard-section DOM directly. Lit components reorder cleanly via moveBefore / insertBefore since they're just elements, but the dnd library needs to treat them as opaque drag handles. Phase 0 spike should confirm.

7. Pointers

  • Source files most relevant:
    • server/src/ledgrab/static/js/features/dashboard.ts
    • server/src/ledgrab/static/js/features/perf-charts.ts
    • server/src/ledgrab/static/js/features/dashboard-layout.ts (cell ordering + visibility)
    • server/src/ledgrab/static/js/features/dashboard-customize.ts (drag-reorder UI)
    • server/src/ledgrab/static/js/core/card-modes.ts (mode toggle that hangs off section headers)
    • server/src/ledgrab/static/js/core/entity-events.ts (push-driven tab reloads — §3.5, highest blast radius)
    • server/src/ledgrab/static/js/core/events-ws.ts (WS → server:* CustomEvent dispatch)
    • server/src/ledgrab/static/js/features/game-integration.ts (2 s event-feed list rebuild — §3.5)
  • Most recent reconciliation commit: f6486f9.
  • Related skill files in ~/.claude/skills/: frontend-patterns, documentation-lookup (for Lit docs via Context7).
  • Locale convention: perf.* for cross-card primitives, dashboard.perf.* for cell titles.

8. If this doc gets stale

If you read this and the perf cells are already Lit components — delete this file. If you read this and there's a new flicker / focus / transition bug nobody can explain — search for \.innerHTML\s*= in static/js/features/ and static/js/core/ (entity-events lives in core) and you've probably found it. For state loss on a server event (scroll jump, focus drop, an inline editor closing itself), look at the server:* listeners in core/entity-events.ts first.