# Production-Readiness Review — service-to-notification-bridge v0.8.1 **Date:** 2026-05-22 **Scope:** entire codebase (~70k LOC, 312 files) **Branch:** master @ a20635a **Reviewers:** 6 parallel specialised agents ## Verdict **Ship-readiness: nearly there.** The product is in materially better shape than a typical pre-1.0 — every security baseline is in place (sandboxed Jinja2, bcrypt+JWT, SSRF guard with DNS-rebinding mitigation, secret masking, signed webhooks, non-root Docker, owner-scoped queries) and the feature set is mature (deferred dispatch, quiet hours, fan-out caps, 429 backoff, Prometheus metrics). No CRITICAL security findings exist. The work that *should* block shipping to wider users is concentrated in **three buckets**: (1) a handful of correctness defects that surface only under load or restart (duplicate-send class), (2) two secret-handling gaps (HA token returned cleartext, bot tokens/SMTP passwords unencrypted at rest), and (3) the schema-management story (`create_all` on boot + 1880-line hand-rolled migration script with no Alembic). ## Reports | Axis | File | Findings | Top hit | |---|---|---|---| | Backend (Python) | [backend-review.md](backend-review.md) | 5C / 15H / 18M / 10L | `asyncio.create_task` GC in HA status logger | | Frontend (TS/Svelte) | [frontend-review.md](frontend-review.md) | 2C / 10H / 19M / 7L | JWT access+refresh in `localStorage` | | Security | [security-review.md](security-review.md) | 0C / 2H / multiple M | HA `access_token` not masked on `GET /providers/{id}` | | Performance + DB | [performance-db-review.md](performance-db-review.md) | 3C / 7H / 10M / 10L | `SQLModel.metadata.create_all` on every boot | | Bugs + features | [bugs-features-review.md](bugs-features-review.md) | 3C / 13H / 12M / 3L + 25 features | Webhook redelivery has no idempotency | | UI/UX | [ui-ux-review.md](ui-ux-review.md) | ~33 across 13 axes | Five overlapping glass-card abstractions | ## Ship blockers (must fix before wider rollout) Cross-cutting top 12 — verified across all six reviews: 1. **HA `access_token` returned in plaintext** on `GET /api/providers/{id}` — not in mask list. *(Security H-1, [providers.py:399-405](packages/server/src/notify_bridge_server/api/providers.py#L399))* 2. **Secrets unencrypted at rest** — Telegram bot tokens, SMTP passwords, HA tokens, webhook secrets stored as plain text in SQLite. Disk/snapshot/backup theft = full credential set. *(Security H-2)* 3. **Frontend JWT access + refresh in `localStorage`** — any future XSS exfiltrates the session in one call. Move to httpOnly cookie. *(Frontend C-1)* 4. **`asyncio.create_task` fire-and-forget** in `ha_subscription._on_status_change` — task may be GC'd before completion. *(Backend C-1, [ha_subscription.py:249](packages/server/src/notify_bridge_server/services/ha_subscription.py#L249))* 5. **Pre-auth 1 MiB body read** on Gitea + generic webhooks — DoS amplifier. Verify `X-Hub-Signature` before reading body. *(Backend C-3, [webhooks.py:167](packages/server/src/notify_bridge_server/api/webhooks.py#L167) + 449)* 6. **No webhook idempotency** — Gitea/Planka/generic don't dedupe by `X-Gitea-Delivery` / equivalent. Replays = duplicate sends. *(Bugs C-1)* 7. **Deferred-dispatch crash window** — `dispatch()` returns before `session.commit()`; restart re-fires. Wrap in idempotent "claim → send → ack" with a unique constraint. *(Bugs C-2)* 8. **Telegram `_last_update_id` in-memory only** — restart can replay or skip commands. Persist watermark. *(Bugs C-3)* 9. **`init_db` calls `SQLModel.metadata.create_all` on every boot** — causes schema drift between fresh and upgraded installs. Adopt Alembic. *(Perf C-1)* 10. **Template-preview endpoints bypass sandbox timeout** — authenticated user can wedge a worker with `{% for i in range(10**8) %}`. *(Security M-1)* 11. **Telegram webhook handler missing `session.rollback()`** in catch-all — leaves uncommitted writes. *(Backend C-2, [commands/webhook.py:162](packages/server/src/notify_bridge_server/commands/webhook.py#L162))* 12. **CLAUDE.md rule-8 violation** — `if (provider.type !== 'immich')` in `RuleEditor.svelte` silently disables people/album picker for other providers. *(Frontend C-2, [RuleEditor.svelte:57](frontend/src/routes/actions/RuleEditor.svelte#L57))* ## Next-tier priorities (HIGH — fix in the same release where practical) 13. Audit `backup_schema.PROVIDER_SECRET_FIELDS` so `webhook_secret`, `password`, `client_secret`, `refresh_token` are scrubbed on export. *(Backend C-5)* 14. Add `asyncio.Lock` around `bridge_self` failure-counter dicts. *(Backend C-4)* 15. Login rate-limit is per-IP only — slow rotated-source brute force succeeds. Add per-account lockout + raise password floor. *(Security M-2)* 16. Three frontend CRUD pages copy cache items into local `$state`, breaking the shared-cache invariant and forcing a full refetch per mutation. *(Frontend H-1/H-2)* 17. Uncancelled `setTimeout` chain in backup restart flow can `window.location.reload()` after navigation. *(Frontend H-5)* 18. Refresh-token race against `logout()` produces spurious "Unauthorized" toasts. *(Frontend H-6/H-7)* 19. Dashboard per-provider GROUP-BY aggregate runs unbounded on every refresh, no caching, no covering index. *(Perf H-1/H-2)* 20. Truncation/parse-mode escaping for Telegram (HTML-aware truncate, `_extract_retry_after` fractional seconds, forum `message_thread_id` routing, 403 "bot blocked" auto-disable). *(Bugs H-various)* 21. Five overlapping glass-card abstractions + radius drift (22/18/14/12 px) + ~71 legacy `rounded-md text-sm bg-…` form inputs that bypass the global Aurora `input{}` rule. *(UI/UX H-CONSIST-01..04)* 22. Hardcoded hex colors (`#059669`, `#ef4444`) in Snackbar/ConfirmModal/actions — bypasses theming. *(UI/UX H-CONSIST-03)* 23. Snackbar has no `aria-live`; nav lacks `aria-current="page"` — invisible to screen readers. *(UI/UX H-NAV-01, A11y)* 24. DST handling in overnight quiet-hours windows. *(Bugs H)* ## What's working well — keep doing this - **Sandboxed Jinja2 everywhere** (security agent verified every `Environment()` instantiation is `SandboxedEnvironment`). - **`PinnedResolver` SSRF defence** — handles CGNAT, IPv4-mapped IPv6, DNS rebinding. - **JWT with `token_version` revocation** — bcrypt offloaded to worker thread, constant-time username probe. - **Hardened Docker** — non-root, read-only root FS, `cap_drop: ALL`. - **Aurora/Glass design identity** — distinctive (conic-gradient orb, Newsreader italic display serif, lavender/orchid palette, "signal stream"/"on watch"/"wires"/"pulse" editorial labels). Not generic AI admin work. - **Frontend type discipline** — `svelte-check` clean, EN/RU exactly 1466 keys each, no `eval`/`innerHTML`/`var`/`==` anywhere. - **Most SQL hot paths already batched** — `load_link_data` is fully fan-in/fan-out; partial unique indexes on deferred-dispatch are thoughtful. - **Most v0.8.1 production-readiness items shipped** — fan-out caps, 429 backoff, parse_mode fallback, scheduler misfire grace, Prometheus, deep healthcheck, per-receiver render cache. ## Top missing features worth adding next Pulled from the bugs-features report — full pitches in [bugs-features-review.md](bugs-features-review.md): - **Template playground** — "send test against last event" + dry-run with sample payload. - **Template versioning + rollback** with audit log. - **Bulk operations** on targets/templates (currently row-by-row). - **User-side snooze/mute via bot command** ("/mute 2h", "/snooze tonight"). - **Auto-disable receiver on Telegram 403 ("bot blocked")** with admin notification. - **Rate-limit per target** (separate from global fan-out cap). - **Weekly digest + per-target stats + per-provider error rate**. - **Generic webhook provider** and **email / Discord / ntfy.sh / Matrix** channels. - **Message dedup window** (kills duplicate sends from redelivery and scheduler misfires). - **First-run "Getting Started" checklist** on empty dashboard (UI/UX). ## How to consume this review Each report has clickable `file:line` markdown links. Recommended sequence: 1. Read this `README.md`. 2. Skim each report's Executive Summary (top 5-7 bullets). 3. Triage the **Ship blockers (1-12)** above into the next release branch as individual issues. 4. Schedule the **HIGH list (13-24)** for the release after. 5. Treat the feature ideas as a refresh of `.claude/docs/feature-backlog.md`.