Operability: - Correlation IDs end-to-end: shared dispatch_id between log lines and EventLog rows (event/watcher/scheduled/deferred/action/HA/command paths) and a new X-Request-Id middleware that normalizes inbound ids and binds request_id into log context. - dispatch_summary block merged into EventLog.details: per-target success/failure counts plus Telegram media delivered/skipped/failed and truncated error lists, so partial outcomes surface in the UI. - Diagnostic mode: admin can flip one module to DEBUG for a bounded window with auto-revert (in-memory only; setup_logging() resets on boot, lifespan reverts on shutdown). New /diagnostic-mode endpoints plus DiagnosticsCassette UI on the settings page. Telegram: - Per-receiver options: disable_notification (silent send) and message_thread_id (forum-topic routing), wired through the dispatcher via a ContextVar so all four send sites (sendMessage / sendPhoto-Video- Document / sendMediaGroup / cache-hit POST) pick them up. - send_large_videos_as_documents target setting: bypass the 50 MB sendVideo cap by falling back to sendDocument for oversized videos. - sendMediaGroup byte-budget enforcement (TELEGRAM_MAX_GROUP_TOTAL_BYTES, 45 MB) with per-item fallback on chunk failure so a stale file_id no longer silently drops a cached asset. Tests: - New: diagnostic_mode, dispatch_summary, request_correlation, telegram_media_group_partial, telegram_per_send_options. Docs: - .claude/reviews/: six-axis production-readiness review of v0.8.1. - .claude/docs/functional-review-2026-05-28.md: focused review of Telegram/Immich/logging subsystems.
Production-Readiness Review — service-to-notification-bridge v0.8.1
Date: 2026-05-22 Scope: entire codebase (~70k LOC, 312 files)
Branch: master @ a20635a Reviewers: 6 parallel specialised agents
Verdict
Ship-readiness: nearly there. The product is in materially better shape than a typical pre-1.0 — every security baseline is in place (sandboxed Jinja2, bcrypt+JWT, SSRF guard with DNS-rebinding mitigation, secret masking, signed webhooks, non-root Docker, owner-scoped queries) and the feature set is mature (deferred dispatch, quiet hours, fan-out caps, 429 backoff, Prometheus metrics). No CRITICAL security findings exist.
The work that should block shipping to wider users is concentrated in three buckets: (1) a handful of correctness defects that surface only under load or restart (duplicate-send class), (2) two secret-handling gaps (HA token returned cleartext, bot tokens/SMTP passwords unencrypted at rest), and (3) the schema-management story (create_all on boot + 1880-line hand-rolled migration script with no Alembic).
Reports
| Axis | File | Findings | Top hit |
|---|---|---|---|
| Backend (Python) | backend-review.md | 5C / 15H / 18M / 10L | asyncio.create_task GC in HA status logger |
| Frontend (TS/Svelte) | frontend-review.md | 2C / 10H / 19M / 7L | JWT access+refresh in localStorage |
| Security | security-review.md | 0C / 2H / multiple M | HA access_token not masked on GET /providers/{id} |
| Performance + DB | performance-db-review.md | 3C / 7H / 10M / 10L | SQLModel.metadata.create_all on every boot |
| Bugs + features | bugs-features-review.md | 3C / 13H / 12M / 3L + 25 features | Webhook redelivery has no idempotency |
| UI/UX | ui-ux-review.md | ~33 across 13 axes | Five overlapping glass-card abstractions |
Ship blockers (must fix before wider rollout)
Cross-cutting top 12 — verified across all six reviews:
- HA
access_tokenreturned in plaintext onGET /api/providers/{id}— not in mask list. (Security H-1, providers.py:399-405) - Secrets unencrypted at rest — Telegram bot tokens, SMTP passwords, HA tokens, webhook secrets stored as plain text in SQLite. Disk/snapshot/backup theft = full credential set. (Security H-2)
- Frontend JWT access + refresh in
localStorage— any future XSS exfiltrates the session in one call. Move to httpOnly cookie. (Frontend C-1) asyncio.create_taskfire-and-forget inha_subscription._on_status_change— task may be GC'd before completion. (Backend C-1, ha_subscription.py:249)- Pre-auth 1 MiB body read on Gitea + generic webhooks — DoS amplifier. Verify
X-Hub-Signaturebefore reading body. (Backend C-3, webhooks.py:167 + 449) - No webhook idempotency — Gitea/Planka/generic don't dedupe by
X-Gitea-Delivery/ equivalent. Replays = duplicate sends. (Bugs C-1) - Deferred-dispatch crash window —
dispatch()returns beforesession.commit(); restart re-fires. Wrap in idempotent "claim → send → ack" with a unique constraint. (Bugs C-2) - Telegram
_last_update_idin-memory only — restart can replay or skip commands. Persist watermark. (Bugs C-3) init_dbcallsSQLModel.metadata.create_allon every boot — causes schema drift between fresh and upgraded installs. Adopt Alembic. (Perf C-1)- Template-preview endpoints bypass sandbox timeout — authenticated user can wedge a worker with
{% for i in range(10**8) %}. (Security M-1) - Telegram webhook handler missing
session.rollback()in catch-all — leaves uncommitted writes. (Backend C-2, commands/webhook.py:162) - CLAUDE.md rule-8 violation —
if (provider.type !== 'immich')inRuleEditor.sveltesilently disables people/album picker for other providers. (Frontend C-2, RuleEditor.svelte:57)
Next-tier priorities (HIGH — fix in the same release where practical)
- Audit
backup_schema.PROVIDER_SECRET_FIELDSsowebhook_secret,password,client_secret,refresh_tokenare scrubbed on export. (Backend C-5) - Add
asyncio.Lockaroundbridge_selffailure-counter dicts. (Backend C-4) - Login rate-limit is per-IP only — slow rotated-source brute force succeeds. Add per-account lockout + raise password floor. (Security M-2)
- Three frontend CRUD pages copy cache items into local
$state, breaking the shared-cache invariant and forcing a full refetch per mutation. (Frontend H-1/H-2) - Uncancelled
setTimeoutchain in backup restart flow canwindow.location.reload()after navigation. (Frontend H-5) - Refresh-token race against
logout()produces spurious "Unauthorized" toasts. (Frontend H-6/H-7) - Dashboard per-provider GROUP-BY aggregate runs unbounded on every refresh, no caching, no covering index. (Perf H-1/H-2)
- Truncation/parse-mode escaping for Telegram (HTML-aware truncate,
_extract_retry_afterfractional seconds, forummessage_thread_idrouting, 403 "bot blocked" auto-disable). (Bugs H-various) - Five overlapping glass-card abstractions + radius drift (22/18/14/12 px) + ~71 legacy
rounded-md text-sm bg-…form inputs that bypass the global Aurorainput{}rule. (UI/UX H-CONSIST-01..04) - Hardcoded hex colors (
#059669,#ef4444) in Snackbar/ConfirmModal/actions — bypasses theming. (UI/UX H-CONSIST-03) - Snackbar has no
aria-live; nav lacksaria-current="page"— invisible to screen readers. (UI/UX H-NAV-01, A11y) - DST handling in overnight quiet-hours windows. (Bugs H)
What's working well — keep doing this
- Sandboxed Jinja2 everywhere (security agent verified every
Environment()instantiation isSandboxedEnvironment). PinnedResolverSSRF defence — handles CGNAT, IPv4-mapped IPv6, DNS rebinding.- JWT with
token_versionrevocation — bcrypt offloaded to worker thread, constant-time username probe. - Hardened Docker — non-root, read-only root FS,
cap_drop: ALL. - Aurora/Glass design identity — distinctive (conic-gradient orb, Newsreader italic display serif, lavender/orchid palette, "signal stream"/"on watch"/"wires"/"pulse" editorial labels). Not generic AI admin work.
- Frontend type discipline —
svelte-checkclean, EN/RU exactly 1466 keys each, noeval/innerHTML/var/==anywhere. - Most SQL hot paths already batched —
load_link_datais fully fan-in/fan-out; partial unique indexes on deferred-dispatch are thoughtful. - Most v0.8.1 production-readiness items shipped — fan-out caps, 429 backoff, parse_mode fallback, scheduler misfire grace, Prometheus, deep healthcheck, per-receiver render cache.
Top missing features worth adding next
Pulled from the bugs-features report — full pitches in bugs-features-review.md:
- Template playground — "send test against last event" + dry-run with sample payload.
- Template versioning + rollback with audit log.
- Bulk operations on targets/templates (currently row-by-row).
- User-side snooze/mute via bot command ("/mute 2h", "/snooze tonight").
- Auto-disable receiver on Telegram 403 ("bot blocked") with admin notification.
- Rate-limit per target (separate from global fan-out cap).
- Weekly digest + per-target stats + per-provider error rate.
- Generic webhook provider and email / Discord / ntfy.sh / Matrix channels.
- Message dedup window (kills duplicate sends from redelivery and scheduler misfires).
- First-run "Getting Started" checklist on empty dashboard (UI/UX).
How to consume this review
Each report has clickable file:line markdown links. Recommended sequence:
- Read this
README.md. - Skim each report's Executive Summary (top 5-7 bullets).
- Triage the Ship blockers (1-12) above into the next release branch as individual issues.
- Schedule the HIGH list (13-24) for the release after.
- Treat the feature ideas as a refresh of
.claude/docs/feature-backlog.md.