Files
notify-bridge/.claude/reviews
alexei.dolgolyov 6a8f374678 feat: observability, per-receiver Telegram options, oversized-video fallback
Operability:
- Correlation IDs end-to-end: shared dispatch_id between log lines and
  EventLog rows (event/watcher/scheduled/deferred/action/HA/command paths)
  and a new X-Request-Id middleware that normalizes inbound ids and binds
  request_id into log context.
- dispatch_summary block merged into EventLog.details: per-target
  success/failure counts plus Telegram media delivered/skipped/failed and
  truncated error lists, so partial outcomes surface in the UI.
- Diagnostic mode: admin can flip one module to DEBUG for a bounded
  window with auto-revert (in-memory only; setup_logging() resets on
  boot, lifespan reverts on shutdown). New /diagnostic-mode endpoints
  plus DiagnosticsCassette UI on the settings page.

Telegram:
- Per-receiver options: disable_notification (silent send) and
  message_thread_id (forum-topic routing), wired through the dispatcher
  via a ContextVar so all four send sites (sendMessage / sendPhoto-Video-
  Document / sendMediaGroup / cache-hit POST) pick them up.
- send_large_videos_as_documents target setting: bypass the 50 MB
  sendVideo cap by falling back to sendDocument for oversized videos.
- sendMediaGroup byte-budget enforcement (TELEGRAM_MAX_GROUP_TOTAL_BYTES,
  45 MB) with per-item fallback on chunk failure so a stale file_id no
  longer silently drops a cached asset.

Tests:
- New: diagnostic_mode, dispatch_summary, request_correlation,
  telegram_media_group_partial, telegram_per_send_options.

Docs:
- .claude/reviews/: six-axis production-readiness review of v0.8.1.
- .claude/docs/functional-review-2026-05-28.md: focused review of
  Telegram/Immich/logging subsystems.
2026-05-28 15:19:31 +03:00
..

Production-Readiness Review — service-to-notification-bridge v0.8.1

Date: 2026-05-22 Scope: entire codebase (~70k LOC, 312 files) Branch: master @ a20635a Reviewers: 6 parallel specialised agents

Verdict

Ship-readiness: nearly there. The product is in materially better shape than a typical pre-1.0 — every security baseline is in place (sandboxed Jinja2, bcrypt+JWT, SSRF guard with DNS-rebinding mitigation, secret masking, signed webhooks, non-root Docker, owner-scoped queries) and the feature set is mature (deferred dispatch, quiet hours, fan-out caps, 429 backoff, Prometheus metrics). No CRITICAL security findings exist.

The work that should block shipping to wider users is concentrated in three buckets: (1) a handful of correctness defects that surface only under load or restart (duplicate-send class), (2) two secret-handling gaps (HA token returned cleartext, bot tokens/SMTP passwords unencrypted at rest), and (3) the schema-management story (create_all on boot + 1880-line hand-rolled migration script with no Alembic).

Reports

Axis File Findings Top hit
Backend (Python) backend-review.md 5C / 15H / 18M / 10L asyncio.create_task GC in HA status logger
Frontend (TS/Svelte) frontend-review.md 2C / 10H / 19M / 7L JWT access+refresh in localStorage
Security security-review.md 0C / 2H / multiple M HA access_token not masked on GET /providers/{id}
Performance + DB performance-db-review.md 3C / 7H / 10M / 10L SQLModel.metadata.create_all on every boot
Bugs + features bugs-features-review.md 3C / 13H / 12M / 3L + 25 features Webhook redelivery has no idempotency
UI/UX ui-ux-review.md ~33 across 13 axes Five overlapping glass-card abstractions

Ship blockers (must fix before wider rollout)

Cross-cutting top 12 — verified across all six reviews:

  1. HA access_token returned in plaintext on GET /api/providers/{id} — not in mask list. (Security H-1, providers.py:399-405)
  2. Secrets unencrypted at rest — Telegram bot tokens, SMTP passwords, HA tokens, webhook secrets stored as plain text in SQLite. Disk/snapshot/backup theft = full credential set. (Security H-2)
  3. Frontend JWT access + refresh in localStorage — any future XSS exfiltrates the session in one call. Move to httpOnly cookie. (Frontend C-1)
  4. asyncio.create_task fire-and-forget in ha_subscription._on_status_change — task may be GC'd before completion. (Backend C-1, ha_subscription.py:249)
  5. Pre-auth 1 MiB body read on Gitea + generic webhooks — DoS amplifier. Verify X-Hub-Signature before reading body. (Backend C-3, webhooks.py:167 + 449)
  6. No webhook idempotency — Gitea/Planka/generic don't dedupe by X-Gitea-Delivery / equivalent. Replays = duplicate sends. (Bugs C-1)
  7. Deferred-dispatch crash windowdispatch() returns before session.commit(); restart re-fires. Wrap in idempotent "claim → send → ack" with a unique constraint. (Bugs C-2)
  8. Telegram _last_update_id in-memory only — restart can replay or skip commands. Persist watermark. (Bugs C-3)
  9. init_db calls SQLModel.metadata.create_all on every boot — causes schema drift between fresh and upgraded installs. Adopt Alembic. (Perf C-1)
  10. Template-preview endpoints bypass sandbox timeout — authenticated user can wedge a worker with {% for i in range(10**8) %}. (Security M-1)
  11. Telegram webhook handler missing session.rollback() in catch-all — leaves uncommitted writes. (Backend C-2, commands/webhook.py:162)
  12. CLAUDE.md rule-8 violationif (provider.type !== 'immich') in RuleEditor.svelte silently disables people/album picker for other providers. (Frontend C-2, RuleEditor.svelte:57)

Next-tier priorities (HIGH — fix in the same release where practical)

  1. Audit backup_schema.PROVIDER_SECRET_FIELDS so webhook_secret, password, client_secret, refresh_token are scrubbed on export. (Backend C-5)
  2. Add asyncio.Lock around bridge_self failure-counter dicts. (Backend C-4)
  3. Login rate-limit is per-IP only — slow rotated-source brute force succeeds. Add per-account lockout + raise password floor. (Security M-2)
  4. Three frontend CRUD pages copy cache items into local $state, breaking the shared-cache invariant and forcing a full refetch per mutation. (Frontend H-1/H-2)
  5. Uncancelled setTimeout chain in backup restart flow can window.location.reload() after navigation. (Frontend H-5)
  6. Refresh-token race against logout() produces spurious "Unauthorized" toasts. (Frontend H-6/H-7)
  7. Dashboard per-provider GROUP-BY aggregate runs unbounded on every refresh, no caching, no covering index. (Perf H-1/H-2)
  8. Truncation/parse-mode escaping for Telegram (HTML-aware truncate, _extract_retry_after fractional seconds, forum message_thread_id routing, 403 "bot blocked" auto-disable). (Bugs H-various)
  9. Five overlapping glass-card abstractions + radius drift (22/18/14/12 px) + ~71 legacy rounded-md text-sm bg-… form inputs that bypass the global Aurora input{} rule. (UI/UX H-CONSIST-01..04)
  10. Hardcoded hex colors (#059669, #ef4444) in Snackbar/ConfirmModal/actions — bypasses theming. (UI/UX H-CONSIST-03)
  11. Snackbar has no aria-live; nav lacks aria-current="page" — invisible to screen readers. (UI/UX H-NAV-01, A11y)
  12. DST handling in overnight quiet-hours windows. (Bugs H)

What's working well — keep doing this

  • Sandboxed Jinja2 everywhere (security agent verified every Environment() instantiation is SandboxedEnvironment).
  • PinnedResolver SSRF defence — handles CGNAT, IPv4-mapped IPv6, DNS rebinding.
  • JWT with token_version revocation — bcrypt offloaded to worker thread, constant-time username probe.
  • Hardened Docker — non-root, read-only root FS, cap_drop: ALL.
  • Aurora/Glass design identity — distinctive (conic-gradient orb, Newsreader italic display serif, lavender/orchid palette, "signal stream"/"on watch"/"wires"/"pulse" editorial labels). Not generic AI admin work.
  • Frontend type disciplinesvelte-check clean, EN/RU exactly 1466 keys each, no eval/innerHTML/var/== anywhere.
  • Most SQL hot paths already batchedload_link_data is fully fan-in/fan-out; partial unique indexes on deferred-dispatch are thoughtful.
  • Most v0.8.1 production-readiness items shipped — fan-out caps, 429 backoff, parse_mode fallback, scheduler misfire grace, Prometheus, deep healthcheck, per-receiver render cache.

Top missing features worth adding next

Pulled from the bugs-features report — full pitches in bugs-features-review.md:

  • Template playground — "send test against last event" + dry-run with sample payload.
  • Template versioning + rollback with audit log.
  • Bulk operations on targets/templates (currently row-by-row).
  • User-side snooze/mute via bot command ("/mute 2h", "/snooze tonight").
  • Auto-disable receiver on Telegram 403 ("bot blocked") with admin notification.
  • Rate-limit per target (separate from global fan-out cap).
  • Weekly digest + per-target stats + per-provider error rate.
  • Generic webhook provider and email / Discord / ntfy.sh / Matrix channels.
  • Message dedup window (kills duplicate sends from redelivery and scheduler misfires).
  • First-run "Getting Started" checklist on empty dashboard (UI/UX).

How to consume this review

Each report has clickable file:line markdown links. Recommended sequence:

  1. Read this README.md.
  2. Skim each report's Executive Summary (top 5-7 bullets).
  3. Triage the Ship blockers (1-12) above into the next release branch as individual issues.
  4. Schedule the HIGH list (13-24) for the release after.
  5. Treat the feature ideas as a refresh of .claude/docs/feature-backlog.md.