feat: observability, per-receiver Telegram options, oversized-video fallback

Operability: - Correlation IDs end-to-end: shared dispatch_id between log lines and EventLog rows (event/watcher/scheduled/deferred/action/HA/command paths) and a new X-Request-Id middleware that normalizes inbound ids and binds request_id into log context. - dispatch_summary block merged into EventLog.details: per-target success/failure counts plus Telegram media delivered/skipped/failed and truncated error lists, so partial outcomes surface in the UI. - Diagnostic mode: admin can flip one module to DEBUG for a bounded window with auto-revert (in-memory only; setup_logging() resets on boot, lifespan reverts on shutdown). New /diagnostic-mode endpoints plus DiagnosticsCassette UI on the settings page. Telegram: - Per-receiver options: disable_notification (silent send) and message_thread_id (forum-topic routing), wired through the dispatcher via a ContextVar so all four send sites (sendMessage / sendPhoto-Video- Document / sendMediaGroup / cache-hit POST) pick them up. - send_large_videos_as_documents target setting: bypass the 50 MB sendVideo cap by falling back to sendDocument for oversized videos. - sendMediaGroup byte-budget enforcement (TELEGRAM_MAX_GROUP_TOTAL_BYTES, 45 MB) with per-item fallback on chunk failure so a stale file_id no longer silently drops a cached asset. Tests: - New: diagnostic_mode, dispatch_summary, request_correlation, telegram_media_group_partial, telegram_per_send_options. Docs: - .claude/reviews/: six-axis production-readiness review of v0.8.1. - .claude/docs/functional-review-2026-05-28.md: focused review of Telegram/Immich/logging subsystems.
2026-05-28 15:19:31 +03:00
parent 85a8f1e71c
commit 6a8f374678
39 changed files with 7239 additions and 142 deletions
@@ -0,0 +1,435 @@
+# Functional Review — Telegram, Immich, Logging (2026-05-28)
+
+Snapshot review of three subsystems, with prioritised improvement candidates.
+Pairs with [feature-backlog.md](feature-backlog.md) — items here are
+infrastructure that unlocks several backlog features.
+
+All citations are from the working tree at commit `85a8f1e` (master). Two
+files (`packages/core/src/notify_bridge_core/notifications/telegram/client.py`,
+`media.py`) had uncommitted changes at review time — see Telegram §
+"In-flight work".
+
+---
+
+## 1. Telegram infrastructure
+
+### Telegram — what works well
+
+- Single chokepoint `TelegramClient`
+  ([packages/core/src/notify_bridge_core/notifications/telegram/client.py](../../packages/core/src/notify_bridge_core/notifications/telegram/client.py))
+  covers text/photo/video/document/media-group, with 429-aware retry,
+  parse-error retry, file_id cache, multi-bot per-token instances,
+  polling + webhook modes, and bot-command registration.
+- CLAUDE.md rule #6 satisfied for the production paths.
+- Caption length, group sizing, parse-mode fallback all enforced.
+
+### In-flight work
+
+Byte-budget sub-chunking for media groups
+(`TELEGRAM_MAX_GROUP_TOTAL_BYTES` in
+[media.py](../../packages/core/src/notify_bridge_core/notifications/telegram/media.py))
+with per-item fallback inside `_send_media_group`. Logic is coherent;
+before commit, verify `_build_media_items` callers still match the new
+signature (caption no longer injected at fetch time).
+
+### Gaps, ranked by user-visible value
+
+1. **No inline keyboards / `callback_query` handlers** — zero infra for
+   "Favorite / Archive / Dismiss" buttons on Immich notifications.
+   Biggest UX unlock; prerequisite for several Immich smart actions.
+2. **No edit-in-place** (`editMessageText` not wrapped). Pairs naturally
+   with deferred dispatch / quiet hours coalescing — 5 separate
+   "asset added" messages become 1 edited message.
+3. **`disable_notification` (silent send) not exposed** — already a
+   Telegram primitive; slots into the quiet-hours `silent` mode the
+   backlog already mentions.
+4. **`message_thread_id` (forum topics)** — single field per receiver;
+   unblocks supergroup-with-topics users.
+5. **Direct `TelegramClient(...)` constructions** in
+   [api/telegram_bots.py:314,394,404,412](../../packages/server/src/notify_bridge_server/api/telegram_bots.py)
+   bypass `get_telegram_client()` — violates CLAUDE.md rule #6 and
+   skips the shared file_id cache.
+6. **Per-command authorization** — `commands_enabled` is all-or-nothing
+   per chat; no per-command allowlist or admin gate.
+7. **Long-message splitting** — `send_message` silently truncates at
+   4096 ([client.py:492](../../packages/core/src/notify_bridge_core/notifications/telegram/client.py)).
+8. **No parse-mode per target** — HTML hardcoded.
+
+---
+
+## 2. Immich
+
+### Immich — what works well
+
+- Mature polling pipeline: incremental delta-fetch via `updatedAfter`,
+  pending-asset tracking, fingerprint fast-path skip, fallback to full
+  fetch on count-decrease
+  ([providers/immich/provider.py](../../packages/core/src/notify_bridge_core/providers/immich/provider.py)).
+- Rich bot commands (status / albums / events / people / search / latest
+  / random / favorites / summary / memory) with full asset context
+  (CLAUDE.md rule #10 satisfied).
+- `auto_organize` action is well-shaped: AND person + smart-query union,
+  exclusions, type/date/favorite filters, 500-asset batched add,
+  idempotent diff against album asset_ids, dry-run, `ActionExecution`
+  log.
+- Three scheduled features wired: periodic summaries, scheduled-asset
+  delivery, Memory/On-This-Day (with native Immich memory API + fallback).
+
+### Highest-leverage candidates
+
+1. **Webhook ingestion** — `webhook_based=False` at
+   [capabilities.py:46](../../packages/core/src/notify_bridge_core/providers/capabilities.py).
+   Sub-second latency vs the current 5-min poll. New
+   `/api/webhooks/immich/{secret}` route + parser + capability flip.
+2. **Share-link expiry monitoring + auto-rotate action** — links
+   silently break today; data is already fetched per event
+   ([provider.py:541-569](../../packages/core/src/notify_bridge_core/providers/immich/provider.py)).
+3. **Duplicate cluster digest** — Immich >= 1.100 `/api/duplicates` is
+   unused; pairs with inline buttons for "merge / ignore 30d".
+4. **Auto-favorite by person** (already in backlog) — smallest delta on
+   the existing `auto_organize` executor.
+5. **Per-person notification subscription** — tracker-config filter,
+   reuses existing `asset.people` data.
+6. **Album auto-curation from Inbox** — date-based target album name,
+   move (not copy); needs the Immich move endpoint (currently we only
+   `add_assets_to_album`).
+7. **Storage / job-queue alerts** — `/api/server/stats` and `/api/jobs`
+   unused; lightweight poll + threshold = "disk full" / "transcoding
+   stalled" notifications.
+8. **Smart-action infra polish** — descriptors are reusable, but the
+   rule editor is JSON-shaped, action-run statistics aren't aggregated,
+   and dry-run shows counts not the asset list. Address before adding 5
+   more action types.
+
+---
+
+## 3. Logging
+
+### What's already in place
+
+In [logging_setup.py](../../packages/server/src/notify_bridge_server/logging_setup.py):
+
+- `dictConfig` with `JsonFormatter` (line-delimited JSON) toggleable via
+  `NOTIFY_BRIDGE_LOG_FORMAT=json`.
+- `SecretMaskingFilter` redacts Telegram bot tokens + Authorization /
+  api_key / password / refresh_token across `msg`, `exc_text`,
+  `stack_info`.
+- ContextVar-driven record factory injects `request_id`, `command`,
+  `chat_id`, `bot_id`, `dispatch_id` on every record. Text format:
+  `[req=- cmd=- bot=- chat=- disp=-]`.
+- Per-module overrides via `NOTIFY_BRIDGE_LOG_LEVELS` env or DB
+  `AppSetting`. Live runtime patch via `apply_log_levels()` — no
+  restart.
+- Noisy libs pre-quieted (sqlalchemy, aiohttp, apscheduler, urllib3,
+  asyncio, httpx, httpcore, PIL, uvicorn.access).
+
+Plus:
+
+- `EventLog` table with structured rows (event_type, status,
+  assets_count, details JSON, FKs to tracker/provider/action/
+  command_tracker/bot), `event_log_retention_days=30` default, daily
+  APScheduler cleanup `_cleanup_old_events`
+  ([scheduler.py:332](../../packages/server/src/notify_bridge_server/services/scheduler.py)).
+- Prometheus counter `notify_bridge_event_log_total{status,event_type}`.
+- Frontend viewer with filters at
+  [api/status.py](../../packages/server/src/notify_bridge_server/api/status.py).
+- `bind_log_context` actually used in: dispatcher (dispatch_id),
+  telegram_poller (bot/chat/command/request_id), webhook commands.
+
+### Gaps, ordered by debug-pain payoff
+
+1. **No FastAPI request-ID middleware.** `request_id_var` is set only
+   in webhook + Telegram poller paths. Every REST call from the SPA
+   logs as `req=-`. Tiny middleware (read `X-Request-Id` or
+   `uuid4()`, bind context, echo header) closes this whole-app blind
+   spot.
+2. **`dispatch_id` is in log lines but NOT persisted on the `EventLog`
+   row.** Means you can find the failed row in the UI but can't grep
+   stderr for the matching `disp=...`. Stash it in `details.dispatch_id`
+   (no migration needed) — biggest cross-surface correlation win.
+3. **HTTP access log is uvicorn default**
+   (`access_log=not _cfg.debug` at
+   [main.py:419](../../packages/server/src/notify_bridge_server/main.py)).
+   Doesn't include `request_id`, latency, user, status as structured
+   fields. Replace with a small `RequestLoggerMiddleware` that emits
+   `method`, `path`, `status`, `latency_ms`, `request_id`.
+4. **Telegram media-group failures log richly but aren't linked to the
+   resulting `EventLog` row.** The dispatcher result-aggregation work
+   in flight is the right place to dump `errors[]` into
+   `EventLog.details.errors`.
+5. **In-browser log access is missing.** EventLog rows are visible, but
+   raw logger output requires container/SSH access. A bounded
+   in-memory ring-buffer endpoint (admin-only, last N lines, filtered
+   by context fields) would mean ~90% of triage stays in the UI.
+6. **No "diagnostic mode" UI.** The runtime `apply_log_levels()` is
+   great but only reachable through the app-settings JSON editor.
+   A "Debug for 15 minutes: `notify_bridge_core.notifications.telegram.client`"
+   button with auto-revert is a few-hours job.
+7. **`EventLog.details` is freeform.** Frontend already destructures
+   `dispatch_status`, `deferred_until`, `deferred_for_seconds`,
+   `original_event_log_id`
+   ([types.ts:238-261](../../frontend/src/lib/types.ts)). Define a
+   typed `EventLogDetails` per `event_type` (Pydantic at the boundary)
+   — prevents drift between providers.
+8. **No log rotation** — `StreamHandler(sys.stderr)` only. Fine in
+   containers, brittle on bare-metal. Optional `RotatingFileHandler`
+   opt-in via env.
+9. **No slow-query / outbound-HTTP timing logs.**
+   `sqlalchemy.engine=WARNING` by default; no per-query duration log.
+   Same for outbound calls to Immich / Telegram. A
+   "duration_ms >= N" threshold logger would surface "why is this
+   dispatch slow" without flipping global DEBUG.
+10. **Action dry-run output is logger-only.** Could be streamed into
+    the action editor.
+11. **Poll-result not persisted.** Webhook payloads are logged
+    ([api/webhook_logs.py](../../packages/server/src/notify_bridge_server/api/webhook_logs.py)),
+    but Immich/Google-Photos poll cycles emit no
+    "last poll: 0 changes / 245ms" row. A lightweight
+    `provider_poll_log` (small table or ring buffer) would answer
+    "is the poller actually running" without reading stderr.
+
+---
+
+## Recommended sequencing
+
+| # | Item | Status | Why first |
+| --- | --- | --- | --- |
+| 1 | Request-ID middleware + persist `dispatch_id` on `EventLog` | **SHIPPED 2026-05-28** | Unlocks the rest of the debug story; ~2 hours combined |
+| 2 | Finish in-flight Telegram byte-budget chunking + write `errors[]` into `EventLog.details` | **SHIPPED 2026-05-28** | Already half-done; aligns with #1 |
+| 3 | Telegram inline keyboards + `callback_query` handler | not started | Prereq for several Immich smart actions |
+| 4 | Telegram `disable_notification` + `message_thread_id` per target | **SHIPPED 2026-05-28** | Small, also feeds the open Quiet Hours v1 backlog item |
+| 5 | Immich webhook ingestion | not started | 5-min → sub-second; biggest user-facing latency win |
+| 6 | Immich share-link expiry + auto-rotate (using #3) | not started | Real silent-breakage today |
+| 7 | Diagnostic-mode UI (live log-level toggle with auto-revert) | **SHIPPED 2026-05-28** | Shifts triage to the browser |
+| 8 | Immich duplicate digest + auto-favorite by person | not started | Both ride on #3 |
+
+Items 1–4 are infrastructure that unlocks 5–8. Items 1, 2, 4 also
+smooth the Quiet Hours v1 / target-level windows that's top of the
+backlog — worth landing before that feature so quiet hours can dispatch
+through edited messages and silent sends from day one.
+
+---
+
+## Decision log
+
+- **2026-05-28** — Review completed. Starting work on item #1
+  (request-id middleware + persist `dispatch_id` on `EventLog`).
+- **2026-05-28** — Item #1 shipped. Summary of the change:
+  - New helpers in
+    [packages/core/src/notify_bridge_core/log_context.py](../../packages/core/src/notify_bridge_core/log_context.py):
+    `ensure_dispatch_id()` (reuse existing or mint a new
+    `disp:<12 hex>`) and `enrich_details_with_correlation(details)`
+    (shallow-copy a details dict and merge active `dispatch_id` /
+    `request_id` from the ContextVar snapshot).
+  - New `RequestContextMiddleware` in
+    [packages/server/src/notify_bridge_server/main.py](../../packages/server/src/notify_bridge_server/main.py)
+    that reads inbound `X-Request-Id` (charset/length validated, `:`
+    excluded so a client can't masquerade as a server-minted id),
+    falls back to `req:<12 hex>`, binds the value via
+    `bind_log_context`, and echoes it back as the response header.
+    Added LAST so it's the outermost middleware.
+  - Outer entry points now bind a `dispatch_id` via a thin wrapper
+    function (`check_tracker`, `dispatch_provider_event`,
+    `dispatch_scheduled_for_tracker`, `_process_row`, `run_action`).
+    All 10 `EventLog(...)` creation sites wrap their `details=`
+    payload in `enrich_details_with_correlation(...)`.
+  - Switched `NotificationDispatcher.dispatch` to use
+    `ensure_dispatch_id()` instead of inline `uuid.uuid4()`.
+  - New tests in
+    [packages/server/tests/test_request_correlation.py](../../packages/server/tests/test_request_correlation.py)
+    (12 tests) covering header echo, charset validation, prefix-
+    masquerade rejection, helper merge semantics. All 239 server
+    tests green.
+  - Reviewed by `python-reviewer` subagent (no CRITICAL/HIGH; 3 MEDIUM
+    and 1 LOW addressed: PEP 8 imports moved to top of main.py;
+    `RequestResponseEndpoint` type added to `dispatch`; `:` dropped
+    from the request-id charset; shallow-copy caveat documented).
+  - Live smoke verified: generated id `req:a9b9821f5aab` on plain
+    request; safe inbound `my-trace-abc123` echoed unchanged;
+    `disp:fake12345678` correctly replaced; watcher tick log lines now
+    show distinct `disp=disp:<hex>` per tracker check.
+- **2026-05-28** — Item #2 shipped. Summary of the change:
+  - Confirmed the in-flight Telegram byte-budget media-group chunking
+    in
+    [telegram/client.py](../../packages/core/src/notify_bridge_core/notifications/telegram/client.py)
+    is complete (15/15 media-group tests pass). Deleted the now-unused
+    `split_media_by_upload_size()` from
+    [telegram/media.py](../../packages/core/src/notify_bridge_core/notifications/telegram/media.py).
+  - New module
+    [services/dispatch_summary.py](../../packages/server/src/notify_bridge_server/services/dispatch_summary.py)
+    with `summarize_dispatch_results()` (aggregator),
+    `attach_summary_in_place()` (in-session) and
+    `record_dispatch_summary_async()` (post-commit). Captures
+    `targets_attempted/succeeded/failed`, per-target `errors`,
+    media-group `media{delivered,skipped,failed}` counts and
+    `media_errors[]` from the new
+    `TelegramClient._send_media_group` partial-failure path.
+    Bounded: 20 errors / 20 media errors / 500-char message cap with
+    explicit `…[truncated]` marker.
+  - Wired at 4 dispatch sites:
+    - `event_dispatch.py`: accumulates per-target results across all
+      tracking-config groups, attaches summary in-session before
+      commit.
+    - `deferred_dispatch.py`: inlines summary into the new EventLog
+      row's `details` for both `delivered_after_quiet_hours` and
+      `deferred_then_failed` paths.
+    - `scheduled_dispatch.py`: inlines summary into the cron-fire
+      EventLog row's `details`.
+    - `watcher.py`: follow-up `record_dispatch_summary_async` in a
+      fresh session because the EventLog row was committed before
+      dispatch.
+  - Frontend type drift fixed:
+    [types.ts](../../frontend/src/lib/types.ts) gets new
+    `DispatchSummary`, `DispatchSummaryError`,
+    `DispatchSummaryMediaError` interfaces plus `dispatch_id` /
+    `request_id` / `dispatch_summary` keys on `EventLog.details`.
+  - New tests in
+    [tests/test_dispatch_summary.py](../../packages/server/tests/test_dispatch_summary.py)
+    (10 tests): empty/all-success/mixed/media-counts/sub-errors/
+    truncation/long-message-trim/in-place attach/no-results no-op/
+    malformed sub-error. All 249 server tests green.
+  - Reviewed by `python-reviewer` subagent (no CRITICAL; 2 HIGH + 3
+    MEDIUM addressed: `asyncio.CancelledError` re-raise in the
+    best-effort catch; late `from .dispatch_summary import …` calls
+    hoisted to top of each file; empty-results contract changed from
+    "zero-count summary attached" to "no key written"; truncation
+    marker upgraded to `…[truncated]` for operator clarity;
+    `flag_modified` comment tightened).
+  - Live smoke: backend restarts cleanly, watcher tick log lines
+    continue showing `disp=disp:<hex>` correlation, no startup
+    errors.
+- **2026-05-28** — Item #4 shipped. Summary of the change:
+  - `TelegramReceiver` dataclass in
+    [receiver.py](../../packages/core/src/notify_bridge_core/notifications/receiver.py)
+    gains `disable_notification: bool = False` and
+    `message_thread_id: int | None = None`. New
+    `_coerce_telegram_thread_id` helper collapses Telegram's "general
+    topic" sentinels (`0`, negatives, blanks, bools) to `None` so the
+    Bot API just omits the field — matches the frontend's `<= 0 → unset`
+    behaviour.
+  - `TelegramClient`
+    ([client.py](../../packages/core/src/notify_bridge_core/notifications/telegram/client.py))
+    gets a frozen `_SendOptions` + `_send_options_var` `ContextVar`
+    pattern for the deep media paths (`_upload_media`,
+    `_post_media_group`, `_send_from_cache`) that can't easily plumb
+    kwargs through. `send_notification` binds the var; the 3 deep
+    builders read it via `_apply_send_opts_to_payload` /
+    `_apply_send_opts_to_form`. `send_message` is a leaf and just
+    inlines its kwargs into the JSON body directly (no ContextVar
+    needed there).
+  - Dispatcher
+    ([dispatcher.py](../../packages/core/src/notify_bridge_core/notifications/dispatcher.py))
+    passes `receiver.disable_notification` / `receiver.message_thread_id`
+    into `client.send_message(...)` and `client.send_notification(...)`.
+  - Frontend: new inline per-Telegram-receiver options panel in
+    [ReceiverSection.svelte](../../frontend/src/routes/targets/ReceiverSection.svelte)
+    triggered by a cog icon. Silent + thread-id indicators (bell-off
+    icon, `#N` badge) on the row when set. `+page.svelte` handlers
+    PUT the merged config to `/api/targets/{id}/receivers/{rid}`.
+    5 new i18n keys in `en.json` / `ru.json`.
+  - New tests in
+    [test_telegram_per_send_options.py](../../packages/server/tests/test_telegram_per_send_options.py)
+    — 19 tests: factory + thread-id coercion table (including bool
+    rejection and `0`/negative collapse), payload/form helper merge
+    semantics, bind/reset under exceptions, concurrent-task isolation
+    via `asyncio.gather`, end-to-end `send_message` payload assertions.
+    All 270 server tests green.
+  - Reviewed by `python-reviewer` subagent (no CRITICAL; 2 HIGH + 1
+    MEDIUM + 1 LOW addressed: dead ContextVar bind in `send_message`
+    removed in favor of inline kwarg injection; re-entrant bind from
+    `send_notification → send_message` auto-resolved by the same fix;
+    `message_thread_id=0` collapse aligns backend with frontend;
+    `_coerce_telegram_thread_id` rejects `bool` input).
+  - Live smoke: backend restarts cleanly, no errors in startup log.
+- **2026-05-28** — Holistic `code-reviewer` pass over the full session
+  diff (Features 1+2+4+7) caught a real HIGH that the per-feature
+  Python-narrow reviews missed: ``summarize_dispatch_results`` in
+  Feature 2 was reading the wrong dict shape. The dispatcher's
+  ``_aggregate_results`` wraps per-receiver dicts under
+  ``result["results"]`` and renames the Telegram media counts to
+  ``media_delivered_count`` / ``media_skipped_count`` /
+  ``media_failed_count``. The summarizer was reading the top-level
+  ``delivered_count``, which is always absent in production aggregated
+  output — meaning the ``dispatch_summary.media`` block was silently
+  zero / missing for every real dispatch, and the ``media_errors``
+  list never populated. The unit tests passed because they
+  hand-constructed leaf-shaped dicts that masked the wrong-shape
+  read. Fixed in
+  [dispatch_summary.py](../../packages/server/src/notify_bridge_server/services/dispatch_summary.py)
+  by drilling into ``result["results"]`` per-receiver leaves and
+  preferring ``media_*_count`` field names with fallback to the
+  top-level names. Receiver index added to ``media_errors`` entries
+  when drilling. New integration tests in
+  [test_dispatch_summary.py](../../packages/server/tests/test_dispatch_summary.py)
+  use the real dispatcher envelope so a future shape regression fails
+  loudly. Also addressed MEDIUM findings: ``attach_summary_in_place``
+  / ``record_dispatch_summary_async`` now skip when a caller has
+  pre-set ``dispatch_summary`` (mirrors the "caller wins" rule in
+  ``enrich_details_with_correlation``); ``ReceiverSection.svelte``
+  props for the Telegram options panel are now optional + gated
+  internally so the component stays portable; TS type for
+  ``editingReceiverOptions.message_thread_id`` is ``number | ''``
+  with proper coercion in ``openEditReceiver``. 294/294 server tests
+  green; backend restarts clean.
+- **2026-05-28** — Item #5 NOT shipped. Reason: Immich has no
+  outbound webhook feature. The closest thing is `POST /sync/stream`
+  (a server-streaming sync API designed for first-party Immich
+  clients), and adopting it would (a) take 1-2 days of new
+  subscription-manager infrastructure, (b) couple us to an API with no
+  third-party stability contract, and (c) deliver 5-min → sub-second
+  latency on photo notifications which is rarely critical. If
+  someone later actually needs lower latency, dropping the default
+  ``scan_interval`` is a 5-minute alternative that gets 80% of the
+  win for 1% of the cost. Skipped in favour of #7.
+- **2026-05-28** — Item #7 shipped. Summary of the change:
+  - New service module
+    [services/diagnostic_mode.py](../../packages/server/src/notify_bridge_server/services/diagnostic_mode.py)
+    with `set_diagnostic` / `revert_diagnostic` / `revert_all` /
+    `list_active`. State is in-memory only — restart wipes overrides
+    (`setup_logging` re-applies the DB baseline at boot). Modules go
+    through an allowlist (`notify_bridge_*`, `sqlalchemy`, `aiohttp`,
+    `apscheduler`, `urllib3`, `httpx`, `httpcore`, `asyncio`, `PIL`,
+    `uvicorn`, `starlette`, `fastapi`) so a button press can't flip
+    root. Duration clamped to `[1, 240]` minutes. Baseline derivation
+    walks the dotted parents so
+    `sqlalchemy.engine.Engine` correctly inherits `sqlalchemy.engine`
+    → WARNING rather than falling through to root.
+  - 3 new admin-only endpoints under `/api/settings/diagnostic-mode`
+    in
+    [api/app_settings.py](../../packages/server/src/notify_bridge_server/api/app_settings.py):
+    `GET` (list active), `POST` (activate, 400 on invalid input),
+    `DELETE /{module:path}` (manual revert, 404 if not active).
+  - Auto-revert uses APScheduler's date trigger with `misfire_grace_time=60`,
+    falling back to a strongly-referenced asyncio task (stored in a
+    module-level set with `add_done_callback(discard)`) when the
+    scheduler isn't running. `_expire_callback` re-reads `log_levels`
+    from the DB at fire time, so an admin who edits overrides mid-window
+    sees the new baseline restored — not a stale snapshot.
+  - `revert_all` is wired into the FastAPI lifespan shutdown in
+    [main.py](../../packages/server/src/notify_bridge_server/main.py)
+    so a clean stop / hot-reload leaves the world tidy.
+  - New frontend
+    [DiagnosticsCassette.svelte](../../frontend/src/routes/settings/DiagnosticsCassette.svelte)
+    sits below `LoggingCassette` in the settings page. Quick-pick
+    module dropdown + custom-text fallback, duration chip group (5m /
+    15m / 30m / 1h / 2h), Activate button. Active list with countdown
+    updated by a 1s ticker; resyncs from the backend every 30s based
+    on elapsed time (not modulo-of-now, which the prior version had
+    wrong). Manual revert via undo-icon button on each row.
+  - 15 new i18n keys in `en.json` / `ru.json`.
+  - 20 new tests in
+    [test_diagnostic_mode.py](../../packages/server/tests/test_diagnostic_mode.py)
+    — service-module unit tests + 4 FastAPI smoke tests via
+    `dependency_overrides[require_admin]` exercising the router /
+    path converter / HTTPException paths. All 290 server tests green.
+  - Reviewed by `python-reviewer` subagent (no CRITICAL; 3 HIGH +
+    3 MEDIUM addressed: fallback task retention in a module-level set
+    to prevent GC; prefix-walk for `_baseline_for` so sub-loggers
+    inherit parent defaults; `revert_all` wired into lifespan
+    shutdown; `list_active` now sweeps expired entries; DB
+    `log_levels` re-read at revert time instead of snapshot at
+    activation; frontend resync uses elapsed time. LOW items
+    addressed: scheduler-unavailable paths log at DEBUG instead of
+    silently passing; test cleanup of dead `_MIN_DURATION_MINUTES`
+    mutation).
+  - Live smoke: backend restarts cleanly, no errors in startup log.
@@ -0,0 +1,89 @@
+# Production-Readiness Review — service-to-notification-bridge v0.8.1
+
+**Date:** 2026-05-22  **Scope:** entire codebase (~70k LOC, 312 files)
+**Branch:** master @ a20635a  **Reviewers:** 6 parallel specialised agents
+
+## Verdict
+
+**Ship-readiness: nearly there.** The product is in materially better shape than a typical pre-1.0 — every security baseline is in place (sandboxed Jinja2, bcrypt+JWT, SSRF guard with DNS-rebinding mitigation, secret masking, signed webhooks, non-root Docker, owner-scoped queries) and the feature set is mature (deferred dispatch, quiet hours, fan-out caps, 429 backoff, Prometheus metrics). No CRITICAL security findings exist.
+
+The work that *should* block shipping to wider users is concentrated in **three buckets**: (1) a handful of correctness defects that surface only under load or restart (duplicate-send class), (2) two secret-handling gaps (HA token returned cleartext, bot tokens/SMTP passwords unencrypted at rest), and (3) the schema-management story (`create_all` on boot + 1880-line hand-rolled migration script with no Alembic).
+
+## Reports
+
+| Axis | File | Findings | Top hit |
+|---|---|---|---|
+| Backend (Python) | [backend-review.md](backend-review.md) | 5C / 15H / 18M / 10L | `asyncio.create_task` GC in HA status logger |
+| Frontend (TS/Svelte) | [frontend-review.md](frontend-review.md) | 2C / 10H / 19M / 7L | JWT access+refresh in `localStorage` |
+| Security | [security-review.md](security-review.md) | 0C / 2H / multiple M | HA `access_token` not masked on `GET /providers/{id}` |
+| Performance + DB | [performance-db-review.md](performance-db-review.md) | 3C / 7H / 10M / 10L | `SQLModel.metadata.create_all` on every boot |
+| Bugs + features | [bugs-features-review.md](bugs-features-review.md) | 3C / 13H / 12M / 3L  + 25 features | Webhook redelivery has no idempotency |
+| UI/UX | [ui-ux-review.md](ui-ux-review.md) | ~33 across 13 axes | Five overlapping glass-card abstractions |
+
+## Ship blockers (must fix before wider rollout)
+
+Cross-cutting top 12 — verified across all six reviews:
+
+1. **HA `access_token` returned in plaintext** on `GET /api/providers/{id}` — not in mask list. *(Security H-1, [providers.py:399-405](packages/server/src/notify_bridge_server/api/providers.py#L399))*
+2. **Secrets unencrypted at rest** — Telegram bot tokens, SMTP passwords, HA tokens, webhook secrets stored as plain text in SQLite. Disk/snapshot/backup theft = full credential set. *(Security H-2)*
+3. **Frontend JWT access + refresh in `localStorage`** — any future XSS exfiltrates the session in one call. Move to httpOnly cookie. *(Frontend C-1)*
+4. **`asyncio.create_task` fire-and-forget** in `ha_subscription._on_status_change` — task may be GC'd before completion. *(Backend C-1, [ha_subscription.py:249](packages/server/src/notify_bridge_server/services/ha_subscription.py#L249))*
+5. **Pre-auth 1 MiB body read** on Gitea + generic webhooks — DoS amplifier. Verify `X-Hub-Signature` before reading body. *(Backend C-3, [webhooks.py:167](packages/server/src/notify_bridge_server/api/webhooks.py#L167) + 449)*
+6. **No webhook idempotency** — Gitea/Planka/generic don't dedupe by `X-Gitea-Delivery` / equivalent. Replays = duplicate sends. *(Bugs C-1)*
+7. **Deferred-dispatch crash window** — `dispatch()` returns before `session.commit()`; restart re-fires. Wrap in idempotent "claim → send → ack" with a unique constraint. *(Bugs C-2)*
+8. **Telegram `_last_update_id` in-memory only** — restart can replay or skip commands. Persist watermark. *(Bugs C-3)*
+9. **`init_db` calls `SQLModel.metadata.create_all` on every boot** — causes schema drift between fresh and upgraded installs. Adopt Alembic. *(Perf C-1)*
+10. **Template-preview endpoints bypass sandbox timeout** — authenticated user can wedge a worker with `{% for i in range(10**8) %}`. *(Security M-1)*
+11. **Telegram webhook handler missing `session.rollback()`** in catch-all — leaves uncommitted writes. *(Backend C-2, [commands/webhook.py:162](packages/server/src/notify_bridge_server/commands/webhook.py#L162))*
+12. **CLAUDE.md rule-8 violation** — `if (provider.type !== 'immich')` in `RuleEditor.svelte` silently disables people/album picker for other providers. *(Frontend C-2, [RuleEditor.svelte:57](frontend/src/routes/actions/RuleEditor.svelte#L57))*
+
+## Next-tier priorities (HIGH — fix in the same release where practical)
+
+13. Audit `backup_schema.PROVIDER_SECRET_FIELDS` so `webhook_secret`, `password`, `client_secret`, `refresh_token` are scrubbed on export. *(Backend C-5)*
+14. Add `asyncio.Lock` around `bridge_self` failure-counter dicts. *(Backend C-4)*
+15. Login rate-limit is per-IP only — slow rotated-source brute force succeeds. Add per-account lockout + raise password floor. *(Security M-2)*
+16. Three frontend CRUD pages copy cache items into local `$state`, breaking the shared-cache invariant and forcing a full refetch per mutation. *(Frontend H-1/H-2)*
+17. Uncancelled `setTimeout` chain in backup restart flow can `window.location.reload()` after navigation. *(Frontend H-5)*
+18. Refresh-token race against `logout()` produces spurious "Unauthorized" toasts. *(Frontend H-6/H-7)*
+19. Dashboard per-provider GROUP-BY aggregate runs unbounded on every refresh, no caching, no covering index. *(Perf H-1/H-2)*
+20. Truncation/parse-mode escaping for Telegram (HTML-aware truncate, `_extract_retry_after` fractional seconds, forum `message_thread_id` routing, 403 "bot blocked" auto-disable). *(Bugs H-various)*
+21. Five overlapping glass-card abstractions + radius drift (22/18/14/12 px) + ~71 legacy `rounded-md text-sm bg-…` form inputs that bypass the global Aurora `input{}` rule. *(UI/UX H-CONSIST-01..04)*
+22. Hardcoded hex colors (`#059669`, `#ef4444`) in Snackbar/ConfirmModal/actions — bypasses theming. *(UI/UX H-CONSIST-03)*
+23. Snackbar has no `aria-live`; nav lacks `aria-current="page"` — invisible to screen readers. *(UI/UX H-NAV-01, A11y)*
+24. DST handling in overnight quiet-hours windows. *(Bugs H)*
+
+## What's working well — keep doing this
+
+- **Sandboxed Jinja2 everywhere** (security agent verified every `Environment()` instantiation is `SandboxedEnvironment`).
+- **`PinnedResolver` SSRF defence** — handles CGNAT, IPv4-mapped IPv6, DNS rebinding.
+- **JWT with `token_version` revocation** — bcrypt offloaded to worker thread, constant-time username probe.
+- **Hardened Docker** — non-root, read-only root FS, `cap_drop: ALL`.
+- **Aurora/Glass design identity** — distinctive (conic-gradient orb, Newsreader italic display serif, lavender/orchid palette, "signal stream"/"on watch"/"wires"/"pulse" editorial labels). Not generic AI admin work.
+- **Frontend type discipline** — `svelte-check` clean, EN/RU exactly 1466 keys each, no `eval`/`innerHTML`/`var`/`==` anywhere.
+- **Most SQL hot paths already batched** — `load_link_data` is fully fan-in/fan-out; partial unique indexes on deferred-dispatch are thoughtful.
+- **Most v0.8.1 production-readiness items shipped** — fan-out caps, 429 backoff, parse_mode fallback, scheduler misfire grace, Prometheus, deep healthcheck, per-receiver render cache.
+
+## Top missing features worth adding next
+
+Pulled from the bugs-features report — full pitches in [bugs-features-review.md](bugs-features-review.md):
+
+- **Template playground** — "send test against last event" + dry-run with sample payload.
+- **Template versioning + rollback** with audit log.
+- **Bulk operations** on targets/templates (currently row-by-row).
+- **User-side snooze/mute via bot command** ("/mute 2h", "/snooze tonight").
+- **Auto-disable receiver on Telegram 403 ("bot blocked")** with admin notification.
+- **Rate-limit per target** (separate from global fan-out cap).
+- **Weekly digest + per-target stats + per-provider error rate**.
+- **Generic webhook provider** and **email / Discord / ntfy.sh / Matrix** channels.
+- **Message dedup window** (kills duplicate sends from redelivery and scheduler misfires).
+- **First-run "Getting Started" checklist** on empty dashboard (UI/UX).
+
+## How to consume this review
+
+Each report has clickable `file:line` markdown links. Recommended sequence:
+
+1. Read this `README.md`.
+2. Skim each report's Executive Summary (top 5-7 bullets).
+3. Triage the **Ship blockers (1-12)** above into the next release branch as individual issues.
+4. Schedule the **HIGH list (13-24)** for the release after.
+5. Treat the feature ideas as a refresh of `.claude/docs/feature-backlog.md`.
@@ -0,0 +1,342 @@
+# Backend Production-Readiness Review
+
+Scope: packages/server/src/notify_bridge_server/ and packages/core/src/notify_bridge_core/ (~44k LOC, Python 3.11, FastAPI + SQLModel async + APScheduler + aiohttp).
+
+## Executive Summary
+
+- **Overall quality is high.** The Jinja2 sandbox is consistently applied (every Environment instantiation is SandboxedEnvironment), JWT auth uses bcrypt offloaded to a worker thread, SSRF guard exists with DNS-rebinding mitigation, secrets are masked in logs via a dedicated filter, and most async/SQL patterns show production-aware design (per-tracker sessions, batched IN-queries, partial unique indexes).
+- **Top correctness risk: a fire-and-forget asyncio.create_task in ha_subscription._on_status_change** (no reference stored, GC can drop the task) plus thread-unsafe in-memory counters in bridge_self. Both bite on chatty HA installs.
+- **Module-level dict caches shared across the event loop have small read-modify-write windows** in services/scheduler.py (adaptive state), services/bridge_self.py (failure counters), commands/handler.py (TTLCache rate limits), and command_sync._dirty_bots. Currently functional under low concurrency; risky under load.
+- **Very large hot-path functions** — services/watcher.py:check_tracker (381 lines), services/dispatch_helpers.py:load_link_data (208 lines), the 1880-line database/migrations.py, and the 1365-line services/scheduler.py — concentrate too much logic in one place.
+- **Provider-type hardcoding** persists in api/providers.py, services/__init__.py, services/action_runner.py, and services/manual_dispatch.py (if provider.type == immich chains). The watchers _POLL_FACTORIES registry is the right model — extend it.
+- **Webhook handlers read the request body BEFORE authenticating** in the Gitea and generic-webhook routes. The Planka route gets it right. Net impact: a peer that knows the URL but not the secret can drive a 1 MiB read per request.
+- **autoescape is inconsistent**: True for runtime templates (renderer.py, commands/handler.py), False for preview / sample-context renders in api/template_configs.py, api/slot_helpers.py, and services/notifier.send_test_template_notification. Lower risk (admin-authored input) but mismatch invites surprise.
+
+---
+
+## CRITICAL
+
+### [C-1] _on_status_change schedules an unstored task (GC + drop risk)
+
+File: [packages/server/src/notify_bridge_server/services/ha_subscription.py:240-260](../../packages/server/src/notify_bridge_server/services/ha_subscription.py#L240)
+
+The task created by asyncio.create_task(_record_ha_status(...)) at line 249 is not held anywhere. Python may garbage-collect a task whose only reference is the create_task return value before it completes (Python docs explicitly warn: save a reference to the result). Result: an HA disconnect/reconnect EventLog row silently disappears under memory pressure.
+
+**Fix:** Module-level set[asyncio.Task], add the new task, remove via task.add_done_callback. ha_subscription.start_all already does this correctly (line 315-320); the pattern is already in-house.
+
+### [C-2] Telegram-webhook handler returns 200 OK on uncommitted writes
+
+File: [packages/server/src/notify_bridge_server/commands/webhook.py:130-169](../../packages/server/src/notify_bridge_server/commands/webhook.py#L130)
+
+The catch-all at line 162 swallows handle_command exceptions and returns OK to Telegram. The request already called await session.commit() at line 96 (after save_chat_from_webhook), and any subsequent writes via the dispatcher use NEW sessions inside the command path. If a downstream session inside handle_command partially commits before raising, the dependency get_session does NOT roll back automatically — the context manager only closes.
+
+**Fix:** Either explicitly session.rollback() in the except block, or wrap the per-request mutations in async with session.begin(): so the implicit transaction guarantees rollback on exception.
+
+### [C-3] Gitea/generic webhook reads body BEFORE verifying secret is configured
+
+File: [packages/server/src/notify_bridge_server/api/webhooks.py:167-178](../../packages/server/src/notify_bridge_server/api/webhooks.py#L167) and line 449-454
+
+The sequence is: read 1 MiB raw_body, then check if webhook_secret is empty. A peer that learned the URL but has no secret drives a 1 MiB body read per request. Plankas handler at line 232+ validates the bearer token BEFORE the body read — that is the correct pattern.
+
+**Fix:** Hoist the "if not webhook_secret" (Gitea) and "if auth_mode == none" short-circuit (generic) above _read_bounded_body. Gitea HMAC still needs the body — but bailing on a missing-config-side error first costs nothing.
+
+### [C-4] bridge_self in-memory counters are not async-safe
+
+File: [packages/server/src/notify_bridge_server/services/bridge_self.py:186-230](../../packages/server/src/notify_bridge_server/services/bridge_self.py#L186)
+
+record_poll_failure does _poll_failure_counts[tracker_id] = _poll_failure_counts.get(tracker_id, 0) + 1. These dicts are accessed concurrently from poll loop, HA push, webhook ingest, and dispatcher target-failure recording. Individual dict ops are atomic, but get + 1 + set is not when interleaved with another coroutine that touches the same key. Symptoms: missed threshold crossings, occasional double-emission. Same pattern in _target_failure_counts and _backlog_above_threshold.
+
+**Fix:** Wrap mutating ops in an asyncio.Lock. The reset-and-re-arm semantics already assume serial access — make it explicit.
+
+### [C-5] PROVIDER_SECRET_FIELDS audit needed for backup exports
+
+File: [packages/server/src/notify_bridge_server/api/providers.py:617-625](../../packages/server/src/notify_bridge_server/api/providers.py#L617) and [services/backup_service.py:84-93](../../packages/server/src/notify_bridge_server/services/backup_service.py#L84)
+
+_apply_secrets_provider redacts only fields named in PROVIDER_SECRET_FIELDS. The webhook flow uses a field called webhook_secret (Gitea, Planka, generic) — verify this is in PROVIDER_SECRET_FIELDS (defined in backup_schema.py). A backup export with secrets_mode=INCLUDE that misses webhook_secret leaks a token that grants webhook-forgery rights.
+
+**Action:** Audit PROVIDER_SECRET_FIELDS. Specifically check it includes: api_key, api_token, access_token, webhook_secret, password, client_secret, refresh_token. The _provider_response mask list at api/providers.py:620 is a good cross-reference — both should be the same constant.
+
+---
+
+## HIGH
+
+### [H-1] _compile_template lru_cache competes across tenants
+
+File: [packages/server/src/notify_bridge_server/commands/handler.py:99-103](../../packages/server/src/notify_bridge_server/commands/handler.py#L99)
+
+lru_cache(maxsize=256) keyed by raw template string. Edited templates remain cached. On a multi-tenant install one tenants 256 distinct templates can evict anothers. No invalidation on template-edit.
+
+**Fix:** Drop the cache (Jinja compile is sub-ms) OR add an invalidation call from the template-edit endpoints. The notification renderer (renderer.py:31) uses 512 slots — same problem; consistent fix.
+
+### [H-2] check_tracker is 381 lines with deep coupling
+
+File: [packages/server/src/notify_bridge_server/services/watcher.py:263-644](../../packages/server/src/notify_bridge_server/services/watcher.py#L263)
+
+Loads tracker, polls, writes state, persists EventLog, evaluates gates, defers, dispatches, records bridge_self — all in one function. Refactor candidates: _poll_phase, _persist_state_and_events, _dispatch_phase. This is the watchers hot path; bugs here affect every tracker tick.
+
+### [H-3] load_link_data returns untyped dict[str, Any]
+
+File: [packages/server/src/notify_bridge_server/services/dispatch_helpers.py:539-747](../../packages/server/src/notify_bridge_server/services/dispatch_helpers.py#L539)
+
+Five call sites consume ld["target_type"], ld.get("link_id"), etc. — no static guarantee against key typos.
+
+**Fix:** Introduce a frozen @dataclass class LinkData. Same for per-receiver entries.
+
+### [H-4] N+1 in _resolve_command_context template-slot loop
+
+File: [packages/server/src/notify_bridge_server/commands/handler.py:200-215](../../packages/server/src/notify_bridge_server/commands/handler.py#L200)
+
+One SELECT per distinct command_template_config_id. Already batched for trackers/configs/providers — finish the job. Single WHERE config_id IN (...) query + Python pivot.
+
+### [H-5] N+1 in backup_service.export_backup receiver loop
+
+File: [packages/server/src/notify_bridge_server/services/backup_service.py:187-189](../../packages/server/src/notify_bridge_server/services/backup_service.py#L187)
+
+50 targets = 51 SELECTs. Batch with WHERE target_id IN (...). Audit other sections of this 941-line file for the same pattern (templates -> slots, command configs -> slots).
+
+### [H-6] _dirty_bots mutated from request and scheduler without a lock
+
+File: [packages/server/src/notify_bridge_server/services/command_sync.py:25-95](../../packages/server/src/notify_bridge_server/services/command_sync.py#L25)
+
+mark_bot_dirty runs in request handlers, _flush_dirty_bots on the scheduler executor. Currently safe (snapshot via ready = [...]) but fragile.
+
+**Fix:** Snapshot under lock, or move to a thread-safe primitive.
+
+### [H-7] HA reconnect cycle has no way for CRUD to short-circuit a stale supervisor
+
+File: [packages/server/src/notify_bridge_server/services/ha_subscription.py:163-175](../../packages/server/src/notify_bridge_server/services/ha_subscription.py#L163)
+
+Reload-on-reconnect means a disabled HA provider keeps trying to reconnect at the 30s/300s cadence until next reconnect attempt. CRUD endpoints should call reload_provider (defined at line 339) — verify wiring.
+
+### [H-8] Cached expunged ORM instances are footguns
+
+File: [packages/server/src/notify_bridge_server/services/event_dispatch.py:75-107](../../packages/server/src/notify_bridge_server/services/event_dispatch.py#L75)
+
+_load_trackers_cached returns expunged NotificationTracker rows. Future maintainer calling session.add(tracker) on a stale cached instance triggers DetachedInstance or silent re-INSERT. Document this strongly, ideally convert to a typed projection.
+
+### [H-9] Pending-restore at startup has no timeout
+
+File: [packages/server/src/notify_bridge_server/main.py:142-143](../../packages/server/src/notify_bridge_server/main.py#L142)
+
+apply_pending_restore_if_any runs in lifespan; a partially-corrupt restore could block startup indefinitely. Container liveness probes then fail after grace.
+
+**Fix:** asyncio.wait_for with a generous timeout, or kick off as background task while app starts.
+
+### [H-10] Jinja2 render watchdog uses daemon thread that can pin a CPU forever
+
+File: [packages/core/src/notify_bridge_core/templates/renderer.py:48-73](../../packages/core/src/notify_bridge_core/templates/renderer.py#L48)
+
+Comment acknowledges the trade-off. Multiple concurrent runaway renders can exhaust CPU cores while callers think they timed out. Add a process-level BoundedSemaphore capping concurrent in-flight renders.
+
+### [H-11] _aggregate drops all but the first error
+
+File: [packages/server/src/notify_bridge_server/services/notifier.py:326-335](../../packages/server/src/notify_bridge_server/services/notifier.py#L326)
+
+When all sends fail, only results[0] is returned. Distinct subsequent errors are lost.
+
+**Fix:** Aggregate all errors into a details field.
+
+### [H-12] Generic-webhook header dict materialised twice
+
+File: [packages/server/src/notify_bridge_server/api/webhooks.py:456](../../packages/server/src/notify_bridge_server/api/webhooks.py#L456) and line 475
+
+dict(request.headers) materialises full headers map, then _filter_headers and _redact_sensitive_body walk the payload. With a malicious peer sending many headers (Starlette default 100), bounded but wasteful.
+
+### [H-13] SSRF redirect-walk has no aggregate wall-clock budget
+
+File: [packages/core/src/notify_bridge_core/notifications/telegram/client.py:232-268](../../packages/core/src/notify_bridge_core/notifications/telegram/client.py#L232)
+
+max_redirects = 3, each with 120s _DOWNLOAD_TIMEOUT. Worst case per request: 480s. _TARGET_TIMEOUT_S = 120s in the dispatcher caps the top-level case, but per-asset preloads inside media groups dont all share that cap.
+
+### [H-14] Backlog recovery logic flips latch for in-flight users
+
+File: [packages/server/src/notify_bridge_server/services/bridge_self.py:544-551](../../packages/server/src/notify_bridge_server/services/bridge_self.py#L544)
+
+Recovery loop iterates all known users and flips to False for any not in counts_by_user. If a user transiently has no user_id set on deferred rows (legacy / orphaned), theyre excluded from the GROUP BY and incorrectly marked recovered.
+
+### [H-15] quiet_hours_status silently returns None on start == end
+
+File: [packages/server/src/notify_bridge_server/services/dispatch_helpers.py:110-111](../../packages/server/src/notify_bridge_server/services/dispatch_helpers.py#L110)
+
+The comment notes this is almost always a user mistake. Silent return means the user wonders why their notifications still arrive at all hours. Surface via WARNING log + UI hint.
+
+---
+
+## MEDIUM
+
+### [M-1] register_commands_with_telegram chat overrides loop is sequential
+
+File: [packages/server/src/notify_bridge_server/commands/handler.py:723-776](../../packages/server/src/notify_bridge_server/commands/handler.py#L723)
+
+50 chats with overrides = 50 sequential Telegram round-trips. Use asyncio.gather with a semaphore as in _refresh_telegram_chat_titles.
+
+### [M-2] _run_provider exception backoff has no escalation
+
+File: [packages/server/src/notify_bridge_server/services/ha_subscription.py:278-283](../../packages/server/src/notify_bridge_server/services/ha_subscription.py#L278)
+
+Persistent bug in _emit reconnects every 30s forever. Add exponential backoff with cap and bridge_self alert after N failures.
+
+### [M-3] database/migrations.py is 1880 lines
+
+File: [packages/server/src/notify_bridge_server/database/migrations.py](../../packages/server/src/notify_bridge_server/database/migrations.py)
+
+Past the 800-line guideline. Split per-migration into database/migrations/<name>.py, list in main.py.
+
+### [M-4] Locale-resolution logic duplicated
+
+File: [packages/server/src/notify_bridge_server/services/dispatch_helpers.py:484-491](../../packages/server/src/notify_bridge_server/services/dispatch_helpers.py#L484) and [services/notifier.py:46](../../packages/server/src/notify_bridge_server/services/notifier.py#L46)
+
+Two implementations of locale priority. One source of truth.
+
+### [M-5] _normalize_locale duplicated across modules
+
+File: [packages/server/src/notify_bridge_server/commands/handler.py:632](../../packages/server/src/notify_bridge_server/commands/handler.py#L632)
+
+Five-line copy; move to commands/command_utils.py.
+
+### [M-6] Provider-type if-chain in _test_provider_connection
+
+File: [packages/server/src/notify_bridge_server/api/providers.py:203-250](../../packages/server/src/notify_bridge_server/api/providers.py#L203)
+
+Same chain in services/__init__.py:_make_collection_provider. Both candidates for a single registry.
+
+### [M-7] Secret masking exposes last 4 chars unconditionally
+
+File: [packages/server/src/notify_bridge_server/api/providers.py:624](../../packages/server/src/notify_bridge_server/api/providers.py#L624) and [services/backup_service.py:81](../../packages/server/src/notify_bridge_server/services/backup_service.py#L81)
+
+Fine for 32-char Immich keys. Returns half the value for short secrets. Use plain "***" for len(value) < 16.
+
+### [M-8] Deprecated validate_outbound_url still imported
+
+File: [packages/core/src/notify_bridge_core/providers/immich/client.py:14](../../packages/core/src/notify_bridge_core/providers/immich/client.py#L14)
+
+The sync version uses blocking socket.getaddrinfo on the event loop. Migrate to avalidate_outbound_url.
+
+### [M-9] Lazy cache init has confusing DCL comment
+
+File: [packages/server/src/notify_bridge_server/services/watcher.py:81-113](../../packages/server/src/notify_bridge_server/services/watcher.py#L81)
+
+Comment about Double-check after acquiring lock implies classic DCL — under asyncio, the unlocked first check is safe because theres no thread context switch, but rename to clarify.
+
+### [M-10] Dispatcher concurrency cap is per-dispatch, not process-wide
+
+File: [packages/core/src/notify_bridge_core/notifications/dispatcher.py:58](../../packages/core/src/notify_bridge_core/notifications/dispatcher.py#L58)
+
+_DISPATCH_CONCURRENCY = 16 is INSIDE dispatch(). HA storm = N events x min(M, 16) sends with no outer cap. Add a process-level semaphore in event_dispatch.py.
+
+### [M-11] success=True returned for partial failures
+
+File: [packages/server/src/notify_bridge_server/services/notifier.py:329-335](../../packages/server/src/notify_bridge_server/services/notifier.py#L329)
+
+A test that fails on 1 of 3 receivers returns success=True with a partial_failures count. Introduce a status: "ok"|"partial"|"fail" field.
+
+### [M-12] Telegram command registration not retried on 429
+
+File: [packages/server/src/notify_bridge_server/commands/handler.py:671-693](../../packages/server/src/notify_bridge_server/commands/handler.py#L671)
+
+set_my_commands/delete_my_commands arent retried. Adopt the retry-after handling that _upload_media has.
+
+### [M-13] event_log_id_by_event keyed on id(event)
+
+File: [packages/server/src/notify_bridge_server/services/watcher.py:417-464](../../packages/server/src/notify_bridge_server/services/watcher.py#L417)
+
+CPython object-address as key works because events are held alive in scope, but a typed key would be safer.
+
+### [M-14] Bcrypt-length error wording could be clearer
+
+File: [packages/server/src/notify_bridge_server/auth/routes.py:69-81](../../packages/server/src/notify_bridge_server/auth/routes.py#L69)
+
+User typing 70 ASCII + emoji gets rejected and doesnt understand why. Clarify the byte-count language.
+
+### [M-15] CSP allows unsafe-inline for script-src
+
+File: [packages/server/src/notify_bridge_server/main.py:186-201](../../packages/server/src/notify_bridge_server/main.py#L186)
+
+Acknowledged. SvelteKit --csp build flag emits hashes; switching unblocks dropping unsafe-inline.
+
+### [M-16] Telegram-webhook body size not capped
+
+File: [packages/server/src/notify_bridge_server/commands/webhook.py:71](../../packages/server/src/notify_bridge_server/commands/webhook.py#L71)
+
+update = await request.json() reads with no cap. Add _read_bounded_body pattern.
+
+### [M-17] _log_command_event swallows DB failures invisibly
+
+File: [packages/server/src/notify_bridge_server/commands/handler.py:353-357](../../packages/server/src/notify_bridge_server/commands/handler.py#L353)
+
+Hard DB failure here is invisible. Add a metrics counter.
+
+### [M-18] apply_tracking_display_filters is a 60-line if-branched function
+
+File: [packages/server/src/notify_bridge_server/services/dispatch_helpers.py:350-405](../../packages/server/src/notify_bridge_server/services/dispatch_helpers.py#L350)
+
+Split into _filter_favorites, _apply_order_and_limit, _strip_details_and_tags.
+
+---
+
+## LOW
+
+### [L-1] from .database.models import * in main.py
+
+File: [packages/server/src/notify_bridge_server/main.py:26](../../packages/server/src/notify_bridge_server/main.py#L26)
+
+Comment is honest about purpose, but explicit imports or a single module import is clearer.
+
+### [L-2] None comparisons
+
+All comparisons verified to use is None via grep — no findings.
+
+### [L-3] Magic numbers
+
+Constants are well-named throughout (_TG_429_MAX_ATTEMPTS, _MAX_PENDING_PER_TRACKER, DEBOUNCE_SECONDS, etc.). Only nit: seconds=30 literal in scheduler.schedule_bot_polling could be promoted.
+
+### [L-4] noqa E712 repeated 8+ times for SQLModel boolean comparisons
+
+Switch to .is_(True) for SQLAlchemy idiom, or add E712 to project ruff config.
+
+### [L-5] _check_same_origin is best-effort by design
+
+Acceptable.
+
+### [L-6] _normalize_host strips IPv6 zone IDs silently
+
+File: [packages/core/src/notify_bridge_core/notifications/ssrf.py:105-106](../../packages/core/src/notify_bridge_core/notifications/ssrf.py#L105)
+
+Debug log when stripping changes the host would help diagnose.
+
+### [L-7] _compute_jitter cap of 30s might be tight on hourly polls
+
+File: [packages/server/src/notify_bridge_server/services/scheduler.py:91-105](../../packages/server/src/notify_bridge_server/services/scheduler.py#L91)
+
+Revisit if jitter-collision becomes a real-world issue.
+
+### [L-8] SmtpConfig repr may leak password
+
+File: [packages/server/src/notify_bridge_server/services/notifier.py:205-213](../../packages/server/src/notify_bridge_server/services/notifier.py#L205)
+
+If SmtpConfig is a vanilla dataclass, repr() will leak the password. Verify in notify_bridge_core.notifications.email.client — add field(repr=False) or a custom __repr__.
+
+### [L-9] noqa BLE001 count is high
+
+49 occurrences across 26 files. Each defensible; consider narrowing where possible.
+
+### [L-10] _normalize_for_json does not handle UUID/Decimal
+
+File: [packages/server/src/notify_bridge_server/services/deferred_dispatch.py:124-133](../../packages/server/src/notify_bridge_server/services/deferred_dispatch.py#L124)
+
+No current consumer emits these, but a fallback str() for unknown types would prevent future breakage.
+
+---
+
+## Approval Verdict
+
+**Block** — CRITICAL findings (C-1 unstored task, C-2 missing rollback, C-3 unauthenticated body read, C-4 racy counters, C-5 secret-mask audit) must be fixed before declaring production-ready. Once those are addressed, the HIGH findings can land in a follow-up.
+
+## Quick Wins (low effort, high value)
+
+1. **Wrap every fire-and-forget asyncio.create_task in a module-level set** — search for asyncio.create_task( with no assignment. Definite hit: ha_subscription.py:249.
+2. **Move webhook-secret check before _read_bounded_body** in Gitea + generic webhook handlers — 5-line move per endpoint, eliminates pre-auth resource exhaustion.
+3. **Add an asyncio.Lock around _poll_failure_counts and _target_failure_counts** mutations — eliminates C-4.
+4. **Split migrations.py** — mechanical refactor, ~1 hour, improves blame/review.
+5. **Batch the receiver query in backup_service.export_backup** — single IN (...) query, ~10x faster.
+6. **Replace from .database.models import \*** with explicit imports — small clarity win.
@@ -0,0 +1,714 @@
+# Bugs + Missing Features — Production-Readiness Review
+
+Repo: `c:\Users\Alexei\Documents\service-to-notification-bridge` (v0.8.1 baseline)
+Date: 2026-05-22
+Scope: full repo (backend Python/FastAPI, Svelte 5 frontend, providers + dispatchers + bot commands)
+
+---
+
+## Executive summary
+
+- **The code is in much better shape than typical pre-1.0 code.** Quiet-hours,
+  SSRF, JWT, secret redaction, rate-limit fan-out caps, partition-by-media-kind,
+  parse_mode retry, scheduler misfire-grace, Prometheus metrics, deep
+  healthcheck, and per-receiver render cache are all already implemented and
+  well-tested.
+- **The single biggest shipping risk is webhook idempotency.** Gitea, Planka,
+  and the generic webhook endpoint all dispatch on every POST regardless of
+  redelivery — there is no `X-Gitea-Delivery` / `X-Hub-Delivery` dedup table.
+  An upstream retry storm sends the same notification N times.
+- **The deferred-dispatch drain has a duplicate-send window** if the process
+  dies between `dispatcher.dispatch()` returning and `session.commit()` —
+  the row stays `pending` and the periodic catch-up scan re-drains it.
+- **Telegram update offset (`_last_update_id`) is in-memory only** — on
+  restart, the bot replays already-handled updates or skips ones Telegram
+  has discarded. Combined with no per-update idempotency, this is a
+  duplicate-command surface.
+- **Several Telegram features are silently unsupported**: forum threads
+  (`message_thread_id`), bot-blocked-by-user detection (403 → keep retrying
+  forever), and inline-button callback queries. None blocks shipping today
+  but each is a near-term ask from any real user.
+- **No template versioning / dry-run / playground** — every template edit is
+  immediately live. There is no way to validate a new template against a
+  sample payload before flipping the switch, and no rollback path.
+- **Frontend lacks bulk operations and import/export of templates+targets.**
+  An operator with 30 trackers cannot bulk-toggle, bulk-edit, or move a
+  template across users.
+
+---
+
+## Part A — Bugs and reliability issues
+
+Severity legend: **CRITICAL** = data loss / duplicate user-visible messages /
+silent stop-shipping; **HIGH** = wrong behavior under realistic conditions;
+**MEDIUM** = degrades UX or operability; **LOW** = polish.
+
+### CRITICAL
+
+#### A1. Webhook redelivery causes duplicate notifications (no idempotency)
+
+**Location**: `packages/server/src/notify_bridge_server/api/webhooks.py:156`
+(`gitea_webhook`), `:225` (`planka_webhook`), `:427` (`generic_webhook`).
+**Scenario**: Gitea retries a webhook after 30s if the bridge returns 5xx,
+times out under load, or if the operator clicks "Test Delivery" twice. Every
+retry produces a fresh notification because the handlers never check
+`X-Gitea-Delivery` (Gitea's per-delivery UUID), nor do they record any
+event_id/hash for `parse_generic_webhook` events.
+**Fix**: Add a `webhook_delivery` table with `(provider_id, delivery_id)`
+unique constraint and `created_at`. Insert before dispatch (`INSERT OR IGNORE`
+on SQLite, `ON CONFLICT DO NOTHING` on Postgres); if the insert is a no-op,
+return `{"ok": true, "skipped": "duplicate"}`. For Gitea use the
+`X-Gitea-Delivery` header; for Planka use a hash of `event_type +
+payload.id + payload.createdAt`; for generic webhooks use a configurable
+JSONPath expression to derive an idempotency key, falling back to a SHA256 of
+the raw body. TTL prune older than 7 days.
+
+#### A2. Deferred-dispatch drain can double-send on process crash
+
+**Location**: `packages/server/src/notify_bridge_server/services/deferred_dispatch.py:721-758`.
+**Scenario**: Inside `_process_row`, `dispatcher.dispatch()` actually
+delivers the Telegram message (HTTP 200 returned, user phone buzzes).
+The function then sets `row.status = "fired"` (line 734) but the surrounding
+`session.commit()` (line 577) hasn't run yet. Process is killed (OOM,
+SIGTERM during deploy, host reboot). On restart, `_run_deferred_drain_catchup`
+re-fetches the still-`pending` row and dispatches it again — **the user gets
+the same album twice**.
+**Fix**: Either (a) record an outbound dedup key per-row before dispatch
+(`row.dispatch_id = uuid4(); session.commit()` first), then ask the channel
+client to send-or-no-op based on that ID; or (b) flip the row to a
+`"in_flight"` state with a short timeout in a pre-dispatch transaction so a
+restart sees it as poisoned and aborts. Option (a) is more correct but
+needs per-channel cooperation; option (b) is the cheap fix.
+
+#### A3. Telegram update offset is in-memory only — restart replays or loses commands
+
+**Location**: `packages/server/src/notify_bridge_server/services/telegram_poller.py:31`
+(`_last_update_id: dict[int, int] = {}`).
+**Scenario**: A user types `/random Family`. Telegram delivers update_id=4711.
+The bridge processes the command, sends back the media, and crashes before
+APScheduler ticks again. On restart, `_last_update_id` is empty, so we call
+`getUpdates(offset=None)` → Telegram returns 4711 again → we send the user
+the same album a second time. Conversely, if Telegram's 24-hour retention
+expired during a long outage, we silently skip pending updates.
+**Fix**: Persist last_update_id in DB (`telegram_bot.last_update_id` column).
+Combine with A2-style command idempotency by inserting
+`(bot_id, update_id)` into a dedup table before processing.
+
+### HIGH
+
+#### A4. Telegram "bot blocked by user" / "chat not found" never short-circuits
+
+**Location**: `packages/core/src/notify_bridge_core/notifications/telegram/client.py`
+(`send_message`, `_upload_media`, etc.). Errors with
+`error_code == 403` (Forbidden, "Bot was blocked by the user") and 400
+"chat not found" / "user is deactivated" are returned as failures but
+never recorded so the receiver gets removed/disabled.
+**Scenario**: A user blocks the bot. Every scheduled "Good morning memory"
+fires a sendMessage that Telegram instantly 403s. Bridge logs an error,
+moves on, repeats forever. The bridge_self target-failure counter eventually
+fires but the underlying receiver is never disabled. With many such chats
+the operator has no easy cleanup path.
+**Fix**: In the dispatcher, on `error_code in (403, 400 with description
+matching "chat not found"/"user is deactivated")`, automatically set
+`TelegramChat.commands_enabled = False` and either flag the receiver as
+`disabled` with reason `blocked_by_user` or surface it via a new
+`/admin/blocked-chats` view. Also stop further retries that round.
+
+#### A5. Telegram forum-thread (topic) routing not supported
+
+**Location**: telegram client never accepts/sends `message_thread_id`.
+**Scenario**: Operator points the bridge at a group's "Releases" forum
+topic. Today every message lands in the General topic instead — there is
+no way to specify the topic. This is a hard requirement for any non-trivial
+group install. Currently `reply_parameters` is the only thread-adjacent
+field used; `message_thread_id` is silently absent.
+**Fix**: Add an optional `message_thread_id` per-receiver (or per-target)
+config, pass through `send_message`, `_upload_media`, and `_post_media_group`.
+Auto-extract from incoming command updates' `message.message_thread_id` so
+the bot can reply into the same topic.
+
+#### A6. `bot.token` read after commit without refresh in webhook flow
+
+**Location**: `packages/server/src/notify_bridge_server/commands/webhook.py:92-97`.
+**Scenario**: The comment acknowledges "AsyncSession expires instances on
+commit" and snapshots `bot_id`/`bot_token` before commit, but `await
+session.refresh(bot)` is also called after the commit. If `session.refresh`
+fails (e.g. row was deleted by an admin concurrently — bot rotation), the
+exception is caught as a warning and the rest of the handler still runs
+using the stale local `bot_id`/`bot_token`. The window is small but real.
+**Fix**: Remove the `session.refresh(bot)` since the snapshot already
+covers everything the handler needs. The refresh adds risk for no gain.
+
+#### A7. Deferred-dispatch coalescing has a JSON-mutation bug under concurrent defers
+
+**Location**: `packages/server/src/notify_bridge_server/services/deferred_dispatch.py:307`
+(`_find_pending_asset_rows`).
+**Scenario**: Two near-simultaneous `assets_added` events for the same
+`(link_id, collection_id)` from two upstream pollers (HA chat-bus +
+periodic Immich). Both call `defer_event` concurrently. The two transactions
+both see "no pending row", both `session.add(new_row)`, and SQLite cheerfully
+inserts two rows. The drain then fires both, sending the same combined media
+twice. Note that the partial UNIQUE index from v0.8.1 protects only the
+`bridge_self` provider row, not the deferred queue.
+**Fix**: Add a partial UNIQUE index `UNIQUE(link_id, collection_id, event_type)
+WHERE status = 'pending'` on `deferred_dispatch`, then convert `defer_event`
+to `INSERT ... ON CONFLICT (link_id, collection_id, event_type) DO UPDATE`
+and merge `event_payload` inside the SQL or in a re-read+retry loop.
+
+#### A8. Quiet-hours overnight window + DST transition can produce wrong fire_at
+
+**Location**: `packages/server/src/notify_bridge_server/services/dispatch_helpers.py:121-128`.
+**Scenario**: User in `Europe/Minsk` (UTC+3, no DST anymore) sets quiet
+hours 22:00-06:00. For a user in a DST-observing zone (e.g.
+`America/New_York`), on the "spring forward" night where 2:00 → 3:00, an
+event arriving at 02:30 local time gets `end_today = now_local.replace(hour=6,
+minute=0)`. But `.replace()` ignores DST adjustments — the resulting
+`datetime` may sit in the skipped hour or have ambiguous DST status. Two
+hours later, the dispatcher sees the quiet window as "still active" or "30
+min ago" depending on the system.
+**Fix**: After `.replace(hour=t_end.hour, minute=t_end.minute, ...)`, pass
+through `tz.localize` (zoneinfo's behavior: re-walk via `astimezone`) and
+explicitly handle the `fold=` parameter. Add tests using
+`zoneinfo.ZoneInfo("America/New_York")` and known DST transition dates.
+
+#### A9. Quiet-hours `start == end` returns None — silently no quiet hours
+
+**Location**: `packages/server/src/notify_bridge_server/services/dispatch_helpers.py:110-111`.
+**Scenario**: User UI submits `quiet_hours_start = "00:00"` and
+`quiet_hours_end = "00:00"`, thinking "all day quiet". The function returns
+`None` (no quiet window) — the user gets pinged at 3am even though the UI
+says "quiet hours enabled". Same code path eats malformed times silently.
+**Fix**: Bubble up `ValueError`/`malformed input` to the API validator on
+write so the user gets a 422 with a specific error message rather than
+silently broken behavior. Define `00:00-00:00` as "always quiet" or reject
+it explicitly with a clear error.
+
+#### A10. Telegram `_truncate` cuts mid-HTML-tag → parse_mode fallback then loses formatting
+
+**Location**: `packages/core/src/notify_bridge_core/notifications/telegram/client.py:144-149`
+(`_truncate`).
+**Scenario**: A template renders to 4090 chars and an
+`<a href="https://...">...</a>` straddles the 4096-byte boundary. The
+truncate function takes a flat string slice, so the final character may be
+inside a tag → Telegram returns 400 "can't parse entities" → the retry
+strips parse_mode → the user sees `<a href="...">` literally in their chat.
+**Fix**: Make `_truncate` HTML-aware: scan from the right and abandon
+truncation at the start of any tag boundary, OR strip incomplete tags after
+truncating. A simpler intermediate fix: pop any unclosed `<a>` /`<b>`/`<i>`
+detected by a regex over the truncated string.
+
+#### A11. JSON-payload depth/size hardened in backup, not in webhooks
+
+**Location**: `packages/server/src/notify_bridge_server/api/webhooks.py:43-71`
+(`_read_bounded_body` only caps total bytes).
+**Scenario**: Generic webhook accepts a 999KB payload (under the 1MB cap)
+but with 50 levels of nesting. `json.loads` succeeds, then
+`parse_generic_webhook` evaluates JSONPath expressions in a loop and the CPU
+spends seconds chasing pointers. Multiple concurrent malicious requests can
+peg the event loop.
+**Fix**: Reuse the depth/node guards from
+`packages/server/src/notify_bridge_server/services/backup_service.py`
+(JSON depth cap 10, node count cap 100k). Either share the helper or
+re-implement around `json.loads(object_pairs_hook=...)`.
+
+#### A12. Generic-webhook `auth_mode="none"` with `acknowledge_unauthenticated` is per-provider, not per-user
+
+**Location**: `packages/server/src/notify_bridge_server/api/webhooks.py:294-323`.
+**Scenario**: v0.8.1 added the `acknowledge_unauthenticated=true` opt-in,
+but it's only stored in `provider.config` JSON. A multi-user install where
+one user accepts unauthenticated and another doesn't would suffice. But
+because anyone with the webhook URL can also infer the token (URLs are not
+secret in real deployments — they end up in upstream config files, logs,
+build artifacts), `auth_mode="none"` is dangerous beyond "explicit opt-in":
+an attacker who guesses the path can DoS the rate limiter by burning the
+60/min budget.
+**Fix**: Refuse to even create a `webhook` provider with `auth_mode="none"`
+in production unless a separate environment guard
+`NOTIFY_BRIDGE_ALLOW_UNAUTHENTICATED_WEBHOOKS` is set; AND drop the rate
+limit to 10/min for `auth_mode="none"` providers.
+
+#### A13. `_extract_retry_after` returns int but Telegram `retry_after` is fractional
+
+**Location**: `packages/core/src/notify_bridge_core/notifications/telegram/client.py:59-78`.
+**Scenario**: Modern Telegram sometimes returns `retry_after` as a float
+(e.g. `1.5`). The current code does `int(group(1))` and `isinstance(ra,
+(int, float))`. Regex `\d+` only matches integers. So a `1.5s` retry-after
+becomes "no retry-after found" → fallback 1s sleep → retry too early → second
+429 → eventually the bounded retry budget runs out.
+**Fix**: Loosen the regex to `\d+(?:\.\d+)?` and `float(m.group(1))`,
+preserve fractional via `await asyncio.sleep(retry_after + 1)` with float.
+
+#### A14. APScheduler date-job collision when two windows end at the exact same second
+
+**Location**: `packages/server/src/notify_bridge_server/services/scheduler.py:1127-1132`
+(`_drain_job_id_for`). The job id is keyed on `YYYYMMDDHHMMSS`. Comment in
+code acknowledges "two trackers... seconds different ... would collide", but
+two windows ending at the exact same second still collide on a single job id
+— `replace_existing=True` silently drops the second.
+**Scenario**: 30 users with quiet_hours_end=`07:00`. All 30 windows end at
+the same wall-clock second. Only one drain job is scheduled. That single
+job fires `drain_deferred_due()` which scans all rows globally so all 30
+get drained — actually fine. **But** if the global drain function ever
+filters by user/tracker (a likely near-term change for multi-tenant), the
+collision becomes silent data loss.
+**Fix**: Either keep the global drain (and document the assumption) or
+add a tracker_id segment to the job_id and let APScheduler dedup naturally.
+
+#### A15. `_handle_webhook_conflict` reclaim races against a parallel admin action
+
+**Location**: `packages/server/src/notify_bridge_server/services/telegram_poller.py:163-218`.
+**Scenario**: Admin clicks "Switch to webhook mode" in the UI, which sets
+`update_mode=webhook` and calls `set_webhook(...)`. Concurrently, the next
+poll tick for the same bot hits the conflict, calls `delete_webhook` → the
+admin's webhook is wiped 1s after they set it. The poll tick checks
+`bot.update_mode != "polling"` *before* the conflict reclaim, but the
+reload is best-effort and the conflict reclaim path runs unconditionally
+once entered.
+**Fix**: Re-check `bot.update_mode == "polling"` inside
+`_handle_webhook_conflict` before calling `delete_webhook`; or take an
+advisory lock on the bot row for the duration of the mode flip.
+
+#### A16. Discord 2000-char split breaks on Unicode codepoint boundaries
+
+**Location**: `packages/core/src/notify_bridge_core/notifications/discord/client.py:60-80`
+(`_split_message`).
+**Scenario**: A template renders to 2050 chars with emoji at position
+1998-1999 (each emoji is 2 surrogates / multi-byte UTF-8). The split uses
+`text.rfind("\n", 0, limit)` and falls back to character index `limit`,
+which is a Python str index → that part is OK in CPython 3, but if the
+content contains a grapheme cluster (emoji + zero-width-joiner + skin tone),
+slicing at `limit` mid-cluster renders as the broken emoji "□" in Discord.
+**Fix**: Use a grapheme-cluster boundary library (e.g. `regex` module with
+`\X`) or at minimum back off to the previous whitespace if `limit` is
+inside a likely cluster.
+
+### MEDIUM
+
+#### A17. Per-target failure counter does not distinguish receivers within a target
+
+**Location**: `packages/server/src/notify_bridge_server/services/event_dispatch.py:311-333`.
+**Scenario**: A target has 10 receivers. 1 chat is blocked, 9 work. Today
+`maybe_emit_target_failure` is called for the target — but the success
+counter (`record_target_success`) is also called for the same target on the
+other 9. Net counter behavior depends on call order. With the
+default-threshold 5, this oscillates.
+**Fix**: Track success/failure per receiver, not per target; or only call
+`maybe_emit_target_failure` when `all` receivers failed for the target.
+
+#### A18. `_cleanup_old_events` does not delete cancelled `DeferredDispatch` rows
+
+**Location**: `packages/server/src/notify_bridge_server/services/scheduler.py:332-364`.
+**Scenario**: The daily cleanup deletes `EventLog`, `WebhookPayloadLog`,
+`ActionExecution`. Cancelled / fired / dropped `DeferredDispatch` rows live
+forever in the DB. Active install with chatty providers accumulates millions
+of rows; eventually the `_load_pending_drain_jobs` query, `_trim_queue_if_needed`,
+and the catch-up scan all degrade.
+**Fix**: Add `delete(DeferredDispatch).where(status.in_(["fired", "dropped",
+"cancelled"]), fired_at < cutoff)` to the cleanup.
+
+#### A19. `random.shuffle(shuffled)` in `_sort_assets` uses non-deterministic seed
+
+**Location**: `packages/server/src/notify_bridge_server/services/dispatch_helpers.py:317-320`.
+**Scenario**: Two identical events arriving in close succession (deferred-
+dispatch merge, then drain re-renders) shuffle into different orders. With
+the deferred-dispatch coalescing logic, this produces a visual "they're not
+the same album" surprise in the chat history.
+**Fix**: Seed `random` with a stable per-event hash
+(`hash(event.event_type.value + event.collection_id + event.timestamp.isoformat())`).
+
+#### A20. `_poll_tracker` swallows exception, drops it at `_LOGGER.error` not `exception`
+
+**Location**: `packages/server/src/notify_bridge_server/services/scheduler.py:657-666`.
+**Scenario**: An exception in `check_tracker` is logged as `_LOGGER.error("Error
+polling tracker %d: %s", tracker_id, e)` — no traceback. Production debugging
+of "why is tracker 42 silently broken since yesterday" requires the stack.
+**Fix**: Change to `_LOGGER.exception("Error polling tracker %d", tracker_id)`.
+
+#### A21. Long bot commands → `/help` reply > 4096 chars truncates without warning
+
+**Location**: `packages/server/src/notify_bridge_server/commands/handler.py:521-532`,
+combined with `send_reply` → `send_telegram_message` → `_truncate` to 4096.
+**Scenario**: A user with 20 enabled commands runs `/help`. Each command +
+description (RU) crosses 250 chars → 5000 chars total → truncated mid-command.
+The user sees a half-list that suggests we forgot half the commands.
+**Fix**: Split `/help` over multiple messages by command category (provider).
+
+#### A22. `parse_command` truncates to 512 chars — long search queries lost
+
+**Location**: `packages/server/src/notify_bridge_server/commands/parser.py:15`.
+**Scenario**: `/search a very long query containing emoji 🎉 and more text that
+the user really meant to send because they pasted a long string from somewhere…`
+gets clipped to 512 chars silently. The trailing count parser then operates
+on the truncated text, possibly extracting a count from mid-query.
+**Fix**: Either reject `>512` with `parse_command` returning a sentinel
+"too_long" tuple, or just stop truncating — the Telegram limit is already
+4096 and we already truncate the response side.
+
+#### A23. Periodic catch-up scan can dispatch a stale event payload
+
+**Location**: `packages/server/src/notify_bridge_server/services/deferred_dispatch.py:628`
+(`_process_row`).
+**Scenario**: An `assets_added` event is deferred at 22:00. At 06:00 the
+quiet window ends, drain re-fetches `link_data`. The assets in `event_payload`
+include URLs and asset metadata. But the user has since deleted those photos
+from Immich. The dispatcher tries to download → 404. Notification shows
+"5 photos added to Album X" but the actual media fails to attach.
+**Fix**: For `assets_added`, re-validate asset existence against the
+provider before dispatch (one batched `getAssets` call). Drop missing IDs
+from the event, mark with "delivered_after_quiet_hours" + extra hint
+`"missing_count": N` in details. For deferred windows >12h this is the
+right behavior; for shorter windows the lookup is wasted work, so gate on
+`(now - deferred_at).hours >= 6`.
+
+#### A24. Watcher / scheduler restart can lose adaptive polling state
+
+**Location**: `packages/server/src/notify_bridge_server/services/scheduler.py:67-88`
+(`_adaptive_state: dict`).
+**Scenario**: Module-level dict resets on restart. A tracker that had ramped
+up to 1-in-4 ticks goes back to every-tick polling. Over a fleet of 50
+trackers in steady-state idle, this triggers a thundering herd of every-tick
+polls right after deploy. Combined with no DB-level rate limiting on the
+upstream Immich/Gitea API, it can rate-limit the operator out of their own
+services for ~5min.
+**Fix**: Either persist the adaptive state in `notification_tracker_state`
+(cheap on shutdown via `atexit`) or stagger the initial ticks via
+APScheduler's `next_run_time` instead of relying on the existing jitter.
+
+#### A25. `defer_event` `return "cancelled"` logic is incorrect in some merge paths
+
+**Location**: `packages/server/src/notify_bridge_server/services/deferred_dispatch.py:444`.
+**Scenario**: The `cancelled` return branch checks `upd_added is None or
+upd_added.status == "cancelled"` AND same for `upd_removed`. But if both
+`upd_added` and `upd_removed` are `None` (i.e. there were no pending rows
+to begin with), `fully_cancelled` is `False` → returns "merged". That's
+fine. But the more subtle issue: an "insert" action with one of the rows
+being cancelled returns "merged" — should be "inserted". The dashboard
+"merged" status confuses the operator looking at why no defer row exists.
+**Fix**: Rewrite as a clearer state machine: distinguish "inserted",
+"merged_into_existing", "fully_cancelled".
+
+#### A26. `_fetch_bytes` and `_safe_get` honor only 3 redirects with no Retry-After awareness
+
+**Location**: `packages/core/src/notify_bridge_core/notifications/telegram/client.py:217-268`.
+**Scenario**: Immich behind a CDN can chain `302 → 302 → 200`. With 4 hops
+it falls through to "Too many redirects". A user complains "old photos
+suddenly missing in notifications".
+**Fix**: Bump to 5 redirects and surface the chain in the error string for
+easier debugging.
+
+#### A27. No structured event log filter UI for "show me all drops in the last hour"
+
+**Location**: `packages/server/src/notify_bridge_server/api/status.py` —
+`event_log` rows have `details.dispatch_status` field but no API filter
+exposes it. The frontend can fetch only via global filter on `event_type`.
+**Scenario**: An operator sees "messages are missing today". They want to
+filter event_log to `dispatch_status in (dropped_quiet_hours_nondeferrable,
+deferred_then_dropped, deferred_then_failed)`. Today they can't.
+**Fix**: Add `dispatch_status` and `dispatched=true|false` as first-class
+event_log columns (denormalized from `details`), plus API + UI filter.
+
+#### A28. `_render_cmd_template` falls back to `"[No template: X]"` user-visible text
+
+**Location**: `packages/server/src/notify_bridge_server/commands/handler.py:111-115`.
+**Scenario**: An operator removes a template slot by mistake. The next user
+who runs `/random` sees `[No template: response_random]` in chat. Not just
+ugly — it leaks internal slot names.
+**Fix**: Show a friendly "Sorry, something went wrong on our side" + log at
+error level. Better: refuse to disable the slot if it's referenced.
+
+### LOW
+
+#### A29. `_truncate`'s ellipsis can land inside a multi-byte char
+
+The marker `"…"` is one Unicode codepoint (3 bytes UTF-8) but the truncate
+counts characters, not bytes. Telegram counts UTF-16 code units, so for a
+4090-char message ending in emoji, the calculation is off by a small constant.
+Won't break sends but messages may end up slightly longer than `TELEGRAM_MAX_TEXT_LENGTH`
+allows. Re-measure in UTF-16 code units (`len(s.encode('utf-16-le')) // 2`).
+
+#### A30. `NotificationDispatcher._render_cache` set to fresh dict on every dispatch — comment says "reuse"
+
+The instance attribute `self._render_cache` is reset to `{}` at the start
+of every `_send_to_target` (line 245). The cache only helps across receivers
+within one target, not across targets. The comment at line 111-115 implies
+broader reuse. Either align comment with reality or actually share across
+targets within one `dispatch()` call.
+
+#### A31. Frontend `entity-cache.svelte.ts` doesn't propagate stale-cache errors
+
+The shared `$state`-based caches return stale data silently if the underlying
+fetch fails after a successful initial load. A user sees old target list
+during an outage and is confused why edits aren't sticking.
+
+---
+
+## Part B — Missing functionality and "cool feature" gaps
+
+Tier legend: **must-have** = blocks prod for any non-trivial install;
+**nice-to-have** = clear value, ship in next minor; **aspirational** = ship
+when v1.0+ slows down.
+Effort: **S** ≈ 1-2 days; **M** ≈ 1 week; **L** ≈ 2+ weeks.
+
+### Already in the backlog (post-v0.8.1 status check)
+
+#### B1. Target-level quiet hours (per-target DND, multi-window, days-of-week, silent mode)
+
+**Status**: Still missing in v0.8.1. The backlog item proposed a v1 cut
+(target-level windows + `silent` mode for Telegram = `disable_notification=True`).
+None of the proposed code paths exist:
+- `notification_target.quiet_hours_json` column — not present.
+- `disable_notification=True` plumbing through `TelegramClient.send_message`
+  — not present.
+- Days-of-week filter — not present.
+
+**Pitch**: Quiet hours bind to the *watcher* (tracking config); users want
+DND at the *destination*. "Don't ping my phone at night, regardless of
+which provider".
+**Who benefits**: Every user. Today they have to recreate per-link windows.
+**Effort**: **M** (1 week — backend dispatcher gate + frontend Aurora-style fieldset).
+**Tier**: **must-have for prod**.
+
+#### B2. Immich Smart Actions expansion (auto-favorite by person, auto-archive, share-link rotation)
+
+**Status**: Auto-Organize exists; no other action descriptors are shipped.
+**Pitch**: Reuse the existing action descriptor pipeline. Auto-favorite-by-person
+is the smallest cut.
+**Effort**: **M** per action (a few days each).
+**Tier**: nice-to-have.
+
+#### B3. Block-based template builder
+
+**Status**: Not started. `JinjaEditor` is unchanged.
+**Effort**: **L** — frontend-only but big.
+**Tier**: aspirational.
+
+### Newly identified — must-have for prod
+
+#### B4. Webhook delivery dedup table + "Test Delivery" replay
+
+**Pitch**: Add the dedup table from A1, plus a `/api/webhooks/{provider_id}/replay/{delivery_id}`
+endpoint that admin can hit to re-dispatch a stored payload without the upstream
+provider needing to resend. Combined with the existing `WebhookPayloadLog`,
+this is "click to retest" in the UI.
+**Who benefits**: Every webhook provider. Replay is invaluable for debugging
+template edits.
+**Effort**: **M**.
+**Tier**: **must-have for prod**.
+
+#### B5. "Send test message" / template playground
+
+**Pitch**: From the template editor, click "Try this template against the
+last received event" → render preview, optionally send to a sandbox chat.
+Bypass dispatch but exercise the full Jinja pipeline.
+**Who benefits**: Every template edit today is a leap of faith — the operator
+modifies the template, waits for the next real event, hopes nothing breaks.
+**Effort**: **S-M**. The preview infrastructure already exists
+(`services/sample_context.py`); add a "send to chat X" button.
+**Tier**: **must-have for prod**.
+
+#### B6. Template versioning + rollback
+
+**Pitch**: Auto-snapshot each template on save (last 10 revisions). UI shows
+diff between version N and N-1, "Restore" button. Same for command templates.
+**Who benefits**: An operator who tweaks a template at midnight and goofs
+the syntax needs an undo button.
+**Effort**: **M**. New `template_revision` table; new endpoints; UI button.
+**Tier**: **must-have for prod**.
+
+#### B7. Bulk operations on trackers / targets / links
+
+**Pitch**: Multi-select in lists → "disable selected", "delete selected",
+"export selected templates as JSON bundle", "move to user X".
+**Who benefits**: Operators with >10 trackers. A common pain point: deploying
+the bridge for a new family member requires N clicks per tracker.
+**Effort**: **M** (frontend-heavy).
+**Tier**: **must-have for prod**.
+
+#### B8. Bot blocked / chat-not-found auto-disable + dashboard
+
+**Pitch**: Detect Telegram 403 / 400 chat-related errors. Mark the receiver
+or `TelegramChat` as `disabled_by_remote`. Surface in a "Stale receivers"
+admin view with a "Try resending invite" / "Delete chat" button.
+**Who benefits**: Every Telegram user. Today the bridge silently sprays
+errors until a human looks.
+**Effort**: **S**.
+**Tier**: **must-have for prod**.
+
+#### B9. Forum-thread (topic) routing for Telegram
+
+**Pitch**: Per-receiver `message_thread_id` field, auto-detected from incoming
+command messages. UI: when adding a chat that's a forum, show a topic
+selector populated via `getForumTopicIconStickers` + `getChat`'s `is_forum`.
+**Who benefits**: Any group install where the user wants notifications in a
+dedicated topic.
+**Effort**: **M**.
+**Tier**: **must-have for prod**.
+
+#### B10. Telegram inline buttons + callback queries
+
+**Pitch**: Templates can declare `{% buttons %}` with action descriptors.
+Bridge listens for `callback_query` updates, dispatches to a registered
+action (e.g. "Mark album as favorite", "Snooze this tracker for 1h", "Run
+HA service light.turn_off").
+**Who benefits**: Power users. Foundation for several other features
+(Immich duplicate-cluster review, HA action button → service call, snooze).
+**Effort**: **L**.
+**Tier**: nice-to-have but unlocks the next 3 items.
+
+#### B11. User snooze / mute via bot command
+
+**Pitch**: `/snooze 1h` mutes the bot's outbound chat for 1h.
+`/mute provider gitea` mutes a whole provider for that chat. `/wake` undoes.
+Implemented as a per-receiver `snoozed_until` column.
+**Effort**: **S-M**.
+**Tier**: **must-have for prod** (user-side relief valve).
+
+### Newly identified — nice-to-have
+
+#### B12. Per-target / per-user rate limit (send-side)
+
+**Pitch**: Cap outbound messages per minute per receiver. Existing 429
+backoff handles Telegram's limit, but a runaway template / event-storm
+provider can still spray the user's phone with 200 messages.
+**Effort**: **S**. Token bucket per chat_id in `_send_telegram`.
+**Tier**: nice-to-have.
+
+#### B13. Message dedup window (idempotency key per outbound message)
+
+**Pitch**: SHA256 of `(target_id, receiver_id, rendered_message,
+event_collection_id)`. If the same key was sent in the last 5min, skip.
+**Effort**: **S**.
+**Tier**: nice-to-have (lots of overlap with A1+A2 but addresses the
+end-of-pipeline dedup, after all coalescing).
+
+#### B14. Weekly digest / per-target stats / per-provider error rate
+
+**Pitch**: Cron-based weekly summary email/Telegram. "Top 5 noisy trackers",
+"Receivers with >X% failure rate", "Top 5 days of the week with the most
+activity". Operator preventive maintenance.
+**Effort**: **M**.
+**Tier**: nice-to-have.
+
+#### B15. Mobile-friendly minimal mode for the SPA
+
+**Pitch**: The Aurora redesign is a lot for mobile. A "manage from phone"
+minimal layout — list of trackers, click to toggle, click to mute. Stops
+operators from needing a desktop to silence a chatty tracker at 1am.
+**Effort**: **M**.
+**Tier**: nice-to-have.
+
+#### B16. Audit log of admin actions
+
+**Pitch**: New `audit_log` table. Every create/update/delete on
+`NotificationTracker`, `NotificationTarget`, `TemplateConfig`, `ServiceProvider`,
+`TelegramBot`, `User`, etc. writes a row with `(user_id, action,
+entity_type, entity_id, before_json, after_json, ip, ua)`. Admin UI tab.
+**Effort**: **M**. SQLAlchemy event listeners on the affected models.
+**Tier**: nice-to-have for multi-admin installs; must-have if any
+compliance requirement.
+
+#### B17. Health → not just /ready, but per-component status page
+
+**Pitch**: `/api/health/components` returns `{providers: [{id, last_ok_at,
+last_error}], targets: [{id, last_ok_at, last_error}], scheduler:
+{job_count, next_fires}}`. Frontend "Status" tab.
+**Effort**: **S-M**. The data is already in `EventLog` / scheduler API.
+**Tier**: nice-to-have.
+
+#### B18. Provider unreachable backoff + escalation
+
+**Pitch**: Today `bridge_self` emits `bridge_self_poll_failures` after N
+consecutive fails. Add (a) exponential backoff on the polling interval after
+M failures so we don't hammer a down host, and (b) recovery notification
+when the provider comes back.
+**Effort**: **S**.
+**Tier**: nice-to-have.
+
+#### B19. RSS provider
+
+**Pitch**: Generic RSS/Atom feed poller. One more provider, reuses event_dispatch.
+Long-tail value (operator wants "notify me when a blog publishes").
+**Effort**: **M**.
+**Tier**: nice-to-have.
+
+#### B20. Mobile push / FCM channel
+
+**Pitch**: A dedicated FCM "Receiver" type so the user can ship their own
+companion app. Today Telegram is the only realtime channel; email is too
+slow; webhook out is for plumbing.
+**Effort**: **L**.
+**Tier**: aspirational.
+
+### Newly identified — aspirational
+
+#### B21. Conversation threading per source (one notification thread per album / repo)
+
+**Pitch**: Use Telegram `reply_parameters` to chain all notifications about
+"Album X" as a single thread that grows over time. Today every notification
+is a top-level message. Threading turns the chat into a navigable history.
+**Effort**: **M**. Store `last_message_id` per `(target_id, collection_id)`,
+pass as `reply_to_message_id`.
+**Tier**: aspirational but a clear differentiator.
+
+#### B22. A/B test variants for templates
+
+**Pitch**: A template config can carry 2 variants. The dispatcher
+hash-routes receivers to A or B; the dashboard shows "variant A's response
+time / click rate / receiver mute rate".
+**Effort**: **L**.
+**Tier**: aspirational.
+
+#### B23. Dark-launch a new template before enabling it
+
+**Pitch**: "Send-to-sandbox-chat-only" toggle on a template config. The new
+template renders against real events but only goes to one operator's chat
+for 1 week. Then promote to production.
+**Effort**: **M**. Builds on template versioning (B6).
+**Tier**: aspirational.
+
+#### B24. Scheduled template changes
+
+**Pitch**: "On 2026-12-25 at 09:00, switch template_config X to draft Y".
+Useful for holiday-themed greetings or batch migrations.
+**Effort**: **M**.
+**Tier**: aspirational.
+
+#### B25. HA service-call from a Telegram inline button
+
+**Pitch**: Building on B10. A template renders `{% button hass:light.turn_off
+target=living_room %}`. User clicks → bridge calls HA `light.turn_off`.
+**Effort**: **M** (after B10).
+**Tier**: aspirational.
+
+---
+
+## Ship-blocker checklist (do not widen user audience without)
+
+Order is rough priority (top first). Most are also called out in Part A.
+
+1. **A1** — Webhook idempotency table (Gitea/Planka/generic). Without this,
+   one upstream retry storm can double-/quadruple-spray every user.
+2. **A2** — Deferred-dispatch crash window. A redeploy mid-drain duplicates
+   every queued notification. Implement either the `dispatch_id`
+   pre-commit OR the `in_flight` state machine.
+3. **A3** — Persist Telegram update offset. Same root cause class as A1/A2;
+   matters less if A1+A2 are fixed but should land together.
+4. **A4 / B8** — Bot blocked / chat-not-found auto-disable. A user blocking
+   the bot must not generate infinite errors.
+5. **A11** — Webhook JSON depth/node cap (mirror the backup guard).
+6. **A9** — Quiet-hours `start == end` confirmation; either accept "always
+   quiet" semantics or reject in the API validator.
+7. **A8** — DST handling in quiet-hours overnight window. Verify with
+   tests that include known transition timestamps.
+8. **B5** — "Send test message" / template playground. Without this, every
+   template edit is a flying blind change against a live system.
+9. **B6** — Template versioning + rollback. Pair with B5.
+10. **A5 / B9** — Forum-thread (topic) routing. Any non-trivial Telegram
+    group install needs this.
+11. **B11** — User snooze / mute via bot command. Relief valve when the
+    bridge gets too chatty.
+12. **B7** — Bulk operations on trackers / targets / links. Operability
+    floor for any install with >10 trackers.
+
+Everything else in Part B is upside, not a blocker.
+
@@ -0,0 +1,682 @@
+# Frontend Production-Readiness Review
+
+Scope: `frontend/src/**` (~26k lines, Svelte 5 runes + SvelteKit). `npm run check`
+passes with exit code 0. The codebase is in good shape overall - i18n EN/RU keys
+are 1:1 in sync (1466 each), Modal/Snackbar overlays follow the `position:fixed`
+ `z-index:9999` convention, no `eval`, no `innerHTML`, no string-interpolated
+`setTimeout`, and the sanitizer (`lib/sanitize.ts`) is a sound DOMParser-based
+allowlist. The issues below are real production risks layered on top of an
+otherwise clean architecture.
+
+## Executive Summary
+
+- **Auth tokens live in `localStorage`** (`lib/api.ts`). Any XSS that bypasses
+  the (good) `sanitizePreview` allowlist - or sneaks past it via a future code
+  path - exfiltrates both access and refresh tokens. There is no httpOnly-cookie
+  alternative, no token rotation on refresh failure, and `redirectToLogin` only
+  fires once per session (a leaked refresh token can outlive that flag).
+- **One real provider-hardcoding violation** (`routes/actions/RuleEditor.svelte`)
+  breaks the "descriptors only" rule in CLAUDE.md item 8 and silently disables
+  the people/album picker for any non-Immich provider - every other page is
+  clean.
+- **Caches duplicated into local `$state`** on `notification-trackers`,
+  `command-trackers`, and `command-template-configs` pages - the cache is
+  populated but the page never re-reads it, so cross-page mutations (search
+  palette pre-warming) won't update the list and cache `invalidate()` becomes
+  useless. Convention #4 says "always use cache".
+- **Three CRUD pages refetch all entities after every mutation** (full
+  `await load()` after upsert/delete) instead of using `cache.upsert()`/
+  `remove()` - defeats the optimistic-cache design and produces visible flicker
+  on slow connections.
+- **Floating async work + N+1 patterns**: `providers/+page.svelte` fires N
+  parallel health checks without an AbortController (state writes continue
+  after navigation); `bots/TelegramBotTab.svelte` does a sequential
+  `for (const trk of trackers) { await api('/listeners') }` loop.
+- **`backup/+page.svelte` post-restart health poll** keeps recursing for up to
+  120s with no unmount guard - if the user navigates away mid-restart, the
+  recursive `setTimeout` chain keeps calling `fetch('/api/health')` until it
+  reloads the page out from under whatever route they're on.
+- **`api()` 30s timeout is per-request, hard-coded, with no observability** -
+  long-running provider operations (Immich bulk fetch, full backup export) hit
+  it silently and surface as `AbortError` with no telemetry.
+
+---
+
+## CRITICAL
+
+### C1. JWT tokens stored in `localStorage` - XSS-exfiltratable
+
+[lib/api.ts:78-91](frontend/src/lib/api.ts#L78-L91)
+
+```ts
+function getToken(): string | null {
+    return localStorage.getItem('access_token');
+}
+export function setTokens(access: string, refresh: string) {
+    localStorage.setItem('access_token', access);
+    localStorage.setItem('refresh_token', refresh);
+}
+```
+
+Both the short-lived access token and the long-lived refresh token sit in
+`localStorage`. Any successful XSS - including a future template-preview path
+that escapes `sanitizePreview`, a vulnerable third-party CodeMirror extension,
+or a Telegram bot username that ends up unescaped somewhere - reads both with a
+single `localStorage.getItem` call.
+
+**Fix:** Move to httpOnly + Secure + SameSite=Strict cookies set by the backend.
+If a cookie-based session is infeasible for the deployment model, at minimum
+move the refresh token to an httpOnly cookie and keep only the short-lived
+access token in memory (a module-level `let accessToken` is XSS-readable but
+not persistent across reloads, which limits the exfiltration window).
+
+### C2. Provider type hardcoded in `RuleEditor.svelte` (convention violation)
+
+[routes/actions/RuleEditor.svelte:55-67](frontend/src/routes/actions/RuleEditor.svelte#L55-L67)
+
+```ts
+async function loadProviderData() {
+    if (actionType !== 'auto_organize') return;
+    const provider = providersCache.items.find((p: any) => p.id === providerId);
+    if (!provider || provider.type !== 'immich') return;
+    ...
+```
+
+CLAUDE.md item 8 explicitly forbids `if (type === 'immich')` in components -
+this is the canonical example. As written, adding a second provider with
+auto-organize support (Google Photos, future SmugMug, etc.) is a silent no-op:
+the form renders with empty people/album lists and gives no error.
+
+**Fix:** Add an `actionTypes` / `peopleFilter` capability flag to
+`ProviderDescriptor`, or add a `supportsAutoOrganize: boolean` discriminator,
+then check `getDescriptor(provider.type)?.supportsAutoOrganize` instead of the
+literal string.
+
+---
+
+## HIGH
+
+### H1. Caches imported but copied into local `$state` - invalidation no-op
+
+[routes/notification-trackers/+page.svelte:33](frontend/src/routes/notification-trackers/+page.svelte#L33)
+[routes/command-trackers/+page.svelte:27](frontend/src/routes/command-trackers/+page.svelte#L27)
+[routes/command-template-configs/+page.svelte:51](frontend/src/routes/command-template-configs/+page.svelte#L51)
+
+```ts
+// notification-trackers - line 33
+let allNotificationTrackers = $state<Tracker[]>([]);
+// ...
+[allNotificationTrackers] = await Promise.all([
+    api<Tracker[]>('/notification-trackers'),
+    ...
+]);
+```
+
+The cache modules expose `notificationTrackersCache`, `commandTrackersCache`,
+and `commandTemplateConfigsCache` - populated by `+layout.svelte` on mount and
+by the search palette - but these three pages don't read from them. They each
+issue their own `api(...)` call and store the result locally. Side effects:
+
+1. The cache shows stale data on every other page that reads it (dashboard nav
+   counts, search palette).
+2. `commandTemplateConfigsCache.fetch(true)` is called on `command-template-configs`
+   `load()` but the result is then re-assigned from the function return value
+   into `allCmdTplConfigs` - the cache itself is updated, but the page has no
+   reactive link to it.
+3. `cache.upsert()` / `cache.remove()` after mutations would short-circuit a
+   full refetch - but with the local-state copy, every save triggers a full
+   `await load()` (see H2).
+
+**Fix:** Replace `let allX = $state([])` with `let allX = $derived(cache.items)`
+(see how `targets/+page.svelte:147` does it correctly) and remove the parallel
+`api()` call.
+
+### H2. Full refetch after every mutation - cache.upsert/remove not used
+
+[routes/providers/+page.svelte:238-250](frontend/src/routes/providers/+page.svelte#L238-L250)
+[routes/actions/+page.svelte:139](frontend/src/routes/actions/+page.svelte#L139)
+[routes/notification-trackers/+page.svelte:291](frontend/src/routes/notification-trackers/+page.svelte#L291)
+[routes/targets/+page.svelte:476](frontend/src/routes/targets/+page.svelte#L476)
+
+Every save/delete/toggle on these pages calls `cache.invalidate(); await load()`,
+which re-fetches the entire list from the server. The cache exposes
+`upsert(entity)` and `remove(id)` for exactly this case - the server already
+returned the new entity (or 204), so the round-trip is wasted bandwidth and
+produces a visible "list redraws" flash on slow links.
+
+**Fix:** On POST/PUT response, `cache.upsert(savedEntity)`. On DELETE,
+`cache.remove(id)`. Reserve `invalidate()` + `fetch()` for cases where the
+mutation may have changed *other* entities (e.g. broadcast target updates
+affect children).
+
+### H3. Provider health checks fire-and-forget - leak past navigation
+
+[routes/providers/+page.svelte:175-181](frontend/src/routes/providers/+page.svelte#L175-L181)
+
+```ts
+for (const p of allProviders) {
+    health = { ...health, [p.id]: null };
+    api(`/providers/${p.id}/test`, { method: 'POST' })
+        .then((r: any) => { health = { ...health, [p.id]: r.ok }; })
+        .catch(() => { health = { ...health, [p.id]: false }; });
+}
+```
+
+No `AbortController`, no unmount guard. If the user navigates away while N
+slow Immich/Gitea probes are inflight, every probe still resolves and tries to
+write to the (now-detached) `health` `$state`. With Svelte 5 runes this won't
+crash, but it does waste backend connections (Immich health checks call the
+real API) and may trigger duplicate probes on quick back/forward navigation.
+
+**Fix:** Pass `{ signal: controller.signal }` to `api()` (already supported -
+see `lib/api.ts:150`), abort in `onDestroy`. Or use `cache.probeAll()` driven
+from a single store so revisiting the page reuses the previous result.
+
+### H4. Sequential awaits for independent fetches - N+1 in TelegramBotTab
+
+[routes/bots/TelegramBotTab.svelte:215-223](frontend/src/routes/bots/TelegramBotTab.svelte#L215-L223)
+
+```ts
+const trackers = await api<CommandTrackerSummary[]>('/command-trackers');
+const matched: CommandTrackerSummary[] = [];
+for (const trk of trackers) {
+    try {
+        const listeners = await api<ListenerEntry[]>(`/command-trackers/${trk.id}/listeners`);
+        const hasBot = listeners.some(...);
+        if (hasBot) matched.push(trk);
+    } catch (e) { console.warn(...); }
+}
+```
+
+For a deployment with 20 command trackers, opening the listener section on a
+bot triggers 20 serial `GET /command-trackers/{id}/listeners` requests -
+visibly slow over a high-latency link.
+
+**Fix:** Either expose a single backend endpoint
+(`GET /command-trackers/listeners?bot_id=X`) or run the loop through
+`Promise.all(trackers.map(trk => api(...).catch(() => null)))` and filter
+afterwards.
+
+### H5. Post-restart health poll keeps running after unmount
+
+[routes/settings/backup/+page.svelte:117-139](frontend/src/routes/settings/backup/+page.svelte#L117-L139)
+
+```ts
+async function applyAndRestart(): Promise<void> {
+    await api('/backup/apply-restart', { method: 'POST' });
+    restartingOverlay = true;
+    const startedAt = Date.now();
+    let attempts = 0;
+    const poll = async (): Promise<void> => {
+        attempts += 1;
+        try {
+            const res = await fetch('/api/health');
+            if (res.ok && Date.now() - startedAt > 2000) {
+                window.location.reload();
+                return;
+            }
+        } catch { /* still down */ }
+        if (attempts < 120) setTimeout(poll, 1000);
+    };
+    setTimeout(poll, 1500);
+}
+```
+
+The recursive `setTimeout(poll, 1000)` chain has no cancellation. If the user
+navigates to another route between `apply-restart` and the next health probe,
+the chain keeps firing for up to 120s and eventually calls
+`window.location.reload()` from a route the user has since moved away from.
+Side effects:
+
+1. Unauthenticated `fetch('/api/health')` calls keep going while the user is
+   on `/login`.
+2. A user who hit "restart later" on a different tab will still get reloaded
+   from the original tab's poll.
+
+**Fix:** Capture `controller = new AbortController()` and pass to `fetch`,
+`onDestroy(() => controller.abort())`. Also store the timeout handle and
+`clearTimeout` it on destroy.
+
+### H6. Token refresh races with logout in a sneaky edge
+
+[lib/api.ts:97-127](frontend/src/lib/api.ts#L97-L127)
+
+The dedupe via `refreshPromise` is correct *for the refresh itself*, but the
+outer `api()` reads `getToken()` before awaiting `refreshAccessToken()`. Three
+concurrent requests that all 401 will all queue on the same refresh promise,
+then *all* retry - fine. But if the refresh succeeds and an unrelated
+`clearTokens()` (from `logout()`) fires between the refresh resolving and the
+retry running, the retry uses an empty `Authorization: Bearer ` header. The
+result is "ApiError: HTTP 401" surfaced via snackbar even though the redirect
+to `/login` already happened.
+
+**Fix:** Either re-check `isAuthenticated()` immediately before the retry, or
+make `clearTokens()` cancel an inflight `refreshPromise`.
+
+### H7. `AuthRedirectError` is thrown but not consistently caught
+
+[lib/api.ts:165-170](frontend/src/lib/api.ts#L165-L170)
+
+Most pages use the pattern `catch (err: unknown) { snackError(errMsg(err)); }` -
+which catches `AuthRedirectError` too and shows "Unauthorized - redirecting
+to login" in a snackbar that the user sees *as* the route changes. The error
+class exists specifically to be distinguished, but only one or two call sites
+actually check `instanceof AuthRedirectError` before showing a snackbar.
+
+**Fix:** Make `errMsg()` (or a new helper) return `null` for `AuthRedirectError`
+and have snackbar helpers ignore null messages. Or filter in the snackbar
+store.
+
+### H8. `api()` JSON-decode failure path swallowed silently
+
+[lib/api.ts:189](frontend/src/lib/api.ts#L189)
+
+```ts
+return res.json();
+```
+
+When the backend returns a `200 OK` with a non-JSON body (proxy error page,
+HTML 502 from a misconfigured reverse proxy in front), `res.json()` rejects
+with a `SyntaxError: Unexpected token < in JSON at position 0`. The page
+shows the raw parser message in a snackbar, which is confusing UX.
+
+**Fix:** Wrap `res.json()` in try/catch and throw a typed `ApiError("Backend
+returned non-JSON response", 502)` so the UI can show a clean message.
+
+### H9. Email/Matrix bot tabs strip secrets via `as any`
+
+[routes/bots/EmailBotTab.svelte:84](frontend/src/routes/bots/EmailBotTab.svelte#L84)
+[routes/bots/MatrixBotTab.svelte:79](frontend/src/routes/bots/MatrixBotTab.svelte#L79)
+
+```ts
+if (!body.smtp_password) delete (body as any).smtp_password;
+if (editingMatrix && !body.access_token) delete (body as any).access_token;
+```
+
+The `as any` bypass exists because the body type doesn't allow `delete` on a
+required field. The intent - "don't send a blank secret which would overwrite
+the stored one" - is correct, but the cast hides a real risk: if the field
+name ever changes (`smtp_password` -> `smtpPassword`), the `delete` is a no-op
+and the blank field is sent.
+
+**Fix:** Build `body` as `Partial<...>` from the start and only conditionally
+include the secret field.
+
+### H10. `template-configs` hardcodes a slot name
+
+[routes/template-configs/+page.svelte:228](frontend/src/routes/template-configs/+page.svelte#L228)
+
+```ts
+.map(s => ({ key: s.name, label: ..., rows: s.name === 'message_assets_added' ? 10 : 3, isDateFormat: false }))
+```
+
+Special-casing one Immich slot name inside a provider-agnostic component is
+the same pattern CLAUDE.md item 8 forbids for components, scoped to template
+configs. Other providers' "large" slots (Gitea PR descriptions, Planka card
+content) would render in 3-row editors that the author probably didn't intend.
+
+**Fix:** Add a `rows?: number` field to the backend slot definition and read
+it via `notification_slots[].rows`.
+
+---
+
+## MEDIUM
+
+### M1. Three placeholder strings hardcoded English in shared components
+
+[lib/components/EntitySelect.svelte:18](frontend/src/lib/components/EntitySelect.svelte#L18)
+[lib/components/IconGridSelect.svelte:16](frontend/src/lib/components/IconGridSelect.svelte#L16)
+[lib/components/MultiEntitySelect.svelte:16](frontend/src/lib/components/MultiEntitySelect.svelte#L16)
+
+```ts
+placeholder = 'Select...',
+```
+
+These defaults render `Select...` in RU locale when a caller doesn't pass an
+explicit placeholder. The convention (CLAUDE.md item 5) prescribes plain text
+selectors but says nothing about translation - these still need to flow through
+`t()`.
+
+**Fix:** Move the default into the template: `placeholder = $props().placeholder
+?? t('common.selectPlaceholder')`, with `common.selectPlaceholder` added to
+both locales.
+
+### M2. `EntitySelect.noneLabel` defaults to a decorative em-dash literal
+
+[lib/components/EntitySelect.svelte:20](frontend/src/lib/components/EntitySelect.svelte#L20)
+
+```
+noneLabel = (em-dash literal),
+```
+
+CLAUDE.md item 5 calls out decorative dashes specifically. `LinkedTargetsSection`
+already overrides this with `t('common.noneDefault')` (good), but other
+consumers that do not override get the bare em-dash. It also fails the
+localizable smell test.
+
+**Fix:** Default to `t('common.none')`.
+
+### M3. `lib/auth.svelte.ts` logout does a full page reload, losing UX continuity
+
+[lib/auth.svelte.ts:54-61](frontend/src/lib/auth.svelte.ts#L54-L61)
+
+```ts
+export function logout() {
+    clearTokens();
+    clearAllCaches();
+    user = null;
+    if (typeof window !== 'undefined') {
+        window.location.href = '/login';
+    }
+}
+```
+
+`window.location.href` triggers a hard reload - the SvelteKit router exists
+specifically to avoid this. Side effects: any inflight requests get cancelled
+without proper cleanup, the splash-loader flashes between the two pages, and
+the search-palette / overlays do not get a chance to close gracefully.
+
+**Fix:** `goto('/login', { invalidateAll: true, replaceState: true })`.
+
+### M4. `+layout.svelte` auto-expand `$effect` writes during read
+
+[routes/+layout.svelte:336-342](frontend/src/routes/+layout.svelte#L336-L342)
+
+The effect reads `expandedGroups` (via `expandedGroups[entry.key]`) and writes
+to `expandedGroups`. Svelte 5 dedupes the write back to the same set of keys,
+but the pattern is fragile - adding any side effect that re-derives from
+`expandedGroups` here would loop. It also persists to localStorage in
+`toggleGroup` but not from this effect - so auto-expansion stays in memory only.
+
+**Fix:** Compute the next state in a single pass and write once; either
+include the localStorage save, or move the auto-expand into the initial
+hydration block.
+
+### M5. `commandTemplateConfigsCache.fetch(true)` result discarded; cache populated but unused
+
+[routes/command-template-configs/+page.svelte:208](frontend/src/routes/command-template-configs/+page.svelte#L208)
+
+The `Promise.all` destructures `cfgs` from `commandTemplateConfigsCache.fetch(true)`
+but then writes `allCmdTplConfigs = cfgs` instead of $derived-reading the cache.
+The cache is updated (good) but this page never reads it (bad - see H1).
+
+**Fix:** Same fix as H1 - use `$derived(commandTemplateConfigsCache.items)`.
+
+### M6. Dashboard search debounce timeout not cleared on filter change
+
+[routes/+page.svelte:268-272](frontend/src/routes/+page.svelte#L268-L272)
+
+If the user changes the type/provider filter (`applyFilters` runs synchronously
+from the `$effect` at line 249) while a search debounce is pending, the pending
+timeout still fires 300ms later and triggers an identical request. Not a leak,
+just a wasted call.
+
+**Fix:** Clear `searchTimeout` from `applyFilters()` as well.
+
+### M7. Dashboard `Promise.all` destructure uses empty middle slot
+
+[routes/+page.svelte:283-287](frontend/src/routes/+page.svelte#L283-L287)
+
+```ts
+const [statusRes, , chartRes] = await Promise.all([
+    api<DashboardStatus>(`/status?limit=${eventsLimit}`),
+    providersCache.fetch(),
+    api<{ days: ... }>('/status/chart'),
+]);
+```
+
+The empty middle slot is brittle - anyone reordering for readability silently
+swaps `statusRes` and `chartRes`. Trivially avoided.
+
+**Fix:** Either await `providersCache.fetch()` separately (it caches anyway),
+or `const [statusRes, _providers, chartRes] = ...` with an explicit `_providers`
+local.
+
+### M8. `actions/+page.svelte` derives `actionTypes` from a function-in-derived
+
+[routes/actions/+page.svelte:78-81](frontend/src/routes/actions/+page.svelte#L78-L81)
+
+```ts
+let actionTypes = $derived((() => {
+    const caps = capabilitiesCache.items[selectedProviderType];
+    return caps?.action_types || [];
+})());
+```
+
+The IIFE is unnecessary; `$derived` already runs the expression on every
+dependency change. Reads as a refactor leftover.
+
+**Fix:** `let actionTypes = $derived(capabilitiesCache.items[selectedProviderType]?.action_types ?? []);`
+
+### M9. `RuleEditor.svelte` mutates rule object in `toggleRule` then sends to API
+
+[routes/actions/RuleEditor.svelte:105-108](frontend/src/routes/actions/RuleEditor.svelte#L105-L108)
+
+```ts
+async function toggleRule(rule: ActionRule) {
+    rule.enabled = !rule.enabled;
+    await updateRule(rule);
+}
+```
+
+Direct mutation of the prop violates the immutability rule (coding-style.md).
+If the API call fails, the local state is already flipped - the UI shows the
+new value even though the server still has the old one.
+
+**Fix:** `await updateRule({ ...rule, enabled: !rule.enabled })`. After
+successful response, `await loadRules()` (already happens) re-syncs.
+
+### M10. `+layout.svelte` filter functions use `as any[]` four times
+
+[routes/+layout.svelte:145-151](frontend/src/routes/+layout.svelte#L145-L151)
+
+```ts
+notification_trackers: filterById(notificationTrackersCache.items as any[]).length,
+```
+
+The cast exists because `filterById<T extends { provider_id?: number }>` is
+narrower than the cache item types. The proper fix is a single base interface
+`{ provider_id?: number }` on the relevant types so the cast goes away.
+
+### M11. `setLocale` does not update `<html lang>` attr
+
+[lib/i18n/index.svelte.ts:31-36](frontend/src/lib/i18n/index.svelte.ts#L31-L36)
+
+Screen readers and browser translation extensions rely on `<html lang="en">`.
+The app never sets it, so switching to RU leaves accessibility tooling thinking
+the page is still English.
+
+**Fix:** `document.documentElement.lang = locale` in `setLocale`.
+
+### M12. `Modal.svelte` focus restore does not verify element still in DOM
+
+[lib/components/Modal.svelte:43-45](frontend/src/lib/components/Modal.svelte#L43-L45)
+
+If the previously focused element has been removed from the DOM between modal
+open and close (common with optimistic UI updates that rerender the source
+button), `.focus()` is a silent no-op on a detached node. Focus ends up on
+`<body>` and the next Tab restarts from the top of the page.
+
+**Fix:** `if (... && document.contains(previouslyFocused)) previouslyFocused.focus()`,
+else focus a sensible fallback (the trigger that opened the page).
+
+### M13. TimezoneSelector ticks at 1s - wakes the event loop forever
+
+[lib/components/TimezoneSelector.svelte:33-37](frontend/src/lib/components/TimezoneSelector.svelte#L33-L37)
+
+```ts
+let tickHandle: ReturnType<typeof setInterval> | null = null;
+onMount(() => {
+    tickHandle = setInterval(() => { now = new Date(); }, 1000);
+});
+```
+
+A 1Hz tick is fine for visible UI; the issue is it keeps running even when
+the selector dropdown is closed (the time display is only visible when the
+dropdown is open). Battery impact is non-trivial on mobile for what is
+essentially a hidden component.
+
+**Fix:** Start/stop the interval based on `open` state, or use
+`requestAnimationFrame` driven by `IntersectionObserver`.
+
+### M14. Backup file download builds blob from JSON without size guard
+
+[routes/settings/backup/+page.svelte:269-281](frontend/src/routes/settings/backup/+page.svelte#L269-L281)
+
+```ts
+const data = await api(`/backup/files/${filename}`);
+const blob = new Blob([JSON.stringify(data, null, 2)], { type: 'application/json' });
+```
+
+For a deployment with hundreds of providers/trackers, the JSON serialization
+of the entire backup happens in-memory in a single string before the Blob
+constructor - wasted memory peak and a frozen tab on slow machines. Worse,
+`api()` parses the JSON and then `JSON.stringify` re-serializes it.
+
+**Fix:** Use `fetchAuth()` for the download path and pipe the response stream
+straight into a Blob (`new Blob([await res.arrayBuffer()])`).
+
+### M15. Modal focus-trap query selector includes disabled inputs
+
+[lib/components/Modal.svelte:62-67](frontend/src/lib/components/Modal.svelte#L62-L67)
+
+Re-querying the DOM on every Tab keystroke is OK but means disabled inputs
+(common in long forms with submit-in-progress) are included in the trap and
+focus can land on them. The selector should add `:not([disabled])`.
+
+### M16. i18n resolve uses any for the recursion accumulator
+
+[lib/i18n/index.svelte.ts:55-62](frontend/src/lib/i18n/index.svelte.ts#L55-L62)
+
+```ts
+function resolve(obj: any, path: string): string | undefined {
+```
+
+`obj: unknown` plus a runtime check would let TS narrow `current` properly and
+catch the case where someone accidentally passes a `string` (returns undefined
+silently today).
+
+### M17. Tracker name auto-set string concat - English-only
+
+[routes/notification-trackers/+page.svelte:82-84](frontend/src/routes/notification-trackers/+page.svelte#L82-L84)
+[routes/command-trackers/+page.svelte:69-71](frontend/src/routes/command-trackers/+page.svelte#L69-L71)
+
+```ts
+form.name = provider ? `${provider.name} Tracker` : 'Tracker';
+form.name = provider ? `${provider.name} Commands` : 'Commands';
+```
+
+Defaults the tracker name to "Provider Name Tracker" / "Provider Name Commands"
+- only English. Russian users get an English suffix on the auto-generated
+name. Inconsistent with the rest of the i18n discipline.
+
+**Fix:** Use `t('notificationTracker.defaultName').replace('{name}', provider.name)`.
+
+### M18. topbar-action store not cleared on auth state change
+
+[routes/providers/+page.svelte:160-167](frontend/src/routes/providers/+page.svelte#L160)
+
+Each page sets a topbar CTA in `onMount` and clears it in `onDestroy`. If
+`logout()` is called from inside the page (via the search palette, etc.), the
+page never destroys cleanly and the topbar action sticks into the login screen.
+Defensive `topbarAction.clear()` in `logout()` would plug this.
+
+### M19. Many `: any` and `as any` types in critical paths
+
+[routes/users/+page.svelte:62](frontend/src/routes/users/+page.svelte#L62)
+[routes/command-trackers/+page.svelte:27](frontend/src/routes/command-trackers/+page.svelte#L27)
+[routes/providers/+page.svelte:179](frontend/src/routes/providers/+page.svelte#L179)
+[lib/providers/types.ts:120](frontend/src/lib/providers/types.ts#L120)
+
+64 occurrences of `: any` / `as any` across 20 files. None are in
+security-sensitive paths, but they remove type safety in exactly the call
+sites that shape API requests (`body: any = { ... }`). Recommended cleanup
+task, not a blocker.
+
+---
+
+## LOW
+
+### L1. +page.svelte event types hardcoded in three parallel maps
+
+[routes/+page.svelte:475-512](frontend/src/routes/+page.svelte#L475-L512)
+
+`eventLabels`, `eventIcons`, and `eventGradients` are three parallel dicts
+keyed by the same set of strings. Adding a new event type requires editing
+three places (plus i18n). A single `EVENT_META` object would be more
+maintainable.
+
+### L2. TestMenu.svelte uses z-index 9998 instead of 9999
+
+[routes/notification-trackers/TestMenu.svelte:25](frontend/src/routes/notification-trackers/TestMenu.svelte#L25)
+
+```svelte
+<div style="position:fixed; top:0; left:0; right:0; bottom:0; z-index:9998;"
+```
+
+The convention says 9999 for overlays. Using 9998 was probably intentional
+(so the menu sits above the backdrop), but the cleaner pattern is to give the
+backdrop a slightly lower stacking context inside the same parent.
+
+### L3. console.warn left in production-bound code
+
+14 `console.warn`/`console.error` occurrences. Most are guarded by a
+"failed to load" + UI fallback - legitimate debug noise. Recommend wiring to
+a structured logger before public release; current state is acceptable for an
+internal tool but spam-prone in DevTools.
+
+### L4. Dashboard setTimeout(animateCount, 200) is uncancelled
+
+[routes/+page.svelte:290-299](frontend/src/routes/+page.svelte#L290-L299)
+
+The 200ms delay before triggering count animations is uncancelled. Navigating
+away during the first 200ms means the count animation `requestAnimationFrame`
+chain still runs against a stale `status` reference. Cosmetic only.
+
+### L5. app.html inline theme bootstrap reads localStorage without try/catch
+
+[src/app.html:12](frontend/src/app.html#L12)
+
+Theme is hydrated synchronously in `<head>` to avoid FOUC - fine - but if
+localStorage is blocked (Safari private mode, some enterprise policies) the
+inline script throws and the rest of the head bootstrap may be skipped.
+
+### L6. EventChart computes activeTypes and hasData from same loop twice
+
+[lib/components/EventChart.svelte:46-49](frontend/src/lib/components/EventChart.svelte#L46-L49)
+
+`hasData` and `activeTypes` traverse the same data twice. Single-pass
+derivation would be cheaper for the rare "many days of events" case.
+
+### L7. Single-letter t shadowing in +layout.svelte
+
+`+layout.svelte:140` uses `for (const t of targets)` inside `navCounts`, which
+shadows the imported i18n function `t`. Svelte 5 does not flag it (inner scope
+wins), but it confuses search/grep and breaks IDE go-to-definition. Several
+other pages use single-letter `t` as iteration var (`actions/+page.svelte`,
+`command-trackers/+page.svelte`, `targets/+page.svelte`). Recommend `target` /
+`tracker` for legibility.
+
+---
+
+## Notes & non-findings
+
+- **Modal overlay convention** (CLAUDE.md #2): Modal.svelte, Snackbar,
+  IconPicker, IconGridSelect, MultiEntitySelect, EntitySelect, TimezoneSelector,
+  EventChart, Hint, SearchPalette, and TestMenu all use `position:fixed` with
+  `z-index: 9999` (or 9998 for the TestMenu backdrop - see L2). Convention
+  upheld.
+- **@html usage** - only three call sites, all pipe through `sanitizePreview`,
+  which is a DOMParser-based allowlist limited to `B`, `I`, `CODE`, `PRE`, `A`,
+  `BR` with `https?://` href validation. Safe.
+- **i18n parity**: EN and RU JSON have the exact same 1466 keys - no orphans.
+- **Selector placeholders**: `LinkedTargetsSection` correctly uses
+  `t('common.noneDefault')`, no em-dash leaks in user-facing flows (only
+  defaults inside shared components - see M1/M2).
+- **svelte-check passes** (exit 0) - no type errors at the strict level the
+  project compiles with.
+- **No eval, new Function, or string-setTimeout**: dynamic code execution
+  surface is clean.
+- **No var declarations**, no `==` (loose equality) outside generated CSS.
+- **AbortController usage**: present in `lib/api.ts` for the canonical fetch
+  wrapper - the rest of the codebase could lean on it more (see H3, H5).
@@ -0,0 +1,436 @@
+# Performance & Database Review — `service-to-notification-bridge`
+
+**Scope:** entire repo at `c:\Users\Alexei\Documents\service-to-notification-bridge`
+**Backend:** FastAPI + SQLAlchemy async + SQLModel on SQLite (Postgres-compatible URL, but only SQLite branch is exercised in code).
+**Frontend:** SvelteKit 5 (runes) static build served by the same FastAPI process.
+**Reviewer:** Claude Opus 4.7 (1M context)
+
+---
+
+## Executive summary
+
+1. **Indexing is in good shape.** FK columns and the dashboard/webhook hot paths have explicit composite indexes (`ix_event_log_user_created`, `ix_event_log_user_event_type_created`, `ix_deferred_dispatch_status_fire_at`, partial `ux_deferred_dispatch_pending`). The bulk of the "missing index" risk is already mitigated.
+2. **No real migration tool.** The project runs a hand-rolled, 1880-line, idempotent migration script on every boot. It works, but it's brittle, slow on cold start, has no down-migrations, and the table-rebuild branches lose indexes silently. Move to Alembic before the next major schema change.
+3. **`create_all` is still the source-of-truth for new schemas** (engine.py:63). That's an anti-pattern next to migration tooling: schema drift can silently appear between fresh installs and upgraded installs.
+4. **Two real N+1 risks remain.** `_tracker_response` (notification_trackers.py:286-291) calls `_tt_response` per link, and `_refresh_telegram_chat_titles` (scheduler.py:229) issues per-chat `getChat` calls without bot-level batching guards. The big one in `load_link_data` was already fixed (good).
+5. **SQLite PRAGMAs are mostly right but pool sizing is wrong.** WAL, `synchronous=NORMAL`, FK enforcement, busy_timeout, temp_store=MEMORY are all set. Missing: `cache_size`, `mmap_size`. The async engine uses SQLAlchemy's default pool with multiple writer connections — under WAL that still serializes, but it raises spurious BUSY pressure on long transactions (see #M3).
+6. **Event-log retention exists and is correct** (30-day default, cron at 03:00 UTC), but `retention_days=0` disables it silently and there is no archival, no per-tenant cap, no row-count metric exposed to operators.
+7. **Memory leak risk: `_dirty_bots`, `_last_update_id`, `_last_webhook_reclaim_at`, `_adaptive_state`, `_adaptive_max_skip`** in command_sync.py, telegram_poller.py, scheduler.py are unbounded module-level dicts. In a long-running process they grow without ever shrinking when entities are deleted.
+8. **Frontend has no virtualization on long lists** — dashboard event stream, tracker history, target list. On a tenant with thousands of events the dashboard `{#each status.recent_events}` (with `(event.id)` key) still renders the whole page-set into DOM and re-runs derivations on every refresh.
+
+---
+
+## CRITICAL
+
+### C1. `create_all` is the schema-of-record for new installs ([engine.py:60](packages/server/src/notify_bridge_server/database/engine.py))
+
+```python
+async def init_db() -> None:
+    engine = get_engine()
+    async with engine.begin() as conn:
+        await conn.run_sync(SQLModel.metadata.create_all)
+```
+
+**What's wrong:** `init_db()` runs unconditionally on every boot before the migration script. New installs get the *current* model's CREATE TABLE statements — including FK declarations like `ondelete=SET NULL` — while upgraded installs only get what the (one-way) `migrate_*` scripts manage to inject via `ALTER TABLE`. Several migrations explicitly admit "this only takes effect on freshly created tables" (e.g. `migrate_eventlog_provider_fk` is a documented no-op). That means **the schema drift between a fresh install and a 6-month-old install is real and undocumented.**
+
+**Impact:** stability — subtle bugs that reproduce only on upgraded installs (FK enforcement, cascade behavior, partial UNIQUE indexes); ops — restoring a backup from a fresh install onto an upgraded box, or vice-versa, can change observable behaviour.
+
+**Fix:**
+1. Adopt Alembic with autogenerate-from-models, lock the baseline migration to the current `SQLModel.metadata`, and stop calling `create_all` in production startup.
+2. Keep the hand-rolled `migrate_*` chain as legacy data-migrations only (idempotent, runs once, then removed).
+3. Add a CI check: spin up empty DB → run migrations → diff against `SQLModel.metadata` → fail if non-empty.
+
+---
+
+### C2. `migrate_schema` runs ~30+ idempotent `PRAGMA table_info` + ALTER probes on every cold start ([migrations.py:67-427](packages/server/src/notify_bridge_server/database/migrations.py))
+
+`_has_column` issues a `PRAGMA table_info('<table>')` per check; `migrate_schema` calls it dozens of times serially inside one transaction. On a cold start this is the dominant boot latency. Worse, it forces a write txn on every boot even when nothing changes (because each migration opens `engine.begin()`).
+
+**Impact:** startup cost — visible on Raspberry-Pi / NAS deployments; SQLite WAL checkpoint pressure on every boot when nothing changed; readiness probe grace window must accommodate this.
+
+**Fix:**
+1. Wire `schema_version` (already exists, `CURRENT_SCHEMA_VERSION=1`) as a real short-circuit — at the top of every `migrate_*`, return immediately if `schema_version >= N` for that migration.
+2. Cache `PRAGMA table_info` results within a single migration run.
+3. Better long-term: replace with Alembic; you already have the version table.
+
+---
+
+### C3. `_install_sqlite_pragmas` only fires on engine-pool `connect`, not when SQLAlchemy reuses pooled connections from a different event loop ([engine.py:18-38](packages/server/src/notify_bridge_server/database/engine.py))
+
+The `@event.listens_for(engine.sync_engine, "connect")` hook only runs at connection creation. The default `aiosqlite` pool reuses connections — that's fine — but `connect_args["timeout"]=30` clashes with the in-PRAGMA `busy_timeout=10000` (10 s). Two different timeout settings is confusing and the lower wins.
+
+**Impact:** stability under contention — under sustained writer contention you get `SQLITE_BUSY` *much* sooner than expected. The 30-s connect_args timeout is for connection *open*, the 10-s busy_timeout is what governs lock contention; users see "database is locked" errors after 10 s, not 30.
+
+**Fix:** standardize on busy_timeout (raise to 30 s to match `connect_args`, or drop one and keep the other). Document the chosen value in a constant. Also add:
+
+```python
+cur.execute("PRAGMA cache_size=-65536")   # 64 MiB negative = kibibytes
+cur.execute("PRAGMA mmap_size=268435456") # 256 MiB
+cur.execute("PRAGMA wal_autocheckpoint=1000")
+```
+
+The 100k-asset album write pattern (`asset_ids` JSON blob) benefits significantly from a larger page cache and mmap; current defaults force a lot of SQLite-internal I/O.
+
+---
+
+## HIGH
+
+### H1. Frontend dashboard event-stream lacks virtualization & double-fetches on filter changes ([+page.svelte:739](frontend/src/routes/+page.svelte))
+
+`{#each status.recent_events as event, i (event.id)}` is keyed (good), but the page renders every event row with rich nested components (`EventDetailModal`, `MdiIcon`, etc.) for every paginate-back/forward. There's no row virtualization and the same data fetches re-run on every filter mutation (search input has a 300 ms debounce in `onSearchInput`, but `filterEventType`, `filterProviderId`, `filterSort`, `refreshSeconds` do not).
+
+**Impact:** UX — choppy on tenants with 50+ events/page, perceptible filter-flicker; CPU — derivation cost on every status refresh.
+
+**Fix:**
+1. Wrap the events list in a tiny windowing component (svelte-virtual or a simple offset/limit windowed view — the API already supports it).
+2. Debounce the entire filter-change branch, not just the search input (`$effect(() => { if (settled) { reload() }})` with a 100 ms guard).
+3. The provider count map (`provider_event_counts`) is computed server-side for *all* matching events on every page request; cache it for `(user_id, filters)` in a 30-s in-memory dict server-side (see also #M2).
+
+---
+
+### H2. `provider_event_counts` aggregate query runs unbounded GROUP BY on every dashboard request ([status.py:84-103](packages/server/src/notify_bridge_server/api/status.py))
+
+```python
+provider_counts_query = (
+    select(
+        EventLog.provider_id,
+        EventLog.provider_name,
+        func.sum(func.coalesce(EventLog.assets_count, 1)).label("total"),
+    )
+    .where(EventLog.user_id == user.id)
+    .group_by(EventLog.provider_id, EventLog.provider_name)
+)
+```
+
+Every dashboard load (every 10–60 s by default — see `refreshIntervalItems`) runs `GROUP BY provider_id, provider_name` over *every* event the user ever owned. At 90 days × ~1 event/min/tracker this is hundreds of thousands of rows scanned per refresh per logged-in user.
+
+**Impact:** latency — SQLite forces a full table scan + sort here because the only composite index is `(user_id, event_type, created_at DESC)`; cost — burns CPU on the bridge box for a metric that changes very slowly.
+
+**Fix:**
+1. Add `ix_event_log_user_provider (user_id, provider_id)` so the GROUP BY can be index-only.
+2. Cache the result for `(user_id, filter_signature)` for 30 s in the same in-memory cache as #H1.
+3. Long-term: materialize per-provider counts into an `event_counter` table maintained by triggers or an APScheduler job. The dashboard then reads at most a dozen rows.
+
+---
+
+### H3. `_tracker_response` issues one query per tracker-target link ([notification_trackers.py:286-291](packages/server/src/notify_bridge_server/api/notification_trackers.py))
+
+```python
+async def _tracker_response(session: AsyncSession, t: NotificationTracker) -> dict:
+    result = await session.exec(
+        select(NotificationTrackerTarget).where(NotificationTrackerTarget.tracker_id == t.id)
+    )
+    tracker_targets = [await _tt_response(session, tt) for tt in result.all()]
+```
+
+`_tt_response` (in notification_tracker_targets.py:12 — has 12 distinct `select`/`session.get` references) issues per-link follow-up SELECTs. Called from `create`, `update`, `delete` and `trigger` for a single tracker, so the practical N is small — but `_tt_response` is also called inside the bulk `list_notification_trackers` loop's downstream consumers, and any future bulk endpoint will multiply this badly.
+
+**Impact:** latency on POST/PATCH responses; future regression risk.
+
+**Fix:** rewrite `_tt_response` to accept pre-fetched maps (mirror the pattern in `dispatch_helpers.load_link_data`). Or, simpler: write a single eager-load helper using `selectinload(NotificationTrackerTarget.target)` once `relationship()` mappers are declared on the models.
+
+---
+
+### H4. `load_link_data` does not eagerly load target.config related entities — relies on `dict(target.config)` snapshotting ([dispatch_helpers.py:539-747](packages/server/src/notify_bridge_server/services/dispatch_helpers.py))
+
+The function batch-loads receivers, telegram_chats, email_bots, matrix_bots up-front, but the broadcast-expansion branch in the active_links loop still issues `_resolve_target` per child target (line 715). That `_resolve_target` is called with all the pre-fetched maps, so it doesn't *query* per call — but it does build a fresh `target_config` dict per child. With a broadcast target containing 50 children fanning out 100 events/min this is constant garbage collection pressure.
+
+**Impact:** GC pressure under load; not a correctness problem.
+
+**Fix:** none required short-term. Long-term, add `selectinload` declarations on the relationship model so SQLAlchemy can co-fetch the chain. The code path is already well-batched.
+
+---
+
+### H5. `aiohttp.ClientSession` is constructed per-call inside `NotificationDispatcher._session_ctx` when no shared session is provided ([dispatcher.py:117-123](packages/core/src/notify_bridge_core/notifications/dispatcher.py))
+
+```python
+@contextlib.asynccontextmanager
+async def _session_ctx(self) -> AsyncIterator[aiohttp.ClientSession]:
+    if self._shared_session is not None and not self._shared_session.closed:
+        yield self._shared_session
+        return
+    async with _new_session() as session:
+        yield session
+```
+
+In server-side code paths (watcher, event_dispatch, deferred_dispatch) a shared session is always passed in, so this is harmless. But unit tests, the CLI, and any direct library user that instantiates `NotificationDispatcher` without a session pays the cost. Worse, the per-dispatch session creates a fresh TCP pool, fresh DNS resolver — defeating connection reuse to Telegram / Discord webhook hosts.
+
+**Impact:** test slowness; correctness if a non-server consumer ever ships.
+
+**Fix:** require the `session` parameter (`session: aiohttp.ClientSession` not `| None`). Or have the dispatcher lazily attach to a module-level `_default_session` cached by event loop id.
+
+---
+
+### H6. `WebhookPayloadLog` is pruned per-insert via a sub-select but the prune query has no UNIQUE/partial protection against duplicate inserts ([webhooks.py:404-418](packages/server/src/notify_bridge_server/api/webhooks.py))
+
+The "keep newest `max_count` per provider, delete the rest" pattern uses `select(...).order_by(created_at DESC).limit(max_count)` as a subquery. Under SQLite this materializes the top-N then negates it — fine when max_count is 20. But this runs on every inbound webhook. For a busy Gitea/HA installation that's 60+ writes/min, each with a delete-by-sub-select. The `ix_webhook_payload_log_provider_created` index makes the read cheap, but the DELETE still rewrites pages.
+
+**Impact:** write amplification on busy webhook tenants.
+
+**Fix:** keep the prune but make it probabilistic — only run with `random.random() < 0.1` (10% chance per insert). The cap still holds in steady state, but the per-write cost drops 10×.
+
+---
+
+### H7. No retention/archival for `notification_tracker_state` and `deferred_dispatch` "fired"/"dropped" rows ([scheduler.py:332-364](packages/server/src/notify_bridge_server/services/scheduler.py))
+
+`_cleanup_old_events` deletes `event_log`, `webhook_payload_log`, `action_execution` older than retention days. `deferred_dispatch` rows with `status IN ('fired', 'dropped')` are never deleted. `notification_tracker_state.asset_ids` for an immich tracker watching a deleted collection is also never reaped.
+
+**Impact:** unbounded growth on long-running installs; `asset_ids` JSON blobs can be megabytes per collection.
+
+**Fix:** extend `_cleanup_old_events` to also delete `DeferredDispatch.status != 'pending' AND fired_at < cutoff`. Add a separate housekeeping job that prunes `NotificationTrackerState` rows whose `collection_id` is no longer in `NotificationTracker.collection_ids`.
+
+---
+
+## MEDIUM
+
+### M1. Sentinel value `bot_id=0` is a footgun ([models.py:69-73](packages/server/src/notify_bridge_server/database/models.py))
+
+```python
+# bot_id=0 is a sentinel meaning "Telegram has not yet returned a numeric
+# ID for this bot" (i.e. token never validated). Multiple unverified bots
+# may legitimately carry 0, so we only enforce uniqueness for non-sentinel
+# values via a partial index added in migrate_uniqueness_constraints.
+bot_id: int = Field(default=0, index=True)
+```
+
+Sentinel values on indexed columns hurt index selectivity (every unvalidated bot is the same row from the planner's perspective) and create maintenance burden. Worse, every code path that looks up by `bot_id` must remember to filter `bot_id != 0`.
+
+**Impact:** maintainability; latent bug surface (one missed `!= 0` filter and an unverified bot is silently re-used).
+
+**Fix:** change `bot_id: int | None` defaulting to None, drop the sentinel.
+
+---
+
+### M2. No request-scoped cache for `user.id` lookups inside one request ([api/*.py, throughout](packages/server/src/notify_bridge_server/api/))
+
+The same `get_current_user` dependency runs JWT validation + a `session.get(User, id)` on every request. Many endpoints then do their *own* `user.id`-filtered SELECTs. There is no per-request memoization of the User row.
+
+**Impact:** one extra SELECT per request, mostly noise — but it's free to fix.
+
+**Fix:** in `get_current_user`, cache the User on `request.state.user`. Routes that take `user: User = Depends(...)` are unchanged.
+
+---
+
+### M3. SQLAlchemy async pool defaults serialize SQLite writers but the engine allows multiple connections ([engine.py:41-57](packages/server/src/notify_bridge_server/database/engine.py))
+
+`create_async_engine` for SQLite defaults to a `StaticPool` of size 1 in newer SQLAlchemy versions, but older versions / different `aiosqlite` versions can default to `NullPool` (one connection per request) or a small QueuePool. The code does not pin this explicitly. Under WAL, multiple readers are fine but only one writer can hold the txn at a time — so a slow writer just makes other connections block on `busy_timeout`.
+
+**Impact:** unpredictable behaviour across SQLAlchemy versions; sporadic `SQLITE_BUSY` under load.
+
+**Fix:** explicitly configure the pool:
+
+```python
+from sqlalchemy.pool import StaticPool, AsyncAdaptedQueuePool
+
+_engine = create_async_engine(
+    url,
+    echo=settings.debug,
+    pool_pre_ping=True,
+    connect_args=connect_args,
+    poolclass=AsyncAdaptedQueuePool,
+    pool_size=5,
+    max_overflow=10,
+    pool_recycle=3600,
+)
+```
+
+For Postgres compatibility leave these as-is; for SQLite the right value is `StaticPool` + `connect_args={"check_same_thread": False}` to share one connection across the event loop (this is the supabase/pgbouncer pattern adapted for sqlite-async).
+
+---
+
+### M4. `_refresh_telegram_chat_titles` issues per-chat HTTP without per-bot bucketing ([scheduler.py:229-329](packages/server/src/notify_bridge_server/services/scheduler.py))
+
+The job builds `tasks` as a flat list across all bots and runs them under a global `Semaphore(10)`. A bot with 50 chats and a slow Telegram response (rare but happens) can monopolize all 10 slots, starving every other bot. The semaphore should be per-bot.
+
+**Impact:** the daily refresh can take much longer than intended on a multi-bot install with one degraded bot.
+
+**Fix:** create one semaphore per bot:
+
+```python
+sems = {bot_id: asyncio.Semaphore(_CHAT_SYNC_CONCURRENCY) for bot_id in bot_tokens}
+```
+
+---
+
+### M5. `event_log.collection_name.contains(search)` triggers full table scan on filter ([status.py:69-75](packages/server/src/notify_bridge_server/api/status.py))
+
+The dashboard search input runs four `.contains(search)` clauses ORed together — these become `LIKE '%search%'` and cannot use a regular B-tree index. With 100k+ event_log rows the dashboard search becomes a multi-second operation.
+
+**Impact:** UX — search feels broken on large installs; CPU on the bridge box.
+
+**Fix:**
+1. Limit the search to the most recent N days (e.g. retention/3) — most users only search recent events.
+2. Add a SQLite FTS5 virtual table mirroring event_log's text columns, sync via triggers. Searches use `MATCH 'foo'` which is sub-millisecond on million-row tables.
+
+---
+
+### M6. `DeferredDispatch.event_payload` JSON blob can grow unbounded per row ([models.py:639-659](packages/server/src/notify_bridge_server/database/models.py), [deferred_dispatch.py:188-298](packages/server/src/notify_bridge_server/services/deferred_dispatch.py))
+
+The asset-coalescing union path appends every new asset's full dict (filename, urls, tags, extra metadata) into `event_payload["added_assets"]`. A mass-import that adds 50k photos during a quiet window means one DeferredDispatch row with 50k asset entries.
+
+**Impact:** memory blow-up at drain time (the whole JSON is parsed via `deserialize_event` into a Python list of `MediaAsset` dataclasses); could trip the drain timeout (`_DRAIN_DISPATCH_TIMEOUT_SECONDS=120`) on legitimate workloads.
+
+**Fix:** cap the union at e.g. 500 assets per row; when crossed, emit a "more_truncated" sentinel into `payload["extra"]` so the rendered template can show "+45000 more". The `apply_tracking_display_filters` `max_assets_to_show` does cap it for delivery, but the *stored* payload is uncapped.
+
+---
+
+### M7. Per-tick `await get_app_timezone(session)` reads from the DB on every dispatch ([dispatch_helpers.py:146-150](packages/server/src/notify_bridge_server/services/dispatch_helpers.py))
+
+Each tracker tick, each webhook, each defer evaluation calls `get_app_timezone` which calls `get_setting(session, "timezone")` which is a SELECT. The timezone setting rarely changes (manual setting), but the SELECT runs constantly.
+
+**Impact:** noise on otherwise good caching.
+
+**Fix:** cache the timezone in a module-level `(value, expires_at)` tuple with 60-s TTL, invalidated by `reschedule_cron_jobs_for_timezone_change`.
+
+---
+
+### M8. Unbounded in-memory dictionaries with no TTL or capacity ([scheduler.py:67-72](packages/server/src/notify_bridge_server/services/scheduler.py), [telegram_poller.py:31-35](packages/server/src/notify_bridge_server/services/telegram_poller.py), [command_sync.py:25](packages/server/src/notify_bridge_server/services/command_sync.py))
+
+```python
+_adaptive_state: dict[int, dict[str, int]] = {}
+_adaptive_max_skip: dict[int, int] = {}
+_last_update_id: dict[int, int] = {}
+_last_webhook_reclaim_at: dict[int, float] = {}
+_dirty_bots: dict[int, float] = {}
+```
+
+Each is keyed by tracker_id / bot_id. When a tracker or bot is deleted, the cleanup paths (`unschedule_tracker`, etc.) do remove some entries — but not all. `_last_update_id`, `_last_webhook_reclaim_at` are never cleared on bot deletion.
+
+**Impact:** slow memory leak in long-running processes that create+delete trackers/bots frequently (e.g. test environments).
+
+**Fix:** on tracker/bot deletion, explicitly clear all module dicts that key by that id. Or, simpler, switch each to `weakref.WeakValueDictionary` once the entity has a Python object representation, or to a TTLCache.
+
+---
+
+### M9. Bulk insert pattern in migrations uses one-statement-per-row ([migrations.py:566-588](packages/server/src/notify_bridge_server/database/migrations.py))
+
+`migrate_tracker_targets` issues `INSERT INTO ... VALUES (...)` per row in a Python for-loop. On a tenant with 10k+ legacy rows this is slow even inside a single transaction.
+
+**Impact:** one-shot, but rough on upgrade for big tenants.
+
+**Fix:** use `executemany` / batch INSERTs:
+
+```python
+await conn.execute(text("INSERT INTO ... VALUES (...)"), batch_params)
+```
+
+This is mostly historical (the migration is idempotent and skipped on subsequent runs), but worth fixing if you're touching the file.
+
+---
+
+### M10. Missing index on `notification_tracker_state(notification_tracker_id, collection_id)` ([models.py:454-478](packages/server/src/notify_bridge_server/database/models.py))
+
+`check_tracker` reads state per tracker; the existing `ix_notification_tracker_state.notification_tracker_id` index (declared via `index=True`) supports that. But every state read is `WHERE tracker_id = ? AND collection_id = ?` (implicitly via the resulting dict). A composite would help; SQLite can do index-only scans here.
+
+**Impact:** small. SQLite's index intersection plus the fact that one tracker typically has <20 collections makes this a minor win.
+
+**Fix:** add `(notification_tracker_id, collection_id)` composite index to the `_INDEXES` list.
+
+---
+
+## LOW
+
+### L1. `SELECT *` semantics from `select(Model)` ORM is unavoidable but verbose ([throughout services/, api/])
+
+SQLModel's `select(ModelClass)` is effectively `SELECT all columns`. For wide rows like `TrackingConfig` (~70 columns of boolean flags) that's a lot of bytes per dispatch evaluation. There are no API list endpoints that return `TrackingConfig` from a hot path, so this is mostly cosmetic — but for pages that only need a handful of columns (e.g. `status.py`'s `tracker_id, name` map) the explicit-column form is already used. Continue that pattern.
+
+---
+
+### L2. `EventLog.details` JSON dict is reconstructed on every dashboard read ([status.py:258](packages/server/src/notify_bridge_server/api/status.py))
+
+`details: e.details or {}` serializes the JSON every time. SQLite returns this as a parsed Python dict already (JSON column), so the cost is low; just a note that this is a hot path.
+
+---
+
+### L3. `event_log.collection_id` and `details` have no indexes; some webhook commands filter on them ([commands/immich/events.py:43](packages/server/src/notify_bridge_server/commands/immich/events.py))
+
+The history-by-tracker endpoint uses the composite `ix_event_log_user_event_type_created` plus a hit on `notification_tracker_id` — fine. But `events.py`'s "last assets_added for this collection" queries (`event_type='assets_added' AND collection_id=?`) cannot use any current index optimally.
+
+**Fix:** add `(event_type, collection_id, created_at DESC)` if these queries are called by users frequently (Telegram `/assets <album>` etc.).
+
+---
+
+### L4. JSON column types not declared with `JSONB` semantics ([models.py: many](packages/server/src/notify_bridge_server/database/models.py))
+
+SQLite has only `JSON` (text storage with `json_valid` checks). On Postgres you'd want `JSONB`. The codebase uses `Column(JSON)` from SQLModel which maps to native `JSONB` on Postgres — that's correct. No action needed.
+
+---
+
+### L5. The `setup` lifespan runs migrations *inside* the FastAPI lifespan synchronously ([main.py:62-122](packages/server/src/notify_bridge_server/main.py))
+
+The migrations + seeds + scheduler boot all run before `_READY = True`. On a cold start with a big DB this can take 10+ s during which `/api/ready` returns 503. That's correct, but `/api/health` is also un-reachable because uvicorn hasn't started the workers yet (lifespan blocks startup). For orchestrators that probe `/api/health`, this means startup-grace must be tuned.
+
+**Fix:** start the HTTP listener first, run migrations as a background task, expose readiness flag through `/api/ready` only.
+
+---
+
+### L6. `ServiceProvider.config`, `NotificationTarget.config`, `Tracker.filters` JSON columns store secrets unencrypted ([models.py:42, 349, 399](packages/server/src/notify_bridge_server/database/models.py))
+
+API keys, refresh tokens, webhook secrets, SMTP passwords all live in `config` JSON. Visible to anyone with DB read access. This is a known design trade-off (`backup_secrets_mode` controls export behaviour) but worth flagging.
+
+**Fix:** out of scope for this review; consider an at-rest encryption layer keyed off `secret_key` (Fernet) for `config["api_key"]`, `config["password"]`, `access_token`, etc. — but only if your threat model justifies the operational cost.
+
+---
+
+### L7. Frontend `caches.svelte.ts` has 30-s TTL but no cross-tab invalidation ([entity-cache.svelte.ts:14](frontend/src/lib/stores/entity-cache.svelte.ts))
+
+Two browser tabs editing the same entity will see stale data for up to 30 s in the other tab. No `BroadcastChannel` listener.
+
+**Fix:** add a `BroadcastChannel('notify-bridge-cache')` that calls `cache.invalidate()` on receipt. ~15 lines.
+
+---
+
+### L8. `providersCache.invalidate(); await load()` is two-step ([providers/+page.svelte:238, 250](frontend/src/routes/providers/+page.svelte))
+
+`invalidate()` + immediate `fetch(true)` race against any in-flight request; the deduplication map handles it, but the explicit `await load()` is essentially `fetch(true)` directly. Simpler:
+
+```typescript
+providersCache.set(updatedList);  // or fetch(true)
+```
+
+Cosmetic.
+
+---
+
+### L9. `details["dispatch_status"]` is a string enum but not declared as one ([deferred_dispatch.py:619-624](packages/server/src/notify_bridge_server/services/deferred_dispatch.py))
+
+`dispatch_status` takes values `"deferred"`, `"deferred_then_dropped"`, `"deferred_then_failed"`, `"delivered_after_quiet_hours"`, `"dropped_quiet_hours_nondeferrable"`. They're scattered as string literals. The dashboard renders them.
+
+**Fix:** declare an `Enum` once and import from both server and frontend types.
+
+---
+
+### L10. No DB connection used by `/api/health` ([main.py:270-274](packages/server/src/notify_bridge_server/main.py))
+
+`/api/health` returns instantly without checking the DB. That's correct for a liveness probe but the comment doesn't match common practice ("liveness = process up"). Pair this with #L5: orchestrators using `/api/health` for warm-up will mark the pod ready while migrations are still running.
+
+**Fix:** keep liveness lightweight, document the readiness probe as the warm-up gate.
+
+---
+
+## Notes on what's already good
+
+- Performance indexes (`_INDEXES` list) cover all the right hot paths.
+- Composite `(status, fire_at)` index on `deferred_dispatch` plus partial unique `(link_id, collection_id, event_type) WHERE status='pending'` prevents the worst races.
+- `load_link_data` is fully batched — the most complex hot path in the codebase looks clean.
+- Shared `aiohttp.ClientSession` with DNS-rebinding-safe `PinnedResolver` is production-grade.
+- Pre-migration `VACUUM INTO` snapshot is the right safety net for a hand-rolled migration chain.
+- APScheduler defaults (`coalesce=True`, `misfire_grace_time=300`, `max_instances=1`) are correct production settings.
+- Adaptive polling (skip-N-of-K when idle) with jitter is a thoughtful 4-tier scheduling design.
+- Tracker cache (5-s TTL with explicit invalidation) and rendered-message per-locale cache are good fan-out optimizations.
+- Migration idempotency is genuinely well-handled despite the rough tooling.
+- Frontend `entity-cache` deduplication of in-flight requests is the right pattern.
+
+---
+
+## Priority recommendations (next 30 days)
+
+1. **Adopt Alembic** (C1, C2) — eliminate `create_all` from prod, baseline the current schema, lock down new schema changes through autogenerate.
+2. **Fix the dashboard aggregate query** (H1, H2, M5) — add the missing composite index, server-side cache the per-provider aggregate, virtualize the event list. This is the single biggest user-visible perf win.
+3. **Cap `DeferredDispatch.event_payload` size + add retention for fired/dropped rows** (M6, H7) — closes off the worst-case memory and growth scenarios.
+4. **Cleanup module-level dicts on entity deletion** (M8) — small fix, prevents a slow leak.
+5. **Standardize SQLite PRAGMAs and pool config** (C3, M3) — predictable behaviour, fewer spurious BUSY errors.
+
+---
+
+*Reviewed against codebase at HEAD (`a20635a`).*
@@ -0,0 +1,312 @@
+# Security Review — notify-bridge v0.8.1
+
+Reviewer: security-reviewer (Opus 4.7) — 2026-05-22
+Branch: master @ a20635a
+Scope: `packages/server`, `packages/core`, `frontend/src`, `Dockerfile`, `docker-compose.yml`, `.gitea/workflows/`, env handling.
+
+---
+
+## Executive Summary
+
+- **Overall posture is strong.** The project applies many non-obvious controls correctly: Jinja2 `SandboxedEnvironment` on every render path; `bcrypt` with a 72-byte length guard and constant-time login (dummy hash on missing user); JWT with `token_version` revocation; SSRF guard with CGNAT, IPv4-mapped-IPv6 unwrapping, and a `PinnedResolver` that defeats DNS rebinding; secret-masking log filter; path-traversal-safe backup file resolver; security headers + CSP; non-root Docker user; required `SECRET_KEY` >= 32 chars with a rejection list; non-default Telegram webhook secret enforced; HMAC signature checks on Gitea/Generic webhooks; provider-config secret masking on GET; ownership checks (`get_owned_entity`) on every parameterised route I sampled.
+- **HIGH — Home Assistant `access_token` is not masked.** It is stored in `provider.config`, never added to the mask list in `_provider_response`, never added to the placeholder-drop list in `update_provider`. Any logged-in user can `GET /api/providers/{id}` and read their HA token in cleartext, and a partial save will wipe it. Trivial fix.
+- **HIGH — Secrets at rest are plaintext.** Telegram bot tokens (`telegram_bot.token`), provider configs containing `api_key`/`api_token`/`webhook_secret`/`access_token`/SMTP passwords, and email-bot SMTP passwords are stored unencrypted in SQLite. Disk theft, an unrelated read primitive, or any backup leak exposes all credentials. The masking on the API is good UX, but the DB itself has no encryption-at-rest. The exported JSON backup respects a `secrets_mode` flag (good) but the live DB does not.
+- **MEDIUM — Template-preview endpoints bypass the timeout/size watchdog.** `template_configs.preview_config`, `template_configs.preview_raw`, `command_template_configs.preview_raw`, and `notifier.send_test_template_notification` construct fresh `SandboxedEnvironment(autoescape=False)` instances and call `.render(...)` directly. The hardened helper `render_template()` (timeout, source cap, output cap, autoescape) is bypassed. A logged-in user can wedge a worker thread with `{% for i in range(10**8) %}x{% endfor %}`. Single-tenant deployment limits the blast radius, but the renderer should be the single chokepoint.
+- **MEDIUM — Login rate limit is per-IP only.** `POST /api/auth/login @ 5/min` keys on `get_remote_address`. An attacker behind a proxy / NAT, or one that rotates source IPs (cheap on residential / cloud), trivially bypasses it. There is no per-username lockout, no exponential backoff, no captcha. Combined with no MFA, this leaves the admin account vulnerable to a slow online dictionary attack from a single password (8-char minimum, no complexity requirement).
+- **LOW / INFO — Several smaller findings**: webhook payload logs persist source payload (now with key-level redaction, but the redactor is name-based and will miss high-entropy secret values in non-obvious keys); no replay protection on inbound webhooks (no nonce/timestamp window); the `/api/auth/setup` 3/min limit + JWT issuance race window is hardened with a transaction count guard (good), but the dummy bcrypt hash literal used for timing-equalisation is malformed and `bcrypt.checkpw` returns `False` via `ValueError` — the swallowed exception still equalises timing, but a maintainer could regress this; CSP allows `script-src 'unsafe-inline'` (necessary for SvelteKit hydration, acceptable risk acknowledged in code).
+
+---
+
+## Findings
+
+### CRITICAL
+
+_None found._
+
+---
+
+### HIGH
+
+#### H-1. Home Assistant access_token leaked in provider GET responses
+
+- CWE: CWE-522 (Insufficiently Protected Credentials), CWE-200 (Exposure of Sensitive Information)
+- Files:
+  - [`packages/server/src/notify_bridge_server/api/providers.py:616-624`](../../packages/server/src/notify_bridge_server/api/providers.py) — `_provider_response` masks `("api_key", "api_token", "webhook_secret", "password", "client_secret", "refresh_token")` but **not** `access_token`.
+  - [`packages/server/src/notify_bridge_server/api/providers.py:399-405`](../../packages/server/src/notify_bridge_server/api/providers.py) — `update_provider` also omits `access_token` from the placeholder-drop list, so the response masking is consistent here, but if you fix one you must fix the other.
+- Scenario: Any user authenticated to the bridge (any role) calls `GET /api/providers/{id}` for an HA provider they own and the response includes `config.access_token` in cleartext. The HA long-lived token grants full control of the user's Home Assistant instance (lights, locks, cameras, scripts, devices). In a multi-user deployment, even within the same admin account, a stolen JWT exfiltrates the HA token; in a single-user deployment, any read primitive (XSS via a future template feature, an MITM on an HTTPS misconfiguration) gives the same result.
+- Remediation: Add `access_token` to both lists.
+
+```python
+# providers.py:_provider_response
+for secret_field in (
+    "api_key", "api_token", "webhook_secret", "password",
+    "client_secret", "refresh_token", "access_token",  # <-- add
+):
+    ...
+
+# providers.py:update_provider
+for secret_field in (
+    "api_key", "api_token", "webhook_secret", "password",
+    "client_secret", "refresh_token", "access_token",  # <-- add
+):
+    value = incoming.get(secret_field)
+    if isinstance(value, str) and value.startswith("***"):
+        incoming.pop(secret_field, None)
+```
+
+Better still: replace the hand-maintained tuple with a single module-level constant `_PROVIDER_SECRET_FIELDS` referenced from both call sites, plus a unit test that asserts every field declared on the per-provider Pydantic configs whose name appears in a denylist (`token`, `secret`, `password`, `key`, `credential`) is in the set. That prevents the next provider type from re-introducing the same gap.
+
+#### H-2. Secrets stored in plaintext at rest
+
+- CWE: CWE-312 (Cleartext Storage of Sensitive Information), CWE-256 (Plaintext Storage of a Password)
+- Files:
+  - [`packages/server/src/notify_bridge_server/database/models.py:54-84`](../../packages/server/src/notify_bridge_server/database/models.py) — `TelegramBot.token: str`
+  - [`packages/server/src/notify_bridge_server/database/models.py:87-100`](../../packages/server/src/notify_bridge_server/database/models.py) — `MatrixBot` (access_token in config)
+  - `ServiceProvider.config: dict[str, Any]` (JSON column) holds Immich `api_key`, Gitea `webhook_secret` + `api_token`, Google Photos `client_secret` + `refresh_token`, HA `access_token`, etc.
+  - `EmailBot.smtp_password: str` (per [`api/email_bots.py:142`](../../packages/server/src/notify_bridge_server/api/email_bots.py))
+- Scenario: An attacker who can read the SQLite file (compromised host, mis-permissioned backup volume, snapshot artifact in `data_dir/backups/`, leaked debug dump) gets every credential the bridge speaks: Telegram bot tokens (full bot control), Immich/Gitea/Planka API keys (read all photos / repos), Google Photos refresh tokens (long-lived, hard to revoke at scale), HA long-lived tokens (smart-home), SMTP passwords. The pre-migrate VACUUM-INTO snapshots (`packages/server/src/notify_bridge_server/database/snapshot.py`) inherit the same plaintext exposure and live alongside the active DB.
+- Remediation options, in order of effort:
+  1. **Short term**: document the threat in `OPERATIONS.md`, enforce file-system permissions on `/data` (the Dockerfile chowns to appuser already, but the host bind-mount must be `chmod 700`), and ensure backups are encrypted at the storage layer (S3 SSE / Borg / restic).
+  2. **Better**: column-level encryption with a key derived from `NOTIFY_BRIDGE_SECRET_KEY` (or a separate `NOTIFY_BRIDGE_DB_ENCRYPTION_KEY`). Use the `cryptography` library's `Fernet` for each sensitive column; envelope the secret JSON keys, not the whole row, so `WHERE` clauses and existing migrations keep working. Add a one-shot migration that re-encrypts existing rows.
+  3. **Best**: encrypt with a KMS-backed key (HashiCorp Vault Transit, AWS KMS) and rotate per-secret data keys. This is overkill for a homelab homeserver-style deployment but mandatory if the bridge is ever multi-tenant.
+- Skeleton for option 2:
+
+```python
+# new file packages/server/src/notify_bridge_server/security/secretbox.py
+from cryptography.fernet import Fernet, InvalidToken
+from .config import settings
+
+def _key() -> bytes:
+    # Derive a deterministic Fernet key from secret_key. Anyone with secret_key
+    # can decrypt — same threat model as JWT signing — but anyone with the DB
+    # alone cannot.
+    import base64, hashlib
+    h = hashlib.sha256(settings.secret_key.encode()).digest()
+    return base64.urlsafe_b64encode(h)
+
+_fernet = Fernet(_key())
+
+def encrypt_secret(plaintext: str) -> str:
+    return _fernet.encrypt(plaintext.encode()).decode()
+
+def decrypt_secret(ciphertext: str) -> str:
+    return _fernet.decrypt(ciphertext.encode()).decode()
+```
+
+Apply at write time in `update_provider` / `create_provider`, decrypt at read time inside `make_immich_provider`, `make_gitea_provider`, the Telegram client constructor, etc. Add a migration that scans every `ServiceProvider.config` JSON and re-encrypts the listed keys in place.
+
+---
+
+### MEDIUM
+
+#### M-1. Template preview endpoints skip the renderer watchdog
+
+- CWE: CWE-400 (Uncontrolled Resource Consumption), CWE-1333 (Inefficient Regular Expression Complexity — analogous)
+- Files:
+  - [`packages/server/src/notify_bridge_server/api/template_configs.py:608-613`](../../packages/server/src/notify_bridge_server/api/template_configs.py) — `preview_config` calls `SandboxedEnvironment(autoescape=False).from_string(template_body).render(...)` directly.
+  - [`packages/server/src/notify_bridge_server/api/slot_helpers.py:72-90`](../../packages/server/src/notify_bridge_server/api/slot_helpers.py) — `render_template_preview` (used by `/preview-raw` for both notification and command templates).
+  - [`packages/server/src/notify_bridge_server/services/notifier.py:494-499`](../../packages/server/src/notify_bridge_server/services/notifier.py) — `send_test_template_notification`.
+  - The hardened helper [`packages/core/src/notify_bridge_core/templates/renderer.py:48-108`](../../packages/core/src/notify_bridge_core/templates/renderer.py) (with timeout, length caps, output cap) is **not** used here.
+- Scenario: An authenticated admin submits `{% for i in range(10**8) %}x{% endfor %}` to `POST /api/template-configs/preview-raw`. Jinja2 has no built-in timeout. The sandbox blocks attribute access but not CPU. The request blocks the FastAPI event loop's executor thread until the worker oomkills or the client times out. Repeat to DoS the API.
+- Remediation: Route every render through a single, hardened helper.
+
+```python
+# Use the existing core helper consistently
+from notify_bridge_core.templates.renderer import render_template
+rendered = render_template(template_str, context)  # already has timeout + caps
+```
+
+For the strict-undefined two-pass validation in `render_template_preview`, fold the watchdog into the helper itself rather than skipping it.
+
+#### M-2. Login rate limit is per-IP only
+
+- CWE: CWE-307 (Improper Restriction of Excessive Authentication Attempts)
+- Files: [`packages/server/src/notify_bridge_server/auth/routes.py:140-157`](../../packages/server/src/notify_bridge_server/auth/routes.py).
+- Scenario: `@limiter.limit("5/minute")` keyed on `get_remote_address` gives 5 attempts per source IP per minute = ~7,200/day per IP. An attacker rotating across 10 IPs (cheap cloud, residential proxies, even a Tor exit pool) gets 72,000/day. With the 8-character minimum password and no complexity requirement, a 7-char-and-common password is reachable in days, not centuries. There is no per-username lockout, no captcha, no MFA.
+- Remediation:
+  1. Add a per-username sliding-window limiter on top of the per-IP one. Use a second `Limiter` whose `key_func` returns the lower-cased username from the body. Re-check after parsing the body.
+  2. Add an exponential lockout: after N consecutive failures for a username, require a cooldown (record in a `LoginFailure` table or in-memory TTLCache).
+  3. Document and recommend deploying behind a reverse proxy that adds CAPTCHA / WAF rate-limiting for login (Cloudflare Turnstile is cheap).
+  4. Track and log failed logins (auth-event audit trail) with src IP + username + timestamp.
+
+```python
+# Sketch — a second limiter that keys by username from the parsed body.
+async def _check_username_quota(username: str) -> None:
+    # In-memory TTLCache: 10 attempts per username per 15 minutes
+    if _username_attempts[username] >= 10:
+        raise HTTPException(429, "Too many attempts for this account")
+    _username_attempts[username] += 1
+```
+
+#### M-3. Webhook payload log redactor is keyword-based, misses value-based secrets
+
+- CWE: CWE-532 (Insertion of Sensitive Information into Log File)
+- Files: [`packages/server/src/notify_bridge_server/api/webhooks.py:326-358`](../../packages/server/src/notify_bridge_server/api/webhooks.py).
+- Scenario: `_redact_sensitive_body` walks the JSON and redacts values whose **keys** contain `token`, `auth`, `key`, `secret`, etc. A webhook provider that ships secrets under an innocent key (e.g. `"oauth_state": "ya29.a0..."`, `"continuation": "ABCDE..."`, `"x_state": "..."`) leaves the secret in the persisted payload log. The log row is admin-readable and exported in backups.
+- Remediation: Layer a high-entropy value detector on top of the key matcher (e.g. anything matching `[A-Za-z0-9_\-+/=]{32,}` and high Shannon entropy ≥ 3.5). Lower bound: also redact known prefixes (`ya29.`, `xoxb-`, `ghp_`, `glpat_`, `sk-`, `Bearer `).
+
+#### M-4. Webhook ingestion has no replay protection
+
+- CWE: CWE-294 (Authentication Bypass by Capture-replay)
+- Files: [`packages/server/src/notify_bridge_server/api/webhooks.py`](../../packages/server/src/notify_bridge_server/api/webhooks.py) — Gitea/Planka/Generic.
+- Scenario: An attacker who once intercepts a signed Gitea push event (network downgrade, log leak from a proxy, exfil from the Gitea side) can replay it indefinitely. The HMAC stays valid; the bridge has no nonce / timestamp window / delivery-ID cache. With a webhook that fires `assets_added` it's just noise. With a webhook that triggers an action (planka card-created → `/api/actions/{id}/execute` chained logic), it could be more.
+- Remediation: For Gitea, store the last N `X-Gitea-Delivery` UUIDs per provider and reject duplicates; cap with a partial unique index. For the generic webhook, add an optional `replay_window_seconds` + a timestamp-extracting JSONPath in the provider config. Constant-time string compare.
+
+#### M-5. `bcrypt.checkpw` dummy-hash literal is malformed
+
+- CWE: CWE-208 (Observable Timing Discrepancy) — partial.
+- Files: [`packages/server/src/notify_bridge_server/auth/routes.py:147-152`](../../packages/server/src/notify_bridge_server/auth/routes.py).
+- Scenario: When the username doesn't exist, the code calls `_verify_password(body.password, "$2b$12$" + "a" * 53)`. That hash is not a real bcrypt hash; `bcrypt.checkpw` raises `ValueError` which `_verify_password` swallows and returns `False`. The exception path is *faster* than a real bcrypt verify (no key schedule), so timing of "user does not exist" differs from "user exists, wrong password" — a maintainer changing the swallow behaviour later could regress this entirely.
+- Remediation: Cache one valid dummy bcrypt hash at module load time so the verify path actually runs the KDF.
+
+```python
+_DUMMY_BCRYPT_HASH = bcrypt.hashpw(b"x", bcrypt.gensalt()).decode()  # module load
+...
+password_ok = await _verify_password(
+    body.password,
+    user.hashed_password if user else _DUMMY_BCRYPT_HASH,
+)
+```
+
+#### M-6. Setup endpoint relies on `User.id != 0` filter — robust but a single typo breaks it
+
+- CWE: CWE-302 (Authentication Bypass) — defence-in-depth.
+- Files: [`packages/server/src/notify_bridge_server/auth/routes.py:97-119`](../../packages/server/src/notify_bridge_server/auth/routes.py).
+- Scenario: `POST /api/auth/setup` is gated by "no users with id != 0". The `__system__` sentinel is id=0. If a future migration changes the sentinel id, or the `WHERE` clause is dropped during a refactor, setup re-opens silently and an internet-reachable bridge would let an attacker claim the admin account.
+- Remediation: Add a defence-in-depth flag `AppSetting.setup_completed=true` set during the first successful setup, and require it to be unset (in addition to the count check). This bakes the invariant into a single boolean that's easier to audit.
+
+#### M-7. Anonymous Prometheus metrics endpoint leaks operational data
+
+- CWE: CWE-200 (Exposure of Sensitive Information to an Unauthorized Actor)
+- Files: [`packages/server/src/notify_bridge_server/api/metrics.py:138-159`](../../packages/server/src/notify_bridge_server/api/metrics.py).
+- Notes: This is **documented and gated** by `NOTIFY_BRIDGE_METRICS_ENABLED`, and the comment explicitly says scrapers don't authenticate. Acceptable when the API port is firewalled to the scraper. Surface it here as informational so an operator who exposes the API directly to the internet (e.g. via reverse-proxy without an ACL) doesn't accidentally expose dispatch rates, provider names, queue depths.
+- Remediation: keep the env flag, but additionally allow `metrics_basic_auth_user` / `metrics_basic_auth_password` as a soft credential check on the endpoint so a "default enabled, default protected" mode is possible. Document the threat in `OPERATIONS.md` next to the env var.
+
+---
+
+### LOW
+
+#### L-1. CSP allows `'unsafe-inline'` for scripts
+
+- CWE: CWE-1021 (Improper Restriction of Rendered UI Layers or Frames) — adjacent.
+- File: [`packages/server/src/notify_bridge_server/main.py:186-201`](../../packages/server/src/notify_bridge_server/main.py).
+- Notes: Comment explicitly justifies it — SvelteKit static adapter emits an inline bootstrap. Acceptable, but `'strict-dynamic'` with a per-page nonce (or moving the bootstrap into a hashed external module) eliminates the gap entirely. Track as INFO unless future XSS-injection paths emerge.
+
+#### L-2. CSP `style-src 'unsafe-inline'` allows inline-style XSS payloads
+
+- CWE: CWE-79 (Cross-site Scripting) — defence-in-depth.
+- Same file as L-1. Inline styles are not directly executable, but they are a known vector for click-jacking and data-exfil via CSS selectors. Same remediation path: nonce-based CSP.
+
+#### L-3. `frame-ancestors 'none'` but no `X-Frame-Options: DENY` collision (false — it is set)
+
+- INFO only. Both `X-Frame-Options: DENY` and `frame-ancestors 'none'` are set; modern browsers honour CSP, legacy ones honour XFO. Good.
+
+#### L-4. Webhook `_filter_headers` allowlist accepts unknown `X-*` headers
+
+- CWE: CWE-532
+- File: [`packages/server/src/notify_bridge_server/api/webhooks.py:361-374`](../../packages/server/src/notify_bridge_server/api/webhooks.py).
+- Notes: The filter strips known sensitive headers, then accepts any `X-*`. A custom auth header like `X-Custom-Authentication: <token>` would slip past the substring check if the name doesn't contain `auth`/`token`/`key`/`secret`/etc. Low risk because the well-known providers we support don't ship such headers, but a misconfigured generic webhook will leave a credential in the log row.
+- Remediation: invert the policy — explicit allowlist for known-safe `X-*` headers (e.g. `X-Forwarded-For` is also borderline since it can carry PII).
+
+#### L-5. `external_url` setting is not validated against an allow-list
+
+- CWE: CWE-918 (SSRF), CWE-79 (XSS in the rendered Telegram webhook URL).
+- File: [`packages/server/src/notify_bridge_server/api/app_settings.py:329-339`](../../packages/server/src/notify_bridge_server/api/app_settings.py) reads, [`packages/server/src/notify_bridge_server/api/telegram_bots.py:247`](../../packages/server/src/notify_bridge_server/api/telegram_bots.py) writes it into the registered Telegram webhook URL.
+- Notes: An admin can set `external_url` to anything. The value is used to build the URL passed to Telegram in `setWebhook`. Telegram itself enforces an HTTPS-only allow-list, so the actual risk is bounded. Still — validate scheme + host + that it doesn't include credentials or fragments.
+
+#### L-6. Bot token GET endpoint is intentional but worth auditing
+
+- File: [`packages/server/src/notify_bridge_server/api/telegram_bots.py:148-156`](../../packages/server/src/notify_bridge_server/api/telegram_bots.py).
+- Notes: `GET /api/telegram-bots/{bot_id}/token` returns the full Telegram bot token to the owner. Used by the frontend to construct webhook URLs. Limiting to a single short-lived nonce per `register_bot_webhook` flow would be safer than exposing the token directly. Currently INFO; revisit if a multi-user role model lands.
+
+#### L-7. SQLite journal mode + backup snapshot file permissions
+
+- File: [`packages/server/src/notify_bridge_server/database/snapshot.py:60-95`](../../packages/server/src/notify_bridge_server/database/snapshot.py).
+- Notes: Snapshots are written via `VACUUM INTO 'path'`. They land in `data_dir/backups/` with default umask permissions. In the Docker image the dir is owned by `appuser` and only that user runs the process, so this is fine. On a host bind-mount, an operator who forgets to lock down `/data` exposes every credential in every snapshot to anyone with shell access. Document this in `OPERATIONS.md`.
+
+#### L-8. No CSRF token on state-changing endpoints
+
+- CWE: CWE-352
+- Notes: The API uses `Authorization: Bearer <jwt>` exclusively (no cookies). Browsers don't auto-attach `Authorization` headers cross-origin, so this is **not** classical CSRF-exploitable. Combined with strict CORS (`allow_credentials=True`, explicit origin allowlist, wildcard rejected on startup) and the `Origin`/`Referer` same-host check on the backup endpoints, the practical risk is essentially zero. INFO only.
+
+---
+
+### INFO / NEEDS VERIFICATION
+
+#### N-1. Jinja2 `SandboxedEnvironment` is the standard sandbox — confirm it covers your threat model
+
+- The sandbox blocks `__class__`, `__mro__`, etc., but it is well-known that Jinja2's sandbox is not a security boundary against a determined attacker who can author templates. The threat model here is "templates are admin-authored, so we trust them but use the sandbox as defence-in-depth"; that is reasonable. Document explicitly in `OPERATIONS.md` that anyone with template-edit permission has effective RCE on the worker thread (`{{ foo.__init__.__globals__... }}` style escapes have been published in the past; new ones surface periodically).
+- Verification: run `bandit -r packages/` and `safety check` against pinned versions of `jinja2>=3.1`. Latest CVEs against Jinja2 sandbox: track `CVE-2024-34064` and any 2025+ disclosures. As of the review date there is no known unpatched sandbox-escape in `jinja2>=3.1.4`.
+
+#### N-2. `apscheduler<4`
+
+- Notes: The pin `apscheduler>=3.10,<4` keeps the bridge on the 3.x line, which is in maintenance. No known CVEs as of this review. Track when 4.x stabilises and migrate.
+
+#### N-3. `python-multipart>=0.0.9`
+
+- Notes: This package had high-severity bugs prior to 0.0.6. The minimum here is 0.0.9 — good.
+
+#### N-4. No signed-image / SBOM on the container
+
+- Notes: The `release.yml` workflow builds and pushes a multi-tag image but does not sign with cosign or emit an SBOM. For an internet-facing deployment, consider adding `cosign sign` against the image digest, and `syft packages` to emit an SBOM at release time. INFO only.
+
+#### N-5. Frontend dependencies are pinned via caret (`^`) ranges
+
+- Notes: `package.json` uses `^x.y.z`. CI builds `npm ci` from `package-lock.json`, so reproducibility is fine at build time. There is no `npm audit` step in `.gitea/workflows/build.yml`. Add `npm audit --audit-level=high` to the frontend build job.
+
+#### N-6. `NOTIFY_BRIDGE_ALLOW_PRIVATE_URLS=1` is a footgun
+
+- File: [`packages/core/src/notify_bridge_core/notifications/ssrf.py:39-52`](../../packages/core/src/notify_bridge_core/notifications/ssrf.py).
+- Notes: When set, the SSRF guard becomes a no-op. The warning at boot is the only mitigation. Acceptable for the documented homelab use-case; document that the env flag must NEVER be set on an internet-reachable instance, and consider refusing to enable it when `cors_allowed_origins` resolves to a non-loopback host (defence-in-depth interlock).
+
+#### N-7. Verify the auth flow at the WebSocket boundary
+
+- File: [`packages/core/src/notify_bridge_core/providers/home_assistant/client.py:54-83`](../../packages/core/src/notify_bridge_core/providers/home_assistant/client.py).
+- The `_ws_url_from_base` correctly strips userinfo before connecting and `_redact` defangs error messages — verify that `wss://` URLs go through SSRF validation (currently the HA URL is validated by `AnyHttpUrl` at config time but I did not find a call to `avalidate_outbound_url_full` on the HA WS connect path; the resolver would not pin a host the validator never saw).
+- Action: confirm by reading `ha_subscription.py` for explicit validation, or add a check that calls `avalidate_outbound_url_full` against the derived `ws_url` (treating `ws`/`wss` like `http`/`https` for the block-range check) before `ws_connect`.
+
+---
+
+## Prioritised Fix List (Top 10)
+
+1. **HIGH H-1** — Add `access_token` to the secret-mask list in `providers._provider_response` and the placeholder-drop list in `providers.update_provider`. Add a regression test that GETs an HA provider and asserts the response does not contain the cleartext token.
+2. **HIGH H-2** — Implement column-level encryption for `TelegramBot.token`, `MatrixBot` access tokens, `EmailBot.smtp_password`, and the sensitive keys inside `ServiceProvider.config`. Use Fernet with a key derived from `SECRET_KEY`. Write a one-shot migration.
+3. **MEDIUM M-1** — Replace the ad-hoc `SandboxedEnvironment(...).render()` calls in the four preview/test paths with the single hardened `render_template()` helper that already has timeout + size caps.
+4. **MEDIUM M-2** — Add per-username login lockout (TTL cache or DB-backed) on top of the per-IP `5/minute`. Log failed login attempts.
+5. **MEDIUM M-5** — Replace the malformed dummy bcrypt literal in `login()` with a real bcrypt hash computed once at module load so the timing-equalisation actually runs the KDF.
+6. **MEDIUM M-3** — Strengthen `_redact_sensitive_body` with a value-entropy heuristic and well-known token-prefix matching.
+7. **MEDIUM M-4** — Add replay protection on Gitea webhooks via the `X-Gitea-Delivery` header (small table + partial unique index).
+8. **MEDIUM M-7** — Make the metrics endpoint require either a flag or a Basic Auth credential; document in `OPERATIONS.md` that the API port should not be internet-exposed when metrics are on.
+9. **MEDIUM M-6** — Add a defence-in-depth `setup_completed` boolean in `app_setting` and check it in `/api/auth/setup` in addition to the count.
+10. **N-5** — Add `npm audit --audit-level=high` to the frontend build job in `.gitea/workflows/build.yml` so dependency CVEs land in CI.
+
+---
+
+## What was confirmed safe (worth keeping)
+
+- JWT design: HS256 with `iss`/`aud`/`exp`/`type`/`sub`/`ver`; refresh/access split; `token_version` revocation on role change, username change, and password change.
+- bcrypt with 72-byte length guard; CPU-bound work run in a thread.
+- SSRF guard with: scheme allowlist, IPv6-mapped-v4 unwrap, CGNAT block, IDN normalisation, async resolver, `PinnedResolver` to defeat DNS rebinding.
+- SQL access goes through SQLModel/SQLAlchemy with bind parameters; the only `f"..."` SQL is in DDL (column adds, index creates, `VACUUM INTO`) using server-controlled identifiers — sampled and clean.
+- Sandbox is `SandboxedEnvironment` everywhere a user-controllable template is rendered (six locations checked).
+- Frontend `{@html}` is wrapped in `sanitizePreview()` everywhere (`tracking-configs`, `template-configs`, `command-template-configs`).
+- Provider config secrets are masked on GET (except H-1).
+- `_resolve_backup_file` rejects `..`, NUL, separators, and enforces `relative_to(base)`.
+- CORS rejects wildcard with credentials at startup; secret_key default values are rejected with a clear error.
+- Docker: non-root user, `read_only: true`, `tmpfs: /tmp`, `no-new-privileges`, `cap_drop: ALL`, resource limits, healthcheck on `/api/ready`.
+- Logging: `SecretMaskingFilter` masks Telegram bot tokens, `Authorization`, `x-api-key`, `password`, `secret`, `access_token`, `refresh_token` from formatted messages, exception text, and stack traces.
+- Telegram webhook: secret token mandatory, refused on missing config, opaque `webhook_path_id` separate from bot token.
+- Inbound generic webhook: refuses `auth_mode="none"` unless an explicit acknowledgment field is set; auto-generates a strong secret if missing for `bearer_token`/`hmac_sha256`.
+- Inbound payload size capped at 1 MiB with a streaming check that doesn't trust `Content-Length`.
+
+---
+
+## Methodology
+
+- Manual code review of every authentication, authorization, webhook ingestion, template rendering, secret-handling, and outbound HTTP path under `packages/`.
+- Cross-checked CORS / CSP / security headers and rate-limiter configuration in `main.py` + `auth/routes.py`.
+- Sampled API routes for ownership enforcement (`get_owned_entity` / `_get_user_provider` / `_get_user_bot`) — all sampled routes apply it; no IDOR found.
+- Grepped for `Environment(` / `jinja2.Environment` / `f"..."` SQL / `{@html}` / `subprocess` / `eval` / `os.system` / known-bad patterns.
+- Reviewed CI workflows for secret leakage in env blocks and image-signing posture.
+- Reviewed Dockerfile + docker-compose for least-privilege and read-only root.
+- No dynamic testing performed; static review only. Run `pytest` (already gated in CI) + `bandit -r packages/` + `npm audit` in CI to backstop this review.
@@ -0,0 +1,408 @@
+# UI / UX Design Review — Notify Bridge frontend
+
+**Reviewed**: 2026-05-22
+**Scope**: SvelteKit frontend at `frontend/`, "Aurora / Glass" aesthetic, en + ru locales.
+**Reviewer method**: Read `app.css`, `+layout.svelte`, dashboard, login, setup, providers, targets, users, settings (parent), settings/IdentityCassette, notification-trackers, template-configs, actions, bots, plus shared components (Card, Button, Modal, ConfirmModal, AuthLayout, PageHeader, EmptyState, Loading, Snackbar). Cross-cutting Grep passes for inputs, border-radius, ARIA, sort, hex colors.
+
+---
+
+## Executive summary
+
+- **Aurora design language is real and distinctive.** Newsreader display serif + Geist variable sans + Geist Mono, conic-gradient brand orb, animated radial-gradient aurora background (`body::before` 28s drift), gradient pill chips, glow-pulse dots, and the lavender/orchid/mint/citrus/coral/sky palette together give the product a clear visual identity. This is **not** generic admin-template AI slop — the dashboard hero, signal-stream rows, provider deck, and the `PageHeader` "subpage-hero" pattern all carry intentional character that the user will remember.
+- **Consistency is the weakest axis.** Five overlapping card container abstractions (`.hero-card`, `.panel`, `.glass`, `Card.svelte`, settings `.cassette`/`.identity`) re-implement the same frosted-glass recipe with diverging radius (22 / 18 / 14 / 12 px) and padding (1.25/1.4 vs 1.3/1.4 vs 2/2.4 rem). A `--radius: 1rem` token is declared but unused. Pick one card module + one radius scale (e.g. `--radius-card: 22px`, `--radius-input: 12px`, `--radius-pill: 999px`).
+- **Forms have not been migrated to Aurora.** ~71 occurrences across 17 files still use the legacy raw class string `border border-[var(--color-border)] rounded-md text-sm bg-[var(--color-background)]` instead of the global `input { ... }` rule already in `app.css` (which uses `--color-input-bg`, `--color-rule-strong`, 0.625rem radius, glow focus ring). Result: rounded-md (6px) fields next to rounded-2xl (22px) cards, solid opaque backgrounds inside frosted-glass cards. Removing the override class would auto-restyle every form to match. **HIGH** priority, mostly mechanical.
+- **Hardcoded hex colors leak through.** Snackbar uses `#059669` / `#ef4444` / `#3b82f6` / `#f59e0b` instead of `--color-mint/coral/sky/citrus`. ConfirmModal uses a raw `rgba(239, 68, 68, 0.3)` glow. Actions page uses `#059669` for the enabled dot. All bypass theming — they will look wrong in light theme.
+- **Snackbar is invisible to screen readers.** No `role="status"` / `aria-live="polite"` / `aria-live="assertive"` on the toast container. Critical confirmations (saved, deleted, error) are never announced. **HIGH** accessibility fix, one-line.
+- **No `aria-current="page"` anywhere in the nav** — active state is conveyed only visually (border-radius bar + glow). Active state has no accessible name.
+- **No sortable columns, no multi-select bulk actions, anywhere in the app.** Lists rely entirely on `IconGridSelect` sort widgets (newest / oldest, etc.) and per-row icon buttons. For a notification routing system that may accumulate dozens of trackers / targets / configs, this scales poorly.
+- **Localization parity is solid string-for-string** (en.json = ru.json = 1577 lines). Russian renders the same characters but several places (hero title, brand row with provider name, stat-card label/value flex) have no length-guard for the longer Russian translations — visible truncation/wrapping likely.
+- **Onboarding is a single screen.** After `/setup` lands you on `/` with `0 providers` and a hero saying "all clear" — the most important first-run moment shows nothing to do. No checklist, no empty-dashboard CTA panel, no tour.
+- **Power-user feature standout**: ⌘K SearchPalette is present and wired through the topbar, global provider filter, and reduced-motion media-query support. These three deserve credit and should be more discoverable (no in-app hint they exist).
+
+---
+
+## Findings by area
+
+### 1. Design quality vs generic AI aesthetic
+
+#### F-DESIGN-01 — Aurora identity is strong and self-consistent at the macro level [LOW / commendation]
+
+- **Files**: [`frontend/src/app.css`](frontend/src/app.css), [`frontend/src/routes/+layout.svelte`](frontend/src/routes/+layout.svelte), [`frontend/src/routes/+page.svelte`](frontend/src/routes/+page.svelte)
+- **State**: Newsreader display serif italic with linear-gradient text-clip is used in hero titles, panel titles, modal titles. Conic brand orb is unique. Aurora drift on body::before is a 28s slow loop that's never busy. The "signal" / "wires" / "on watch" / "pulse" / "stream" / "compose" semantic naming on the dashboard is editorial, not generic admin copy.
+- **Verdict**: Keep all of this. Lean *further* into it on the subpages — most list pages currently default back to plain "PageHeader + Card list" without inheriting the dashboard's editorial flavor.
+
+#### F-DESIGN-02 — Italic-serif emphasis loses impact on smaller subpage titles [LOW]
+
+- **Files**: [`frontend/src/lib/components/PageHeader.svelte`](frontend/src/lib/components/PageHeader.svelte) (lines 132–147)
+- **State**: `subpage-hero__title` is 2.15rem with italic emphasis on a gradient. At that size the gradient italic word is legible but loses the editorial drama it has at the 3rem dashboard hero. Russian translations (`em` words like *«операторы»*) sometimes look cramped because letter-spacing -0.025em is shared with the much larger dashboard hero.
+- **Suggestion**: Use a separate letter-spacing scale per font size step, or drop italic emphasis on titles below ~2rem and use color-only emphasis there.
+
+---
+
+### 2. Visual consistency
+
+#### F-CONSIST-01 — Five overlapping card abstractions [HIGH]
+
+- **Files**: [`frontend/src/app.css`](frontend/src/app.css) `.glass`, [`frontend/src/lib/components/Card.svelte`](frontend/src/lib/components/Card.svelte), [`frontend/src/lib/components/PageHeader.svelte`](frontend/src/lib/components/PageHeader.svelte) `.subpage-hero`, [`frontend/src/routes/+page.svelte`](frontend/src/routes/+page.svelte) `.hero-card` / `.panel` / `.stat-card`, [`frontend/src/routes/settings/IdentityCassette.svelte`](frontend/src/routes/settings/IdentityCassette.svelte) `.identity` + `.glass`
+- **State**: Six places re-declare the same recipe: `background: var(--color-glass); backdrop-filter: blur(28px) saturate(160%); border: 1px solid var(--color-border); border-radius: 22px; box-shadow: var(--shadow-card);` followed by an `::after` highlight overlay. Card.svelte even has its own 22px radius next to the global `.glass` 22px radius — they would diverge silently if either gets touched.
+- **Suggestion**: Consolidate into one `<GlassPanel>` component (or `.glass-card` utility) with variants `default | hero | panel | cassette` for padding/radius differences. Delete the duplicated `::after` overlays. The pattern is good — it's just *copy-pasted* 5+ times.
+
+#### F-CONSIST-02 — Border-radius drift, no scale [HIGH]
+
+- **Files**: [`frontend/src/routes/+layout.svelte`](frontend/src/routes/+layout.svelte), [`frontend/src/routes/+page.svelte`](frontend/src/routes/+page.svelte), [`frontend/src/app.css`](frontend/src/app.css)
+- **State**: Radii used: 22, 18, 14, 12, 11, 10, 9, 8, 7, 6, 3, 2 px + 0.3, 0.5, 0.625, 0.85, 1 rem + 9999px. `--radius: 1rem` is declared in the theme but only re-declared — no component reads it.
+- **Suggestion**: Define and *use* `--radius-card: 22px; --radius-panel: 18px; --radius-pill: 999px; --radius-input: 12px; --radius-chip: 8px; --radius-tile: 6px;`. Refactor in passes — start with `Card.svelte`, `Button.svelte`, `Modal.svelte`, `ConfirmModal.svelte`.
+
+#### F-CONSIST-03 — Hardcoded hex colors bypass theming [HIGH]
+
+- **Files**:
+  - [`frontend/src/lib/components/Snackbar.svelte`](frontend/src/lib/components/Snackbar.svelte) lines 26–31: `#059669 / #ef4444 / #3b82f6 / #f59e0b`
+  - [`frontend/src/lib/components/ConfirmModal.svelte`](frontend/src/lib/components/ConfirmModal.svelte) line 70: `box-shadow: 0 0 16px rgba(239, 68, 68, 0.3)`
+  - [`frontend/src/routes/actions/+page.svelte`](frontend/src/routes/actions/+page.svelte) line 379: `style="background: {action.enabled ? '#059669' : 'var(--color-muted-foreground)'}"`
+  - 25 files in `frontend/src/routes/**` contain `#xxx` literals
+- **State**: These colors are NOT the Aurora palette — `#059669` is emerald-600, our mint is `#7ee8c4`. In light theme the user sees green-on-green that wasn't intended.
+- **Suggestion**: Replace all status hexes with `--color-mint/coral/sky/citrus/orchid`. Add a stylelint rule `color-no-hex` scoped to `src/**/*.svelte` to prevent regression.
+
+#### F-CONSIST-04 — Form input styling not migrated to Aurora [HIGH]
+
+- **Files**: 17 routes, ~71 occurrences. Examples: [`frontend/src/routes/users/+page.svelte`](frontend/src/routes/users/+page.svelte) lines 137, 141, 190, 207; [`frontend/src/routes/providers/+page.svelte`](frontend/src/routes/providers/+page.svelte) lines 303, 309, 323, 333; [`frontend/src/routes/notification-trackers/TrackerForm.svelte`](frontend/src/routes/notification-trackers/TrackerForm.svelte); [`frontend/src/routes/targets/TargetForm.svelte`](frontend/src/routes/targets/TargetForm.svelte).
+- **State**: `class="w-full px-3 py-2 border border-[var(--color-border)] rounded-md text-sm bg-[var(--color-background)]"` is repeated 71+ times. This overrides the global `input { ... }` rule that *already* uses Aurora glass styling.
+- **Suggestion**: Delete the class string in all these places. The global rule kicks in and forms instantly look correct. Cross-check that `Tailwind`'s preflight isn't interfering. Spot-check one page (e.g. `users/+page.svelte`), confirm visually, then mass-delete via Grep/Edit.
+
+#### F-CONSIST-05 — ConfirmModal duplicates Button.svelte logic [MEDIUM]
+
+- **Files**: [`frontend/src/lib/components/ConfirmModal.svelte`](frontend/src/lib/components/ConfirmModal.svelte)
+- **State**: Its `.confirm-btn-cancel` and `.confirm-btn-delete` re-implement what `Button variant="secondary"` and `Button variant="danger"` already provide. The danger button even uses raw `rgba(239,68,68,...)` instead of `--color-error-fg`.
+- **Suggestion**: `<Button variant="secondary" onclick={oncancel}>{cancel}</Button>` and `<Button variant="danger" onclick={onconfirm}>{confirm}</Button>`. Removes ~35 lines of CSS.
+
+#### F-CONSIST-06 — AuthLayout uses a different glass recipe [MEDIUM]
+
+- **Files**: [`frontend/src/lib/components/AuthLayout.svelte`](frontend/src/lib/components/AuthLayout.svelte) (line 68 `.auth-card`)
+- **State**: `border-radius: 1rem`, `padding: 2rem`, `backdrop-filter: blur(8px)` (vs the 28px elsewhere), plus its own auth-bg gradient mesh + 32px-grid background that nothing else in the app uses. Has its own `.auth-input` / `.auth-submit` / `.auth-label` / `.auth-error` design language.
+- **State pt 2**: Login/setup ends up looking *more* like generic SaaS than the dashboard does. The brand orb from the sidebar isn't on the login screen — instead a small lavender mdi-lan icon in a square.
+- **Suggestion**: Reuse the conic brand orb. Use the same glass recipe (28px blur, 22px radius) for `.auth-card`. Either drop the dot-grid `.auth-grid` (it reads as a generic "futuristic SaaS" template) or use it as a deliberate flair on the dashboard hero too.
+
+---
+
+### 3. Information hierarchy
+
+#### F-HIER-01 — Stat cards do triple duty (KPI + nav link + filter context) without ranking [MEDIUM]
+
+- **Files**: [`frontend/src/routes/+page.svelte`](frontend/src/routes/+page.svelte) lines 571–645
+- **State**: All four stat cards have the same visual weight, same accent intensity (`STAT_ACCENTS[idx]`), and rotate accents by index. When the global provider filter is active the first stat card morphs into a "literal value" card showing provider name (1rem font, very different visual). The accent rotation creates a rainbow row that doesn't carry meaning — events `total` has no semantic reason to be orchid vs. providers being lavender.
+- **Suggestion**: Tie accent color to entity type (providers=primary, trackers=mint, targets=sky, throughput=citrus) so the same accent recurs throughout the app for the same concept. Keep the morph behavior but design a distinct "filtered context" stat-card variant — a smaller, narrower chip — so it doesn't compete visually.
+
+#### F-HIER-02 — Hero title and meter compete for attention at desktop width [LOW]
+
+- **Files**: [`frontend/src/routes/+page.svelte`](frontend/src/routes/+page.svelte) lines 1047–1068, 1078–1086
+- **State**: Both the `.hero-title` and `.hero-meter-value` are 3rem 500-weight in two different fonts. Side-by-side they create two focal points.
+- **Suggestion**: Shrink `.hero-meter-value` to 2.4rem and use it as a *secondary* read; let the editorial title be the single dominant element.
+
+#### F-HIER-03 — Pulse chart panel rarely meaningful on first launch [LOW]
+
+- **Files**: [`frontend/src/routes/+page.svelte`](frontend/src/routes/+page.svelte) lines 909–927
+- **State**: On a fresh install the chart is an empty 0-events grid taking 250-400px vertical space. No empty-state copy inside `EventChart`.
+- **Suggestion**: When `chartDays` has all-zero values, replace with a small "No events recorded in the last 30 days — once a tracker fires, the pulse will appear here" inline empty state.
+
+---
+
+### 4. Navigation & wayfinding
+
+#### F-NAV-01 — No `aria-current="page"` on active nav links [HIGH a11y]
+
+- **Files**: [`frontend/src/routes/+layout.svelte`](frontend/src/routes/+layout.svelte) lines 498–533, 591–597, 632–658
+- **State**: Active state is conveyed via `.active` class + a gradient left-bar div. Screen readers cannot announce it. Grep for `aria-current` across the whole frontend: zero matches.
+- **Suggestion**: Add `aria-current={isActive(child.href) ? 'page' : undefined}` to every nav `<a>`.
+
+#### F-NAV-02 — No breadcrumb on subpages [MEDIUM]
+
+- **Files**: [`frontend/src/lib/components/PageHeader.svelte`](frontend/src/lib/components/PageHeader.svelte)
+- **State**: The `crumb` prop only renders a single mono-uppercase tag (e.g. "ROUTING · AUTOMATION") — it's decorative, not navigational. There's no actual breadcrumb chain. For `/template-configs`, `/command-template-configs`, `/tracking-configs`, `/command-configs`, etc., a user landing via deep link has no parent-link to return to.
+- **Suggestion**: Make the crumb a real breadcrumb (≤3 levels: `Notifications → Templates` or `Commands → Configs`). Render the prior level as a clickable `<a>`.
+
+#### F-NAV-03 — Deep linking via `?type=<targetType>` and `?tab=<botType>` doesn't update page title [LOW]
+
+- **State**: `/targets?type=email` and `/bots?tab=matrix` change the active sidebar item but the `<PageHeader>` title for those pages is generic ("Targets" / "Bots").
+- **Suggestion**: When `activeType` is set, derive the title from it: "Email targets" / "Matrix bots". Improves browser tab titles and the in-page title.
+
+#### F-NAV-04 — Collapsed sidebar tooltip wraps for long Russian translations [LOW]
+
+- **State**: Tooltips for collapsed sidebar nav items use the browser-native `title=` attribute, which gives no glass-style chip. They will use the OS tooltip styling, which clashes with the Aurora aesthetic and clips long ru labels.
+- **Suggestion**: Build a small custom tooltip component (or use existing portal helper) for collapsed-sidebar nav. Keep `title` as fallback for `prefers-reduced-motion` users.
+
+---
+
+### 5. Form UX
+
+#### F-FORM-01 — No inline field-level validation, only post-submit error banners [MEDIUM]
+
+- **Files**: [`frontend/src/routes/providers/+page.svelte`](frontend/src/routes/providers/+page.svelte), [`frontend/src/routes/users/+page.svelte`](frontend/src/routes/users/+page.svelte), [`frontend/src/routes/targets/TargetForm.svelte`](frontend/src/routes/targets/TargetForm.svelte)
+- **State**: Forms rely on HTML5 `required` / `minlength` browser validation plus a single `ErrorBanner` shown after submit failure. Native browser validation tooltips are pale and don't match Aurora.
+- **Suggestion**: Add a per-field `<FieldError>` slot below labels for inline validation (URL syntax, email format, port range). The settings page already has a nice pattern (`url-field-valid` class on `IdentityCassette`) — generalize it.
+
+#### F-FORM-02 — Save feedback inconsistent across pages [MEDIUM]
+
+- **Files**: Settings uses a sticky `SaveBar` with dirty tracking ([`frontend/src/routes/settings/+page.svelte`](frontend/src/routes/settings/+page.svelte) lines 77–84, 208–214). Most other forms have inline Save buttons inside the card. Some show snackbar success ("snack.userCreated"), some don't.
+- **Suggestion**: Standardize: (a) inline "Save" inside the card *plus* (b) snackbar success message *plus* (c) optional sticky SaveBar for multi-field admin forms. Document the pattern in `.claude/docs/frontend-architecture.md`.
+
+#### F-FORM-03 — Forms auto-name from descriptor but offer no way to unlock it back to auto-name [LOW]
+
+- **Files**: [`frontend/src/routes/providers/+page.svelte`](frontend/src/routes/providers/+page.svelte) lines 136–141 + 303; [`frontend/src/routes/actions/+page.svelte`](frontend/src/routes/actions/+page.svelte) lines 50–56
+- **State**: Once user types in the Name field, `nameManuallyEdited` becomes true and the auto-fill stops permanently — no way to ask "go back to default name".
+- **Suggestion**: Add a tiny "↺ reset" link next to the name input when `nameManuallyEdited && form.name !== descriptor.defaultName`.
+
+#### F-FORM-04 — No optimistic UI; rows disappear / appear only after server roundtrip [LOW]
+
+- **State**: After delete/create, pages refetch via `cache.fetch(true)`. Visible 200-400ms blank state.
+- **Suggestion**: Optimistic insert/remove in the cache stores, with snackbar undo for destructive ops.
+
+#### F-FORM-05 — Login form omits `autofocus` on username [LOW]
+
+- **Files**: [`frontend/src/routes/login/+page.svelte`](frontend/src/routes/login/+page.svelte) line 99
+- **Suggestion**: Add `autofocus` to the username input. Saves one keystroke on every login.
+
+---
+
+### 6. Modals & overlays
+
+#### F-MODAL-01 — Modal.svelte is well-built [LOW / commendation]
+
+- **Files**: [`frontend/src/lib/components/Modal.svelte`](frontend/src/lib/components/Modal.svelte)
+- **State**: Portal mount, focus trap, focus restoration, Escape, Tab cycling, `aria-modal="true"`, `aria-labelledby`, body scroll containment via `overscroll-behavior: contain`, transition (250ms in/out), 80vh max-height. This is the strongest single component in the codebase.
+- **Verdict**: Reuse as the foundation for every overlay. Currently `BlockedByModal`, `EventDetailModal`, `SharedLinkModal`, `ConfirmModal` all do — good.
+
+#### F-MODAL-02 — Modal backdrop has `role="button"` [LOW]
+
+- **Files**: [`frontend/src/lib/components/Modal.svelte`](frontend/src/lib/components/Modal.svelte) line 96
+- **State**: The backdrop is a `<div>` with `role="button"`, `tabindex="-1"`, and an onclick to close. That's a common pattern to silence Svelte's a11y warnings, but a screen reader announces "Close, button" twice (once for backdrop, once for the explicit X button).
+- **Suggestion**: Drop `role="button"` and `aria-label` from the backdrop; the explicit Close button is enough. Or use `<button class="modal-backdrop">` instead of a div.
+
+#### F-MODAL-03 — Modal panel uses solid `#131520` instead of glass [LOW]
+
+- **Files**: [`frontend/src/lib/components/Modal.svelte`](frontend/src/lib/components/Modal.svelte) lines 150–151
+- **State**: `--modal-solid-bg: #131520;` is a deliberate choice (probably for readability) but it breaks visual consistency with the rest of the app. The Aurora drift behind it is invisible.
+- **Suggestion**: Use `var(--color-glass-elev)` over the blurred backdrop. Or, if the solid choice was deliberate, document why so the next developer doesn't "fix" it.
+
+#### F-MODAL-04 — Confirm-modal "delete" hover uses raw rgba [MEDIUM]
+
+- **Files**: [`frontend/src/lib/components/ConfirmModal.svelte`](frontend/src/lib/components/ConfirmModal.svelte) line 70
+- **State**: `box-shadow: 0 0 16px rgba(239, 68, 68, 0.3);` — not themed.
+- **Suggestion**: Use `box-shadow: 0 0 16px color-mix(in srgb, var(--color-coral) 40%, transparent);`.
+
+---
+
+### 7. Empty / loading / error states
+
+#### F-STATE-01 — `Loading.svelte` is a single shimmer pattern [MEDIUM]
+
+- **Files**: [`frontend/src/lib/components/Loading.svelte`](frontend/src/lib/components/Loading.svelte)
+- **State**: Three or four 4rem shimmer bars. Used as `<Loading />` on virtually every page including hero pages. Doesn't match the actual layout the user will see — looks like a row list even on settings.
+- **Suggestion**: Add layout-aware variants: `<Loading shape="hero" />`, `<Loading shape="grid" cols={4} />`, `<Loading shape="list" rows={5} />`. Reduces layout shift on first paint.
+
+#### F-STATE-02 — `EmptyState.svelte` is plain and undifferentiated [MEDIUM]
+
+- **Files**: [`frontend/src/lib/components/EmptyState.svelte`](frontend/src/lib/components/EmptyState.svelte)
+- **State**: 10-line component: dimmed icon + message. No CTA, no illustration, no flavor. The dashboard's inline `.empty-state` (lines 1300–1319 of `+page.svelte`) is richer (has a CTA link) but isn't reused.
+- **Suggestion**: Extend `EmptyState` to accept a `cta` slot and a `tone` (with subtle gradient blob behind the icon). On `/providers` empty: "No providers yet — connect Immich, Nextcloud, or Home Assistant to start tracking events" with an "+ Add provider" CTA.
+
+#### F-STATE-03 — Many list pages have no error-recovery action [MEDIUM]
+
+- **Files**: Throughout — most pages have a `loadError` state that renders `<Card><ErrorBanner /></Card>` but no "Retry" button.
+- **Suggestion**: `ErrorBanner` should accept an `onRetry` prop and surface a retry button. Standardize across pages.
+
+#### F-STATE-04 — `EventChart` no empty state [LOW]
+
+- See F-HIER-03.
+
+---
+
+### 8. Accessibility
+
+#### F-A11Y-01 — Snackbar has no aria-live [HIGH]
+
+- **Files**: [`frontend/src/lib/components/Snackbar.svelte`](frontend/src/lib/components/Snackbar.svelte) lines 35–63
+- **State**: Snack container is a plain `<div use:portal>`. Success / error toasts never reach screen readers. Three other files have proper aria-live; this critical one doesn't.
+- **Fix**: `<div use:portal class="snackbar-container" role="region" aria-live="polite" aria-label={t('snackbar.region')}>`. Use `aria-live="assertive"` for `snack.type === 'error'`.
+
+#### F-A11Y-02 — No `aria-current="page"` on nav links [HIGH]
+
+- See F-NAV-01.
+
+#### F-A11Y-03 — Custom focus outlines partially overridden [MEDIUM]
+
+- **Files**: [`frontend/src/app.css`](frontend/src/app.css) lines 237–241 (global `button:focus-visible` outline 2px primary + offset 2px), [`frontend/src/routes/+layout.svelte`](frontend/src/routes/+layout.svelte) line 894 (`.nav-link { border-radius: 12px !important }`), [`frontend/src/routes/+page.svelte`](frontend/src/routes/+page.svelte) lines 1351–1354 (`.signal-row--clickable:focus-visible { outline-offset: -2px }`).
+- **State**: Inverted offset `-2px` makes the focus ring sit *inside* the row, which against the glass-strong hover-bg ends up nearly invisible at certain accent positions.
+- **Suggestion**: Use `outline-offset: 2px` consistently with a `box-shadow: 0 0 0 2px var(--color-glass)` ringer if needed for contrast.
+
+#### F-A11Y-04 — `prefers-reduced-motion` is honored — commendation [LOW]
+
+- **Files**: [`frontend/src/app.css`](frontend/src/app.css) lines 484–507, [`frontend/src/routes/+layout.svelte`](frontend/src/routes/+layout.svelte) lines 837–840
+- **State**: Aurora drift, brand-version pulse, stagger entrances, signal-row hover transitions, paginator transitions all gated. Smooth scroll override too. Solid implementation.
+
+#### F-A11Y-05 — Color contrast risk on glass surfaces [MEDIUM]
+
+- **State**: `--color-muted-foreground: #b6b2d4` on `--color-glass: rgba(255,255,255,0.04)` over the aurora gradient. In the brightest hot-spot of the aurora background (where the `#b8a7ff` lavender peaks), `#b6b2d4` may fail WCAG AA (4.5:1 for body text). Hasn't been measured.
+- **Suggestion**: Run a contrast pass with `--color-muted-foreground` against the brightest part of the aurora background. Likely need to bump it to ~`#cfcae8` for dark theme.
+
+#### F-A11Y-06 — Toggle switch has no label association [LOW]
+
+- **Files**: [`frontend/src/app.css`](frontend/src/app.css) lines 513–556
+- **State**: `.toggle-switch` wraps an `<input type="checkbox">` and a visual `.toggle-track` `<span>`. There's no visible label text or `aria-label` requirement in the global utility. Callers may forget to pass one.
+- **Suggestion**: Lift into a `<Toggle>` component requiring a `label` prop.
+
+---
+
+### 9. Responsive design
+
+#### F-RESP-01 — Sidebar collapse breakpoint is fine; mobile bottom nav covers gracefully [LOW / commendation]
+
+- **Files**: [`frontend/src/routes/+layout.svelte`](frontend/src/routes/+layout.svelte) lines 589–668, 1136–1168
+- **State**: Below 767px the desktop sidebar hides and mobile bottom-nav appears with primary 4 keys + search + more. Mobile "More" panel mirrors the full desktop tree. Solid.
+
+#### F-RESP-02 — Hero meter wraps awkwardly between 720–880px [LOW]
+
+- **Files**: [`frontend/src/routes/+page.svelte`](frontend/src/routes/+page.svelte) lines 1119–1130
+- **State**: Below 880px the hero collapses to one column, but the meter row pills wrap to a third row on Russian translations of "providers/targets/armed".
+- **Suggestion**: Add an intermediate breakpoint (`max-width: 1024px`) where pill labels switch from `"5 providers"` to a tooltip-only count.
+
+#### F-RESP-03 — Stat-card grid drops to 1 column at sm: [MEDIUM]
+
+- **Files**: [`frontend/src/routes/+page.svelte`](frontend/src/routes/+page.svelte) line 590 `grid-cols-1 sm:grid-cols-2 lg:grid-cols-4`
+- **State**: Between 640–1024px stat cards are 2-wide. At tablet sizes the cards become huge and dilute the dashboard density.
+- **Suggestion**: Cap stat-card max-width at ~300px or switch to `auto-fit, minmax(200px, 1fr)` so they don't grow uncontrollably.
+
+#### F-RESP-04 — List rows don't gracefully truncate webhook URLs on mobile [LOW]
+
+- **Files**: [`frontend/src/routes/providers/+page.svelte`](frontend/src/routes/providers/+page.svelte) lines 392–410
+- **State**: Secondary text line shows full webhook URL with `break-all` which on very narrow viewports gives a 4-line wrap.
+- **Suggestion**: Use the `shortenUrl()` helper (already defined for the meta-tile path) on the narrow-screen secondary line too.
+
+---
+
+### 10. Onboarding
+
+#### F-ONBOARD-01 — Setup → empty dashboard with no guidance [HIGH]
+
+- **Files**: [`frontend/src/routes/setup/+page.svelte`](frontend/src/routes/setup/+page.svelte), [`frontend/src/routes/+page.svelte`](frontend/src/routes/+page.svelte)
+- **State**: After `/setup` the user lands on `/` with 0 providers, hero says *"all clear"* (literally "Nothing to do"). Wasted first impression.
+- **Suggestion**: First-run detection (`providersCache.items.length === 0 && targetsCache.items.length === 0`) replaces the dashboard hero with a 3-4 step "Getting started" checklist: (1) Add a provider · (2) Connect a bot · (3) Create a target · (4) Wire your first tracker. Each step is a CTA card. Persist completion to localStorage so it disappears once finished.
+
+#### F-ONBOARD-02 — No in-app discovery of ⌘K palette [MEDIUM]
+
+- **Files**: [`frontend/src/routes/+layout.svelte`](frontend/src/routes/+layout.svelte) lines 678–682
+- **State**: Topbar shows `⌘K` / `Ctrl K` chip but only that. No "Press ⌘K to jump to any page" hint anywhere.
+- **Suggestion**: First-visit toast: "Tip: Press ⌘K from anywhere to search providers, trackers, and pages". Dismissible.
+
+#### F-ONBOARD-03 — Login screen has no help / forgot-password / docs link [LOW]
+
+- **Files**: [`frontend/src/routes/login/+page.svelte`](frontend/src/routes/login/+page.svelte)
+- **State**: Plain username + password. For self-hosted users who lost the admin password, there's no link to the recovery docs.
+- **Suggestion**: Small "Need help?" link to docs (the `/docs` route exists).
+
+---
+
+### 11. Microcopy
+
+#### F-COPY-01 — Dashboard hero copy is editorial — commendation [LOW]
+
+- "Live · throughput 24h · armed · providers" reads more like a control-room dashboard than CRUD admin. Keep doing this on the rest of the app.
+
+#### F-COPY-02 — Many subpages use literal entity-name copy [MEDIUM]
+
+- E.g. "Add provider" / "Add target" / "Add tracker" / "Add user". Editorial would be "Connect a provider" / "Define a target" / "Wire a tracker" / "Invite a user". Lean into verbs that match the dashboard's "wires / signals / on watch" vocabulary.
+
+#### F-COPY-03 — Russian translations match en line-count but no length QA visible [LOW]
+
+- File sizes match exactly (1577 lines each). That's just structural parity, not visual parity. Russian tends to be 20-30% longer for the same concept; flagged places likely have layout issues (hero title em, stat-card values, sidebar nav labels).
+- **Suggestion**: Set up a Playwright snapshot test that switches locale=ru and screenshots dashboard + a representative list page to catch overflow visually.
+
+---
+
+### 12. Localization parity
+
+#### F-LOCALE-01 — "Notify Bridge" wordmark stays in English [LOW / correct]
+
+- Brand. Don't translate.
+
+#### F-LOCALE-02 — Provider type label not localized in list rows [LOW]
+
+- **Files**: [`frontend/src/routes/providers/+page.svelte`](frontend/src/routes/providers/+page.svelte) line 391
+- **State**: Type pill shows raw `provider.type` value (e.g. "immich", "nextcloud") — not localized.
+- **Suggestion**: Use `getDescriptor(type).defaultName` or `t(\`providers.type${PascalName}\`)` which exists per project conventions.
+
+#### F-LOCALE-03 — Mixed Cyrillic glitches in source [LOW]
+
+- **Files**: [`frontend/src/routes/login/+page.svelte`](frontend/src/routes/login/+page.svelte) line 42 (`вЂ”` instead of em-dash in a comment), [`frontend/src/routes/users/+page.svelte`](frontend/src/routes/users/+page.svelte) line 166 (`В·` instead of `·`)
+- **State**: Encoding-corrupt characters in source comments and one user-facing dot. Pre-existing — files were probably edited with the wrong encoding at some point.
+- **Suggestion**: Grep `вЂ` / `В·` across the repo and fix. Add a pre-commit hook that fails on non-UTF8 chars in `.svelte` / `.ts` / `.json`.
+
+---
+
+### 13. Power-user features
+
+#### F-POWER-01 — No sortable columns anywhere [MEDIUM]
+
+- Confirmed by Grep: no `aria-sort` / `sortable` / `onSort` in the codebase. Lists are sorted by `IconGridSelect` widget (newest / oldest / name).
+- **Suggestion**: For long lists (trackers, targets), add column-header sort affordance. Even minimal: clicking the "Name" or "Provider" header re-sorts. Use cache state so sort persists across nav.
+
+#### F-POWER-02 — No multi-select bulk actions [MEDIUM]
+
+- Grep for `bulkAction` / `selectAll`: only the locale files contain those strings (likely as i18n keys that are never used). No checkbox UI.
+- **Suggestion**: Add a checkbox column on `targets`, `notification-trackers`, `command-trackers`, `actions` pages. Bulk-enable / bulk-delete are the obvious ones.
+
+#### F-POWER-03 — ⌘K palette is the strongest power feature, under-promoted [MEDIUM]
+
+- See F-ONBOARD-02.
+
+#### F-POWER-04 — Sidebar group expand/collapse is persisted but no "expand all / collapse all" [LOW]
+
+- **Files**: [`frontend/src/routes/+layout.svelte`](frontend/src/routes/+layout.svelte) lines 263–269
+- **Suggestion**: Add a right-click menu on a group header, or a tiny "collapse all" icon at the bottom of the nav rail.
+
+#### F-POWER-05 — No keyboard shortcuts beyond ⌘K [LOW]
+
+- **Suggestion**: `n` for new, `g + p` for "go providers", `g + t` for trackers, `?` to show shortcut sheet. Document in the palette.
+
+---
+
+## Production polish checklist (top 15, prioritized)
+
+1. **[HIGH]** Add `role="status" aria-live="polite"` to Snackbar container; `assertive` for error toasts. (F-A11Y-01) — one-line fix.
+2. **[HIGH]** Add `aria-current="page"` to every nav link in `+layout.svelte`. (F-NAV-01, F-A11Y-02)
+3. **[HIGH]** Mass-replace the legacy form-input class (`border border-[var(--color-border)] rounded-md text-sm bg-[var(--color-background)]`) with nothing — let the global `input { ... }` style win. 17 files, ~71 occurrences. (F-CONSIST-04)
+4. **[HIGH]** Replace hardcoded hex colors (`#059669`, `#ef4444`, `#3b82f6`, `#f59e0b`, `rgba(239,68,68,...)`) with Aurora palette tokens in `Snackbar.svelte`, `ConfirmModal.svelte`, `actions/+page.svelte`, and any remaining sites. (F-CONSIST-03)
+5. **[HIGH]** First-run onboarding: when `providersCache.items.length === 0`, replace dashboard hero with a 4-step "Getting started" checklist. (F-ONBOARD-01)
+6. **[HIGH]** Consolidate the 5 glass-card abstractions into a single `<GlassPanel variant=...>` component; delete redundant `::after` overlays. (F-CONSIST-01)
+7. **[HIGH]** Introduce a radius scale (`--radius-card / panel / pill / input / chip / tile`) and refactor `Card.svelte`, `Button.svelte`, `Modal.svelte`, `ConfirmModal.svelte` to use it. (F-CONSIST-02)
+8. **[MEDIUM]** Rewrite `ConfirmModal.svelte` to use `<Button variant="secondary">` and `<Button variant="danger">` instead of its own buttons. (F-CONSIST-05)
+9. **[MEDIUM]** Add layout-aware `<Loading shape="hero|grid|list">` variants to reduce first-paint layout shift. (F-STATE-01)
+10. **[MEDIUM]** Extend `<EmptyState>` with `cta` slot and provider-/tracker-/target-specific copy + a contextual CTA. (F-STATE-02)
+11. **[MEDIUM]** Visual length-QA pass for Russian — at least dashboard hero, providers list, settings hero, stat-cards. Playwright screenshot test. (F-COPY-03, F-LOCALE-02)
+12. **[MEDIUM]** Implement column-header sort on `notification-trackers`, `targets`, `actions`. Persist in cache state. (F-POWER-01)
+13. **[MEDIUM]** Add multi-select bulk actions (enable/disable, delete) to `targets`, `notification-trackers`, `command-trackers`. (F-POWER-02)
+14. **[MEDIUM]** Audit contrast: `--color-muted-foreground` over brightest aurora peak; likely bump dark-theme value from `#b6b2d4` to ~`#cfcae8`. (F-A11Y-05)
+15. **[MEDIUM]** Replace inline browser-native `title=` tooltips on the collapsed sidebar with a custom Aurora-styled tooltip (using the existing portal helper). (F-NAV-04)
+
+### Quick wins (bonus, under an hour each)
+
+- Add `autofocus` to the username input on `/login`. (F-FORM-05)
+- Fix `вЂ"` / `В·` Cyrillic encoding glitches in `login/+page.svelte` and `users/+page.svelte`. (F-LOCALE-03)
+- Drop `role="button"` from Modal backdrop. (F-MODAL-02)
+- Replace `provider.type` raw label in provider list rows with localized descriptor name. (F-LOCALE-02)
+- Add inline empty-state copy to `EventChart` when all `chartDays` values are 0. (F-HIER-03)
+
+---
+
+## What's working — keep doing it
+
+- The conic-gradient brand orb, animated aurora background, Newsreader italic emphasis, gradient pill chips, glow-pulse dots — distinctive identity.
+- `Modal.svelte` (focus trap, restore, portal, escape, scroll containment).
+- `prefers-reduced-motion` honored across every animation surface.
+- Global ⌘K search palette, global provider filter, persisted sidebar state, persisted nav-group expansion.
+- Editorial copy on dashboard (`signal stream`, `on watch`, `pulse`, `wires`, `compose`).
+- Snackbar with detail-toggle expansion for error context.
+- Mobile "More" panel that mirrors the full desktop nav tree.
+- 6-file template-variable sync rule honored by project conventions.
+- `i18n` parity at 1577 lines for both locales.
+
+End of review.
@@ -1,6 +1,6 @@
 {
-  "last_commit": "cfdafa9c2b49ea64496e9355d92337dbbb70db93",
-  "last_sync": "2026-05-16T00:00:00Z",
+  "last_commit": "04fe8124fcc3f783038b9aaac393b6c62c68e22a",
+  "last_sync": "2026-05-16T20:04:00Z",
  "tracked_files": {
    "gitea-python-ci-cd.md": "sha256:9f1f57e1b0d909143e20cb3f21ac9c4d75b45f2992ec002645540f94c4920851",
    "gitea-release-workflow.md": "sha256:5eb64789fca062b2138ca7661b942c9fc9c304f63326844ff6f6724e7e05b08c"
@@ -480,6 +480,7 @@
 		"videoWarning": "Video size warning",
 		"disableUrlPreview": "Disable link previews",
 		"sendLargeAsDocuments": "Send large photos as documents",
+		"sendLargeVideosAsDocuments": "Send oversized videos as documents (bypass 50 MB limit)",
 		"chatAction": "Chat action",
 		"chatActionNone": "None (no action)",
 		"chatActionTyping": "Typing",
@@ -509,6 +510,11 @@
 		"confirmDeleteReceiver": "Delete this receiver?",
 		"receiverEnabled": "Receiver enabled",
 		"receiverDisabled": "Receiver disabled",
+		"telegramOptions": "Telegram options",
+		"telegramOptionsSaved": "Telegram options saved",
+		"telegramDisableNotification": "Send silently (no sound / vibration)",
+		"telegramThreadId": "Forum topic ID",
+		"telegramThreadIdPlaceholder": "Leave empty for general topic",
 		"groupNoBot": "No bot linked",
 		"groupDirect": "Direct delivery",
 		"groupBotMissing": "Unknown bot",
@@ -897,6 +903,22 @@
 		"identityHeadline": "How this instance presents itself to bots, webhooks, and recipients",
 		"telegramHeadline": "Webhook authentication and media cache tuning",
 		"loggingHeadline": "Verbosity, output format, and per-module overrides",
+		"diagnostics": "Diagnostics",
+		"diagnosticsHeadline": "Temporary DEBUG for one module, auto-reverted",
+		"diagnosticsHint": "Use to investigate a specific dispatch failure without flooding stderr. The chosen module flips to DEBUG immediately and reverts to its baseline (your per-module overrides or the noisy-library defaults) when the window ends. Restarts also reset.",
+		"diagModuleQuick": "Module (quick pick)",
+		"diagModuleCustom": "Or a custom module name",
+		"diagModuleCustomPlaceholder": "e.g. notify_bridge_server.services.deferred_dispatch",
+		"diagModuleRequired": "Pick a module first",
+		"diagDuration": "Duration",
+		"diagActivate": "Activate DEBUG",
+		"diagActivated": "Diagnostic mode activated",
+		"diagActivateFailed": "Failed to activate diagnostic mode",
+		"diagActive": "Active overrides",
+		"diagRevertsIn": "Reverts in",
+		"diagRevertNow": "Revert now",
+		"diagReverted": "Diagnostic mode reverted",
+		"diagRevertFailed": "Failed to revert diagnostic mode",
 		"heroNoUrl": "External URL not set",
 		"heroNoLocales": "no locales",
 		"copy": "Copy",
@@ -480,6 +480,7 @@
 		"videoWarning": "Предупреждение о размере видео",
 		"disableUrlPreview": "Отключить превью ссылок",
 		"sendLargeAsDocuments": "Отправлять большие фото как документы",
+		"sendLargeVideosAsDocuments": "Отправлять видео сверх лимита как документы (обход 50 МБ)",
 		"chatAction": "Действие в чате",
 		"chatActionNone": "Нет (без действия)",
 		"chatActionTyping": "Печатает",
@@ -509,6 +510,11 @@
 		"confirmDeleteReceiver": "Удалить этого получателя?",
 		"receiverEnabled": "Получатель включён",
 		"receiverDisabled": "Получатель отключён",
+		"telegramOptions": "Параметры Telegram",
+		"telegramOptionsSaved": "Параметры Telegram сохранены",
+		"telegramDisableNotification": "Отправлять без звука и вибрации",
+		"telegramThreadId": "ID темы форума",
+		"telegramThreadIdPlaceholder": "Оставьте пустым для общей темы",
 		"groupNoBot": "Без привязки к боту",
 		"groupDirect": "Прямая доставка",
 		"groupBotMissing": "Неизвестный бот",
@@ -897,6 +903,22 @@
 		"identityHeadline": "Как этот сервер представляется ботам, вебхукам и получателям",
 		"telegramHeadline": "Аутентификация вебхуков и настройка медиакэша",
 		"loggingHeadline": "Подробность, формат вывода и переопределения по модулям",
+		"diagnostics": "Диагностика",
+		"diagnosticsHeadline": "Временный DEBUG для одного модуля с авто-возвратом",
+		"diagnosticsHint": "Включите, чтобы разобраться в конкретной ошибке отправки без заливания stderr. Выбранный модуль немедленно переходит в DEBUG и возвращается к базовому уровню (вашим переопределениям или умолчаниям для шумных библиотек) по истечении окна. При перезапуске сервера всё сбрасывается.",
+		"diagModuleQuick": "Модуль (быстрый выбор)",
+		"diagModuleCustom": "Или произвольное имя модуля",
+		"diagModuleCustomPlaceholder": "напр. notify_bridge_server.services.deferred_dispatch",
+		"diagModuleRequired": "Сначала выберите модуль",
+		"diagDuration": "Длительность",
+		"diagActivate": "Включить DEBUG",
+		"diagActivated": "Режим диагностики включён",
+		"diagActivateFailed": "Не удалось включить режим диагностики",
+		"diagActive": "Активные переопределения",
+		"diagRevertsIn": "Вернётся через",
+		"diagRevertNow": "Вернуть сейчас",
+		"diagReverted": "Режим диагностики отменён",
+		"diagRevertFailed": "Не удалось отменить режим диагностики",
 		"heroNoUrl": "Внешний URL не задан",
 		"heroNoLocales": "нет локалей",
 		"copy": "Копировать",
@@ -235,6 +235,35 @@ export type DispatchStatus =
 	| 'deferred_then_failed'
 	| 'suppressed_quiet_hours_nondeferrable';

+export interface DispatchSummaryError {
+	index: number;
+	error: string;
+}
+
+export interface DispatchSummaryMediaError {
+	target_index: number;
+	kind?: string;
+	chunk?: number;
+	item_index?: number;
+	error?: string;
+	code?: number;
+}
+
+export interface DispatchSummary {
+	targets_attempted: number;
+	targets_succeeded: number;
+	targets_failed: number;
+	errors?: DispatchSummaryError[];
+	errors_truncated?: number;
+	media?: {
+		delivered: number;
+		skipped: number;
+		failed: number;
+	};
+	media_errors?: DispatchSummaryMediaError[];
+	media_errors_truncated?: number;
+}
+
 export interface EventLog {
 	id: number;
 	event_type: string;
@@ -256,6 +285,9 @@ export interface EventLog {
 		deferred_until?: string;
 		original_event_log_id?: number | null;
 		deferred_for_seconds?: number;
+		dispatch_id?: string;
+		request_id?: string;
+		dispatch_summary?: DispatchSummary;
 	};
 	created_at: string;
 }
@@ -14,6 +14,7 @@
 	import ReleaseCassette from './ReleaseCassette.svelte';
 	import CacheLedger from './CacheLedger.svelte';
 	import LoggingCassette from './LoggingCassette.svelte';
+	import DiagnosticsCassette from './DiagnosticsCassette.svelte';
 	import SaveBar from './SaveBar.svelte';

 	interface CacheBucketStats {
@@ -203,6 +204,8 @@
 			bind:logFormat={settings.log_format}
 			bind:logLevels={settings.log_levels}
 		/>
+
+		<DiagnosticsCassette />
 	</div>

 	<SaveBar
@@ -0,0 +1,424 @@
+<script lang="ts">
+	import { onMount, onDestroy } from 'svelte';
+	import { slide } from 'svelte/transition';
+	import { api } from '$lib/api';
+	import { t } from '$lib/i18n';
+	import MdiIcon from '$lib/components/MdiIcon.svelte';
+	import IconButton from '$lib/components/IconButton.svelte';
+	import IconGridSelect from '$lib/components/IconGridSelect.svelte';
+	import { snackSuccess, snackError } from '$lib/stores/snackbar.svelte';
+
+	interface ActiveOverride {
+		module: string;
+		baseline_level: string;
+		current_level: string;
+		activated_at: string;
+		expires_at: string;
+		remaining_seconds: number;
+	}
+
+	// Modules ship with shortcuts; users can also type a freeform name
+	// matching the backend allowlist (notify_bridge_*, sqlalchemy.*, etc.).
+	// Icons let the IconGridSelect render each entry as a visual chip
+	// instead of a bare text list — same pattern as the surrounding
+	// log-level / log-format selectors.
+	const QUICK_MODULES: { value: string; icon: string; label: string; desc?: string }[] = [
+		{ value: 'notify_bridge_core.notifications.telegram.client', icon: 'mdiSend', label: 'Telegram client' },
+		{ value: 'notify_bridge_core.notifications.dispatcher', icon: 'mdiCallSplit', label: 'Dispatcher' },
+		{ value: 'notify_bridge_core.providers.immich', icon: 'mdiImageMultiple', label: 'Immich provider' },
+		{ value: 'notify_bridge_server.services.watcher', icon: 'mdiEyeOutline', label: 'Watcher' },
+		{ value: 'notify_bridge_server.services.deferred_dispatch', icon: 'mdiClockOutline', label: 'Deferred dispatch' },
+		{ value: 'notify_bridge_server.services.scheduled_dispatch', icon: 'mdiCalendarClock', label: 'Scheduled dispatch' },
+		{ value: 'sqlalchemy.engine', icon: 'mdiDatabase', label: 'SQLAlchemy engine (SQL)' },
+		{ value: 'aiohttp.client', icon: 'mdiWeb', label: 'aiohttp client' },
+	];
+
+	const DURATION_PRESETS: { minutes: number; label: string }[] = [
+		{ minutes: 5, label: '5m' },
+		{ minutes: 15, label: '15m' },
+		{ minutes: 30, label: '30m' },
+		{ minutes: 60, label: '1h' },
+		{ minutes: 120, label: '2h' },
+	];
+
+	let active = $state<ActiveOverride[]>([]);
+	let pickedModule = $state(QUICK_MODULES[0].value);
+	let customModule = $state('');
+	let pickedMinutes = $state(30);
+	let submitting = $state(false);
+	let tickHandle: ReturnType<typeof setInterval> | null = null;
+	// Resync from the backend every N seconds so a server-side auto-revert
+	// is reflected even if we missed a tick. Tracked as elapsed-time so the
+	// 1s ticker can drift without breaking the cadence.
+	const RESYNC_EVERY_SECONDS = 30;
+	let lastResyncAt = Date.now();
+
+	async function refresh(): Promise<void> {
+		try {
+			const data = await api<{ active: ActiveOverride[] }>(
+				'/settings/diagnostic-mode',
+				{ method: 'GET' },
+			);
+			active = data.active || [];
+		} catch (err: unknown) {
+			// Surface non-401 errors only; settings page already shows a banner
+			// when the API is unreachable.
+		}
+	}
+
+	function tick(): void {
+		// Cheap local countdown so the UI doesn't poll the server every second
+		// to render a clock. The full refresh happens every 30s OR on action.
+		if (active.length === 0) return;
+		const now = Date.now();
+		active = active
+			.map(a => ({
+				...a,
+				remaining_seconds: Math.max(
+					0,
+					Math.floor((new Date(a.expires_at).getTime() - now) / 1000),
+				),
+			}))
+			.filter(a => a.remaining_seconds > 0);
+	}
+
+	function startTicker(): void {
+		if (tickHandle != null) return;
+		tickHandle = setInterval(() => {
+			tick();
+			const now = Date.now();
+			if (now - lastResyncAt >= RESYNC_EVERY_SECONDS * 1000) {
+				lastResyncAt = now;
+				void refresh();
+			}
+		}, 1000);
+	}
+
+	function stopTicker(): void {
+		if (tickHandle != null) {
+			clearInterval(tickHandle);
+			tickHandle = null;
+		}
+	}
+
+	onMount(() => {
+		lastResyncAt = Date.now();
+		void refresh();
+		startTicker();
+	});
+
+	onDestroy(() => {
+		stopTicker();
+	});
+
+	function effectiveModule(): string {
+		return (customModule.trim() || pickedModule).trim();
+	}
+
+	async function activate(): Promise<void> {
+		const mod = effectiveModule();
+		if (!mod) {
+			snackError(t('settings.diagModuleRequired'));
+			return;
+		}
+		submitting = true;
+		try {
+			const entry = await api<ActiveOverride>('/settings/diagnostic-mode', {
+				method: 'POST',
+				body: JSON.stringify({ module: mod, duration_minutes: pickedMinutes }),
+			});
+			// Replace any existing row for this module with the new schedule.
+			active = [
+				...active.filter(a => a.module !== entry.module),
+				entry,
+			];
+			customModule = '';
+			snackSuccess(t('settings.diagActivated'));
+		} catch (err: unknown) {
+			const msg = err instanceof Error ? err.message : String(err);
+			snackError(msg || t('settings.diagActivateFailed'));
+		} finally {
+			submitting = false;
+		}
+	}
+
+	async function revert(module: string): Promise<void> {
+		try {
+			await api(`/settings/diagnostic-mode/${encodeURIComponent(module)}`, {
+				method: 'DELETE',
+			});
+			active = active.filter(a => a.module !== module);
+			snackSuccess(t('settings.diagReverted'));
+		} catch (err: unknown) {
+			const msg = err instanceof Error ? err.message : String(err);
+			snackError(msg || t('settings.diagRevertFailed'));
+		}
+	}
+
+	function formatRemaining(seconds: number): string {
+		if (seconds <= 0) return '0s';
+		const mins = Math.floor(seconds / 60);
+		const secs = seconds % 60;
+		if (mins >= 60) {
+			const hours = Math.floor(mins / 60);
+			const remMins = mins % 60;
+			return `${hours}h ${remMins}m`;
+		}
+		if (mins > 0) return `${mins}m ${secs}s`;
+		return `${secs}s`;
+	}
+</script>
+
+<section class="diag glass">
+	<header class="diag-head">
+		<div class="diag-eyebrow">
+			<MdiIcon name="mdiBugOutline" size={12} />
+			<span>{t('settings.diagnostics')}</span>
+		</div>
+		<h3 class="diag-title">{t('settings.diagnosticsHeadline')}</h3>
+		<p class="diag-sub">{t('settings.diagnosticsHint')}</p>
+	</header>
+
+	<!-- Compose new override -->
+	<div class="diag-compose">
+		<div class="diag-label">
+			<span>{t('settings.diagModuleQuick')}</span>
+			<IconGridSelect items={QUICK_MODULES} bind:value={pickedModule} columns={2} compact />
+		</div>
+
+		<label class="diag-label">
+			<span>{t('settings.diagModuleCustom')}</span>
+			<input
+				bind:value={customModule}
+				type="text"
+				autocomplete="off"
+				spellcheck="false"
+				placeholder={t('settings.diagModuleCustomPlaceholder')}
+				class="diag-input"
+			/>
+		</label>
+
+		<div class="diag-label">
+			<span>{t('settings.diagDuration')}</span>
+			<div class="diag-duration-chips">
+				{#each DURATION_PRESETS as preset (preset.minutes)}
+					<button
+						type="button"
+						class="diag-chip"
+						class:diag-chip-active={pickedMinutes === preset.minutes}
+						onclick={() => (pickedMinutes = preset.minutes)}
+					>
+						{preset.label}
+					</button>
+				{/each}
+			</div>
+		</div>
+
+		<button
+			type="button"
+			onclick={activate}
+			disabled={submitting}
+			class="diag-activate"
+		>
+			<MdiIcon name="mdiPlay" size={14} />
+			<span>{submitting ? t('common.loading') : t('settings.diagActivate')}</span>
+		</button>
+	</div>
+
+	<!-- Active overrides list -->
+	{#if active.length > 0}
+		<div class="diag-active" in:slide={{ duration: 180 }}>
+			<div class="diag-active-head">
+				<MdiIcon name="mdiTimerSandComplete" size={12} />
+				<span>{t('settings.diagActive')}</span>
+			</div>
+			{#each active as ov (ov.module)}
+				<div class="diag-row">
+					<div class="diag-row-info">
+						<code class="diag-row-module">{ov.module}</code>
+						<span class="diag-row-meta">
+							{t('settings.diagRevertsIn')} <strong>{formatRemaining(ov.remaining_seconds)}</strong>
+							<span class="diag-row-baseline">→ {ov.baseline_level}</span>
+						</span>
+					</div>
+					<IconButton
+						icon="mdiUndoVariant"
+						title={t('settings.diagRevertNow')}
+						onclick={() => revert(ov.module)}
+						size={16}
+					/>
+				</div>
+			{/each}
+		</div>
+	{/if}
+</section>
+
+<style>
+	.diag {
+		padding: 1.5rem 1.6rem 1.4rem;
+		display: flex;
+		flex-direction: column;
+		gap: 1.15rem;
+	}
+	.diag-head {
+		position: relative;
+		z-index: 1;
+	}
+	.diag-eyebrow {
+		display: inline-flex;
+		align-items: center;
+		gap: 0.35rem;
+		font-family: var(--font-mono);
+		font-size: 0.62rem;
+		text-transform: uppercase;
+		letter-spacing: 0.18em;
+		color: var(--color-muted-foreground);
+		margin-bottom: 0.45rem;
+	}
+	.diag-title {
+		margin: 0;
+		font-family: var(--font-display);
+		font-style: italic;
+		font-weight: 400;
+		font-size: 1.15rem;
+		line-height: 1.35;
+		letter-spacing: -0.015em;
+		color: var(--color-foreground);
+		max-width: 38ch;
+	}
+	.diag-sub {
+		margin: 0.45rem 0 0 0;
+		font-size: 0.78rem;
+		color: var(--color-muted-foreground);
+		max-width: 56ch;
+	}
+	.diag-compose {
+		position: relative;
+		z-index: 1;
+		display: flex;
+		flex-direction: column;
+		gap: 0.7rem;
+		padding-top: 0.4rem;
+		border-top: 1px solid var(--color-border);
+	}
+	.diag-label {
+		display: flex;
+		flex-direction: column;
+		gap: 0.32rem;
+	}
+	.diag-label > span {
+		font-size: 0.74rem;
+		font-weight: 500;
+		color: var(--color-foreground);
+	}
+	.diag-input {
+		width: 100%;
+		font-family: var(--font-mono);
+		font-size: 0.78rem;
+		padding: 0.45rem 0.7rem;
+		border: 1px solid var(--color-border);
+		border-radius: 8px;
+		background: var(--color-glass);
+		color: var(--color-foreground);
+	}
+	.diag-duration-chips {
+		display: flex;
+		flex-wrap: wrap;
+		gap: 0.35rem;
+	}
+	.diag-chip {
+		padding: 0.32rem 0.75rem;
+		border-radius: 999px;
+		border: 1px solid var(--color-border);
+		background: transparent;
+		color: var(--color-muted-foreground);
+		font-family: var(--font-mono);
+		font-size: 0.72rem;
+		cursor: pointer;
+		transition: background 0.15s, color 0.15s, border-color 0.15s;
+	}
+	.diag-chip:hover {
+		background: var(--color-glass-strong);
+		color: var(--color-foreground);
+	}
+	.diag-chip-active {
+		background: color-mix(in srgb, var(--color-primary) 12%, transparent);
+		color: var(--color-primary);
+		border-color: color-mix(in srgb, var(--color-primary) 45%, var(--color-border));
+	}
+	.diag-activate {
+		display: inline-flex;
+		align-items: center;
+		justify-content: center;
+		gap: 0.4rem;
+		align-self: flex-start;
+		padding: 0.55rem 1.1rem;
+		border-radius: 10px;
+		border: 1px solid color-mix(in srgb, var(--color-primary) 45%, var(--color-border));
+		background: color-mix(in srgb, var(--color-primary) 12%, transparent);
+		color: var(--color-primary);
+		font-family: var(--font-display);
+		font-style: italic;
+		font-size: 0.85rem;
+		cursor: pointer;
+		transition: background 0.15s, color 0.15s, border-color 0.15s;
+	}
+	.diag-activate:hover {
+		background: color-mix(in srgb, var(--color-primary) 18%, transparent);
+		border-color: color-mix(in srgb, var(--color-primary) 65%, var(--color-border));
+	}
+	.diag-activate:disabled {
+		opacity: 0.5;
+		cursor: not-allowed;
+	}
+
+	.diag-active {
+		display: flex;
+		flex-direction: column;
+		gap: 0.4rem;
+		padding-top: 0.55rem;
+		border-top: 1px solid var(--color-border);
+	}
+	.diag-active-head {
+		display: inline-flex;
+		align-items: center;
+		gap: 0.3rem;
+		font-family: var(--font-mono);
+		font-size: 0.58rem;
+		text-transform: uppercase;
+		letter-spacing: 0.18em;
+		color: var(--color-muted-foreground);
+	}
+	.diag-row {
+		display: flex;
+		align-items: center;
+		justify-content: space-between;
+		gap: 0.6rem;
+		padding: 0.5rem 0.65rem;
+		border-radius: 10px;
+		border: 1px solid var(--color-border);
+		background: var(--color-glass-strong);
+	}
+	.diag-row-info {
+		display: flex;
+		flex-direction: column;
+		gap: 0.2rem;
+		min-width: 0;
+	}
+	.diag-row-module {
+		font-family: var(--font-mono);
+		font-size: 0.78rem;
+		color: var(--color-foreground);
+		word-break: break-all;
+	}
+	.diag-row-meta {
+		font-size: 0.72rem;
+		color: var(--color-muted-foreground);
+	}
+	.diag-row-baseline {
+		font-family: var(--font-mono);
+		font-size: 0.7rem;
+		margin-left: 0.4rem;
+		opacity: 0.7;
+	}
+</style>
@@ -166,7 +166,7 @@
 	const defaultForm = () => ({
 		name: '', icon: '', bot_id: 0, bot_token: '',
 		max_media_to_send: 50, max_media_per_group: 10, media_delay: 500, max_asset_size: 50,
-		disable_url_preview: true, send_large_photos_as_documents: false, ai_captions: false, chat_action: 'typing',
+		disable_url_preview: true, send_large_photos_as_documents: false, send_large_videos_as_documents: false, ai_captions: false, chat_action: 'typing',
 		// Discord/Slack shared settings
 		username: '',
 		// ntfy shared settings
@@ -407,7 +407,7 @@
 			bot_id: c.bot_id || 0, bot_token: '',
 			max_media_to_send: c.max_media_to_send ?? 50, max_media_per_group: c.max_media_per_group ?? 10,
 			media_delay: c.media_delay ?? 500, max_asset_size: c.max_asset_size ?? 50,
-			disable_url_preview: c.disable_url_preview ?? false, send_large_photos_as_documents: c.send_large_photos_as_documents ?? false,
+			disable_url_preview: c.disable_url_preview ?? false, send_large_photos_as_documents: c.send_large_photos_as_documents ?? false, send_large_videos_as_documents: c.send_large_videos_as_documents ?? false,
 			ai_captions: c.ai_captions ?? false, chat_action: tgt.chat_action ?? c.chat_action ?? 'typing',
 			// discord/slack
 			username: c.username || '',
@@ -448,6 +448,7 @@
 					max_media_to_send: form.max_media_to_send, max_media_per_group: form.max_media_per_group,
 					media_delay: form.media_delay, max_asset_size: form.max_asset_size,
 					disable_url_preview: form.disable_url_preview, send_large_photos_as_documents: form.send_large_photos_as_documents,
+					send_large_videos_as_documents: form.send_large_videos_as_documents,
 					ai_captions: form.ai_captions,
 				};
 			} else if (formType === 'webhook') {
@@ -603,6 +604,63 @@
 		} catch (err: unknown) { snackError(errMsg(err)); }
 	}

+	// Per-Telegram-receiver options panel: silent send + forum thread id.
+	// Edits the receiver's config dict in place via PUT.
+	let editingReceiverId = $state<number | null>(null);
+	// ``<input type="number">`` binds either a ``number`` or empty string
+	// when the field is blank — model both so TS strict mode and the save
+	// path's ``Number(raw)`` coercion agree.
+	let editingReceiverOptions = $state<{ disable_notification: boolean; message_thread_id: number | '' }>({
+		disable_notification: false,
+		message_thread_id: '',
+	});
+
+	function openEditReceiver(_targetId: number, receiver: TargetReceiver) {
+		editingReceiverId = receiver.id;
+		// Empty string maps to "no thread" — the form's <input type=number>
+		// produces '' for an empty field, which we normalize to null on save.
+		const raw = receiver.config?.message_thread_id;
+		const parsed = raw == null || raw === '' ? '' : Number(raw);
+		editingReceiverOptions = {
+			disable_notification: Boolean(receiver.config?.disable_notification),
+			message_thread_id: typeof parsed === 'number' && Number.isFinite(parsed) ? parsed : '',
+		};
+	}
+
+	function cancelEditReceiver() {
+		editingReceiverId = null;
+	}
+
+	async function saveEditReceiver(targetId: number, receiverId: number) {
+		const target = allTargets.find(t => t.id === targetId);
+		const receiver = target?.receivers?.find(r => r.id === receiverId);
+		if (!receiver) return;
+		// Merge new options into the existing config so we don't lose the chat_id
+		// or any other receiver-specific keys (language_code on Telegram).
+		const newConfig: Record<string, any> = { ...receiver.config };
+		newConfig.disable_notification = editingReceiverOptions.disable_notification;
+		const raw = editingReceiverOptions.message_thread_id;
+		if (raw === '' || raw == null) {
+			delete newConfig.message_thread_id;
+		} else {
+			const parsed = Number(raw);
+			if (Number.isFinite(parsed) && parsed > 0) {
+				newConfig.message_thread_id = Math.trunc(parsed);
+			} else {
+				delete newConfig.message_thread_id;
+			}
+		}
+		try {
+			await api(`/targets/${targetId}/receivers/${receiverId}`, {
+				method: 'PUT',
+				body: JSON.stringify({ config: newConfig }),
+			});
+			editingReceiverId = null;
+			await load();
+			snackSuccess(t('targets.telegramOptionsSaved'));
+		} catch (err: unknown) { snackError(errMsg(err)); }
+	}
+
 	async function toggleBroadcastChild(targetId: number, childId: number) {
 		const tgt = allTargets.find(t => t.id === targetId);
 		if (!tgt) return;
@@ -753,6 +811,8 @@
 										{receiverBotChats}
 										{receiverTesting}
 										{receiverLabel}
+										{editingReceiverId}
+										bind:editingReceiverOptions
 										onopenReceiverForm={openReceiverForm}
 										onsaveReceiver={saveReceiver}
 										oncancelReceiver={() => addingReceiverForTarget = null}
@@ -762,6 +822,9 @@
 										onloadBotChats={loadReceiverBotChats}
 										onchangeReceiverForm={(f) => receiverForm = f}
 										ontoggleBroadcastChild={toggleBroadcastChild}
+										onopenEditReceiver={openEditReceiver}
+										oncancelEditReceiver={cancelEditReceiver}
+										onsaveEditReceiver={saveEditReceiver}
 									/>
 								</div>
 							{/if}
@@ -16,6 +16,12 @@
 		receiverBotChats: Record<number, TelegramChat[]>;
 		receiverTesting: Record<number, boolean>;
 		receiverLabel: (target: NotificationTarget, recv: TargetReceiver) => string;
+		// Telegram-only editing state. Optional so a future caller that
+		// reuses this component for a non-Telegram target page doesn't have
+		// to pass dead props; the cog button only renders when both the
+		// target type matches AND the handlers are wired.
+		editingReceiverId?: number | null;
+		editingReceiverOptions?: Record<string, any>;
 		onopenReceiverForm: (targetId: number, targetType: string) => void;
 		onsaveReceiver: (targetId: number) => void;
 		oncancelReceiver: () => void;
@@ -25,6 +31,9 @@
 		onloadBotChats: (botId: number) => void;
 		onchangeReceiverForm: (form: Record<string, any>) => void;
 		ontoggleBroadcastChild?: (targetId: number, childId: number) => void;
+		onopenEditReceiver?: (targetId: number, receiver: TargetReceiver) => void;
+		oncancelEditReceiver?: () => void;
+		onsaveEditReceiver?: (targetId: number, receiverId: number) => void;
 	}

 	let {
@@ -37,6 +46,8 @@
 		receiverBotChats,
 		receiverTesting,
 		receiverLabel,
+		editingReceiverId,
+		editingReceiverOptions = $bindable(),
 		onopenReceiverForm,
 		onsaveReceiver,
 		oncancelReceiver,
@@ -46,6 +57,9 @@
 		onloadBotChats,
 		onchangeReceiverForm,
 		ontoggleBroadcastChild,
+		onopenEditReceiver,
+		oncancelEditReceiver,
+		onsaveEditReceiver,
 	}: Props = $props();
 </script>

@@ -92,11 +106,25 @@
 				{#if (recv as any).language_code || recv.config?.language_code}
 					<span class="text-xs px-1 py-0.5 rounded bg-[var(--color-muted)] text-[var(--color-muted-foreground)]">{((recv as any).language_code || recv.config.language_code).toUpperCase()}</span>
 				{/if}
+				{#if target.type === 'telegram' && recv.config?.disable_notification}
+					<MdiIcon name="mdiBellOff" size={12} />
+				{/if}
+				{#if target.type === 'telegram' && recv.config?.message_thread_id != null && recv.config?.message_thread_id !== ''}
+					<span class="text-xs px-1 py-0.5 rounded bg-[var(--color-muted)] text-[var(--color-muted-foreground)]" title={t('targets.telegramThreadId')}>#{recv.config.message_thread_id}</span>
+				{/if}
 			</div>
 			<div class="flex items-center gap-1">
 				<IconButton icon="mdiSend" title={t('targets.test')}
 					onclick={() => ontestReceiver(target.id, recv.id)}
 					disabled={receiverTesting[recv.id]} size={16} />
+				{#if target.type === 'telegram' && onopenEditReceiver != null}
+					<IconButton
+						icon="mdiCog"
+						title={t('targets.telegramOptions')}
+						onclick={() => onopenEditReceiver!(target.id, recv)}
+						size={16}
+					/>
+				{/if}
 				<IconButton
 					icon={recv.enabled ? 'mdiToggleSwitch' : 'mdiToggleSwitchOff'}
 					title={recv.enabled ? t('targets.receiverDisabled') : t('targets.receiverEnabled')}
@@ -112,6 +140,31 @@
 				/>
 			</div>
 		</div>
+		{#if target.type === 'telegram' && editingReceiverId === recv.id && editingReceiverOptions != null && onsaveEditReceiver != null && oncancelEditReceiver != null}
+			<div in:slide={{ duration: 150 }} class="mb-2 ml-6 mr-2 p-2 rounded-md border border-[var(--color-border)] bg-[var(--color-background)]">
+				<label class="flex items-center gap-2 text-sm mb-2 cursor-pointer">
+					<input type="checkbox" bind:checked={editingReceiverOptions.disable_notification} />
+					<span>{t('targets.telegramDisableNotification')}</span>
+				</label>
+				<label class="flex flex-col gap-1 text-sm mb-2">
+					<span>{t('targets.telegramThreadId')}</span>
+					<input type="number" min="1" inputmode="numeric"
+						bind:value={editingReceiverOptions.message_thread_id}
+						placeholder={t('targets.telegramThreadIdPlaceholder')}
+						class="w-full px-2 py-1 border border-[var(--color-border)] rounded-md text-sm bg-[var(--color-background)]" />
+				</label>
+				<div class="flex gap-2">
+					<button type="button" onclick={() => onsaveEditReceiver!(target.id, recv.id)}
+						class="px-3 py-1 bg-[var(--color-primary)] text-[var(--color-primary-foreground)] rounded-md text-xs font-medium hover:opacity-90">
+						{t('common.save')}
+					</button>
+					<button type="button" onclick={oncancelEditReceiver}
+						class="px-3 py-1 border border-[var(--color-border)] rounded-md text-xs hover:bg-[var(--color-muted)]">
+						{t('targets.cancel')}
+					</button>
+				</div>
+			</div>
+		{/if}
 	{/each}

 	<!-- Telegram: chat picker palette opens directly from the "Add receiver" button — no inline section. -->
@@ -23,6 +23,7 @@
 			max_asset_size: number;
 			disable_url_preview: boolean;
 			send_large_photos_as_documents: boolean;
+			send_large_videos_as_documents: boolean;
 			ai_captions: boolean;
 			chat_action: string;
 			username: string;
@@ -131,6 +132,7 @@
 					</div>
 					<label class="flex items-center gap-2 text-sm col-span-2"><input type="checkbox" bind:checked={form.disable_url_preview} /> {t('targets.disableUrlPreview')}</label>
 					<label class="flex items-center gap-2 text-sm col-span-2"><input type="checkbox" bind:checked={form.send_large_photos_as_documents} /> {t('targets.sendLargeAsDocuments')}</label>
+						<label class="flex items-center gap-2 text-sm col-span-2"><input type="checkbox" bind:checked={form.send_large_videos_as_documents} /> {t('targets.sendLargeVideosAsDocuments')}</label>
 				</div>
 				{/if}
 			</div>
@@ -14,6 +14,7 @@ Kept in ``notify_bridge_core`` so core modules (``TelegramClient``,

 from __future__ import annotations

+import uuid
 from contextlib import contextmanager
 from contextvars import ContextVar, Token
 from typing import Any, Iterator
@@ -56,6 +57,22 @@ def bind_log_context(**kwargs: Any) -> Iterator[None]:
            var.reset(tok)


+def ensure_dispatch_id() -> str:
+    """Return the bound ``dispatch_id`` if one is active, else a new one.
+
+    Format matches :class:`NotificationDispatcher.dispatch` (``disp:<12 hex>``)
+    so logs and ``EventLog.details.dispatch_id`` use a single shape. Callers
+    typically wrap a top-level handler with::
+
+        with bind_log_context(dispatch_id=ensure_dispatch_id()):
+            ...
+
+    so nested calls inherit the same id and any ``EventLog`` row written
+    inside the block can be correlated with the dispatcher's log lines.
+    """
+    return dispatch_id_var.get() or f"disp:{uuid.uuid4().hex[:12]}"
+
+
 def current_log_context() -> dict[str, Any]:
    """Return a snapshot of the currently-bound context values (non-None)."""
    snap: dict[str, Any] = {}
@@ -64,3 +81,43 @@ def current_log_context() -> dict[str, Any]:
        if val is not None:
            snap[key] = val
    return snap
+
+
+# Keys copied onto ``EventLog.details`` so an operator can grep stderr for
+# the matching ``disp=``/``req=`` log lines after spotting a row in the UI.
+# Kept narrow on purpose — ``chat_id``/``bot_id``/``command`` are already
+# represented by dedicated EventLog columns.
+_CORRELATION_KEYS = ("dispatch_id", "request_id")
+
+
+def enrich_details_with_correlation(
+    details: dict[str, Any] | None,
+) -> dict[str, Any]:
+    """Return a (shallow) copy of ``details`` with active correlation IDs merged in.
+
+    Use this when constructing an ``EventLog.details`` dict so the persisted
+    row carries the same ``dispatch_id`` / ``request_id`` that the stderr log
+    lines emitted during the same dispatch carry. The mapping makes it
+    possible to jump from a row in the dashboard to the corresponding log
+    lines without server-side correlation.
+
+    Existing keys in ``details`` are NOT overwritten — callers can pin a
+    specific value (e.g. a synthetic dispatch_id for a backfilled row) by
+    setting it themselves before calling.
+
+    The copy is shallow. Nested mutable values (lists, dicts) are shared with
+    the input — fine for the all-scalar dicts every current call site passes,
+    but callers that intend to mutate after this returns should ``deepcopy``
+    themselves.
+    """
+    result: dict[str, Any] = dict(details or {})
+    for key in _CORRELATION_KEYS:
+        if key in result:
+            continue
+        var = _VAR_MAP.get(key)
+        if var is None:
+            continue
+        val = var.get()
+        if val is not None:
+            result[key] = val
+    return result
@@ -5,13 +5,12 @@ from __future__ import annotations
 import asyncio
 import contextlib
 import logging
-import uuid
 from dataclasses import dataclass, field
 from typing import Any, AsyncIterator, Awaitable, Callable, Final

 import aiohttp

-from notify_bridge_core.log_context import bind_log_context, dispatch_id_var
+from notify_bridge_core.log_context import bind_log_context, ensure_dispatch_id
 from notify_bridge_core.models.events import ServiceEvent
 from notify_bridge_core.templates.context import build_template_context
 from notify_bridge_core.templates.renderer import render_template
@@ -132,7 +131,7 @@ class NotificationDispatcher:
        Returns one result per target. Per-target failures are isolated;
        a single bad target cannot poison the batch.
        """
-        new_id = dispatch_id_var.get() or f"disp:{uuid.uuid4().hex[:12]}"
+        new_id = ensure_dispatch_id()

        with bind_log_context(dispatch_id=new_id):
            _LOGGER.info(
@@ -341,6 +340,7 @@ class NotificationDispatcher:
        max_size_mb = target.config.get("max_asset_size")
        max_size_bytes = max_size_mb * 1024 * 1024 if max_size_mb else None
        send_large_as_docs = target.config.get("send_large_photos_as_documents", False)
+        send_large_videos_as_docs = target.config.get("send_large_videos_as_documents", False)

        if not bot_token:
            return {"success": False, "error": "Missing bot_token"}
@@ -392,6 +392,8 @@ class NotificationDispatcher:
                    chat_id=receiver.chat_id,
                    text=message,
                    disable_web_page_preview=bool(disable_preview),
+                    disable_notification=receiver.disable_notification,
+                    message_thread_id=receiver.message_thread_id,
                )
                if not text_result.get("success"):
                    _LOGGER.warning(
@@ -409,22 +411,45 @@ class NotificationDispatcher:
                        chunk_delay=chunk_delay,
                        max_asset_data_size=max_size_bytes,
                        send_large_photos_as_documents=send_large_as_docs,
+                        send_large_videos_as_documents=send_large_videos_as_docs,
                        chat_action=chat_action or None,
+                        disable_notification=receiver.disable_notification,
+                        message_thread_id=receiver.message_thread_id,
                    )
-                    if not media_result.get("success"):
+                    delivered = media_result.get("delivered_count", 0)
+                    skipped = media_result.get("skipped_count", 0)
+                    failed = media_result.get("failed_count", 0)
+                    media_success = media_result.get("success", False)
+                    has_partial_loss = skipped > 0 or failed > 0
+
+                    if not media_success:
                        _LOGGER.warning(
-                            "Text sent OK but media failed for chat %s: %s",
-                            receiver.chat_id, media_result.get("error"),
+                            "Text sent OK but media failed for chat %s "
+                            "(delivered=%d skipped=%d failed=%d): %s",
+                            receiver.chat_id, delivered, skipped, failed,
+                            media_result.get("error"),
                        )
+                    elif has_partial_loss:
+                        _LOGGER.warning(
+                            "Partial media delivery for chat %s "
+                            "(delivered=%d skipped=%d failed=%d)",
+                            receiver.chat_id, delivered, skipped, failed,
+                        )
+
+                    if not media_success or has_partial_loss:
                        # Preserve both outcomes — text succeeded, media
-                        # didn't. Operators losing media-failure detail
-                        # in the result dict made root-cause analysis
+                        # partially or fully didn't. Operators losing
+                        # media-failure detail made root-cause analysis
                        # impossible.
                        return {
                            "success": True,
                            "message_id": text_result.get("message_id"),
                            "media_error": media_result.get("error"),
                            "media_failed_at_chunk": media_result.get("failed_at_chunk"),
+                            "media_delivered_count": delivered,
+                            "media_skipped_count": skipped,
+                            "media_failed_count": failed,
+                            "media_errors": media_result.get("errors"),
                        }
                return text_result

@@ -20,9 +20,21 @@ class Receiver:

@dataclass
 class TelegramReceiver(Receiver):
-    """Telegram chat receiver."""
+    """Telegram chat receiver.
+
+    ``disable_notification`` toggles Telegram's ``disable_notification=true``
+    flag — the message is delivered without an audible / vibration alert.
+    Useful for low-priority chats that the user reads but doesn't want to
+    be paged by.
+
+    ``message_thread_id`` routes the send into a specific forum topic on a
+    supergroup with topics enabled. ``None`` means "general topic" (default
+    Telegram behaviour).
+    """

    chat_id: str = ""
+    disable_notification: bool = False
+    message_thread_id: int | None = None


@dataclass
@@ -80,9 +92,30 @@ def _coerce_int(value: Any, default: int) -> int:
        return default


+def _coerce_telegram_thread_id(value: Any) -> int | None:
+    """Coerce a config value to a positive Telegram forum-topic id.
+
+    The Bot API treats omission, ``0``, and negative values all as
+    "general topic", so we collapse them to ``None`` for consistency
+    with the frontend (which rejects ``<= 0``). Booleans are explicitly
+    rejected so ``int(True) == 1`` doesn't silently route a misconfigured
+    chat into topic #1.
+    """
+    if value is None or value == "" or isinstance(value, bool):
+        return None
+    try:
+        n = int(value)
+    except (TypeError, ValueError):
+        return None
+    return n if n > 0 else None
+
+
 _RECEIVER_FACTORIES: dict[str, _ReceiverFactory] = {
    "telegram": lambda locale, config: TelegramReceiver(
-        locale=locale, config=config, chat_id=str(config.get("chat_id", "")),
+        locale=locale, config=config,
+        chat_id=str(config.get("chat_id", "")),
+        disable_notification=bool(config.get("disable_notification", False)),
+        message_thread_id=_coerce_telegram_thread_id(config.get("message_thread_id")),
    ),
    "webhook": lambda locale, config: WebhookReceiver(
        locale=locale, config=config,
@@ -3,12 +3,14 @@
 from __future__ import annotations

 import asyncio
+import contextlib
 import json
 import logging
 import mimetypes
 import re
+from contextvars import ContextVar
 from dataclasses import dataclass, field
-from typing import Any, Callable, Final
+from typing import Any, Callable, Final, Iterator

 import aiohttp
 from aiohttp import FormData
@@ -19,6 +21,7 @@ from .cache import TelegramFileCache
 from .media import (
    TELEGRAM_API_BASE_URL,
    TELEGRAM_MAX_CAPTION_LENGTH,
+    TELEGRAM_MAX_GROUP_TOTAL_BYTES,
    TELEGRAM_MAX_PHOTO_SIZE,
    TELEGRAM_MAX_TEXT_LENGTH,
    TELEGRAM_MAX_VIDEO_SIZE,
@@ -27,7 +30,6 @@ from .media import (
    extract_asset_id_from_url,
    is_asset_cache_key,
    is_asset_id,
-    split_media_by_upload_size,
 )

 _LOGGER = logging.getLogger(__name__)
@@ -56,6 +58,68 @@ _UPLOAD_TIMEOUT: Final = aiohttp.ClientTimeout(total=120, connect=10)
 _DOWNLOAD_TIMEOUT: Final = aiohttp.ClientTimeout(total=120, connect=10)


+# ---------------------------------------------------------------------------
+# Per-send options (disable_notification, message_thread_id, …)
+# ---------------------------------------------------------------------------
+#
+# These are properties of a single send, not of the bot or the client, and
+# they fan out into the JSON / multipart payload at four different sites
+# (sendMessage, sendPhoto/Video/Document, sendMediaGroup, cache-hit POST).
+# Rather than threading the kwargs through every internal helper, we bind
+# them on a ContextVar inside the public ``send_message`` / ``send_notification``
+# entry points; the payload builders read the var when constructing the
+# request. ContextVar propagation isolates concurrent ``asyncio.gather``
+# fan-outs in the dispatcher (one task per receiver) — each task sees the
+# value its own caller bound.
+
+
+@dataclass(frozen=True)
+class _SendOptions:
+    """Per-send Telegram flags applied to every API call within one send.
+
+    ``disable_notification`` maps to Bot API ``disable_notification=true``
+    — the chat receives the message silently. ``message_thread_id`` routes
+    the message into a specific forum-topic on supergroups with topics
+    enabled; ``None`` means "general topic" (Bot API omits the field).
+    """
+
+    disable_notification: bool = False
+    message_thread_id: int | None = None
+
+
+_send_options_var: ContextVar[_SendOptions] = ContextVar(
+    "_tg_send_options", default=_SendOptions(),
+)
+
+
+@contextlib.contextmanager
+def _bind_send_options(opts: _SendOptions) -> Iterator[None]:
+    """Bind per-send options for the duration of the ``with`` block."""
+    token = _send_options_var.set(opts)
+    try:
+        yield
+    finally:
+        _send_options_var.reset(token)
+
+
+def _apply_send_opts_to_payload(payload: dict[str, Any]) -> None:
+    """Merge the active per-send options into a JSON request body."""
+    opts = _send_options_var.get()
+    if opts.disable_notification:
+        payload["disable_notification"] = True
+    if opts.message_thread_id is not None:
+        payload["message_thread_id"] = opts.message_thread_id
+
+
+def _apply_send_opts_to_form(form: FormData) -> None:
+    """Merge the active per-send options into a multipart form payload."""
+    opts = _send_options_var.get()
+    if opts.disable_notification:
+        form.add_field("disable_notification", "true")
+    if opts.message_thread_id is not None:
+        form.add_field("message_thread_id", str(opts.message_thread_id))
+
+
 def _extract_retry_after(result: dict[str, Any]) -> int | None:
    """Return the retry_after seconds from a Telegram error response.

@@ -135,10 +199,27 @@ class _MediaItem:
    keyed by position. Bundling these together prevents the
    ``media_json`` and ``cache_info`` lists from drifting out of
    alignment under future edits.
+
+    ``source_url`` and ``download_headers`` let the per-item fallback
+    re-download a cache-hit item if its ``file_id`` POST returns
+    transient errors — without them, a stale ``file_id`` would silently
+    lose a cached asset that the original single-item path would have
+    recovered.
    """
    media_json: dict[str, Any]
    cache_info: tuple[str, str, str | None, int] | None
    attachment: tuple[str, bytes, str, str] | None  # (name, data, filename, content_type)
+    source_url: str | None = None
+    download_headers: dict[str, str] | None = None
+
+    @property
+    def upload_bytes(self) -> int:
+        """Bytes this item contributes to a multipart sendMediaGroup payload.
+
+        Cached items (referenced by ``file_id``) contribute 0 since
+        Telegram serves them server-side without us re-uploading.
+        """
+        return len(self.attachment[1]) if self.attachment else 0


 def _truncate(text: str, limit: int, *, marker: str = "…") -> str:
@@ -302,6 +383,7 @@ class TelegramClient:
            payload["caption"] = _truncate(caption, TELEGRAM_MAX_CAPTION_LENGTH)
        if reply_to_message_id is not None:
            payload["reply_parameters"] = {"message_id": reply_to_message_id}
+        _apply_send_opts_to_payload(payload)
        try:
            async with self._session.post(
                self._api_url(kind.api_method), json=payload, timeout=_API_TIMEOUT,
@@ -351,6 +433,7 @@ class TelegramClient:
                f.add_field("caption", capped_caption)
            if reply_to_message_id is not None:
                f.add_field("reply_parameters", json.dumps({"message_id": reply_to_message_id}))
+            _apply_send_opts_to_form(f)
            return f

        for attempt in range(1, _TG_429_MAX_ATTEMPTS + 1):
@@ -415,18 +498,54 @@ class TelegramClient:
        chunk_delay: int = 0,
        max_asset_data_size: int | None = None,
        send_large_photos_as_documents: bool = False,
+        send_large_videos_as_documents: bool = False,
        chat_action: str | None = "typing",
+        *,
+        disable_notification: bool = False,
+        message_thread_id: int | None = None,
    ) -> NotificationResult:
        if not assets:
            return await self.send_message(
                chat_id, caption or "", reply_to_message_id,
                disable_web_page_preview, parse_mode,
+                disable_notification=disable_notification,
+                message_thread_id=message_thread_id,
            )

        keepalive: _KeepaliveHandle | None = None
        if chat_action:
            keepalive = self.start_chat_action_keepalive(chat_id, chat_action)

+        # Bind for the whole media-send fan-out — every internal helper
+        # (_send_photo / _send_video / _send_document / _send_media_group /
+        # _post_media_group / _send_from_cache / _upload_media) reads the
+        # current value when it constructs its request payload.
+        opts = _SendOptions(
+            disable_notification=disable_notification,
+            message_thread_id=message_thread_id,
+        )
+        with _bind_send_options(opts):
+            return await self._send_notification_body(
+                chat_id, assets, caption, reply_to_message_id, parse_mode,
+                max_group_size, chunk_delay, max_asset_data_size,
+                send_large_photos_as_documents, send_large_videos_as_documents,
+                keepalive,
+            )
+
+    async def _send_notification_body(
+        self,
+        chat_id: str,
+        assets: list[dict[str, Any]],
+        caption: str | None,
+        reply_to_message_id: int | None,
+        parse_mode: str,
+        max_group_size: int,
+        chunk_delay: int,
+        max_asset_data_size: int | None,
+        send_large_photos_as_documents: bool,
+        send_large_videos_as_documents: bool,
+        keepalive: _KeepaliveHandle | None,
+    ) -> NotificationResult:
        try:
            if len(assets) == 1 and assets[0].get("type") == "photo":
                return await self._send_photo(
@@ -443,6 +562,7 @@ class TelegramClient:
                    assets[0].get("content_type"), assets[0].get("cache_key"),
                    download_headers=assets[0].get("headers"),
                    preloaded_data=assets[0].get("data"),
+                    send_large_videos_as_documents=send_large_videos_as_documents,
                )
            if len(assets) == 1 and assets[0].get("type", "document") == "document":
                url = assets[0].get("url")
@@ -465,7 +585,7 @@ class TelegramClient:
            return await self._send_media_group(
                chat_id, assets, caption, reply_to_message_id, max_group_size,
                chunk_delay, parse_mode, max_asset_data_size,
-                send_large_photos_as_documents,
+                send_large_photos_as_documents, send_large_videos_as_documents,
            )
        finally:
            await self.stop_keepalive(keepalive)
@@ -477,6 +597,9 @@ class TelegramClient:
        reply_to_message_id: int | None = None,
        disable_web_page_preview: bool | None = None,
        parse_mode: str = "HTML",
+        *,
+        disable_notification: bool = False,
+        message_thread_id: int | None = None,
    ) -> NotificationResult:
        if not text:
            _LOGGER.warning("send_message called with empty text — using placeholder")
@@ -490,7 +613,19 @@ class TelegramClient:
            payload["reply_parameters"] = {"message_id": reply_to_message_id}
        if disable_web_page_preview:
            payload["link_preview_options"] = {"is_disabled": True}
+        # sendMessage is a leaf call — its kwargs go straight into the
+        # JSON body. The ContextVar pattern is reserved for the deeper
+        # media paths (``_upload_media`` / ``_post_media_group`` /
+        # ``_send_from_cache``) that can't easily plumb kwargs through.
+        if disable_notification:
+            payload["disable_notification"] = True
+        if message_thread_id is not None:
+            payload["message_thread_id"] = message_thread_id
+        return await self._post_send_message(payload)

+    async def _post_send_message(
+        self, payload: dict[str, Any],
+    ) -> NotificationResult:
        url = self._api_url("sendMessage")
        try:
            async with self._session.post(url, json=payload, timeout=_API_TIMEOUT) as response:
@@ -651,6 +786,7 @@ class TelegramClient:
        max_asset_data_size: int | None = None, content_type: str | None = None,
        cache_key: str | None = None, download_headers: dict[str, str] | None = None,
        preloaded_data: bytes | None = None,
+        send_large_videos_as_documents: bool = False,
    ) -> NotificationResult:
        if not url:
            return {"success": False, "error": "Missing 'url' for video"}
@@ -672,6 +808,18 @@ class TelegramClient:
        if max_asset_data_size is not None and len(data) > max_asset_data_size:
            return {"success": False, "error": "Video exceeds size limit", "skipped": True}
        if len(data) > TELEGRAM_MAX_VIDEO_SIZE:
+            # Telegram's sendVideo hard-caps at 50 MB. Documents accept
+            # up to 2 GB, so when the operator opts in we deliver the
+            # bytes as a document instead of silently dropping the asset.
+            # Loses inline playback but preserves delivery.
+            if send_large_videos_as_documents:
+                filename = url.split("/")[-1].split("?")[0] or "video.mp4"
+                if "." not in filename:
+                    filename = "video.mp4"
+                return await self._send_document(
+                    chat_id, data, filename, caption, reply_to_message_id,
+                    parse_mode, url, content_type, cache_key,
+                )
            return {
                "success": False,
                "error": f"Video exceeds Telegram's {TELEGRAM_MAX_VIDEO_SIZE // (1024*1024)} MB limit",
@@ -723,6 +871,7 @@ class TelegramClient:
        caption: str | None = None, reply_to_message_id: int | None = None,
        max_group_size: int = 10, chunk_delay: int = 0, parse_mode: str = "HTML",
        max_asset_data_size: int | None = None, send_large_photos_as_documents: bool = False,
+        send_large_videos_as_documents: bool = False,
    ) -> NotificationResult:
        # Telegram rejects mixed photo/video + document in a single
        # sendMediaGroup. Split before chunking so a malformed input
@@ -730,75 +879,293 @@ class TelegramClient:
        partitions = self._partition_media_by_kind(assets)

        all_message_ids: list[int] = []
-        first_chunk_overall = True
+        errors: list[dict[str, Any]] = []
+        delivered = 0
+        skipped = 0
+        failed = 0
+        first_send = True
+        # Oversized videos that the operator wants delivered as
+        # documents. Sent after all media-group chunks finish so
+        # they ride out on their own (Telegram refuses to mix
+        # documents with photo/video in one group).
+        deferred_documents: list[_MediaItem] = []
+        # Caption + reply_to are "spent" on the first send attempt,
+        # mirroring the prior contract. If that first attempt fails
+        # entirely, they're lost — same as before. Tracking these as
+        # standalone flags (rather than deriving from ``chunk_idx==0``)
+        # keeps the semantics right across multiple partitions.
+        caption_pending = bool(caption)
+        reply_pending = reply_to_message_id is not None
+
+        async def maybe_delay() -> None:
+            nonlocal first_send
+            if not first_send and chunk_delay > 0:
+                await asyncio.sleep(chunk_delay / 1000)
+            first_send = False
+
        for partition in partitions:
            chunks = [
                partition[i:i + max_group_size]
                for i in range(0, len(partition), max_group_size)
            ]
            for chunk_idx, chunk in enumerate(chunks):
-                if not first_chunk_overall and chunk_delay > 0:
-                    await asyncio.sleep(chunk_delay / 1000)
-
-                # Single-item chunk → use the simpler send_photo/video path.
-                if len(chunk) == 1:
-                    item = chunk[0]
-                    chunk_caption = caption if first_chunk_overall else None
-                    chunk_reply = reply_to_message_id if first_chunk_overall else None
-                    if item.get("type") == "photo":
-                        result = await self._send_photo(
-                            chat_id, item.get("url"), chunk_caption, chunk_reply, parse_mode,
-                            max_asset_data_size, send_large_photos_as_documents,
-                            item.get("content_type"), item.get("cache_key"),
-                            download_headers=item.get("headers"),
-                            preloaded_data=item.get("data"),
-                        )
-                    elif item.get("type") == "video":
-                        result = await self._send_video(
-                            chat_id, item.get("url"), chunk_caption, chunk_reply, parse_mode,
-                            max_asset_data_size,
-                            item.get("content_type"), item.get("cache_key"),
-                            download_headers=item.get("headers"),
-                            preloaded_data=item.get("data"),
-                        )
-                    else:
-                        first_chunk_overall = False
-                        continue
-                    first_chunk_overall = False
-                    if not result.get("success"):
-                        result["failed_at_chunk"] = chunk_idx + 1
-                        return result
-                    if result.get("message_id") is not None:
-                        all_message_ids.append(result["message_id"])
-                    continue
-
-                items = await self._build_media_items(
-                    chunk, max_asset_data_size, caption if first_chunk_overall else None,
-                    parse_mode,
+                # Fetch + filter the parent chunk. Skipped items
+                # (oversized, bad photo, failed download) never enter
+                # ``items`` — count them so the operator-facing result
+                # reflects what actually went out vs got dropped.
+                # Oversized videos opted into doc-fallback get
+                # deferred — they're delivered (eventually) so they
+                # don't count as skipped.
+                items, chunk_deferred = await self._build_media_items(
+                    chunk, max_asset_data_size, send_large_videos_as_documents,
                )
+                deferred_documents.extend(chunk_deferred)
+                skipped += len(chunk) - len(items) - len(chunk_deferred)
+
                if not items:
                    _LOGGER.warning(
-                        "sendMediaGroup skipped — chunk %d/%d had %d input items but 0 usable (all filtered/failed)",
+                        "sendMediaGroup: chunk %d/%d had %d input items but 0 usable",
                        chunk_idx + 1, len(chunks), len(chunk),
                    )
-                    first_chunk_overall = False
                    continue

-                chunk_msg_ids, chunk_err = await self._post_media_group(
-                    chat_id, items, reply_to_message_id if first_chunk_overall else None,
-                    chunk_idx, len(chunks),
+                # Split the chunk into sub-chunks that each fit under
+                # Telegram's per-request byte cap. Per-item filtering
+                # alone can't prevent 413s when several legal-sized
+                # items together bust the envelope.
+                sub_chunks = self._split_items_by_byte_budget(
+                    items, TELEGRAM_MAX_GROUP_TOTAL_BYTES,
                )
-                first_chunk_overall = False
-                if chunk_err is not None:
-                    return chunk_err
-                all_message_ids.extend(chunk_msg_ids)
+                if len(sub_chunks) > 1:
+                    _LOGGER.info(
+                        "sendMediaGroup: byte-budget split chunk %d/%d into %d sub-chunks",
+                        chunk_idx + 1, len(chunks), len(sub_chunks),
+                    )

-        if not all_message_ids:
-            _LOGGER.warning(
-                "sendMediaGroup completed with 0 message_ids — nothing was delivered",
+                for sub_items in sub_chunks:
+                    await maybe_delay()
+                    sub_caption = caption if caption_pending else None
+                    sub_reply = reply_to_message_id if reply_pending else None
+                    caption_pending = False
+                    reply_pending = False
+                    if sub_caption:
+                        self._attach_caption_to_first(
+                            sub_items, sub_caption, parse_mode,
+                        )
+
+                    msg_ids, err = await self._post_media_group(
+                        chat_id, sub_items, sub_reply, chunk_idx, len(chunks),
+                    )
+                    if err is None:
+                        all_message_ids.extend(msg_ids)
+                        delivered += len(sub_items)
+                        continue
+
+                    # Telegram rejected the sub-chunk after our
+                    # pre-flight passed (content / transient / rate).
+                    # Try each item as its own message so partial
+                    # delivery survives the chunk-level failure.
+                    # Record the chunk-level cause first so the
+                    # operator-visible ``errors`` list reads in
+                    # cause-then-consequence order.
+                    _LOGGER.warning(
+                        "sendMediaGroup chunk %d/%d failed (%s) — falling back to per-item",
+                        chunk_idx + 1, len(chunks), err.get("error"),
+                    )
+                    errors.append({
+                        "kind": "chunk",
+                        "chunk": chunk_idx + 1,
+                        "error": err.get("error", "unknown"),
+                        "code": err.get("error_code"),
+                    })
+                    for item_idx, item in enumerate(sub_items):
+                        item_caption = sub_caption if item_idx == 0 else None
+                        item_reply = sub_reply if item_idx == 0 else None
+                        # No ``maybe_delay()`` here: per-item retries
+                        # are a recovery path where added latency
+                        # only widens the outage window — the
+                        # individual sendPhoto/sendVideo calls have
+                        # their own 429 backoff in ``_upload_media``.
+                        item_result = await self._send_item_individually(
+                            chat_id, item, item_caption, item_reply, parse_mode,
+                        )
+                        if item_result.get("success"):
+                            delivered += 1
+                            mid = item_result.get("message_id")
+                            if mid is not None:
+                                all_message_ids.append(mid)
+                        else:
+                            failed += 1
+                            errors.append({
+                                "kind": "item",
+                                "chunk": chunk_idx + 1,
+                                "item_index": item_idx,
+                                "error": item_result.get("error", "unknown"),
+                            })
+
+        # Deferred oversized-videos-as-documents: send each on its own
+        # via sendDocument. They couldn't ride in the media group
+        # because Telegram refuses to mix document with photo/video,
+        # and per-item failures don't poison siblings.
+        for deferred in deferred_documents:
+            await maybe_delay()
+            d_caption = caption if caption_pending else None
+            d_reply = reply_to_message_id if reply_pending else None
+            caption_pending = False
+            reply_pending = False
+            d_result = await self._send_item_individually(
+                chat_id, deferred, d_caption, d_reply, parse_mode,
            )
-            return {"success": False, "error": "no_items_delivered"}
-        return {"success": True, "message_ids": all_message_ids}
+            if d_result.get("success"):
+                delivered += 1
+                mid = d_result.get("message_id")
+                if mid is not None:
+                    all_message_ids.append(mid)
+            else:
+                failed += 1
+                errors.append({
+                    "kind": "deferred_document",
+                    "error": d_result.get("error", "unknown"),
+                })
+
+        if delivered == 0:
+            if skipped > 0 and not errors:
+                msg = f"all {skipped} item(s) filtered before send"
+            elif errors:
+                msg = errors[0].get("error", "no_items_delivered")
+            else:
+                msg = "no_items_delivered"
+            _LOGGER.warning(
+                "sendMediaGroup delivered 0 items (skipped=%d failed=%d)",
+                skipped, failed,
+            )
+            return {
+                "success": False,
+                "error": msg,
+                "message_ids": [],
+                "delivered_count": 0,
+                "skipped_count": skipped,
+                "failed_count": failed,
+                "errors": errors or None,
+                "failed_at_chunk": errors[0].get("chunk") if errors else None,
+            }
+
+        return {
+            "success": True,
+            "message_ids": all_message_ids,
+            "delivered_count": delivered,
+            "skipped_count": skipped,
+            "failed_count": failed,
+            "errors": errors or None,
+        }
+
+    @staticmethod
+    def _split_items_by_byte_budget(
+        items: list[_MediaItem], max_bytes: int,
+    ) -> list[list[_MediaItem]]:
+        """Greedy-pack ``items`` into sub-chunks under ``max_bytes`` each.
+
+        Cached items (``upload_bytes == 0``) are free and never force a
+        split. A single item that on its own exceeds the budget is
+        placed alone — letting Telegram return a precise error rather
+        than dropping it silently. Order is preserved so caption
+        attachment stays deterministic.
+        """
+        if not items:
+            return []
+        groups: list[list[_MediaItem]] = []
+        current: list[_MediaItem] = []
+        current_size = 0
+        for item in items:
+            cost = item.upload_bytes
+            if current and current_size + cost > max_bytes:
+                groups.append(current)
+                current = []
+                current_size = 0
+            current.append(item)
+            current_size += cost
+        if current:
+            groups.append(current)
+        return groups
+
+    @staticmethod
+    def _attach_caption_to_first(
+        items: list[_MediaItem], caption: str, parse_mode: str,
+    ) -> None:
+        """Inject caption + parse_mode into the first item's media_json.
+
+        Telegram displays the caption of the first media-group item; the
+        rest are ignored. Idempotent — re-attaching simply overwrites.
+        """
+        if not items:
+            return
+        items[0].media_json["caption"] = _truncate(caption, TELEGRAM_MAX_CAPTION_LENGTH)
+        items[0].media_json["parse_mode"] = parse_mode
+
+    async def _send_item_individually(
+        self, chat_id: str, item: _MediaItem,
+        caption: str | None, reply_to_message_id: int | None,
+        parse_mode: str,
+    ) -> NotificationResult:
+        """Send one ``_MediaItem`` as a standalone sendPhoto/sendVideo/sendDocument.
+
+        Used as the per-item fallback when sendMediaGroup itself
+        rejects a sub-chunk after pre-flight passed. Reuses already-
+        fetched bytes for fresh items; for cache-hit items that fail
+        the file_id POST, re-downloads from ``source_url`` so a stale
+        ``file_id`` doesn't silently lose an asset — the original
+        single-item path does the same recovery.
+        """
+        media_type = item.media_json.get("type") or "photo"
+        if media_type == "photo":
+            kind = _PHOTO_KIND
+        elif media_type == "video":
+            kind = _VIDEO_KIND
+        else:
+            kind = _DOCUMENT_KIND
+
+        cache: TelegramFileCache | None = None
+        cache_key: str | None = None
+        thumbhash: str | None = None
+        if item.cache_info is not None:
+            ck, _ck_type, ck_thumb, _ck_size = item.cache_info
+            cache = self._get_cache_for_key(ck)
+            cache_key = ck
+            thumbhash = ck_thumb
+
+        # Cached items have no attachment bytes — POST the file_id
+        # reference first; if that fails transiently, re-download via
+        # source_url and upload fresh. This matches what _send_photo /
+        # _send_video do for their cache path.
+        if item.attachment is None:
+            file_id = item.media_json.get("media", "")
+            if file_id and not file_id.startswith("attach://"):
+                cached_result = await self._send_from_cache(
+                    kind, chat_id, file_id, caption, reply_to_message_id, parse_mode,
+                )
+                if cached_result is not None:
+                    return cached_result
+
+            if not item.source_url:
+                return {"success": False, "error": "Cached fallback send failed (no source URL)"}
+            data, err = await self._safe_get(
+                self._resolve_url(item.source_url), item.download_headers,
+            )
+            if data is None:
+                return {"success": False, "error": f"Re-download failed: {err}"}
+            return await self._upload_media(
+                kind, chat_id, data,
+                kind.default_filename, kind.default_content_type,
+                caption, reply_to_message_id, parse_mode,
+                cache, cache_key, thumbhash,
+            )
+
+        _, data, filename, content_type = item.attachment
+        return await self._upload_media(
+            kind, chat_id, data, filename, content_type,
+            caption, reply_to_message_id, parse_mode,
+            cache, cache_key, thumbhash,
+        )

    @staticmethod
    def _partition_media_by_kind(
@@ -830,23 +1197,40 @@ class TelegramClient:
        self,
        chunk: list[dict[str, Any]],
        max_asset_data_size: int | None,
-        first_caption: str | None,
-        parse_mode: str,
-    ) -> list[_MediaItem]:
+        send_large_videos_as_documents: bool = False,
+    ) -> tuple[list[_MediaItem], list[_MediaItem]]:
        """Fetch + filter a chunk and return aligned media-group items.

+        Returns ``(items, deferred_documents)`` — ``items`` go into
+        sendMediaGroup, ``deferred_documents`` are oversized videos
+        retagged as documents (when the caller opted in) that will be
+        sent individually via ``_send_item_individually`` *after* the
+        group sends. Telegram rejects mixing documents with photo/video
+        in one group, so they have to ride out separately.
+
        Concurrency is bounded by ``_MEDIA_FETCH_CONCURRENCY`` so peak
        memory stays predictable. Per-fetch exceptions are isolated via
        ``return_exceptions=True`` so a single failed download cannot
        cancel its peers.
+
+        Caption injection is intentionally NOT performed here — callers
+        attach the caption after byte-budget sub-splitting so it lands
+        on the first item of the first delivered sub-chunk.
        """
        sem = asyncio.Semaphore(_MEDIA_FETCH_CONCURRENCY)

-        async def fetch(idx: int, item: dict[str, Any]) -> tuple[int, dict | None, bytes | None]:
+        async def fetch(
+            idx: int, item: dict[str, Any],
+        ) -> tuple[int, dict | None, bytes | None, bool]:
+            """Returns ``(idx, cached_entry, data, defer_as_document)``.
+
+            ``defer_as_document=True`` signals "video bytes valid but
+            too big for sendVideo — caller should send as document".
+            """
            url = item.get("url")
            if not url:
                _LOGGER.warning("Media skipped: missing url (idx=%d type=%s)", idx, item.get("type"))
-                return idx, None, None
+                return idx, None, None, False
            media_type = item.get("type", "photo")
            custom_cache_key = item.get("cache_key")

@@ -860,7 +1244,7 @@ class TelegramClient:
            )
            cached = item_cache.get(ck, thumbhash=item_thumbhash) if item_cache else None
            if cached and cached.get("file_id"):
-                return idx, cached, None
+                return idx, cached, None, False

            preloaded = item.get("data")
            data: bytes | None
@@ -874,34 +1258,40 @@ class TelegramClient:
                        "Media skipped: download failed (idx=%d type=%s): %s",
                        idx, media_type, err,
                    )
-                    return idx, None, None
+                    return idx, None, None, False

            if max_asset_data_size and len(data) > max_asset_data_size:
                _LOGGER.warning(
                    "Media skipped: size %d exceeds max_asset_data_size %d (idx=%d type=%s)",
                    len(data), max_asset_data_size, idx, media_type,
                )
-                return idx, None, None
+                return idx, None, None, False
            if media_type == "video" and len(data) > TELEGRAM_MAX_VIDEO_SIZE:
+                if send_large_videos_as_documents:
+                    _LOGGER.info(
+                        "Video %d bytes over Telegram limit (idx=%d) — deferring as document",
+                        len(data), idx,
+                    )
+                    return idx, None, data, True
                _LOGGER.warning(
                    "Media skipped: video %d bytes exceeds Telegram limit %d (idx=%d)",
                    len(data), TELEGRAM_MAX_VIDEO_SIZE, idx,
                )
-                return idx, None, None
+                return idx, None, None, False
            if media_type == "photo":
                exceeds, reason, _, _ = check_photo_limits(data)
                if exceeds:
                    _LOGGER.warning(
                        "Media skipped: photo %s (idx=%d)", reason, idx,
                    )
-                    return idx, None, None
-            return idx, None, data
+                    return idx, None, None, False
+            return idx, None, data, False

        raw = await asyncio.gather(
            *(fetch(i, item) for i, item in enumerate(chunk)),
            return_exceptions=True,
        )
-        results: list[tuple[int, dict | None, bytes | None]] = []
+        results: list[tuple[int, dict | None, bytes | None, bool]] = []
        for entry in raw:
            if isinstance(entry, Exception):
                _LOGGER.warning("Media fetch raised: %s", redact_exc(entry))
@@ -909,8 +1299,9 @@ class TelegramClient:
            results.append(entry)

        items: list[_MediaItem] = []
+        deferred_documents: list[_MediaItem] = []
        upload_idx = 0
-        for idx, cached_entry, data in results:
+        for idx, cached_entry, data, defer_as_document in results:
            item = chunk[idx]
            url = item.get("url")
            if not url:
@@ -918,6 +1309,35 @@ class TelegramClient:
            media_type = item.get("type") or "photo"
            custom_cache_key = item.get("cache_key")

+            # Deferred videos-as-documents are NEVER cache hits (the
+            # cache lookup branch returns early before the size check),
+            # so we always have fresh bytes here. Retag the
+            # media_json so ``_send_item_individually`` routes via
+            # ``_DOCUMENT_KIND`` to /sendDocument.
+            if defer_as_document and data is not None:
+                ct = item.get("content_type") or "video/mp4"
+                # Best-effort filename preserves the original
+                # extension so Telegram clients give it a sensible
+                # icon and the recipient can re-open it.
+                fname = url.split("/")[-1].split("?")[0] or "video.mp4"
+                if "." not in fname:
+                    fname = "video.mp4"
+                ck = custom_cache_key or extract_asset_id_from_url(url) or url
+                ck_is_asset = is_asset_cache_key(ck)
+                bare_ck = asset_id_from_cache_key(ck) if ck_is_asset else ck
+                th = (
+                    self._thumbhash_resolver(bare_ck)
+                    if ck_is_asset and self._thumbhash_resolver else None
+                )
+                deferred_documents.append(_MediaItem(
+                    media_json={"type": "document", "media": "attach://deferred"},
+                    cache_info=(ck, "document", th, len(data)),
+                    attachment=("deferred", data, fname, ct),
+                    source_url=url,
+                    download_headers=item.get("headers"),
+                ))
+                continue
+
            if cached_entry and cached_entry.get("file_id"):
                mij: dict[str, Any] = {"type": media_type, "media": cached_entry["file_id"]}
                cache_info: tuple[str, str, str | None, int] | None = None
@@ -940,14 +1360,14 @@ class TelegramClient:
            else:
                continue

-            if first_caption and not items:
-                # Only the first usable item in the first chunk receives
-                # the caption, per Telegram's media-group semantics.
-                mij["caption"] = _truncate(first_caption, TELEGRAM_MAX_CAPTION_LENGTH)
-                mij["parse_mode"] = parse_mode
-
-            items.append(_MediaItem(media_json=mij, cache_info=cache_info, attachment=attachment))
-        return items
+            items.append(_MediaItem(
+                media_json=mij,
+                cache_info=cache_info,
+                attachment=attachment,
+                source_url=url,
+                download_headers=item.get("headers"),
+            ))
+        return items, deferred_documents

    async def _post_media_group(
        self,
@@ -973,6 +1393,7 @@ class TelegramClient:
            for name, payload, filename, ct in attachments:
                f.add_field(name, payload, filename=filename, content_type=ct)
            f.add_field("media", json.dumps(media_json))
+            _apply_send_opts_to_form(f)
            return f

        for attempt in range(1, _TG_429_MAX_ATTEMPTS + 1):
@@ -13,6 +13,11 @@ _LOGGER = logging.getLogger(__name__)
 TELEGRAM_API_BASE_URL: Final = "https://api.telegram.org/bot"
 TELEGRAM_MAX_PHOTO_SIZE: Final = 10 * 1024 * 1024  # 10 MB
 TELEGRAM_MAX_VIDEO_SIZE: Final = 50 * 1024 * 1024  # 50 MB
+# Telegram's sendMediaGroup envelope tops out near 50 MB total (multipart
+# bytes including form overhead). 45 MB keeps a safety margin so we don't
+# eat 413s when the per-item budget admits items that, summed, would
+# bust Telegram's request cap.
+TELEGRAM_MAX_GROUP_TOTAL_BYTES: Final = 45 * 1024 * 1024  # 45 MB
 TELEGRAM_MAX_DIMENSION_SUM: Final = 10000
 # Telegram message-text limit (sendMessage) and caption limit
 # (sendPhoto/sendVideo/sendDocument/first item of sendMediaGroup).
@@ -126,36 +131,6 @@ def build_telegram_asset_entry(
    return entry


-def split_media_by_upload_size(
-    media_items: list[tuple], max_upload_size: int
-) -> list[list[tuple]]:
-    """Split media items into sub-groups respecting upload size limit."""
-    if not media_items:
-        return []
-
-    groups: list[list[tuple]] = []
-    current_group: list[tuple] = []
-    current_size = 0
-
-    for item in media_items:
-        media_ref = item[1]
-        is_cached = item[4]
-        item_size = 0 if is_cached else (len(media_ref) if isinstance(media_ref, bytes) else 0)
-
-        if current_group and current_size + item_size > max_upload_size:
-            groups.append(current_group)
-            current_group = []
-            current_size = 0
-
-        current_group.append(item)
-        current_size += item_size
-
-    if current_group:
-        groups.append(current_group)
-
-    return groups
-
-
 def check_photo_limits(
    data: bytes,
 ) -> tuple[bool, str | None, int | None, int | None]:
@@ -315,6 +315,63 @@ async def clear_telegram_cache(
    return result


+class DiagnosticActivateBody(BaseModel):
+    module: str
+    duration_minutes: int = 30
+
+
+@router.get("/diagnostic-mode")
+async def list_diagnostic_overrides(
+    user: User = Depends(require_admin),
+):
+    """List currently-active temporary DEBUG overrides + their countdown.
+
+    Drives the dashboard panel that lets admins toggle a module to DEBUG
+    for a bounded window with auto-revert.
+    """
+    from ..services.diagnostic_mode import list_active
+    return {"active": list_active()}
+
+
+@router.post("/diagnostic-mode")
+async def activate_diagnostic_override(
+    body: DiagnosticActivateBody,
+    user: User = Depends(require_admin),
+):
+    """Flip ``module`` to DEBUG and schedule an auto-revert.
+
+    Re-activating an already-active module replaces the prior schedule.
+    Returns the new entry shape so the UI can render countdown without
+    a follow-up GET. The service module reads the current ``log_levels``
+    setting at activation and at revert so an admin who edits overrides
+    mid-window doesn't see a stale baseline restored.
+    """
+    from ..services.diagnostic_mode import set_diagnostic
+    try:
+        entry = await set_diagnostic(body.module, body.duration_minutes)
+    except ValueError as err:
+        raise HTTPException(status_code=400, detail=str(err)) from err
+    return entry
+
+
+@router.delete("/diagnostic-mode/{module:path}")
+async def revert_diagnostic_override(
+    module: str,
+    user: User = Depends(require_admin),
+):
+    """Manually revert a single module before its window ends.
+
+    Returns 404 when no override was active so the caller can fall through
+    to a friendly "nothing to revert" UX without parsing booleans.
+    """
+    from ..services.diagnostic_mode import revert_diagnostic
+    if not await revert_diagnostic(module):
+        raise HTTPException(
+            status_code=404, detail=f"No active override for {module!r}",
+        )
+    return {"reverted": module}
+
+
@router.get("/locales")
 async def get_supported_locales(
    user: User = Depends(get_current_user),
@@ -13,6 +13,7 @@ from jinja2.sandbox import SandboxedEnvironment
 from sqlmodel import select
 from sqlmodel.ext.asyncio.session import AsyncSession

+from notify_bridge_core.log_context import enrich_details_with_correlation
 from notify_bridge_core.notifications.telegram.client import TelegramClient
 from ..database.engine import get_engine
 from ..database.models import (
@@ -347,7 +348,7 @@ async def _log_command_event(
                collection_id=str(chat_id),
                collection_name=_format_command_subject(cmd, args),
                assets_count=media_total,
-                details=details,
+                details=enrich_details_with_correlation(details),
            ))
            await session.commit()
    except Exception:  # noqa: BLE001 — diagnostic only, never block reply
@@ -1,6 +1,7 @@
 """Notify Bridge Server — FastAPI application entry point."""

 import logging
+import uuid
 from contextlib import asynccontextmanager

 from fastapi import FastAPI
@@ -8,6 +9,11 @@ from fastapi.middleware.cors import CORSMiddleware
 from slowapi import _rate_limit_exceeded_handler
 from slowapi.errors import RateLimitExceeded
 from slowapi.middleware import SlowAPIMiddleware
+from starlette.middleware.base import BaseHTTPMiddleware, RequestResponseEndpoint
+from starlette.requests import Request as StarletteRequest
+from starlette.responses import Response as StarletteResponse
+
+from notify_bridge_core.log_context import bind_log_context

 from .config import settings as _log_cfg
 from .logging_setup import setup_logging
@@ -163,6 +169,16 @@ async def lifespan(app: FastAPI):
    _READY = False
    from .services.ha_subscription import stop_all as stop_ha_subscriptions
    await stop_ha_subscriptions()
+    # Restore the DB-configured baseline level for any temporary DEBUG
+    # overrides before the engine is disposed — so even a forced restart
+    # leaves the world tidy and doesn't leak DEBUG state into the next
+    # process (which would also be wiped by setup_logging() at boot, but
+    # being explicit about shutdown is cheaper than relying on a re-init).
+    from .services.diagnostic_mode import revert_all as revert_diagnostics
+    try:
+        await revert_diagnostics()
+    except Exception:  # pragma: no cover — never block shutdown on this.
+        _LOGGER.exception("Failed to revert diagnostic overrides during shutdown")
    scheduler = get_scheduler()
    if scheduler.running:
        scheduler.shutdown(wait=True)
@@ -178,9 +194,55 @@ _APP_VERSION = _resolve_version()
 app = FastAPI(title="Notify Bridge", version=_APP_VERSION, lifespan=lifespan)

 # --- Security headers ---
-from starlette.middleware.base import BaseHTTPMiddleware
-from starlette.requests import Request as StarletteRequest
-from starlette.responses import Response as StarletteResponse
+
+
+# Bounded character set for accepted inbound X-Request-Id values. Anything
+# outside this is replaced with a server-generated id so a malicious header
+# can't smuggle CR/LF into log lines or break grep-by-field parsing.
+# ``:`` is intentionally excluded so an inbound value can't masquerade as a
+# server-minted ``disp:<hex>`` / ``req:<hex>`` id and confuse operator greps.
+_REQUEST_ID_MAX_LEN = 64
+_REQUEST_ID_ALLOWED = set(
+    "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_"
+)
+
+
+def _normalize_request_id(raw: str | None) -> str:
+    if not raw:
+        return f"req:{uuid.uuid4().hex[:12]}"
+    raw = raw.strip()
+    if not raw or len(raw) > _REQUEST_ID_MAX_LEN:
+        return f"req:{uuid.uuid4().hex[:12]}"
+    if not all(c in _REQUEST_ID_ALLOWED for c in raw):
+        return f"req:{uuid.uuid4().hex[:12]}"
+    return raw
+
+
+class RequestContextMiddleware(BaseHTTPMiddleware):
+    """Bind a per-request ``request_id`` ContextVar and echo it back.
+
+    Reads ``X-Request-Id`` from the inbound request (so an upstream proxy
+    with its own correlation system can propagate its id), falling back to
+    a short random ``req:<12 hex>`` value. Always sets the same id on the
+    response ``X-Request-Id`` header so the SPA can surface it for
+    operator-friendly bug reports.
+
+    Bound via :func:`bind_log_context` so the id appears on every log line
+    emitted during request handling (``[req=...]``) and is picked up by
+    :func:`notify_bridge_core.log_context.enrich_details_with_correlation`
+    when an ``EventLog`` row is written during the same request.
+    """
+
+    async def dispatch(
+        self,
+        request: StarletteRequest,
+        call_next: RequestResponseEndpoint,
+    ) -> StarletteResponse:
+        req_id = _normalize_request_id(request.headers.get("x-request-id"))
+        with bind_log_context(request_id=req_id):
+            response: StarletteResponse = await call_next(request)
+        response.headers["X-Request-Id"] = req_id
+        return response


 _CSP = (
@@ -238,6 +300,12 @@ app.add_middleware(
    allow_headers=["*"],
 )

+# Request-ID middleware is added LAST so it becomes the outermost wrapper —
+# every other middleware (CORS, rate limit, security headers) then logs with
+# the request_id already bound, and CORS preflight responses also carry the
+# X-Request-Id echo header.
+app.add_middleware(RequestContextMiddleware)
+
 # Register routes — static paths before parameterized
 app.include_router(auth_router)
 app.include_router(template_vars_router)
@@ -9,6 +9,11 @@ from typing import Any
 from sqlmodel import select
 from sqlmodel.ext.asyncio.session import AsyncSession

+from notify_bridge_core.log_context import (
+    bind_log_context,
+    ensure_dispatch_id,
+    enrich_details_with_correlation,
+)
 from notify_bridge_core.providers.action_executor import ActionResult

 from ..database.engine import get_engine
@@ -27,6 +32,15 @@ async def run_action(
    action_id: int, *, trigger: str = "scheduled"
 ) -> ActionResult:
    """Load an action from DB, execute it, and save the execution log."""
+    # One dispatch_id per action run so the EventLog row (and any inner log
+    # lines emitted by the action executor) share a correlation id.
+    with bind_log_context(dispatch_id=ensure_dispatch_id()):
+        return await _run_action_impl(action_id, trigger=trigger)
+
+
+async def _run_action_impl(
+    action_id: int, *, trigger: str = "scheduled"
+) -> ActionResult:
    engine = get_engine()

    # ------------------------------------------------------------------
@@ -142,7 +156,7 @@ async def run_action(
                    # without a separate action_name renderer.
                    collection_name=action.name,
                    assets_count=action_result.total_items_affected,
-                    details={
+                    details=enrich_details_with_correlation({
                        "action_type": action.action_type,
                        "trigger": trigger,
                        "rules_processed": action_result.rules_processed,
@@ -150,7 +164,7 @@ async def run_action(
                        "rules_failed": action_result.rules_failed,
                        "error": action_result.error or "",
                        "execution_id": execution_id,
-                    },
+                    }),
                ))

        await session.commit()
@@ -33,6 +33,11 @@ from sqlalchemy.orm.attributes import flag_modified
 from sqlmodel import select
 from sqlmodel.ext.asyncio.session import AsyncSession

+from notify_bridge_core.log_context import (
+    bind_log_context,
+    ensure_dispatch_id,
+    enrich_details_with_correlation,
+)
 from notify_bridge_core.models.events import EventType, ServiceEvent
 from notify_bridge_core.models.media import MediaAsset, MediaType
 from notify_bridge_core.notifications.dispatcher import (
@@ -56,6 +61,7 @@ from .dispatch_helpers import (
    load_link_data,
    resolve_provider_credential,
 )
+from .dispatch_summary import summarize_dispatch_results

 _LOGGER = logging.getLogger(__name__)

@@ -616,12 +622,12 @@ async def _mark_dropped(
        collection_name=payload.get("collection_name", ""),
        assets_count=int(payload.get("added_count", 0))
            or int(payload.get("removed_count", 0)),
-        details={
+        details=enrich_details_with_correlation({
            "dispatch_status": "deferred_then_dropped",
            "reason": reason,
            "original_event_log_id": row.event_log_id,
            "provider_type": payload.get("provider_type", ""),
-        },
+        }),
    ))


@@ -644,6 +650,28 @@ async def _process_row(
    entry produces its own target_config so a broadcast deferred row fans
    out to all current children at drain time.
    """
+    # Bind a fresh dispatch_id per drained row so the EventLog rows written
+    # by the success/drop paths AND the inner dispatcher's log lines share
+    # one id. Each deferred row is a logically separate dispatch attempt.
+    with bind_log_context(dispatch_id=ensure_dispatch_id()):
+        await _process_row_impl(
+            session, row, tracker, provider_id, provider_name,
+            provider_config, app_tz, link_by_id, dispatcher, stats,
+        )
+
+
+async def _process_row_impl(
+    session: AsyncSession,
+    row: DeferredDispatch,
+    tracker: NotificationTracker,
+    provider_id: int,
+    provider_name: str,
+    provider_config: dict[str, Any],
+    app_tz: str,
+    link_by_id: dict[int, list[dict[str, Any]]],
+    dispatcher: NotificationDispatcher,
+    stats: dict[str, int],
+) -> None:
    expanded = link_by_id.get(row.link_id)
    if not expanded:
        # Link removed/disabled between defer and drain.
@@ -735,6 +763,8 @@ async def _process_row(
    row.fired_at = datetime.now(timezone.utc)
    session.add(row)

+    summary = summarize_dispatch_results(results)
+
    if success:
        stats["fired"] += 1
        session.add(EventLog(
@@ -747,14 +777,15 @@ async def _process_row(
            collection_id=row.collection_id,
            collection_name=event.collection_name,
            assets_count=event.added_count or event.removed_count or 0,
-            details={
+            details=enrich_details_with_correlation({
                "dispatch_status": "delivered_after_quiet_hours",
                "original_event_log_id": row.event_log_id,
                "deferred_for_seconds": int(
                    (row.fired_at - row.created_at).total_seconds()
                ),
                "provider_type": event.provider_type.value,
-            },
+                "dispatch_summary": summary,
+            }),
        ))
    else:
        stats["dropped"] += 1
@@ -769,12 +800,13 @@ async def _process_row(
            collection_id=row.collection_id,
            collection_name=event.collection_name,
            assets_count=event.added_count or event.removed_count or 0,
-            details={
+            details=enrich_details_with_correlation({
                "dispatch_status": "deferred_then_failed",
                "reason": str(first_err)[:200],
                "original_event_log_id": row.event_log_id,
                "provider_type": event.provider_type.value,
-            },
+                "dispatch_summary": summary,
+            }),
        ))


@@ -0,0 +1,381 @@
+"""Temporary per-module DEBUG overrides with auto-revert.
+
+The runtime ``apply_log_levels()`` API in ``logging_setup`` already lets
+admins flip a module to DEBUG, but the existing path requires editing the
+``log_levels`` DB setting and remembering to revert it. Operators end up
+either forgetting (leaving DEBUG-flooded logs in production) or never
+turning it on (debugging through stderr only).
+
+This module gives the dashboard a cheap toggle: "give me DEBUG for
+``notify_bridge_core.notifications.telegram.client`` for 30 minutes" —
+apply immediately, schedule a one-shot job at ``now + 30 min`` that
+reverts to whatever level that module would normally have under the
+current DB-configured ``log_levels``.
+
+State is in-memory only. A server restart wipes every active override,
+which is the right semantic: ``setup_logging`` re-applies the
+DB-configured baseline at boot, so a forgotten override can never
+silently carry across a deploy. The lifespan shutdown also calls
+:func:`revert_all` to cleanly restore baselines before the process
+exits — useful for hot-reload dev loops where the server restarts in
+place.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import logging
+from dataclasses import dataclass
+from datetime import datetime, timedelta, timezone
+from typing import Any
+
+from sqlmodel.ext.asyncio.session import AsyncSession
+
+from ..database.engine import get_engine
+from ..logging_setup import (
+    _NOISY_LIBRARY_DEFAULTS,
+    parse_level_overrides,
+)
+
+_LOGGER = logging.getLogger(__name__)
+
+# Limits picked to match what "an operator clicked this button" looks like.
+# One minute is enough to reproduce a single failing dispatch; four hours is
+# long enough for a slow-rolling incident without risking a forgotten
+# override outliving a workday.
+_MIN_DURATION_MINUTES = 1
+_MAX_DURATION_MINUTES = 240
+
+# Allowlist of module namespaces an operator can flip. Lets us catch typos
+# and blocks ``""`` (root) — flipping the root logger to DEBUG floods
+# stderr with stuff the operator probably didn't want (boto3, jinja2,
+# every dependency). Anything matching is accepted, anything else is
+# rejected with a 400.
+_ALLOWED_PREFIXES = (
+    "notify_bridge_core",
+    "notify_bridge_server",
+    "sqlalchemy",
+    "aiohttp",
+    "apscheduler",
+    "urllib3",
+    "httpx",
+    "httpcore",
+    "asyncio",
+    "PIL",
+    "uvicorn",
+    "starlette",
+    "fastapi",
+)
+
+
+@dataclass(frozen=True)
+class _Override:
+    """One active DEBUG override.
+
+    ``baseline_level`` is what the module had at activation time — used
+    for the dashboard's "→ WARNING" display. The actual revert path
+    re-reads the current DB-configured ``log_levels`` so a setting change
+    made *while* the override is active is honored at expiry.
+    """
+
+    module: str
+    baseline_level: str
+    activated_at: datetime
+    expires_at: datetime
+
+
+# Module name → active override. Mutated only from the asyncio thread.
+_active: dict[str, _Override] = {}
+
+# Strong references for background tasks created via the asyncio-timer
+# fallback path. CPython's event loop holds only weak refs, so a task
+# without an external retainer can be GC'd before it fires. Tasks are
+# discarded automatically when they complete.
+_bg_tasks: set[asyncio.Task[None]] = set()
+
+
+def _is_allowed(module: str) -> bool:
+    if not module:
+        return False
+    return any(module == p or module.startswith(p + ".") for p in _ALLOWED_PREFIXES)
+
+
+def _normalize_level_name(lvl: int) -> str:
+    """Return a canonical string for a logging level code."""
+    name = logging.getLevelName(lvl)
+    if isinstance(name, str) and name and not name.startswith("Level "):
+        return name
+    return "INFO"
+
+
+def _walk_dotted(name: str) -> list[str]:
+    """Yield ``name`` then progressively shorter dotted prefixes.
+
+    ``"sqlalchemy.engine.Engine"`` →
+    ``["sqlalchemy.engine.Engine", "sqlalchemy.engine", "sqlalchemy"]``.
+    Mirrors Python's logger-hierarchy traversal so a sub-logger inherits
+    its parent's override / noisy default rather than falling through to
+    the root level.
+    """
+    out = [name]
+    while "." in name:
+        name = name.rsplit(".", 1)[0]
+        out.append(name)
+    return out
+
+
+def _baseline_for(module: str, db_log_levels: str | None) -> str:
+    """The level ``module`` would have if no diagnostic override were active.
+
+    Precedence per dotted-parent walk:
+      1. Explicit DB ``log_levels`` entry (most specific wins).
+      2. Curated noisy-library default in ``_NOISY_LIBRARY_DEFAULTS``.
+      3. Root logger effective level.
+    """
+    overrides = parse_level_overrides(db_log_levels or "")
+    for candidate in _walk_dotted(module):
+        if candidate in overrides:
+            return overrides[candidate]
+        if candidate in _NOISY_LIBRARY_DEFAULTS:
+            return _NOISY_LIBRARY_DEFAULTS[candidate]
+    root_level = logging.getLogger().getEffectiveLevel()
+    return _normalize_level_name(root_level)
+
+
+async def _read_db_log_levels() -> str:
+    """Snapshot the current ``log_levels`` setting in a short-lived session.
+
+    Called at activation AND at revert time so the revert reflects any
+    setting change made while the override was active. Best-effort: a
+    DB hiccup degrades to empty (no DB overrides), which makes the
+    revert use noisy-library defaults — safer than crashing the timer.
+    """
+    try:
+        from ..api.app_settings import get_setting
+        async with AsyncSession(get_engine()) as session:
+            return await get_setting(session, "log_levels") or ""
+    except Exception:  # noqa: BLE001
+        _LOGGER.debug(
+            "diagnostic_mode: failed to read log_levels from DB; "
+            "revert will use noisy-library defaults",
+            exc_info=True,
+        )
+        return ""
+
+
+def list_active() -> list[dict[str, Any]]:
+    """Snapshot the currently active overrides for the dashboard.
+
+    Also sweeps any entry whose ``expires_at`` is in the past — protects
+    against a scheduler misfire that left a ghost row in ``_active``.
+    """
+    now = datetime.now(timezone.utc)
+    out: list[dict[str, Any]] = []
+    expired: list[str] = []
+    for module, ov in _active.items():
+        if ov.expires_at <= now:
+            expired.append(module)
+            continue
+        out.append({
+            "module": ov.module,
+            "baseline_level": ov.baseline_level,
+            "current_level": "DEBUG",
+            "activated_at": ov.activated_at.isoformat(),
+            "expires_at": ov.expires_at.isoformat(),
+            "remaining_seconds": int((ov.expires_at - now).total_seconds()),
+        })
+    for module in expired:
+        _active.pop(module, None)
+    return out
+
+
+def is_active(module: str) -> bool:
+    ov = _active.get(module)
+    if ov is None:
+        return False
+    return ov.expires_at > datetime.now(timezone.utc)
+
+
+async def set_diagnostic(
+    module: str,
+    duration_minutes: int,
+) -> dict[str, Any]:
+    """Activate a DEBUG override for ``module`` lasting ``duration_minutes``.
+
+    Re-activating an already-active module replaces the prior schedule
+    (a clicked-twice button extends the window rather than stacking).
+
+    Returns the dashboard-ready dict; raises ``ValueError`` on bad input
+    so the API layer can surface a 400 with a precise message.
+    """
+    if not _is_allowed(module):
+        raise ValueError(
+            f"Module {module!r} is not in the diagnostic allowlist",
+        )
+    if not (_MIN_DURATION_MINUTES <= duration_minutes <= _MAX_DURATION_MINUTES):
+        raise ValueError(
+            f"duration_minutes must be between {_MIN_DURATION_MINUTES} and "
+            f"{_MAX_DURATION_MINUTES}",
+        )
+
+    db_log_levels = await _read_db_log_levels()
+    baseline = _baseline_for(module, db_log_levels)
+    now = datetime.now(timezone.utc)
+    expires_at = now + timedelta(minutes=duration_minutes)
+
+    # Apply DEBUG immediately. ``logging.getLogger(name).setLevel`` is the
+    # same primitive ``apply_log_levels`` uses, so the two mechanisms stay
+    # consistent.
+    logging.getLogger(module).setLevel("DEBUG")
+
+    # Replace any prior schedule for this module before recording the new one.
+    _remove_scheduled(module)
+    _active[module] = _Override(
+        module=module,
+        baseline_level=baseline,
+        activated_at=now,
+        expires_at=expires_at,
+    )
+    _schedule_revert(module, expires_at)
+
+    _LOGGER.info(
+        "Diagnostic mode: %s set to DEBUG (was %s) for %d min, expires at %s",
+        module, baseline, duration_minutes, expires_at.isoformat(),
+    )
+    return {
+        "module": module,
+        "baseline_level": baseline,
+        "current_level": "DEBUG",
+        "activated_at": now.isoformat(),
+        "expires_at": expires_at.isoformat(),
+        "remaining_seconds": int((expires_at - now).total_seconds()),
+    }
+
+
+async def revert_diagnostic(module: str) -> bool:
+    """Immediately end the override for ``module``. Returns ``False`` if
+    no override was active (so callers can return a 404)."""
+    ov = _active.pop(module, None)
+    if ov is None:
+        return False
+    _remove_scheduled(module)
+    db_log_levels = await _read_db_log_levels()
+    target = _baseline_for(module, db_log_levels)
+    logging.getLogger(module).setLevel(target)
+    _LOGGER.info(
+        "Diagnostic mode: %s reverted from DEBUG back to %s (manual)",
+        module, target,
+    )
+    return True
+
+
+async def revert_all() -> int:
+    """Revert every active override. Wired into the lifespan shutdown so a
+    server stop / hot-reload leaves the world in a clean state. Also
+    callable from a debug endpoint if we ever add one."""
+    count = 0
+    for module in list(_active.keys()):
+        if await revert_diagnostic(module):
+            count += 1
+    return count
+
+
+# ---------------------------------------------------------------------------
+# APScheduler glue — wired here so the API layer doesn't import scheduler.
+# ---------------------------------------------------------------------------
+
+_JOB_PREFIX = "diag_revert::"
+
+
+def _job_id_for(module: str) -> str:
+    return _JOB_PREFIX + module
+
+
+def _remove_scheduled(module: str) -> None:
+    """Drop a previously-scheduled revert job for ``module``, if any.
+
+    Best-effort: scheduler isn't always available in tests; a missing job
+    is the normal path on first-time activation. Logged at DEBUG so an
+    operator chasing a scheduler problem still sees the trail.
+    """
+    try:
+        from .scheduler import get_scheduler
+        scheduler = get_scheduler()
+    except Exception:  # noqa: BLE001
+        _LOGGER.debug(
+            "diagnostic_mode: scheduler not yet available for remove(%s)",
+            module, exc_info=True,
+        )
+        return
+    job_id = _job_id_for(module)
+    try:
+        scheduler.remove_job(job_id)
+    except Exception:  # noqa: BLE001 — JobLookupError or not-running.
+        _LOGGER.debug(
+            "diagnostic_mode: no prior schedule to remove for %s",
+            module, exc_info=True,
+        )
+
+
+def _schedule_revert(module: str, when: datetime) -> None:
+    """Schedule the auto-revert one-shot.
+
+    Falls back to a strongly-referenced ``asyncio`` task if the
+    APScheduler instance isn't running (tests, very early startup) so the
+    revert still happens.
+    """
+    try:
+        from .scheduler import get_scheduler
+        scheduler = get_scheduler()
+        if scheduler.running:
+            scheduler.add_job(
+                _expire_callback,
+                trigger="date",
+                run_date=when,
+                args=[module],
+                id=_job_id_for(module),
+                replace_existing=True,
+                misfire_grace_time=60,
+            )
+            return
+    except Exception:  # noqa: BLE001 — fall through to the task path.
+        _LOGGER.debug(
+            "diagnostic_mode: scheduler unavailable; using asyncio fallback",
+            exc_info=True,
+        )
+
+    # Fallback: in-process timer. Retain the task in a module-level set so
+    # CPython doesn't GC it before the timer fires.
+    delay = max(0.0, (when - datetime.now(timezone.utc)).total_seconds())
+
+    async def _wait_and_expire() -> None:
+        try:
+            await asyncio.sleep(delay)
+        except asyncio.CancelledError:
+            return
+        await _expire_callback(module)
+
+    try:
+        loop = asyncio.get_running_loop()
+    except RuntimeError:
+        return
+    task = loop.create_task(_wait_and_expire())
+    _bg_tasks.add(task)
+    task.add_done_callback(_bg_tasks.discard)
+
+
+async def _expire_callback(module: str) -> None:
+    """Fired by the scheduler at ``expires_at``. Re-applies the baseline.
+
+    Re-reads ``log_levels`` from the DB so a setting change made while
+    the window was active is honored at revert time (instead of using a
+    stale snapshot taken at activation).
+    """
+    ov = _active.pop(module, None)
+    db_log_levels = await _read_db_log_levels()
+    target = _baseline_for(module, db_log_levels)
+    logging.getLogger(module).setLevel(target)
+    _LOGGER.info(
+        "Diagnostic mode: %s auto-reverted from DEBUG to %s (was active=%s)",
+        module, target, ov is not None,
+    )
@@ -0,0 +1,255 @@
+"""Aggregate per-target dispatch results into an ``EventLog.details`` summary.
+
+Every dispatch site (``event_dispatch``, ``watcher``, ``deferred_dispatch``,
+``scheduled_dispatch``) calls :func:`NotificationDispatcher.dispatch` and
+gets back a ``list[dict]`` — one entry per target. Each entry has at minimum
+``success: bool`` and (on failure) ``error: str``. Telegram media-group
+sends additionally include ``delivered_count``, ``skipped_count``,
+``failed_count``, ``errors`` and ``failed_at_chunk`` so a partial delivery
+is observable from the result.
+
+Historically the dashboard only saw the per-row ``status`` derived at
+EventLog insert time — partial failures (one target out of three failed,
+two assets out of ten dropped) showed up as a generic success/failure and
+the operator had to read stderr to find the cause. This module collapses
+the per-target dicts into a small ``dispatch_summary`` block that's merged
+into ``EventLog.details`` after the dispatch completes, so the same
+information surfaces in the UI without re-reading logs.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import logging
+from typing import Any
+
+from sqlalchemy.orm.attributes import flag_modified
+from sqlmodel.ext.asyncio.session import AsyncSession
+
+from ..database.models import EventLog
+
+_LOGGER = logging.getLogger(__name__)
+
+# Bound the error list we stash on the row. A pathological dispatch (50
+# targets, 50 media items each, all failing) would otherwise bloat the
+# row past anything useful — and the dashboard renders a fixed-height
+# strip anyway. Excess entries are summarized as ``errors_truncated``.
+_MAX_ERRORS = 20
+_MAX_MEDIA_ERRORS = 20
+# Cap error message length to avoid pathological payloads in the row.
+_MAX_ERROR_MSG_LEN = 500
+# Distinct sentinel so an operator scanning the dashboard can tell our
+# clipping apart from a literal ``…`` that often appears in upstream API
+# error text (Telegram does this in some Bad Request messages).
+_TRUNCATION_MARKER = "…[truncated]"
+
+
+def _trim(value: Any) -> Any:
+    """Truncate string values to keep the persisted summary bounded."""
+    if isinstance(value, str) and len(value) > _MAX_ERROR_MSG_LEN:
+        return value[:_MAX_ERROR_MSG_LEN] + _TRUNCATION_MARKER
+    return value
+
+
+def summarize_dispatch_results(
+    results: list[dict[str, Any]],
+) -> dict[str, Any]:
+    """Aggregate per-target dispatch results into a compact summary dict.
+
+    The shape is intentionally narrow so it round-trips cleanly through
+    SQLite JSON storage and stays cheap to render in the dashboard.
+
+    Returns a dict with keys:
+
+    * ``targets_attempted`` / ``targets_succeeded`` / ``targets_failed``
+      — counts across the results list.
+    * ``errors`` — per-target failure entries
+      (``[{index, error}, ...]``), capped at ``_MAX_ERRORS``.
+    * ``media`` — present only when at least one result reports media
+      counts. ``{delivered, skipped, failed}``.
+    * ``media_errors`` — per-item / per-chunk failure entries from the
+      Telegram media-group fallback, capped at ``_MAX_MEDIA_ERRORS``.
+    * ``errors_truncated`` / ``media_errors_truncated`` — count of dropped
+      entries when the corresponding cap was hit. Present only when > 0.
+
+    Input shape: each entry is what ``NotificationDispatcher._aggregate_results``
+    returns for one target — ``{success, receivers, successes, failures,
+    results: [per-receiver, ...], errors?, error?}``. Media counts live
+    on each per-receiver dict under ``media_delivered_count`` /
+    ``media_skipped_count`` / ``media_failed_count`` /  ``media_errors``,
+    so the walk drills one level deeper than the obvious top-level reads.
+    For backward compat with simpler call sites that pass a single leaf
+    dict (the Telegram media-group result directly), the leaf shape is
+    accepted as a fallback when ``results`` is absent.
+    """
+    if not results:
+        # Empty results = nothing to summarize. Returning ``{}`` lets the
+        # callers' ``if summary`` / ``if results`` guards keep the row
+        # clean rather than stamping a misleading zero-counts block.
+        return {}
+
+    succeeded = 0
+    failed = 0
+    errors: list[dict[str, Any]] = []
+    media_delivered = 0
+    media_skipped = 0
+    media_failed = 0
+    media_errors: list[dict[str, Any]] = []
+    has_media_counts = False
+    errors_dropped = 0
+    media_errors_dropped = 0
+
+    for index, result in enumerate(results):
+        if result.get("success"):
+            succeeded += 1
+        else:
+            failed += 1
+            if len(errors) < _MAX_ERRORS:
+                errors.append({
+                    "index": index,
+                    "error": _trim(result.get("error", "unknown")),
+                })
+            else:
+                errors_dropped += 1
+
+        # Per-receiver detail is bundled under ``results`` by the
+        # dispatcher's ``_aggregate_results``. Walk it when present; fall
+        # back to reading the leaf shape directly so older callers and
+        # direct-test fixtures keep working.
+        per_receiver = result.get("results")
+        leaves: list[dict[str, Any]]
+        if isinstance(per_receiver, list):
+            leaves = [r for r in per_receiver if isinstance(r, dict)]
+        else:
+            leaves = [result]
+
+        for receiver_index, leaf in enumerate(leaves):
+            # The dispatcher's Telegram path renames the media counters
+            # to ``media_*`` to disambiguate them from the surrounding
+            # text-message result. Accept both names so a future provider
+            # that surfaces top-level counts (single-shot text+media)
+            # also gets picked up.
+            d = leaf.get("media_delivered_count")
+            if d is None:
+                d = leaf.get("delivered_count")
+            s = leaf.get("media_skipped_count")
+            if s is None:
+                s = leaf.get("skipped_count")
+            f = leaf.get("media_failed_count")
+            if f is None:
+                f = leaf.get("failed_count")
+            if d is not None or s is not None or f is not None:
+                has_media_counts = True
+                media_delivered += int(d or 0)
+                media_skipped += int(s or 0)
+                media_failed += int(f or 0)
+
+            sub_errors = leaf.get("media_errors") or leaf.get("errors") or []
+            for sub in sub_errors:
+                if not isinstance(sub, dict):
+                    # ``_aggregate_results`` populates a string list at
+                    # the target level; only dict entries carry structured
+                    # per-chunk / per-item detail worth keeping here.
+                    continue
+                if len(media_errors) >= _MAX_MEDIA_ERRORS:
+                    media_errors_dropped += 1
+                    continue
+                entry: dict[str, Any] = {"target_index": index}
+                # Only stamp the receiver index when we actually drilled
+                # into a multi-receiver target — single-leaf fallbacks
+                # leave the key off so the existing one-target tests
+                # stay shape-compatible.
+                if len(leaves) > 1 or isinstance(per_receiver, list):
+                    entry["receiver_index"] = receiver_index
+                entry.update({k: _trim(v) for k, v in sub.items()})
+                media_errors.append(entry)
+
+    summary: dict[str, Any] = {
+        "targets_attempted": len(results),
+        "targets_succeeded": succeeded,
+        "targets_failed": failed,
+    }
+    if errors:
+        summary["errors"] = errors
+    if errors_dropped:
+        summary["errors_truncated"] = errors_dropped
+    if has_media_counts:
+        summary["media"] = {
+            "delivered": media_delivered,
+            "skipped": media_skipped,
+            "failed": media_failed,
+        }
+    if media_errors:
+        summary["media_errors"] = media_errors
+    if media_errors_dropped:
+        summary["media_errors_truncated"] = media_errors_dropped
+    return summary
+
+
+def attach_summary_in_place(
+    row: EventLog, results: list[dict[str, Any]],
+) -> None:
+    """Merge a dispatch summary into ``row.details`` before its session commits.
+
+    Use when the EventLog row is still attached to a session that has not
+    yet committed — the caller's session.commit() carries the update.
+    """
+    summary = summarize_dispatch_results(results)
+    if not summary:
+        return
+    details = dict(row.details or {})
+    # Don't overwrite a summary that a caller / previous pass already
+    # set explicitly — that's the same "caller wins" rule the correlation
+    # enricher follows in ``log_context.py``.
+    if "dispatch_summary" in details:
+        return
+    details["dispatch_summary"] = summary
+    row.details = details
+    # Identity-changing reassignment above is enough for SQLAlchemy to mark
+    # the column dirty. ``flag_modified`` is belt-and-suspenders against a
+    # future refactor that switches this to in-place mutation.
+    flag_modified(row, "details")
+
+
+async def record_dispatch_summary_async(
+    session: AsyncSession,
+    event_log_id: int | None,
+    results: list[dict[str, Any]],
+) -> None:
+    """Best-effort update of an already-committed ``EventLog`` row.
+
+    Used by call sites where the row was committed in an earlier
+    transaction (the polling watcher commits its EventLog rows before
+    invoking the dispatcher, so we need a follow-up update).
+
+    Best-effort: a DB hiccup here must never abort the wider dispatch
+    flow — the row keeps its prior status / details and the operator
+    can still trace via stderr (via the ``dispatch_id`` correlation
+    written at insert time).
+    """
+    if event_log_id is None or not results:
+        return
+    summary = summarize_dispatch_results(results)
+    if not summary:
+        return
+    try:
+        row = await session.get(EventLog, event_log_id)
+        if row is None:
+            return
+        details = dict(row.details or {})
+        if "dispatch_summary" in details:
+            return
+        details["dispatch_summary"] = summary
+        row.details = details
+        flag_modified(row, "details")
+        session.add(row)
+        await session.commit()
+    except asyncio.CancelledError:
+        # Cancellation must propagate so APScheduler can drain shutdown.
+        # Swallowing it here would pin the task and leave the row in an
+        # indeterminate state.
+        raise
+    except Exception:  # noqa: BLE001
+        _LOGGER.exception(
+            "Failed to record dispatch_summary on event_log %s", event_log_id,
+        )
@@ -20,6 +20,11 @@ from typing import Any, Awaitable, Callable
 from sqlmodel import select
 from sqlmodel.ext.asyncio.session import AsyncSession

+from notify_bridge_core.log_context import (
+    bind_log_context,
+    ensure_dispatch_id,
+    enrich_details_with_correlation,
+)
 from notify_bridge_core.models.events import ServiceEvent
 from notify_bridge_core.notifications.dispatcher import (
    NotificationDispatcher,
@@ -36,6 +41,7 @@ from .dispatch_helpers import (
    load_link_data,
    resolve_provider_credential,
 )
+from .dispatch_summary import attach_summary_in_place

 _LOGGER = logging.getLogger(__name__)

@@ -141,6 +147,31 @@ async def dispatch_provider_event(
    int
        Number of successfully dispatched notifications across all trackers.
    """
+    # Bind a dispatch_id for the whole event so every EventLog row written
+    # below — and every log line emitted by the inner dispatcher — share the
+    # same correlation id. The dispatcher's own ``ensure_dispatch_id()`` call
+    # reuses this id rather than generating its own.
+    with bind_log_context(dispatch_id=ensure_dispatch_id()):
+        return await _dispatch_provider_event_impl(
+            engine, provider_id, provider_name, provider_config,
+            event, detail_keys, filter_fn,
+        )
+
+
+async def _dispatch_provider_event_impl(
+    engine: Any,
+    provider_id: int,
+    provider_name: str,
+    provider_config: dict[str, Any],
+    event: ServiceEvent,
+    detail_keys: tuple[str, ...],
+    filter_fn: FilterFn,
+) -> int:
+    """Implementation body for :func:`dispatch_provider_event`.
+
+    Split out so the public function can wrap the body in
+    :func:`bind_log_context` without re-indenting the entire flow.
+    """
    dispatched = 0
    # Drain-scheduling is best-effort: a scheduling failure must not roll
    # back the persisted defer rows (startup catch-up re-establishes them).
@@ -188,10 +219,10 @@ async def dispatch_provider_event(
                collection_id=event.collection_id,
                collection_name=event.collection_name,
                assets_count=0,
-                details={
+                details=enrich_details_with_correlation({
                    "provider_type": event.provider_type.value,
                    **extra_details,
-                },
+                }),
            )
            session.add(event_log_row)
            await session.flush()
@@ -294,6 +325,11 @@ async def dispatch_provider_event(
                event.provider_type.value != "bridge_self"
            )

+            # Accumulate per-target results across every tracking-config
+            # group so the EventLog row carries a single ``dispatch_summary``
+            # covering the full fan-out (not just the last group).
+            all_results: list[dict[str, Any]] = []
+
            for tc, target_entries in groups.values():
                if not target_entries:
                    continue
@@ -308,6 +344,7 @@ async def dispatch_provider_event(
                        "Dispatcher raised for tracker %d: %s", tracker.id, err,
                    )
                    continue
+                all_results.extend(results)
                for entry, r in zip(target_entries, results):
                    _, target_id, target_name = entry
                    if r.get("success"):
@@ -332,6 +369,12 @@ async def dispatch_provider_event(
                                    "bridge_self target-failure emission failed",
                                )

+            # Merge the aggregated per-target results onto the EventLog row
+            # while the session still owns it. The commit below carries the
+            # ``dispatch_summary`` block alongside the row's original fields.
+            if all_results:
+                attach_summary_in_place(event_log_row, all_results)
+
        await session.commit()

    # Schedule drain jobs OUTSIDE the DB session so an APScheduler hiccup
@@ -28,6 +28,7 @@ from typing import Any
 from sqlmodel import select
 from sqlmodel.ext.asyncio.session import AsyncSession

+from notify_bridge_core.log_context import enrich_details_with_correlation
 from notify_bridge_core.models.events import ServiceEvent
 from notify_bridge_core.providers.home_assistant import (
    HomeAssistantAuthError,
@@ -139,11 +140,11 @@ async def _record_ha_status(
                collection_id="",
                collection_name="",
                assets_count=0,
-                details={
+                details=enrich_details_with_correlation({
                    "provider_type": "home_assistant",
                    "ha_status": state,
                    "ha_status_detail": detail or "",
-                },
+                }),
            ))
            await session.commit()
    except Exception:  # noqa: BLE001
@@ -29,6 +29,11 @@ from zoneinfo import ZoneInfo, ZoneInfoNotFoundError
 from sqlmodel import select
 from sqlmodel.ext.asyncio.session import AsyncSession

+from notify_bridge_core.log_context import (
+    bind_log_context,
+    ensure_dispatch_id,
+    enrich_details_with_correlation,
+)
 from notify_bridge_core.models.events import EventType
 from notify_bridge_core.notifications.dispatcher import (
    NotificationDispatcher,
@@ -51,6 +56,7 @@ from .dispatch_helpers import (
    load_link_data,
    resolve_provider_credential,
 )
+from .dispatch_summary import summarize_dispatch_results
 from .manual_dispatch import build_immich_dispatch_events

 _LOGGER = logging.getLogger(__name__)
@@ -135,12 +141,12 @@ async def _log_skip(
            collection_id="",
            collection_name="",
            assets_count=0,
-            details={
+            details=enrich_details_with_correlation({
                "kind": kind,
                "trigger": "cron",
                "status": "skipped",
                "skip_reason": reason,
-            },
+            }),
        ))
        await session.commit()

@@ -164,6 +170,15 @@ async def dispatch_scheduled_for_tracker(
    the slot is disabled on the tracker's default tracking config, or no link
    has a ``TemplateConfig`` with the corresponding slot row.
    """
+    # Bind a dispatch_id for the whole cron fire so the EventLog "skipped" /
+    # "sent" rows AND the inner dispatcher log lines share one correlation id.
+    with bind_log_context(dispatch_id=ensure_dispatch_id()):
+        await _dispatch_scheduled_for_tracker_impl(tracker_id, kind)
+
+
+async def _dispatch_scheduled_for_tracker_impl(
+    tracker_id: int, kind: ScheduledKind
+) -> None:
    engine = get_engine()
    async with AsyncSession(engine) as session:
        tracker = await session.get(NotificationTracker, tracker_id)
@@ -390,6 +405,9 @@ async def dispatch_scheduled_for_tracker(
        any_sent = True

        successes = sum(1 for r in results if isinstance(r, dict) and r.get("success"))
+        summary = summarize_dispatch_results(
+            [r for r in results if isinstance(r, dict)],
+        )
        async with AsyncSession(engine) as session:
            session.add(EventLog(
                user_id=tracker_user_id,
@@ -401,7 +419,7 @@ async def dispatch_scheduled_for_tracker(
                collection_id=event.collection_id,
                collection_name=event.collection_name,
                assets_count=event.added_count or 0,
-                details={
+                details=enrich_details_with_correlation({
                    "kind": kind,
                    "slot": slot_name,
                    "trigger": "cron",
@@ -410,7 +428,8 @@ async def dispatch_scheduled_for_tracker(
                    "status": "sent",
                    "targets_dispatched": total_targets,
                    "targets_succeeded": successes,
-                },
+                    "dispatch_summary": summary,
+                }),
            ))
            await session.commit()

@@ -95,6 +95,7 @@ async def send_telegram_media(
    chunk_delay: int = 0,
    max_asset_data_size: int | None = None,
    send_large_photos_as_documents: bool = False,
+    send_large_videos_as_documents: bool = False,
    chat_action: str | None = "typing",
    thumbhash_resolver: Callable[[str], str | None] | None = None,
 ) -> NotificationResult:
@@ -116,6 +117,7 @@ async def send_telegram_media(
        chunk_delay=chunk_delay,
        max_asset_data_size=max_asset_data_size,
        send_large_photos_as_documents=send_large_photos_as_documents,
+        send_large_videos_as_documents=send_large_videos_as_documents,
        chat_action=chat_action,
    )

@@ -9,6 +9,11 @@ from typing import Any, Awaitable, Callable
 from sqlmodel import select
 from sqlmodel.ext.asyncio.session import AsyncSession

+from notify_bridge_core.log_context import (
+    bind_log_context,
+    ensure_dispatch_id,
+    enrich_details_with_correlation,
+)
 from notify_bridge_core.models.events import ServiceEvent
 from notify_bridge_core.notifications.dispatcher import NotificationDispatcher, TargetConfig
 from notify_bridge_core.notifications.telegram.cache import TelegramFileCache
@@ -30,6 +35,7 @@ from .dispatch_helpers import (
    load_link_data,
    resolve_provider_credential,
 )
+from .dispatch_summary import record_dispatch_summary_async

 _LOGGER = logging.getLogger(__name__)

@@ -262,6 +268,13 @@ _POLL_FACTORIES: dict[str, PollerFactory] = {

 async def check_tracker(tracker_id: int) -> dict[str, Any]:
    """Poll a tracker's provider for changes and dispatch notifications."""
+    # Bind a per-tick dispatch_id so the EventLog row written for each detected
+    # change carries the same correlation id as the dispatcher's log lines.
+    with bind_log_context(dispatch_id=ensure_dispatch_id()):
+        return await _check_tracker_impl(tracker_id)
+
+
+async def _check_tracker_impl(tracker_id: int) -> dict[str, Any]:
    engine = get_engine()

    # Load all DB data eagerly before entering aiohttp context
@@ -457,7 +470,7 @@ async def check_tracker(tracker_id: int) -> dict[str, Any]:
                collection_id=event.collection_id,
                collection_name=event.collection_name,
                assets_count=assets_count,
-                details=details,
+                details=enrich_details_with_correlation(details),
            )
            session.add(log)
            await session.flush()
@@ -605,6 +618,10 @@ async def check_tracker(tracker_id: int) -> dict[str, Any]:
                event.provider_type.value != "bridge_self"
            )

+            # Per-event accumulator so the summary write covers every
+            # tracking-config group, not just the last one.
+            event_results: list[dict[str, Any]] = []
+
            for tc, target_entries in groups.values():
                if not target_entries:
                    continue
@@ -616,6 +633,7 @@ async def check_tracker(tracker_id: int) -> dict[str, Any]:
                    continue
                target_configs = [entry[0] for entry in target_entries]
                results = await dispatcher.dispatch(shaped_event, target_configs)
+                event_results.extend(results)
                for entry, r in zip(target_entries, results):
                    _, target_id, target_name = entry
                    if r.get("success"):
@@ -637,6 +655,15 @@ async def check_tracker(tracker_id: int) -> dict[str, Any]:
                                    "bridge_self target-failure emission failed",
                                )

+            # The EventLog row was committed in the earlier session block
+            # so we run a tiny follow-up UPDATE in a fresh session. Best-
+            # effort: a failure here logs but does not abort the watcher.
+            if event_log_id is not None and event_results:
+                async with AsyncSession(engine) as summary_session:
+                    await record_dispatch_summary_async(
+                        summary_session, event_log_id, event_results,
+                    )
+
    return {
        "status": "ok",
        "events_detected": len(events),
@@ -0,0 +1,372 @@
+"""Temporary per-module DEBUG overrides with auto-revert.
+
+Covers the in-memory service module + a smoke pass over the API layer
+using ``dependency_overrides`` to bypass auth. The APScheduler glue is
+exercised via the fallback asyncio-timer path since tests run without a
+running scheduler.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import logging
+from datetime import datetime, timedelta, timezone
+from typing import Any
+
+import pytest
+from fastapi.testclient import TestClient
+
+
+# ---------------------------------------------------------------------------
+# Test scaffolding
+# ---------------------------------------------------------------------------
+
+
+def _reset_state() -> None:
+    """Clear the module-level ``_active`` dict between tests so prior
+    activations don't bleed across cases."""
+    from notify_bridge_server.services import diagnostic_mode as svc
+
+    svc._active.clear()
+
+
+@pytest.fixture(autouse=True)
+def _stub_db_read(monkeypatch):
+    """Default every test to a fixed empty ``log_levels`` snapshot.
+
+    A test that wants to exercise DB-override precedence overrides this
+    fixture by re-patching the function explicitly.
+    """
+    async def fake() -> str:
+        return ""
+
+    from notify_bridge_server.services import diagnostic_mode as svc
+
+    monkeypatch.setattr(svc, "_read_db_log_levels", fake)
+
+
+def _patch_db_read(monkeypatch, value: str) -> None:
+    """Override the auto-applied fixture for a single test that needs a
+    non-empty ``log_levels`` value."""
+    async def fake() -> str:
+        return value
+
+    from notify_bridge_server.services import diagnostic_mode as svc
+
+    monkeypatch.setattr(svc, "_read_db_log_levels", fake)
+
+
+# ---------------------------------------------------------------------------
+# Unit tests — service module
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.asyncio
+async def test_set_diagnostic_applies_debug_immediately(tmp_data_dir) -> None:  # noqa: ARG001
+    from notify_bridge_server.services.diagnostic_mode import set_diagnostic
+
+    _reset_state()
+    module = "notify_bridge_core.notifications.telegram.client"
+
+    entry = await set_diagnostic(module, duration_minutes=30)
+
+    assert entry["module"] == module
+    assert entry["current_level"] == "DEBUG"
+    assert entry["remaining_seconds"] > 60 * 29
+    assert logging.getLogger(module).level == logging.DEBUG
+
+
+@pytest.mark.asyncio
+async def test_set_diagnostic_rejects_unlisted_module(tmp_data_dir) -> None:  # noqa: ARG001
+    """Only the documented namespaces should be flippable from the UI."""
+    from notify_bridge_server.services.diagnostic_mode import set_diagnostic
+
+    _reset_state()
+    with pytest.raises(ValueError, match="allowlist"):
+        await set_diagnostic("some_random_third_party", 30)
+
+
+@pytest.mark.asyncio
+async def test_set_diagnostic_rejects_root_logger(tmp_data_dir) -> None:  # noqa: ARG001
+    """The empty string would target root — explicitly disallowed."""
+    from notify_bridge_server.services.diagnostic_mode import set_diagnostic
+
+    _reset_state()
+    with pytest.raises(ValueError, match="allowlist"):
+        await set_diagnostic("", 30)
+
+
+@pytest.mark.asyncio
+async def test_set_diagnostic_rejects_unreasonable_durations(tmp_data_dir) -> None:  # noqa: ARG001
+    from notify_bridge_server.services.diagnostic_mode import set_diagnostic
+
+    _reset_state()
+    with pytest.raises(ValueError, match="duration_minutes"):
+        await set_diagnostic("notify_bridge_core", 0)
+    with pytest.raises(ValueError, match="duration_minutes"):
+        await set_diagnostic("notify_bridge_core", 9999)
+
+
+@pytest.mark.asyncio
+async def test_baseline_from_db_override(tmp_data_dir, monkeypatch) -> None:  # noqa: ARG001
+    """``log_levels`` setting wins over the noisy-library default."""
+    from notify_bridge_server.services.diagnostic_mode import set_diagnostic
+
+    _reset_state()
+    _patch_db_read(monkeypatch, "sqlalchemy.engine=ERROR")
+    entry = await set_diagnostic("sqlalchemy.engine", duration_minutes=15)
+    assert entry["baseline_level"] == "ERROR"
+
+
+@pytest.mark.asyncio
+async def test_baseline_from_noisy_default(tmp_data_dir) -> None:  # noqa: ARG001
+    """No DB override falls through to the curated noisy-lib quiet list."""
+    from notify_bridge_server.services.diagnostic_mode import set_diagnostic
+
+    _reset_state()
+    entry = await set_diagnostic("sqlalchemy.engine", duration_minutes=15)
+    assert entry["baseline_level"] == "WARNING"
+
+
+@pytest.mark.asyncio
+async def test_baseline_prefix_walks_for_submodule(tmp_data_dir, monkeypatch) -> None:  # noqa: ARG001
+    """A sub-logger like ``sqlalchemy.engine.Engine`` inherits its parent's
+    noisy-default level (WARNING), not the root INFO."""
+    from notify_bridge_server.services.diagnostic_mode import set_diagnostic
+
+    _reset_state()
+    entry = await set_diagnostic(
+        "sqlalchemy.engine.Engine", duration_minutes=15,
+    )
+    assert entry["baseline_level"] == "WARNING"
+
+
+@pytest.mark.asyncio
+async def test_baseline_prefix_walks_for_db_override(tmp_data_dir, monkeypatch) -> None:  # noqa: ARG001
+    """An explicit ``log_levels`` entry covers all sub-loggers below it."""
+    from notify_bridge_server.services.diagnostic_mode import set_diagnostic
+
+    _reset_state()
+    _patch_db_read(
+        monkeypatch, "notify_bridge_core.notifications=ERROR",
+    )
+    entry = await set_diagnostic(
+        "notify_bridge_core.notifications.telegram.client",
+        duration_minutes=15,
+    )
+    assert entry["baseline_level"] == "ERROR"
+
+
+@pytest.mark.asyncio
+async def test_set_diagnostic_twice_replaces_schedule(tmp_data_dir) -> None:  # noqa: ARG001
+    """Clicking the button twice extends, doesn't stack."""
+    from notify_bridge_server.services.diagnostic_mode import (
+        list_active, set_diagnostic,
+    )
+
+    _reset_state()
+    module = "notify_bridge_core"
+    await set_diagnostic(module, 5)
+    first_active = list_active()
+    assert len(first_active) == 1
+    first_expires = first_active[0]["expires_at"]
+
+    # Sleep just long enough to make the timestamps distinct, then re-set.
+    await asyncio.sleep(0.05)
+    await set_diagnostic(module, 60)
+    second_active = list_active()
+    assert len(second_active) == 1
+    assert second_active[0]["expires_at"] != first_expires
+    assert second_active[0]["remaining_seconds"] > 30 * 60
+
+
+@pytest.mark.asyncio
+async def test_manual_revert_restores_baseline(tmp_data_dir) -> None:  # noqa: ARG001
+    from notify_bridge_server.services.diagnostic_mode import (
+        revert_diagnostic, set_diagnostic,
+    )
+
+    _reset_state()
+    module = "sqlalchemy.engine"
+    await set_diagnostic(module, 30)
+    assert logging.getLogger(module).level == logging.DEBUG
+
+    reverted = await revert_diagnostic(module)
+    assert reverted is True
+    # noisy-library default is WARNING (30)
+    assert logging.getLogger(module).level == logging.WARNING
+
+
+@pytest.mark.asyncio
+async def test_revert_reads_db_at_revert_time(tmp_data_dir, monkeypatch) -> None:  # noqa: ARG001
+    """Editing ``log_levels`` while the override is active is honored when
+    the revert fires — not the snapshot taken at activation time."""
+    from notify_bridge_server.services.diagnostic_mode import (
+        revert_diagnostic, set_diagnostic,
+    )
+
+    _reset_state()
+    module = "sqlalchemy.engine"
+    _patch_db_read(monkeypatch, "")
+    await set_diagnostic(module, 30)
+
+    # Operator edits the setting mid-window — bump to ERROR.
+    _patch_db_read(monkeypatch, "sqlalchemy.engine=ERROR")
+
+    assert await revert_diagnostic(module) is True
+    assert logging.getLogger(module).level == logging.ERROR
+
+
+@pytest.mark.asyncio
+async def test_manual_revert_no_active_returns_false(tmp_data_dir) -> None:  # noqa: ARG001
+    from notify_bridge_server.services.diagnostic_mode import revert_diagnostic
+
+    _reset_state()
+    assert await revert_diagnostic("notify_bridge_core") is False
+
+
+@pytest.mark.asyncio
+async def test_auto_revert_after_window_elapses(tmp_data_dir) -> None:  # noqa: ARG001
+    """The asyncio-timer fallback fires near ``expires_at`` and restores
+    the baseline. Uses a sub-second window so the test stays fast.
+
+    Bypasses ``set_diagnostic`` (which clamps to minutes) by populating the
+    ``_active`` dict and calling ``_schedule_revert`` directly.
+    """
+    from notify_bridge_server.services import diagnostic_mode as svc
+
+    _reset_state()
+    module = "sqlalchemy.engine"
+    baseline = svc._baseline_for(module, db_log_levels="")
+    now = datetime.now(timezone.utc)
+    expires = now + timedelta(seconds=0.3)
+    logging.getLogger(module).setLevel("DEBUG")
+    svc._active[module] = svc._Override(
+        module=module,
+        baseline_level=baseline,
+        activated_at=now,
+        expires_at=expires,
+    )
+    svc._schedule_revert(module, expires)
+
+    await asyncio.sleep(0.5)
+
+    assert module not in svc._active
+    assert logging.getLogger(module).level == logging.WARNING
+
+
+@pytest.mark.asyncio
+async def test_fallback_task_retained_until_fire(tmp_data_dir) -> None:  # noqa: ARG001
+    """The asyncio fallback path must keep a strong reference to its task
+    so CPython doesn't GC it before the timer fires."""
+    from notify_bridge_server.services import diagnostic_mode as svc
+
+    _reset_state()
+    when = datetime.now(timezone.utc) + timedelta(seconds=10)
+    svc._schedule_revert("notify_bridge_core", when)
+    # The retainer set should hold exactly the task we just queued.
+    assert len(svc._bg_tasks) == 1
+    # Cancel it to clean up; the done-callback will drop it.
+    for task in list(svc._bg_tasks):
+        task.cancel()
+    await asyncio.sleep(0)
+
+
+def test_list_active_omits_and_sweeps_expired(tmp_data_dir) -> None:  # noqa: ARG001
+    """Expired entries are filtered AND removed so a delayed scheduler
+    fire doesn't leave ghost rows in ``_active`` forever."""
+    from notify_bridge_server.services import diagnostic_mode as svc
+
+    _reset_state()
+    past = datetime.now(timezone.utc) - timedelta(minutes=1)
+    svc._active["sqlalchemy.engine"] = svc._Override(
+        module="sqlalchemy.engine",
+        baseline_level="WARNING",
+        activated_at=past - timedelta(minutes=30),
+        expires_at=past,
+    )
+    assert svc.list_active() == []
+    assert "sqlalchemy.engine" not in svc._active
+
+
+@pytest.mark.asyncio
+async def test_revert_all_clears_every_override(tmp_data_dir) -> None:  # noqa: ARG001
+    from notify_bridge_server.services.diagnostic_mode import (
+        list_active, revert_all, set_diagnostic,
+    )
+
+    _reset_state()
+    await set_diagnostic("notify_bridge_core", 30)
+    await set_diagnostic("sqlalchemy.engine", 30)
+    assert len(list_active()) == 2
+
+    count = await revert_all()
+    assert count == 2
+    assert list_active() == []
+
+
+# ---------------------------------------------------------------------------
+# API smoke — bypasses auth via dependency_overrides
+# ---------------------------------------------------------------------------
+
+
+@pytest.fixture
+def _admin_client(tmp_data_dir):  # noqa: ARG001
+    """Yield a TestClient with ``require_admin`` short-circuited.
+
+    Keeps the auth-flow's SQLAlchemy/greenlet issues out of the picture
+    while still exercising the FastAPI router, path converters, and the
+    ``HTTPException`` paths.
+    """
+    _reset_state()
+    from notify_bridge_server.auth.dependencies import require_admin
+    from notify_bridge_server.database.models import User
+    from notify_bridge_server.main import app
+
+    fake = User(
+        id=1, username="admin",
+        password_hash="x", role="admin", token_version=0,
+    )
+    app.dependency_overrides[require_admin] = lambda: fake
+
+    with TestClient(app) as client:
+        yield client
+
+    app.dependency_overrides.pop(require_admin, None)
+    _reset_state()
+
+
+def test_api_post_rejects_unlisted_module_with_400(_admin_client: TestClient) -> None:
+    resp = _admin_client.post(
+        "/api/settings/diagnostic-mode",
+        json={"module": "evil.namespace", "duration_minutes": 15},
+    )
+    assert resp.status_code == 400
+    assert "allowlist" in resp.json().get("detail", "")
+
+
+def test_api_post_rejects_huge_duration_with_400(_admin_client: TestClient) -> None:
+    resp = _admin_client.post(
+        "/api/settings/diagnostic-mode",
+        json={"module": "notify_bridge_core", "duration_minutes": 99999},
+    )
+    assert resp.status_code == 400
+
+
+def test_api_delete_unknown_returns_404(_admin_client: TestClient) -> None:
+    resp = _admin_client.delete(
+        "/api/settings/diagnostic-mode/notify_bridge_core",
+    )
+    assert resp.status_code == 404
+
+
+def test_api_delete_handles_dotted_module_path(_admin_client: TestClient) -> None:
+    """``{module:path}`` lets dotted names survive URL routing intact."""
+    target = "notify_bridge_core.notifications.telegram.client"
+    _admin_client.post(
+        "/api/settings/diagnostic-mode",
+        json={"module": target, "duration_minutes": 15},
+    )
+    resp = _admin_client.delete(f"/api/settings/diagnostic-mode/{target}")
+    assert resp.status_code == 200, resp.text
+    assert resp.json()["reverted"] == target
@@ -0,0 +1,357 @@
+"""Aggregation of per-target dispatch results into ``EventLog.details``.
+
+Covers ``summarize_dispatch_results`` and ``attach_summary_in_place``.
+The async ``record_dispatch_summary_async`` is exercised through the
+in-process update path; the watcher-style flow is covered indirectly via
+the full server tests.
+"""
+
+from __future__ import annotations
+
+from typing import Any
+
+import pytest
+
+
+def test_summarize_empty_returns_empty(tmp_data_dir) -> None:  # noqa: ARG001
+    """Empty results = nothing to summarize. Callers can short-circuit
+    on the falsy return so a row with zero dispatches doesn't get a
+    misleading zero-counts block."""
+    from notify_bridge_server.services.dispatch_summary import (
+        summarize_dispatch_results,
+    )
+
+    assert summarize_dispatch_results([]) == {}
+
+
+def test_summarize_all_success_no_errors_block(tmp_data_dir) -> None:  # noqa: ARG001
+    from notify_bridge_server.services.dispatch_summary import (
+        summarize_dispatch_results,
+    )
+
+    results = [
+        {"success": True, "message_id": 1},
+        {"success": True, "message_id": 2},
+    ]
+    summary = summarize_dispatch_results(results)
+    assert summary["targets_attempted"] == 2
+    assert summary["targets_succeeded"] == 2
+    assert summary["targets_failed"] == 0
+    assert "errors" not in summary
+    assert "media" not in summary
+
+
+def test_summarize_mixed_records_only_failures(tmp_data_dir) -> None:  # noqa: ARG001
+    from notify_bridge_server.services.dispatch_summary import (
+        summarize_dispatch_results,
+    )
+
+    results = [
+        {"success": True},
+        {"success": False, "error": "Bad Request: chat not found"},
+        {"success": False, "error": "timeout"},
+    ]
+    summary = summarize_dispatch_results(results)
+    assert summary["targets_succeeded"] == 1
+    assert summary["targets_failed"] == 2
+    assert summary["errors"] == [
+        {"index": 1, "error": "Bad Request: chat not found"},
+        {"index": 2, "error": "timeout"},
+    ]
+
+
+def test_summarize_media_counts_aggregate(tmp_data_dir) -> None:  # noqa: ARG001
+    """Media counts from a Telegram media-group success are merged."""
+    from notify_bridge_server.services.dispatch_summary import (
+        summarize_dispatch_results,
+    )
+
+    results = [
+        {
+            "success": True,
+            "delivered_count": 5,
+            "skipped_count": 1,
+            "failed_count": 0,
+        },
+        {
+            "success": True,
+            "delivered_count": 3,
+            "skipped_count": 0,
+            "failed_count": 0,
+        },
+    ]
+    summary = summarize_dispatch_results(results)
+    assert summary["media"] == {"delivered": 8, "skipped": 1, "failed": 0}
+
+
+def test_summarize_sub_errors_carry_target_index(tmp_data_dir) -> None:  # noqa: ARG001
+    """Per-chunk/per-item failures from a partial media-group send are flattened."""
+    from notify_bridge_server.services.dispatch_summary import (
+        summarize_dispatch_results,
+    )
+
+    results = [
+        {"success": True, "delivered_count": 1, "skipped_count": 0, "failed_count": 0},
+        {
+            "success": True,  # group landed but with partial failure
+            "delivered_count": 2,
+            "skipped_count": 0,
+            "failed_count": 1,
+            "errors": [
+                {"kind": "chunk", "chunk": 1, "error": "Bad Request: ..."},
+                {"kind": "item", "chunk": 1, "item_index": 2, "error": "media not found"},
+            ],
+        },
+    ]
+    summary = summarize_dispatch_results(results)
+    assert summary["media_errors"] == [
+        {"target_index": 1, "kind": "chunk", "chunk": 1, "error": "Bad Request: ..."},
+        {
+            "target_index": 1,
+            "kind": "item",
+            "chunk": 1,
+            "item_index": 2,
+            "error": "media not found",
+        },
+    ]
+
+
+def test_summarize_caps_errors_and_reports_truncation(tmp_data_dir) -> None:  # noqa: ARG001
+    from notify_bridge_server.services.dispatch_summary import (
+        summarize_dispatch_results,
+    )
+
+    results: list[dict[str, Any]] = [
+        {"success": False, "error": f"err {i}"} for i in range(25)
+    ]
+    summary = summarize_dispatch_results(results)
+    assert len(summary["errors"]) == 20
+    assert summary["errors_truncated"] == 5
+
+
+def test_summarize_trims_long_error_messages(tmp_data_dir) -> None:  # noqa: ARG001
+    """A pathological multi-KB error string is bounded so the row stays small."""
+    from notify_bridge_server.services.dispatch_summary import (
+        summarize_dispatch_results,
+    )
+
+    long_err = "x" * 2000
+    results = [{"success": False, "error": long_err}]
+    summary = summarize_dispatch_results(results)
+    persisted = summary["errors"][0]["error"]
+    assert persisted.endswith("…[truncated]")
+    # 500 char body + the explicit "…[truncated]" marker.
+    assert len(persisted) == 500 + len("…[truncated]")
+
+
+@pytest.mark.asyncio
+async def test_attach_summary_in_place_mutates_details_dict(tmp_data_dir) -> None:  # noqa: ARG001
+    """In-session call merges the summary without losing original keys."""
+    from notify_bridge_server.database.models import EventLog
+    from notify_bridge_server.services.dispatch_summary import (
+        attach_summary_in_place,
+    )
+
+    row = EventLog(
+        event_type="assets_added",
+        collection_id="abc",
+        collection_name="Album",
+        details={"provider_type": "immich", "added_count": 3},
+    )
+    attach_summary_in_place(row, [{"success": True}, {"success": False, "error": "x"}])
+    assert row.details["provider_type"] == "immich"
+    assert row.details["added_count"] == 3
+    assert row.details["dispatch_summary"] == {
+        "targets_attempted": 2,
+        "targets_succeeded": 1,
+        "targets_failed": 1,
+        "errors": [{"index": 1, "error": "x"}],
+    }
+
+
+@pytest.mark.asyncio
+async def test_attach_summary_in_place_with_no_results_is_noop(tmp_data_dir) -> None:  # noqa: ARG001
+    """Empty results → no ``dispatch_summary`` key written. Original
+    details survive untouched."""
+    from notify_bridge_server.database.models import EventLog
+    from notify_bridge_server.services.dispatch_summary import (
+        attach_summary_in_place,
+    )
+
+    row = EventLog(
+        event_type="assets_added",
+        collection_id="abc",
+        collection_name="Album",
+        details={"k": "v"},
+    )
+    attach_summary_in_place(row, [])
+    assert row.details == {"k": "v"}
+    assert "dispatch_summary" not in row.details
+
+
+def test_summarize_handles_malformed_sub_errors(tmp_data_dir) -> None:  # noqa: ARG001
+    """A non-dict sub-error entry is silently skipped, not crashed on."""
+    from notify_bridge_server.services.dispatch_summary import (
+        summarize_dispatch_results,
+    )
+
+    results = [
+        {
+            "success": True,
+            "delivered_count": 1,
+            "errors": ["not a dict", {"kind": "item", "error": "real"}],
+        },
+    ]
+    summary = summarize_dispatch_results(results)
+    assert summary["media_errors"] == [
+        {"target_index": 0, "kind": "item", "error": "real"}
+    ]
+
+
+# ---------------------------------------------------------------------------
+# Integration: real dispatcher output shape from ``_aggregate_results``
+# ---------------------------------------------------------------------------
+#
+# The dispatcher wraps each Telegram fan-out in a per-target envelope:
+#
+#   {
+#     "success": True,
+#     "receivers": 2,
+#     "successes": 2,
+#     "failures": 0,
+#     "results": [<per-receiver dict>, ...],   # ← media counts live HERE
+#   }
+#
+# These tests use that exact shape so a future refactor of the dispatcher
+# doesn't silently zero out the dashboard's ``dispatch_summary.media``
+# block. Earlier versions of this file passed leaf dicts directly, which
+# masked the wrong-shape read in production.
+
+
+def test_summarize_drills_into_aggregated_per_receiver_dicts(tmp_data_dir) -> None:  # noqa: ARG001
+    """Media counts on per-receiver leaves are summed across receivers."""
+    from notify_bridge_server.services.dispatch_summary import (
+        summarize_dispatch_results,
+    )
+
+    # Two targets, each with two Telegram receivers.
+    results = [
+        {
+            "success": True,
+            "receivers": 2,
+            "successes": 2,
+            "failures": 0,
+            "results": [
+                {
+                    "success": True,
+                    "message_id": 100,
+                    "media_delivered_count": 5,
+                    "media_skipped_count": 1,
+                    "media_failed_count": 0,
+                },
+                {
+                    "success": True,
+                    "message_id": 101,
+                    "media_delivered_count": 3,
+                    "media_skipped_count": 0,
+                    "media_failed_count": 0,
+                },
+            ],
+        },
+    ]
+    summary = summarize_dispatch_results(results)
+    assert summary["media"] == {"delivered": 8, "skipped": 1, "failed": 0}
+
+
+def test_summarize_collects_aggregated_media_errors_with_receiver_index(
+    tmp_data_dir,  # noqa: ARG001
+) -> None:
+    """Per-chunk / per-item media errors carry both target AND receiver index."""
+    from notify_bridge_server.services.dispatch_summary import (
+        summarize_dispatch_results,
+    )
+
+    results = [
+        {
+            "success": True,
+            "receivers": 1,
+            "successes": 1,
+            "failures": 0,
+            "results": [
+                {
+                    "success": True,
+                    "message_id": 200,
+                    "media_delivered_count": 2,
+                    "media_failed_count": 1,
+                    "media_errors": [
+                        {"kind": "chunk", "chunk": 1, "error": "Bad Request"},
+                        {"kind": "item", "chunk": 1, "item_index": 2,
+                         "error": "media not found"},
+                    ],
+                },
+            ],
+        },
+    ]
+    summary = summarize_dispatch_results(results)
+    assert summary["media_errors"] == [
+        {"target_index": 0, "receiver_index": 0, "kind": "chunk",
+         "chunk": 1, "error": "Bad Request"},
+        {"target_index": 0, "receiver_index": 0, "kind": "item",
+         "chunk": 1, "item_index": 2, "error": "media not found"},
+    ]
+
+
+def test_summarize_aggregated_target_errors_list_is_safely_ignored(
+    tmp_data_dir,  # noqa: ARG001
+) -> None:
+    """``_aggregate_results`` stamps a flat ``errors: [str, ...]`` at the
+    target level on failure. The summarizer must not try to treat the
+    strings as structured sub-errors."""
+    from notify_bridge_server.services.dispatch_summary import (
+        summarize_dispatch_results,
+    )
+
+    results = [
+        {
+            "success": False,
+            "receivers": 2,
+            "successes": 0,
+            "failures": 2,
+            "error": "All receivers failed",
+            "errors": ["chat_not_found", "blocked_by_user"],
+            "results": [
+                {"success": False, "error": "chat_not_found"},
+                {"success": False, "error": "blocked_by_user"},
+            ],
+        },
+    ]
+    summary = summarize_dispatch_results(results)
+    assert summary["targets_failed"] == 1
+    assert summary["errors"] == [
+        {"index": 0, "error": "All receivers failed"},
+    ]
+    # The string list at the target level is ignored — the per-receiver
+    # errors are already represented by the target-level error message.
+    assert "media_errors" not in summary
+    assert "media" not in summary
+
+
+@pytest.mark.asyncio
+async def test_attach_summary_in_place_skips_when_already_set(
+    tmp_data_dir,  # noqa: ARG001
+) -> None:
+    """Caller-set ``dispatch_summary`` wins — the same "caller pins"
+    rule that ``enrich_details_with_correlation`` follows."""
+    from notify_bridge_server.database.models import EventLog
+    from notify_bridge_server.services.dispatch_summary import (
+        attach_summary_in_place,
+    )
+
+    row = EventLog(
+        event_type="assets_added",
+        collection_id="abc",
+        collection_name="Album",
+        details={"dispatch_summary": {"pinned": True}},
+    )
+    attach_summary_in_place(row, [{"success": True}])
+    assert row.details["dispatch_summary"] == {"pinned": True}
@@ -0,0 +1,158 @@
+"""Request-ID middleware + EventLog dispatch_id correlation.
+
+Covers two halves of the same correlation story:
+
+* ``RequestContextMiddleware`` generates / accepts an inbound request id,
+  binds it onto the log-context ContextVar for the duration of the request,
+  and echoes it back as the ``X-Request-Id`` response header.
+* ``enrich_details_with_correlation`` merges the active ``dispatch_id`` and
+  ``request_id`` into an ``EventLog.details`` dict so the persisted row can
+  be cross-referenced with the stderr log lines emitted during the same
+  dispatch.
+"""
+
+from __future__ import annotations
+
+import re
+
+import pytest
+from fastapi.testclient import TestClient
+
+
+_REQ_ID_PATTERN = re.compile(r"^req:[0-9a-f]{12}$")
+
+
+def test_response_carries_generated_request_id(tmp_data_dir) -> None:  # noqa: ARG001
+    """No inbound header → server generates ``req:<12 hex>`` and echoes it."""
+    from notify_bridge_server.main import app
+
+    with TestClient(app) as client:
+        resp = client.get("/api/health")
+        assert resp.status_code == 200
+        req_id = resp.headers.get("X-Request-Id")
+        assert req_id is not None
+        assert _REQ_ID_PATTERN.match(req_id), (
+            f"generated id {req_id!r} should match req:<12 hex>"
+        )
+
+
+def test_response_echoes_safe_inbound_request_id(tmp_data_dir) -> None:  # noqa: ARG001
+    """A well-formed inbound ``X-Request-Id`` is preserved unchanged."""
+    from notify_bridge_server.main import app
+
+    inbound = "abc-123_XYZ_trace"
+    with TestClient(app) as client:
+        resp = client.get("/api/health", headers={"X-Request-Id": inbound})
+        assert resp.status_code == 200
+        assert resp.headers.get("X-Request-Id") == inbound
+
+
+def test_colon_prefixed_inbound_id_is_replaced(tmp_data_dir) -> None:  # noqa: ARG001
+    """``:`` is reserved for server-minted ids — a colon in the inbound value
+    must trigger replacement so a client can't masquerade as ``disp:...``."""
+    from notify_bridge_server.main import app
+
+    with TestClient(app) as client:
+        resp = client.get(
+            "/api/health", headers={"X-Request-Id": "disp:fake12345678"},
+        )
+        assert resp.status_code == 200
+        echoed = resp.headers.get("X-Request-Id", "")
+        assert echoed != "disp:fake12345678"
+        assert _REQ_ID_PATTERN.match(echoed)
+
+
+@pytest.mark.parametrize(
+    "bad_value",
+    [
+        # CRLF injection attempt — would split log lines / inject headers.
+        "abc\r\ninjected: yes",
+        # Way too long.
+        "x" * 256,
+        # Disallowed characters.
+        "<script>alert(1)</script>",
+        # Empty after stripping.
+        "   ",
+    ],
+)
+def test_unsafe_inbound_request_id_is_replaced(
+    tmp_data_dir, bad_value: str,  # noqa: ARG001
+) -> None:
+    """An attacker-controlled id must not flow into logs verbatim."""
+    from notify_bridge_server.main import app
+
+    with TestClient(app) as client:
+        resp = client.get("/api/health", headers={"X-Request-Id": bad_value})
+        assert resp.status_code == 200
+        echoed = resp.headers.get("X-Request-Id", "")
+        assert echoed != bad_value, "unsafe id was passed through unchanged"
+        assert _REQ_ID_PATTERN.match(echoed), (
+            f"replacement id {echoed!r} should match req:<12 hex>"
+        )
+
+
+def test_enrich_details_merges_active_correlation_ids() -> None:
+    """Within a ``bind_log_context`` block, the helper copies the active ids."""
+    from notify_bridge_core.log_context import (
+        bind_log_context,
+        enrich_details_with_correlation,
+    )
+
+    with bind_log_context(
+        dispatch_id="disp:deadbeef0001",
+        request_id="req:cafecafe0002",
+    ):
+        result = enrich_details_with_correlation({"existing": "value"})
+
+    assert result == {
+        "existing": "value",
+        "dispatch_id": "disp:deadbeef0001",
+        "request_id": "req:cafecafe0002",
+    }
+
+
+def test_enrich_details_does_not_overwrite_explicit_keys() -> None:
+    """If the caller pre-set a correlation key, the helper leaves it alone."""
+    from notify_bridge_core.log_context import (
+        bind_log_context,
+        enrich_details_with_correlation,
+    )
+
+    with bind_log_context(dispatch_id="disp:newvalue00001"):
+        result = enrich_details_with_correlation({"dispatch_id": "disp:pinned"})
+
+    assert result["dispatch_id"] == "disp:pinned"
+
+
+def test_enrich_details_no_context_returns_copy() -> None:
+    """Outside any binding, the helper returns the dict unchanged but copied."""
+    from notify_bridge_core.log_context import enrich_details_with_correlation
+
+    original = {"key": "value"}
+    result = enrich_details_with_correlation(original)
+    assert result == original
+    # Mutating the result must not leak into the caller's dict.
+    result["extra"] = "added"
+    assert "extra" not in original
+
+
+def test_enrich_details_handles_none() -> None:
+    """``None`` is accepted (callers may build details lazily)."""
+    from notify_bridge_core.log_context import enrich_details_with_correlation
+
+    assert enrich_details_with_correlation(None) == {}
+
+
+def test_ensure_dispatch_id_generates_or_reuses() -> None:
+    """Fresh call produces a new id; inside a bind it returns the bound one."""
+    from notify_bridge_core.log_context import (
+        bind_log_context,
+        ensure_dispatch_id,
+    )
+
+    fresh = ensure_dispatch_id()
+    assert fresh.startswith("disp:")
+    assert len(fresh) == len("disp:") + 12
+
+    with bind_log_context(dispatch_id="disp:bound00000001"):
+        assert ensure_dispatch_id() == "disp:bound00000001"
@@ -0,0 +1,511 @@
+"""Tests for partial-delivery resilience in TelegramClient._send_media_group.
+
+Covers the three independent failure modes that previously aborted the
+whole send:
+
+1. **Per-item oversize** — one item over ``max_asset_data_size`` is
+   silently dropped; siblings still deliver. ``skipped_count`` reflects
+   the drop.
+2. **Combined chunk over Telegram's byte envelope** — pre-flight splits
+   into byte-budgeted sub-chunks, avoiding the 413 entirely.
+3. **Telegram-side chunk rejection after pre-flight** — fall back to
+   sending each item individually so partial delivery still happens.
+"""
+
+from __future__ import annotations
+
+from typing import Any
+from unittest.mock import patch
+
+import aiohttp
+import pytest
+from aioresponses import aioresponses
+
+from notify_bridge_core.notifications.telegram.client import (
+    TelegramClient,
+    _MediaItem,
+)
+from notify_bridge_core.notifications.telegram.media import (
+    TELEGRAM_MAX_GROUP_TOTAL_BYTES,
+)
+
+
+BOT_TOKEN = "TEST_TOKEN"
+TG = f"https://api.telegram.org/bot{BOT_TOKEN}"
+CHAT_ID = "-1001234567890"
+
+
+# ---------------------------------------------------------------------------
+# Pure unit tests for the new helpers
+# ---------------------------------------------------------------------------
+
+
+def _item(upload_bytes: int, media_type: str = "photo") -> _MediaItem:
+    """Build a synthetic _MediaItem with the given upload byte cost."""
+    if upload_bytes == 0:
+        return _MediaItem(
+            media_json={"type": media_type, "media": "file_id_cached"},
+            cache_info=None,
+            attachment=None,
+        )
+    return _MediaItem(
+        media_json={"type": media_type, "media": "attach://x"},
+        cache_info=("ck", media_type, None, upload_bytes),
+        attachment=("x", b"\x00" * upload_bytes, "f.jpg", "image/jpeg"),
+    )
+
+
+def test_split_empty_returns_empty() -> None:
+    assert TelegramClient._split_items_by_byte_budget([], 1000) == []
+
+
+def test_split_fits_in_single_group() -> None:
+    items = [_item(10), _item(20), _item(30)]
+    groups = TelegramClient._split_items_by_byte_budget(items, 100)
+    assert len(groups) == 1
+    assert sum(it.upload_bytes for it in groups[0]) == 60
+
+
+def test_split_packs_greedily_across_budget() -> None:
+    # Three items @ 40 each, budget 100 → groups of [40,40] and [40].
+    items = [_item(40), _item(40), _item(40)]
+    groups = TelegramClient._split_items_by_byte_budget(items, 100)
+    assert [len(g) for g in groups] == [2, 1]
+    assert sum(it.upload_bytes for it in groups[0]) == 80
+    assert sum(it.upload_bytes for it in groups[1]) == 40
+
+
+def test_split_oversized_single_item_kept_alone() -> None:
+    # An item that exceeds the budget on its own goes alone — Telegram
+    # gets to return a precise per-item error instead of silently
+    # dropping it client-side.
+    items = [_item(200)]
+    groups = TelegramClient._split_items_by_byte_budget(items, 100)
+    assert len(groups) == 1
+    assert groups[0][0].upload_bytes == 200
+
+
+def test_split_cached_items_are_free() -> None:
+    # Cached items contribute 0 bytes — they never force a split.
+    items = [_item(0), _item(0), _item(0)]
+    groups = TelegramClient._split_items_by_byte_budget(items, 10)
+    assert len(groups) == 1
+    assert len(groups[0]) == 3
+
+
+def test_split_mixes_cached_and_fresh_correctly() -> None:
+    # Cached items piggyback freely into whatever group they land in.
+    items = [_item(40), _item(0), _item(40), _item(0), _item(40)]
+    groups = TelegramClient._split_items_by_byte_budget(items, 100)
+    # [40, 0, 40] = 80 bytes (fits), next 0 fits, next 40 starts new.
+    assert [len(g) for g in groups] == [4, 1]
+
+
+def test_attach_caption_to_first_idempotent() -> None:
+    items = [_item(10), _item(10)]
+    TelegramClient._attach_caption_to_first(items, "Hello", "HTML")
+    assert items[0].media_json["caption"] == "Hello"
+    assert items[0].media_json["parse_mode"] == "HTML"
+    assert "caption" not in items[1].media_json
+    # Re-attaching overwrites in-place, doesn't duplicate.
+    TelegramClient._attach_caption_to_first(items, "Bye", "MarkdownV2")
+    assert items[0].media_json["caption"] == "Bye"
+    assert items[0].media_json["parse_mode"] == "MarkdownV2"
+
+
+def test_attach_caption_truncates_to_telegram_limit() -> None:
+    from notify_bridge_core.notifications.telegram.media import (
+        TELEGRAM_MAX_CAPTION_LENGTH,
+    )
+    items = [_item(10)]
+    long_caption = "A" * (TELEGRAM_MAX_CAPTION_LENGTH + 500)
+    TelegramClient._attach_caption_to_first(items, long_caption, "HTML")
+    assert len(items[0].media_json["caption"]) <= TELEGRAM_MAX_CAPTION_LENGTH
+
+
+def test_attach_caption_no_items_is_noop() -> None:
+    TelegramClient._attach_caption_to_first([], "x", "HTML")  # must not raise
+
+
+# ---------------------------------------------------------------------------
+# Integration tests for the full _send_media_group flow
+# ---------------------------------------------------------------------------
+
+
+def _png_bytes(size: int) -> bytes:
+    """Minimal valid PNG header + pad bytes to reach the requested size.
+
+    Required so ``check_photo_limits`` can identify the bytes as an
+    image rather than rejecting them. The PIL inspection only reads the
+    header so padding with zeros is harmless.
+    """
+    # 8-byte PNG signature + IHDR chunk for a 1x1 image (zero-padded
+    # to size). Pillow accepts this enough to read dimensions; the
+    # remaining bytes after IHDR are treated as trailing garbage.
+    sig = b"\x89PNG\r\n\x1a\n"
+    ihdr = bytes.fromhex(
+        # length=13, type=IHDR, w=1, h=1, depth=8, color=2 (RGB),
+        # compression=0, filter=0, interlace=0, crc=ignored
+        "0000000d49484452000000010000000108020000009077"
+        "53de"
+    )
+    base = sig + ihdr
+    if len(base) >= size:
+        return base[:size]
+    return base + b"\x00" * (size - len(base))
+
+
+async def _build_client(session: aiohttp.ClientSession) -> TelegramClient:
+    return TelegramClient(session, BOT_TOKEN)
+
+
+@pytest.mark.asyncio
+async def test_oversized_item_skipped_others_delivered() -> None:
+    """One item over max_asset_data_size is dropped; siblings still go."""
+    mock_url_big = "http://assets.test/big.jpg"
+    mock_url_a = "http://assets.test/a.jpg"
+    mock_url_b = "http://assets.test/b.jpg"
+    max_size = 1_000_000  # 1 MB cap
+
+    # We pre-load bytes via the asset dict so we don't have to mock the
+    # asset HTTP server. Telegram side is mocked so sendMediaGroup
+    # returns a clean 200 with two message IDs.
+    assets = [
+        {"type": "photo", "url": mock_url_big, "data": _png_bytes(2_000_000)},
+        {"type": "photo", "url": mock_url_a, "data": _png_bytes(50_000)},
+        {"type": "photo", "url": mock_url_b, "data": _png_bytes(50_000)},
+    ]
+
+    with aioresponses() as mocked:
+        mocked.post(
+            f"{TG}/sendMediaGroup",
+            payload={
+                "ok": True,
+                "result": [
+                    {"message_id": 100, "photo": [{"file_id": "fa"}]},
+                    {"message_id": 101, "photo": [{"file_id": "fb"}]},
+                ],
+            },
+        )
+        async with aiohttp.ClientSession() as sess:
+            client = await _build_client(sess)
+            result = await client._send_media_group(
+                CHAT_ID, assets, max_asset_data_size=max_size,
+            )
+
+    assert result["success"] is True
+    assert result["delivered_count"] == 2
+    assert result["skipped_count"] == 1
+    assert result["failed_count"] == 0
+    assert result["message_ids"] == [100, 101]
+
+
+@pytest.mark.asyncio
+async def test_byte_budget_splits_into_sub_chunks() -> None:
+    """Three items that combined exceed the byte budget pre-split into 2 calls."""
+    # Sized so 2 fit (sum < budget) but 3 don't (sum > budget) →
+    # [2 items, 1 item] split.
+    per_item = TELEGRAM_MAX_GROUP_TOTAL_BYTES // 3 + 1
+    # Use generated PNGs so check_photo_limits doesn't reject them as
+    # malformed; the size doesn't matter for the photo dimension check
+    # since the PNG header advertises 1x1.
+    assets = [
+        {"type": "photo", "url": f"http://t/{i}.jpg", "data": _png_bytes(per_item)}
+        for i in range(3)
+    ]
+
+    calls: list[int] = []
+
+    def _ok_response_for_n(n: int) -> dict[str, Any]:
+        return {
+            "ok": True,
+            "result": [
+                {"message_id": 200 + i, "photo": [{"file_id": f"x{i}"}]}
+                for i in range(n)
+            ],
+        }
+
+    with aioresponses() as mocked:
+        # We don't know item count per call up front, so respond with
+        # 10-item payloads (Telegram ignores trailing IDs we don't use).
+        mocked.post(
+            f"{TG}/sendMediaGroup",
+            payload=_ok_response_for_n(10),
+            repeat=True,
+        )
+        async with aiohttp.ClientSession() as sess:
+            client = await _build_client(sess)
+            # Disable photo limits — large PNG bodies trip dimension
+            # checks since we pad past the IHDR.
+            with patch(
+                "notify_bridge_core.notifications.telegram.client.check_photo_limits",
+                return_value=(False, None, None, None),
+            ):
+                result = await client._send_media_group(CHAT_ID, assets)
+
+            # Count outbound sendMediaGroup calls via the mock registry.
+            req_log = mocked.requests
+            send_calls = [
+                k for k in req_log if k[1].path.endswith("/sendMediaGroup")
+            ]
+            assert len(send_calls) >= 1
+            # At least one call → multiple requests recorded.
+            for k in send_calls:
+                calls.append(len(req_log[k]))
+
+    assert result["success"] is True
+    # Pre-split avoided 413 entirely.
+    assert result["failed_count"] == 0
+    # The 3 items went out across 2 sub-chunks (2+1).
+    assert sum(calls) == 2
+
+
+@pytest.mark.asyncio
+async def test_chunk_413_falls_back_to_per_item() -> None:
+    """If Telegram 413s a chunk anyway, retry each item individually."""
+    assets = [
+        {"type": "photo", "url": f"http://t/{i}.jpg", "data": _png_bytes(50_000)}
+        for i in range(2)
+    ]
+
+    with aioresponses() as mocked:
+        # The group send fails hard (Telegram-side rejection).
+        mocked.post(
+            f"{TG}/sendMediaGroup",
+            status=413,
+            payload={"ok": False, "error_code": 413, "description": "Request Entity Too Large"},
+        )
+        # Per-item fallback: two sendPhoto calls succeed.
+        mocked.post(
+            f"{TG}/sendPhoto",
+            payload={"ok": True, "result": {"message_id": 300, "photo": [{"file_id": "z0"}]}},
+        )
+        mocked.post(
+            f"{TG}/sendPhoto",
+            payload={"ok": True, "result": {"message_id": 301, "photo": [{"file_id": "z1"}]}},
+        )
+
+        async with aiohttp.ClientSession() as sess:
+            client = await _build_client(sess)
+            with patch(
+                "notify_bridge_core.notifications.telegram.client.check_photo_limits",
+                return_value=(False, None, None, None),
+            ):
+                result = await client._send_media_group(CHAT_ID, assets)
+
+    assert result["success"] is True
+    assert result["delivered_count"] == 2
+    assert result["failed_count"] == 0
+    # We still record the original chunk-level error for diagnostics,
+    # tagged with kind="chunk" so operators can distinguish cause from
+    # per-item consequences.
+    assert result["errors"] is not None
+    chunk_errors = [e for e in result["errors"] if e.get("kind") == "chunk"]
+    assert len(chunk_errors) == 1
+    assert "Request Entity Too Large" in str(chunk_errors[0]["error"])
+
+
+@pytest.mark.asyncio
+async def test_chunk_failure_with_per_item_partial_failure() -> None:
+    """Per-item fallback can itself partially fail; we report both."""
+    assets = [
+        {"type": "photo", "url": f"http://t/{i}.jpg", "data": _png_bytes(50_000)}
+        for i in range(2)
+    ]
+
+    with aioresponses() as mocked:
+        mocked.post(
+            f"{TG}/sendMediaGroup",
+            status=400,
+            payload={"ok": False, "error_code": 400, "description": "Bad Request"},
+        )
+        # First per-item OK, second fails.
+        mocked.post(
+            f"{TG}/sendPhoto",
+            payload={"ok": True, "result": {"message_id": 400, "photo": [{"file_id": "p0"}]}},
+        )
+        mocked.post(
+            f"{TG}/sendPhoto",
+            status=400,
+            payload={"ok": False, "error_code": 400, "description": "PHOTO_INVALID_DIMENSIONS"},
+        )
+
+        async with aiohttp.ClientSession() as sess:
+            client = await _build_client(sess)
+            with patch(
+                "notify_bridge_core.notifications.telegram.client.check_photo_limits",
+                return_value=(False, None, None, None),
+            ):
+                result = await client._send_media_group(CHAT_ID, assets)
+
+    # At least one item delivered → overall success.
+    assert result["success"] is True
+    assert result["delivered_count"] == 1
+    assert result["failed_count"] == 1
+    assert result["message_ids"] == [400]
+    # The failed item carries its index so operators can correlate
+    # with the original asset list.
+    item_errors = [e for e in result["errors"] if e.get("kind") == "item"]
+    assert len(item_errors) == 1
+    assert item_errors[0]["item_index"] == 1
+
+
+@pytest.mark.asyncio
+async def test_document_chunk_failure_falls_back_to_sendDocument() -> None:
+    """Document items must hit /sendDocument in fallback, not /sendVideo.
+
+    Regression guard: an earlier draft routed any non-photo through
+    _VIDEO_KIND, silently misrouting documents to the video endpoint
+    where Telegram would reject them with a confusing error.
+    """
+    assets = [
+        {"type": "document", "url": f"http://t/f{i}.bin", "data": b"\x00" * 50_000}
+        for i in range(2)
+    ]
+
+    with aioresponses() as mocked:
+        mocked.post(
+            f"{TG}/sendMediaGroup",
+            status=400,
+            payload={"ok": False, "error_code": 400, "description": "Bad Request"},
+        )
+        mocked.post(
+            f"{TG}/sendDocument",
+            payload={"ok": True, "result": {"message_id": 500, "document": {"file_id": "d0"}}},
+        )
+        mocked.post(
+            f"{TG}/sendDocument",
+            payload={"ok": True, "result": {"message_id": 501, "document": {"file_id": "d1"}}},
+        )
+
+        async with aiohttp.ClientSession() as sess:
+            client = await _build_client(sess)
+            result = await client._send_media_group(CHAT_ID, assets)
+
+        # No /sendVideo or /sendPhoto calls should have been made.
+        for key in mocked.requests:
+            assert "/sendVideo" not in key[1].path
+            assert "/sendPhoto" not in key[1].path
+
+    assert result["success"] is True
+    assert result["delivered_count"] == 2
+    assert result["message_ids"] == [500, 501]
+
+
+@pytest.mark.asyncio
+async def test_oversized_video_deferred_as_document_when_opted_in() -> None:
+    """Oversized videos are sent as documents post-chunk when the flag is set.
+
+    Telegram caps sendVideo at 50 MB but accepts up to 2 GB via
+    sendDocument. With ``send_large_videos_as_documents=True``, an
+    oversized video should be deferred out of the media group, then
+    delivered as its own document send instead of being silently
+    dropped. Other items in the same group must ride through the
+    normal sendMediaGroup path unaffected.
+    """
+    # 60 MB exceeds the 50 MB sendVideo cap but is under document's 2 GB cap.
+    oversized_video = b"\x00" * (60 * 1024 * 1024)
+    assets = [
+        {"type": "video", "url": "http://t/big.mp4", "data": oversized_video,
+         "content_type": "video/mp4"},
+        {"type": "photo", "url": "http://t/a.jpg", "data": _png_bytes(50_000)},
+        {"type": "photo", "url": "http://t/b.jpg", "data": _png_bytes(50_000)},
+    ]
+
+    with aioresponses() as mocked:
+        # The 2 photos ride out in sendMediaGroup together.
+        mocked.post(
+            f"{TG}/sendMediaGroup",
+            payload={
+                "ok": True,
+                "result": [
+                    {"message_id": 700, "photo": [{"file_id": "p0"}]},
+                    {"message_id": 701, "photo": [{"file_id": "p1"}]},
+                ],
+            },
+        )
+        # The deferred video lands as a document after the chunk.
+        mocked.post(
+            f"{TG}/sendDocument",
+            payload={"ok": True, "result": {"message_id": 702, "document": {"file_id": "d0"}}},
+        )
+
+        async with aiohttp.ClientSession() as sess:
+            client = await _build_client(sess)
+            with patch(
+                "notify_bridge_core.notifications.telegram.client.check_photo_limits",
+                return_value=(False, None, None, None),
+            ):
+                result = await client._send_media_group(
+                    CHAT_ID, assets,
+                    send_large_videos_as_documents=True,
+                )
+
+        # sendVideo must NOT have been called — the oversized video
+        # bypasses sendVideo entirely and goes straight to sendDocument.
+        for key in mocked.requests:
+            assert "/sendVideo" not in key[1].path
+
+    assert result["success"] is True
+    assert result["delivered_count"] == 3
+    assert result["skipped_count"] == 0
+    assert result["failed_count"] == 0
+    assert sorted(result["message_ids"]) == [700, 701, 702]
+
+
+@pytest.mark.asyncio
+async def test_oversized_video_skipped_when_flag_off() -> None:
+    """Without the opt-in flag, oversized videos are dropped (legacy behavior)."""
+    oversized_video = b"\x00" * (60 * 1024 * 1024)
+    assets = [
+        {"type": "video", "url": "http://t/big.mp4", "data": oversized_video,
+         "content_type": "video/mp4"},
+        {"type": "photo", "url": "http://t/a.jpg", "data": _png_bytes(50_000)},
+    ]
+
+    with aioresponses() as mocked:
+        mocked.post(
+            f"{TG}/sendMediaGroup",
+            payload={
+                "ok": True,
+                "result": [{"message_id": 800, "photo": [{"file_id": "p0"}]}],
+            },
+        )
+
+        async with aiohttp.ClientSession() as sess:
+            client = await _build_client(sess)
+            with patch(
+                "notify_bridge_core.notifications.telegram.client.check_photo_limits",
+                return_value=(False, None, None, None),
+            ):
+                result = await client._send_media_group(CHAT_ID, assets)
+
+        # No sendDocument call either — video is simply dropped.
+        for key in mocked.requests:
+            assert "/sendDocument" not in key[1].path
+
+    assert result["success"] is True
+    assert result["delivered_count"] == 1
+    assert result["skipped_count"] == 1
+
+
+@pytest.mark.asyncio
+async def test_all_items_oversized_returns_failure() -> None:
+    """When every asset is filtered before send, success is False."""
+    assets = [
+        {"type": "photo", "url": "http://t/big.jpg", "data": _png_bytes(5_000_000)}
+        for _ in range(2)
+    ]
+
+    async with aiohttp.ClientSession() as sess:
+        client = await _build_client(sess)
+        # No HTTP mock needed — nothing should reach Telegram.
+        result = await client._send_media_group(
+            CHAT_ID, assets, max_asset_data_size=1_000_000,
+        )
+
+    assert result["success"] is False
+    assert result["delivered_count"] == 0
+    assert result["skipped_count"] == 2
+    assert result["failed_count"] == 0
+    assert "filtered" in result["error"]
@@ -0,0 +1,249 @@
+"""Per-send Telegram options (`disable_notification`, `message_thread_id`).
+
+Verifies the ContextVar-based plumbing inside ``TelegramClient`` so the
+two new flags actually land in the request payloads at all four send
+paths (sendMessage, single-asset send, media-group, cache-hit POST) and
+that concurrent ``asyncio.gather`` fan-outs in the dispatcher don't leak
+options between tasks.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import json
+from typing import Any
+
+import pytest
+from aiohttp import FormData
+
+
+def test_telegram_receiver_factory_reads_new_fields() -> None:
+    """The receiver factory turns config-dict keys into typed fields."""
+    from notify_bridge_core.notifications.receiver import (
+        TelegramReceiver, build_receiver,
+    )
+
+    recv = build_receiver(
+        "telegram",
+        {
+            "chat_id": "12345",
+            "disable_notification": True,
+            "message_thread_id": "7",  # string form, common from JSON UI
+        },
+    )
+    assert isinstance(recv, TelegramReceiver)
+    assert recv.chat_id == "12345"
+    assert recv.disable_notification is True
+    assert recv.message_thread_id == 7
+
+
+def test_telegram_receiver_factory_defaults_when_missing() -> None:
+    """Missing keys default to off / general topic."""
+    from notify_bridge_core.notifications.receiver import (
+        TelegramReceiver, build_receiver,
+    )
+
+    recv = build_receiver("telegram", {"chat_id": "12345"})
+    assert isinstance(recv, TelegramReceiver)
+    assert recv.disable_notification is False
+    assert recv.message_thread_id is None
+
+
+@pytest.mark.parametrize(
+    "raw_thread, expected",
+    [
+        (None, None),
+        ("", None),
+        ("not-a-number", None),
+        ("42", 42),
+        (42, 42),
+        # ``0`` is Telegram's "general topic" sentinel — collapse to None
+        # so the Bot API just omits the field, matching the frontend's
+        # ``<= 0 → unset`` behaviour.
+        ("0", None),
+        (0, None),
+        (-5, None),
+        # bool would otherwise pass through as int(True)==1 / int(False)==0
+        # and silently route into topic #1; reject explicitly.
+        (True, None),
+        (False, None),
+    ],
+)
+def test_telegram_receiver_thread_id_coercion(raw_thread: Any, expected: Any) -> None:
+    from notify_bridge_core.notifications.receiver import build_receiver
+
+    recv = build_receiver(
+        "telegram",
+        {"chat_id": "1", "message_thread_id": raw_thread},
+    )
+    assert recv.message_thread_id == expected  # type: ignore[attr-defined]
+
+
+def test_apply_send_opts_to_payload_merges_when_bound() -> None:
+    """Inside ``_bind_send_options``, payload helper writes the two keys."""
+    from notify_bridge_core.notifications.telegram.client import (
+        _SendOptions,
+        _apply_send_opts_to_payload,
+        _bind_send_options,
+    )
+
+    payload: dict[str, Any] = {"chat_id": "1"}
+    with _bind_send_options(_SendOptions(disable_notification=True, message_thread_id=7)):
+        _apply_send_opts_to_payload(payload)
+    assert payload["disable_notification"] is True
+    assert payload["message_thread_id"] == 7
+
+
+def test_apply_send_opts_to_payload_omits_when_default() -> None:
+    """No bind = no flags written (Bot API treats omission as default)."""
+    from notify_bridge_core.notifications.telegram.client import (
+        _apply_send_opts_to_payload,
+    )
+
+    payload: dict[str, Any] = {"chat_id": "1"}
+    _apply_send_opts_to_payload(payload)
+    assert "disable_notification" not in payload
+    assert "message_thread_id" not in payload
+
+
+def test_apply_send_opts_to_form_merges_when_bound() -> None:
+    """Multipart payload helper writes the two fields when bound."""
+    from notify_bridge_core.notifications.telegram.client import (
+        _SendOptions,
+        _apply_send_opts_to_form,
+        _bind_send_options,
+    )
+
+    form = FormData()
+    with _bind_send_options(_SendOptions(disable_notification=True, message_thread_id=42)):
+        _apply_send_opts_to_form(form)
+
+    # aiohttp.FormData stores fields as ``(MultiDict{name, ...}, headers, value)``.
+    name_to_value = {}
+    for type_opts, _headers, value in form._fields:  # type: ignore[attr-defined]
+        name_to_value[type_opts.get("name")] = value
+    assert name_to_value.get("disable_notification") == "true"
+    assert name_to_value.get("message_thread_id") == "42"
+
+
+def test_bind_send_options_resets_on_exit() -> None:
+    """Token-reset semantics: the var is restored even after a raise."""
+    from notify_bridge_core.notifications.telegram.client import (
+        _SendOptions,
+        _bind_send_options,
+        _send_options_var,
+    )
+
+    default = _send_options_var.get()
+    try:
+        with _bind_send_options(_SendOptions(disable_notification=True)):
+            raise RuntimeError("boom")
+    except RuntimeError:
+        pass
+    assert _send_options_var.get() == default
+
+
+@pytest.mark.asyncio
+async def test_concurrent_binds_do_not_leak_between_tasks() -> None:
+    """Two ``asyncio.gather`` tasks see only their own bound options.
+
+    This is the load-bearing invariant for the dispatcher's per-receiver
+    fan-out: one chat with ``disable_notification=True`` must not silence
+    a peer chat in the same dispatch.
+    """
+    from notify_bridge_core.notifications.telegram.client import (
+        _SendOptions,
+        _apply_send_opts_to_payload,
+        _bind_send_options,
+    )
+
+    results: list[dict[str, Any]] = []
+
+    async def run_with(opts: _SendOptions, label: str) -> None:
+        payload: dict[str, Any] = {"label": label}
+        with _bind_send_options(opts):
+            # Yield to the loop to interleave with the sibling task.
+            await asyncio.sleep(0)
+            _apply_send_opts_to_payload(payload)
+        results.append(payload)
+
+    await asyncio.gather(
+        run_with(_SendOptions(disable_notification=True, message_thread_id=1), "silent"),
+        run_with(_SendOptions(disable_notification=False, message_thread_id=2), "loud"),
+    )
+
+    by_label = {r["label"]: r for r in results}
+    assert by_label["silent"].get("disable_notification") is True
+    assert by_label["silent"].get("message_thread_id") == 1
+    assert "disable_notification" not in by_label["loud"]  # False → omitted
+    assert by_label["loud"].get("message_thread_id") == 2
+
+
+@pytest.mark.asyncio
+async def test_send_message_passes_options_into_payload(monkeypatch) -> None:
+    """``send_message(disable_notification=True, message_thread_id=N)``
+    surfaces both keys in the JSON request body."""
+    from notify_bridge_core.notifications.telegram.client import TelegramClient
+
+    captured: dict[str, Any] = {}
+
+    class _FakeResp:
+        status = 200
+
+        async def json(self) -> dict[str, Any]:
+            return {"ok": True, "result": {"message_id": 99}}
+
+        async def __aenter__(self) -> "_FakeResp":
+            return self
+
+        async def __aexit__(self, *args: Any) -> None:
+            return None
+
+    class _FakeSession:
+        def post(self, url: str, *, json: dict[str, Any] | None = None, **_kw: Any) -> _FakeResp:
+            captured["url"] = url
+            captured["json"] = json
+            return _FakeResp()
+
+    client = TelegramClient(_FakeSession(), "TEST:token")  # type: ignore[arg-type]
+    result = await client.send_message(
+        chat_id="123",
+        text="hello",
+        disable_notification=True,
+        message_thread_id=5,
+    )
+    assert result["success"] is True
+    payload = captured["json"]
+    assert payload["disable_notification"] is True
+    assert payload["message_thread_id"] == 5
+
+
+@pytest.mark.asyncio
+async def test_send_message_without_options_omits_keys(monkeypatch) -> None:
+    """Default kwargs leave the payload Bot-API-clean."""
+    from notify_bridge_core.notifications.telegram.client import TelegramClient
+
+    captured: dict[str, Any] = {}
+
+    class _FakeResp:
+        status = 200
+
+        async def json(self) -> dict[str, Any]:
+            return {"ok": True, "result": {"message_id": 1}}
+
+        async def __aenter__(self) -> "_FakeResp":
+            return self
+
+        async def __aexit__(self, *args: Any) -> None:
+            return None
+
+    class _FakeSession:
+        def post(self, url: str, *, json: dict[str, Any] | None = None, **_kw: Any) -> _FakeResp:
+            captured["json"] = json
+            return _FakeResp()
+
+    client = TelegramClient(_FakeSession(), "TEST:token")  # type: ignore[arg-type]
+    await client.send_message(chat_id="123", text="hello")
+    payload = captured["json"]
+    assert "disable_notification" not in payload
+    assert "message_thread_id" not in payload