# Functional Review — Telegram, Immich, Logging (2026-05-28) Snapshot review of three subsystems, with prioritised improvement candidates. Pairs with [feature-backlog.md](feature-backlog.md) — items here are infrastructure that unlocks several backlog features. All citations are from the working tree at commit `85a8f1e` (master). Two files (`packages/core/src/notify_bridge_core/notifications/telegram/client.py`, `media.py`) had uncommitted changes at review time — see Telegram § "In-flight work". --- ## 1. Telegram infrastructure ### Telegram — what works well - Single chokepoint `TelegramClient` ([packages/core/src/notify_bridge_core/notifications/telegram/client.py](../../packages/core/src/notify_bridge_core/notifications/telegram/client.py)) covers text/photo/video/document/media-group, with 429-aware retry, parse-error retry, file_id cache, multi-bot per-token instances, polling + webhook modes, and bot-command registration. - CLAUDE.md rule #6 satisfied for the production paths. - Caption length, group sizing, parse-mode fallback all enforced. ### In-flight work Byte-budget sub-chunking for media groups (`TELEGRAM_MAX_GROUP_TOTAL_BYTES` in [media.py](../../packages/core/src/notify_bridge_core/notifications/telegram/media.py)) with per-item fallback inside `_send_media_group`. Logic is coherent; before commit, verify `_build_media_items` callers still match the new signature (caption no longer injected at fetch time). ### Gaps, ranked by user-visible value 1. **No inline keyboards / `callback_query` handlers** — zero infra for "Favorite / Archive / Dismiss" buttons on Immich notifications. Biggest UX unlock; prerequisite for several Immich smart actions. 2. **No edit-in-place** (`editMessageText` not wrapped). Pairs naturally with deferred dispatch / quiet hours coalescing — 5 separate "asset added" messages become 1 edited message. 3. **`disable_notification` (silent send) not exposed** — already a Telegram primitive; slots into the quiet-hours `silent` mode the backlog already mentions. 4. **`message_thread_id` (forum topics)** — single field per receiver; unblocks supergroup-with-topics users. 5. **Direct `TelegramClient(...)` constructions** in [api/telegram_bots.py:314,394,404,412](../../packages/server/src/notify_bridge_server/api/telegram_bots.py) bypass `get_telegram_client()` — violates CLAUDE.md rule #6 and skips the shared file_id cache. 6. **Per-command authorization** — `commands_enabled` is all-or-nothing per chat; no per-command allowlist or admin gate. 7. **Long-message splitting** — `send_message` silently truncates at 4096 ([client.py:492](../../packages/core/src/notify_bridge_core/notifications/telegram/client.py)). 8. **No parse-mode per target** — HTML hardcoded. --- ## 2. Immich ### Immich — what works well - Mature polling pipeline: incremental delta-fetch via `updatedAfter`, pending-asset tracking, fingerprint fast-path skip, fallback to full fetch on count-decrease ([providers/immich/provider.py](../../packages/core/src/notify_bridge_core/providers/immich/provider.py)). - Rich bot commands (status / albums / events / people / search / latest / random / favorites / summary / memory) with full asset context (CLAUDE.md rule #10 satisfied). - `auto_organize` action is well-shaped: AND person + smart-query union, exclusions, type/date/favorite filters, 500-asset batched add, idempotent diff against album asset_ids, dry-run, `ActionExecution` log. - Three scheduled features wired: periodic summaries, scheduled-asset delivery, Memory/On-This-Day (with native Immich memory API + fallback). ### Highest-leverage candidates 1. **Webhook ingestion** — `webhook_based=False` at [capabilities.py:46](../../packages/core/src/notify_bridge_core/providers/capabilities.py). Sub-second latency vs the current 5-min poll. New `/api/webhooks/immich/{secret}` route + parser + capability flip. 2. **Share-link expiry monitoring + auto-rotate action** — links silently break today; data is already fetched per event ([provider.py:541-569](../../packages/core/src/notify_bridge_core/providers/immich/provider.py)). 3. **Duplicate cluster digest** — Immich >= 1.100 `/api/duplicates` is unused; pairs with inline buttons for "merge / ignore 30d". 4. **Auto-favorite by person** (already in backlog) — smallest delta on the existing `auto_organize` executor. 5. **Per-person notification subscription** — tracker-config filter, reuses existing `asset.people` data. 6. **Album auto-curation from Inbox** — date-based target album name, move (not copy); needs the Immich move endpoint (currently we only `add_assets_to_album`). 7. **Storage / job-queue alerts** — `/api/server/stats` and `/api/jobs` unused; lightweight poll + threshold = "disk full" / "transcoding stalled" notifications. 8. **Smart-action infra polish** — descriptors are reusable, but the rule editor is JSON-shaped, action-run statistics aren't aggregated, and dry-run shows counts not the asset list. Address before adding 5 more action types. --- ## 3. Logging ### What's already in place In [logging_setup.py](../../packages/server/src/notify_bridge_server/logging_setup.py): - `dictConfig` with `JsonFormatter` (line-delimited JSON) toggleable via `NOTIFY_BRIDGE_LOG_FORMAT=json`. - `SecretMaskingFilter` redacts Telegram bot tokens + Authorization / api_key / password / refresh_token across `msg`, `exc_text`, `stack_info`. - ContextVar-driven record factory injects `request_id`, `command`, `chat_id`, `bot_id`, `dispatch_id` on every record. Text format: `[req=- cmd=- bot=- chat=- disp=-]`. - Per-module overrides via `NOTIFY_BRIDGE_LOG_LEVELS` env or DB `AppSetting`. Live runtime patch via `apply_log_levels()` — no restart. - Noisy libs pre-quieted (sqlalchemy, aiohttp, apscheduler, urllib3, asyncio, httpx, httpcore, PIL, uvicorn.access). Plus: - `EventLog` table with structured rows (event_type, status, assets_count, details JSON, FKs to tracker/provider/action/ command_tracker/bot), `event_log_retention_days=30` default, daily APScheduler cleanup `_cleanup_old_events` ([scheduler.py:332](../../packages/server/src/notify_bridge_server/services/scheduler.py)). - Prometheus counter `notify_bridge_event_log_total{status,event_type}`. - Frontend viewer with filters at [api/status.py](../../packages/server/src/notify_bridge_server/api/status.py). - `bind_log_context` actually used in: dispatcher (dispatch_id), telegram_poller (bot/chat/command/request_id), webhook commands. ### Gaps, ordered by debug-pain payoff 1. **No FastAPI request-ID middleware.** `request_id_var` is set only in webhook + Telegram poller paths. Every REST call from the SPA logs as `req=-`. Tiny middleware (read `X-Request-Id` or `uuid4()`, bind context, echo header) closes this whole-app blind spot. 2. **`dispatch_id` is in log lines but NOT persisted on the `EventLog` row.** Means you can find the failed row in the UI but can't grep stderr for the matching `disp=...`. Stash it in `details.dispatch_id` (no migration needed) — biggest cross-surface correlation win. 3. **HTTP access log is uvicorn default** (`access_log=not _cfg.debug` at [main.py:419](../../packages/server/src/notify_bridge_server/main.py)). Doesn't include `request_id`, latency, user, status as structured fields. Replace with a small `RequestLoggerMiddleware` that emits `method`, `path`, `status`, `latency_ms`, `request_id`. 4. **Telegram media-group failures log richly but aren't linked to the resulting `EventLog` row.** The dispatcher result-aggregation work in flight is the right place to dump `errors[]` into `EventLog.details.errors`. 5. **In-browser log access is missing.** EventLog rows are visible, but raw logger output requires container/SSH access. A bounded in-memory ring-buffer endpoint (admin-only, last N lines, filtered by context fields) would mean ~90% of triage stays in the UI. 6. **No "diagnostic mode" UI.** The runtime `apply_log_levels()` is great but only reachable through the app-settings JSON editor. A "Debug for 15 minutes: `notify_bridge_core.notifications.telegram.client`" button with auto-revert is a few-hours job. 7. **`EventLog.details` is freeform.** Frontend already destructures `dispatch_status`, `deferred_until`, `deferred_for_seconds`, `original_event_log_id` ([types.ts:238-261](../../frontend/src/lib/types.ts)). Define a typed `EventLogDetails` per `event_type` (Pydantic at the boundary) — prevents drift between providers. 8. **No log rotation** — `StreamHandler(sys.stderr)` only. Fine in containers, brittle on bare-metal. Optional `RotatingFileHandler` opt-in via env. 9. **No slow-query / outbound-HTTP timing logs.** `sqlalchemy.engine=WARNING` by default; no per-query duration log. Same for outbound calls to Immich / Telegram. A "duration_ms >= N" threshold logger would surface "why is this dispatch slow" without flipping global DEBUG. 10. **Action dry-run output is logger-only.** Could be streamed into the action editor. 11. **Poll-result not persisted.** Webhook payloads are logged ([api/webhook_logs.py](../../packages/server/src/notify_bridge_server/api/webhook_logs.py)), but Immich/Google-Photos poll cycles emit no "last poll: 0 changes / 245ms" row. A lightweight `provider_poll_log` (small table or ring buffer) would answer "is the poller actually running" without reading stderr. --- ## Recommended sequencing | # | Item | Status | Why first | | --- | --- | --- | --- | | 1 | Request-ID middleware + persist `dispatch_id` on `EventLog` | **SHIPPED 2026-05-28** | Unlocks the rest of the debug story; ~2 hours combined | | 2 | Finish in-flight Telegram byte-budget chunking + write `errors[]` into `EventLog.details` | **SHIPPED 2026-05-28** | Already half-done; aligns with #1 | | 3 | Telegram inline keyboards + `callback_query` handler | not started | Prereq for several Immich smart actions | | 4 | Telegram `disable_notification` + `message_thread_id` per target | **SHIPPED 2026-05-28** | Small, also feeds the open Quiet Hours v1 backlog item | | 5 | Immich webhook ingestion | not started | 5-min → sub-second; biggest user-facing latency win | | 6 | Immich share-link expiry + auto-rotate (using #3) | not started | Real silent-breakage today | | 7 | Diagnostic-mode UI (live log-level toggle with auto-revert) | **SHIPPED 2026-05-28** | Shifts triage to the browser | | 8 | Immich duplicate digest + auto-favorite by person | not started | Both ride on #3 | Items 1–4 are infrastructure that unlocks 5–8. Items 1, 2, 4 also smooth the Quiet Hours v1 / target-level windows that's top of the backlog — worth landing before that feature so quiet hours can dispatch through edited messages and silent sends from day one. --- ## Decision log - **2026-05-28** — Review completed. Starting work on item #1 (request-id middleware + persist `dispatch_id` on `EventLog`). - **2026-05-28** — Item #1 shipped. Summary of the change: - New helpers in [packages/core/src/notify_bridge_core/log_context.py](../../packages/core/src/notify_bridge_core/log_context.py): `ensure_dispatch_id()` (reuse existing or mint a new `disp:<12 hex>`) and `enrich_details_with_correlation(details)` (shallow-copy a details dict and merge active `dispatch_id` / `request_id` from the ContextVar snapshot). - New `RequestContextMiddleware` in [packages/server/src/notify_bridge_server/main.py](../../packages/server/src/notify_bridge_server/main.py) that reads inbound `X-Request-Id` (charset/length validated, `:` excluded so a client can't masquerade as a server-minted id), falls back to `req:<12 hex>`, binds the value via `bind_log_context`, and echoes it back as the response header. Added LAST so it's the outermost middleware. - Outer entry points now bind a `dispatch_id` via a thin wrapper function (`check_tracker`, `dispatch_provider_event`, `dispatch_scheduled_for_tracker`, `_process_row`, `run_action`). All 10 `EventLog(...)` creation sites wrap their `details=` payload in `enrich_details_with_correlation(...)`. - Switched `NotificationDispatcher.dispatch` to use `ensure_dispatch_id()` instead of inline `uuid.uuid4()`. - New tests in [packages/server/tests/test_request_correlation.py](../../packages/server/tests/test_request_correlation.py) (12 tests) covering header echo, charset validation, prefix- masquerade rejection, helper merge semantics. All 239 server tests green. - Reviewed by `python-reviewer` subagent (no CRITICAL/HIGH; 3 MEDIUM and 1 LOW addressed: PEP 8 imports moved to top of main.py; `RequestResponseEndpoint` type added to `dispatch`; `:` dropped from the request-id charset; shallow-copy caveat documented). - Live smoke verified: generated id `req:a9b9821f5aab` on plain request; safe inbound `my-trace-abc123` echoed unchanged; `disp:fake12345678` correctly replaced; watcher tick log lines now show distinct `disp=disp:` per tracker check. - **2026-05-28** — Item #2 shipped. Summary of the change: - Confirmed the in-flight Telegram byte-budget media-group chunking in [telegram/client.py](../../packages/core/src/notify_bridge_core/notifications/telegram/client.py) is complete (15/15 media-group tests pass). Deleted the now-unused `split_media_by_upload_size()` from [telegram/media.py](../../packages/core/src/notify_bridge_core/notifications/telegram/media.py). - New module [services/dispatch_summary.py](../../packages/server/src/notify_bridge_server/services/dispatch_summary.py) with `summarize_dispatch_results()` (aggregator), `attach_summary_in_place()` (in-session) and `record_dispatch_summary_async()` (post-commit). Captures `targets_attempted/succeeded/failed`, per-target `errors`, media-group `media{delivered,skipped,failed}` counts and `media_errors[]` from the new `TelegramClient._send_media_group` partial-failure path. Bounded: 20 errors / 20 media errors / 500-char message cap with explicit `…[truncated]` marker. - Wired at 4 dispatch sites: - `event_dispatch.py`: accumulates per-target results across all tracking-config groups, attaches summary in-session before commit. - `deferred_dispatch.py`: inlines summary into the new EventLog row's `details` for both `delivered_after_quiet_hours` and `deferred_then_failed` paths. - `scheduled_dispatch.py`: inlines summary into the cron-fire EventLog row's `details`. - `watcher.py`: follow-up `record_dispatch_summary_async` in a fresh session because the EventLog row was committed before dispatch. - Frontend type drift fixed: [types.ts](../../frontend/src/lib/types.ts) gets new `DispatchSummary`, `DispatchSummaryError`, `DispatchSummaryMediaError` interfaces plus `dispatch_id` / `request_id` / `dispatch_summary` keys on `EventLog.details`. - New tests in [tests/test_dispatch_summary.py](../../packages/server/tests/test_dispatch_summary.py) (10 tests): empty/all-success/mixed/media-counts/sub-errors/ truncation/long-message-trim/in-place attach/no-results no-op/ malformed sub-error. All 249 server tests green. - Reviewed by `python-reviewer` subagent (no CRITICAL; 2 HIGH + 3 MEDIUM addressed: `asyncio.CancelledError` re-raise in the best-effort catch; late `from .dispatch_summary import …` calls hoisted to top of each file; empty-results contract changed from "zero-count summary attached" to "no key written"; truncation marker upgraded to `…[truncated]` for operator clarity; `flag_modified` comment tightened). - Live smoke: backend restarts cleanly, watcher tick log lines continue showing `disp=disp:` correlation, no startup errors. - **2026-05-28** — Item #4 shipped. Summary of the change: - `TelegramReceiver` dataclass in [receiver.py](../../packages/core/src/notify_bridge_core/notifications/receiver.py) gains `disable_notification: bool = False` and `message_thread_id: int | None = None`. New `_coerce_telegram_thread_id` helper collapses Telegram's "general topic" sentinels (`0`, negatives, blanks, bools) to `None` so the Bot API just omits the field — matches the frontend's `<= 0 → unset` behaviour. - `TelegramClient` ([client.py](../../packages/core/src/notify_bridge_core/notifications/telegram/client.py)) gets a frozen `_SendOptions` + `_send_options_var` `ContextVar` pattern for the deep media paths (`_upload_media`, `_post_media_group`, `_send_from_cache`) that can't easily plumb kwargs through. `send_notification` binds the var; the 3 deep builders read it via `_apply_send_opts_to_payload` / `_apply_send_opts_to_form`. `send_message` is a leaf and just inlines its kwargs into the JSON body directly (no ContextVar needed there). - Dispatcher ([dispatcher.py](../../packages/core/src/notify_bridge_core/notifications/dispatcher.py)) passes `receiver.disable_notification` / `receiver.message_thread_id` into `client.send_message(...)` and `client.send_notification(...)`. - Frontend: new inline per-Telegram-receiver options panel in [ReceiverSection.svelte](../../frontend/src/routes/targets/ReceiverSection.svelte) triggered by a cog icon. Silent + thread-id indicators (bell-off icon, `#N` badge) on the row when set. `+page.svelte` handlers PUT the merged config to `/api/targets/{id}/receivers/{rid}`. 5 new i18n keys in `en.json` / `ru.json`. - New tests in [test_telegram_per_send_options.py](../../packages/server/tests/test_telegram_per_send_options.py) — 19 tests: factory + thread-id coercion table (including bool rejection and `0`/negative collapse), payload/form helper merge semantics, bind/reset under exceptions, concurrent-task isolation via `asyncio.gather`, end-to-end `send_message` payload assertions. All 270 server tests green. - Reviewed by `python-reviewer` subagent (no CRITICAL; 2 HIGH + 1 MEDIUM + 1 LOW addressed: dead ContextVar bind in `send_message` removed in favor of inline kwarg injection; re-entrant bind from `send_notification → send_message` auto-resolved by the same fix; `message_thread_id=0` collapse aligns backend with frontend; `_coerce_telegram_thread_id` rejects `bool` input). - Live smoke: backend restarts cleanly, no errors in startup log. - **2026-05-28** — Holistic `code-reviewer` pass over the full session diff (Features 1+2+4+7) caught a real HIGH that the per-feature Python-narrow reviews missed: ``summarize_dispatch_results`` in Feature 2 was reading the wrong dict shape. The dispatcher's ``_aggregate_results`` wraps per-receiver dicts under ``result["results"]`` and renames the Telegram media counts to ``media_delivered_count`` / ``media_skipped_count`` / ``media_failed_count``. The summarizer was reading the top-level ``delivered_count``, which is always absent in production aggregated output — meaning the ``dispatch_summary.media`` block was silently zero / missing for every real dispatch, and the ``media_errors`` list never populated. The unit tests passed because they hand-constructed leaf-shaped dicts that masked the wrong-shape read. Fixed in [dispatch_summary.py](../../packages/server/src/notify_bridge_server/services/dispatch_summary.py) by drilling into ``result["results"]`` per-receiver leaves and preferring ``media_*_count`` field names with fallback to the top-level names. Receiver index added to ``media_errors`` entries when drilling. New integration tests in [test_dispatch_summary.py](../../packages/server/tests/test_dispatch_summary.py) use the real dispatcher envelope so a future shape regression fails loudly. Also addressed MEDIUM findings: ``attach_summary_in_place`` / ``record_dispatch_summary_async`` now skip when a caller has pre-set ``dispatch_summary`` (mirrors the "caller wins" rule in ``enrich_details_with_correlation``); ``ReceiverSection.svelte`` props for the Telegram options panel are now optional + gated internally so the component stays portable; TS type for ``editingReceiverOptions.message_thread_id`` is ``number | ''`` with proper coercion in ``openEditReceiver``. 294/294 server tests green; backend restarts clean. - **2026-05-28** — Item #5 NOT shipped. Reason: Immich has no outbound webhook feature. The closest thing is `POST /sync/stream` (a server-streaming sync API designed for first-party Immich clients), and adopting it would (a) take 1-2 days of new subscription-manager infrastructure, (b) couple us to an API with no third-party stability contract, and (c) deliver 5-min → sub-second latency on photo notifications which is rarely critical. If someone later actually needs lower latency, dropping the default ``scan_interval`` is a 5-minute alternative that gets 80% of the win for 1% of the cost. Skipped in favour of #7. - **2026-05-28** — Item #7 shipped. Summary of the change: - New service module [services/diagnostic_mode.py](../../packages/server/src/notify_bridge_server/services/diagnostic_mode.py) with `set_diagnostic` / `revert_diagnostic` / `revert_all` / `list_active`. State is in-memory only — restart wipes overrides (`setup_logging` re-applies the DB baseline at boot). Modules go through an allowlist (`notify_bridge_*`, `sqlalchemy`, `aiohttp`, `apscheduler`, `urllib3`, `httpx`, `httpcore`, `asyncio`, `PIL`, `uvicorn`, `starlette`, `fastapi`) so a button press can't flip root. Duration clamped to `[1, 240]` minutes. Baseline derivation walks the dotted parents so `sqlalchemy.engine.Engine` correctly inherits `sqlalchemy.engine` → WARNING rather than falling through to root. - 3 new admin-only endpoints under `/api/settings/diagnostic-mode` in [api/app_settings.py](../../packages/server/src/notify_bridge_server/api/app_settings.py): `GET` (list active), `POST` (activate, 400 on invalid input), `DELETE /{module:path}` (manual revert, 404 if not active). - Auto-revert uses APScheduler's date trigger with `misfire_grace_time=60`, falling back to a strongly-referenced asyncio task (stored in a module-level set with `add_done_callback(discard)`) when the scheduler isn't running. `_expire_callback` re-reads `log_levels` from the DB at fire time, so an admin who edits overrides mid-window sees the new baseline restored — not a stale snapshot. - `revert_all` is wired into the FastAPI lifespan shutdown in [main.py](../../packages/server/src/notify_bridge_server/main.py) so a clean stop / hot-reload leaves the world tidy. - New frontend [DiagnosticsCassette.svelte](../../frontend/src/routes/settings/DiagnosticsCassette.svelte) sits below `LoggingCassette` in the settings page. Quick-pick module dropdown + custom-text fallback, duration chip group (5m / 15m / 30m / 1h / 2h), Activate button. Active list with countdown updated by a 1s ticker; resyncs from the backend every 30s based on elapsed time (not modulo-of-now, which the prior version had wrong). Manual revert via undo-icon button on each row. - 15 new i18n keys in `en.json` / `ru.json`. - 20 new tests in [test_diagnostic_mode.py](../../packages/server/tests/test_diagnostic_mode.py) — service-module unit tests + 4 FastAPI smoke tests via `dependency_overrides[require_admin]` exercising the router / path converter / HTTPException paths. All 290 server tests green. - Reviewed by `python-reviewer` subagent (no CRITICAL; 3 HIGH + 3 MEDIUM addressed: fallback task retention in a module-level set to prevent GC; prefix-walk for `_baseline_for` so sub-loggers inherit parent defaults; `revert_all` wired into lifespan shutdown; `list_active` now sweeps expired entries; DB `log_levels` re-read at revert time instead of snapshot at activation; frontend resync uses elapsed time. LOW items addressed: scheduler-unavailable paths log at DEBUG instead of silently passing; test cleanup of dead `_MIN_DURATION_MINUTES` mutation). - Live smoke: backend restarts cleanly, no errors in startup log.