feat: observability, per-receiver Telegram options, oversized-video fallback

Operability:
- Correlation IDs end-to-end: shared dispatch_id between log lines and
  EventLog rows (event/watcher/scheduled/deferred/action/HA/command paths)
  and a new X-Request-Id middleware that normalizes inbound ids and binds
  request_id into log context.
- dispatch_summary block merged into EventLog.details: per-target
  success/failure counts plus Telegram media delivered/skipped/failed and
  truncated error lists, so partial outcomes surface in the UI.
- Diagnostic mode: admin can flip one module to DEBUG for a bounded
  window with auto-revert (in-memory only; setup_logging() resets on
  boot, lifespan reverts on shutdown). New /diagnostic-mode endpoints
  plus DiagnosticsCassette UI on the settings page.

Telegram:
- Per-receiver options: disable_notification (silent send) and
  message_thread_id (forum-topic routing), wired through the dispatcher
  via a ContextVar so all four send sites (sendMessage / sendPhoto-Video-
  Document / sendMediaGroup / cache-hit POST) pick them up.
- send_large_videos_as_documents target setting: bypass the 50 MB
  sendVideo cap by falling back to sendDocument for oversized videos.
- sendMediaGroup byte-budget enforcement (TELEGRAM_MAX_GROUP_TOTAL_BYTES,
  45 MB) with per-item fallback on chunk failure so a stale file_id no
  longer silently drops a cached asset.

Tests:
- New: diagnostic_mode, dispatch_summary, request_correlation,
  telegram_media_group_partial, telegram_per_send_options.

Docs:
- .claude/reviews/: six-axis production-readiness review of v0.8.1.
- .claude/docs/functional-review-2026-05-28.md: focused review of
  Telegram/Immich/logging subsystems.
This commit is contained in:
2026-05-28 15:19:31 +03:00
parent 85a8f1e71c
commit 6a8f374678
39 changed files with 7239 additions and 142 deletions
@@ -0,0 +1,435 @@
# Functional Review — Telegram, Immich, Logging (2026-05-28)
Snapshot review of three subsystems, with prioritised improvement candidates.
Pairs with [feature-backlog.md](feature-backlog.md) — items here are
infrastructure that unlocks several backlog features.
All citations are from the working tree at commit `85a8f1e` (master). Two
files (`packages/core/src/notify_bridge_core/notifications/telegram/client.py`,
`media.py`) had uncommitted changes at review time — see Telegram §
"In-flight work".
---
## 1. Telegram infrastructure
### Telegram — what works well
- Single chokepoint `TelegramClient`
([packages/core/src/notify_bridge_core/notifications/telegram/client.py](../../packages/core/src/notify_bridge_core/notifications/telegram/client.py))
covers text/photo/video/document/media-group, with 429-aware retry,
parse-error retry, file_id cache, multi-bot per-token instances,
polling + webhook modes, and bot-command registration.
- CLAUDE.md rule #6 satisfied for the production paths.
- Caption length, group sizing, parse-mode fallback all enforced.
### In-flight work
Byte-budget sub-chunking for media groups
(`TELEGRAM_MAX_GROUP_TOTAL_BYTES` in
[media.py](../../packages/core/src/notify_bridge_core/notifications/telegram/media.py))
with per-item fallback inside `_send_media_group`. Logic is coherent;
before commit, verify `_build_media_items` callers still match the new
signature (caption no longer injected at fetch time).
### Gaps, ranked by user-visible value
1. **No inline keyboards / `callback_query` handlers** — zero infra for
"Favorite / Archive / Dismiss" buttons on Immich notifications.
Biggest UX unlock; prerequisite for several Immich smart actions.
2. **No edit-in-place** (`editMessageText` not wrapped). Pairs naturally
with deferred dispatch / quiet hours coalescing — 5 separate
"asset added" messages become 1 edited message.
3. **`disable_notification` (silent send) not exposed** — already a
Telegram primitive; slots into the quiet-hours `silent` mode the
backlog already mentions.
4. **`message_thread_id` (forum topics)** — single field per receiver;
unblocks supergroup-with-topics users.
5. **Direct `TelegramClient(...)` constructions** in
[api/telegram_bots.py:314,394,404,412](../../packages/server/src/notify_bridge_server/api/telegram_bots.py)
bypass `get_telegram_client()` — violates CLAUDE.md rule #6 and
skips the shared file_id cache.
6. **Per-command authorization**`commands_enabled` is all-or-nothing
per chat; no per-command allowlist or admin gate.
7. **Long-message splitting**`send_message` silently truncates at
4096 ([client.py:492](../../packages/core/src/notify_bridge_core/notifications/telegram/client.py)).
8. **No parse-mode per target** — HTML hardcoded.
---
## 2. Immich
### Immich — what works well
- Mature polling pipeline: incremental delta-fetch via `updatedAfter`,
pending-asset tracking, fingerprint fast-path skip, fallback to full
fetch on count-decrease
([providers/immich/provider.py](../../packages/core/src/notify_bridge_core/providers/immich/provider.py)).
- Rich bot commands (status / albums / events / people / search / latest
/ random / favorites / summary / memory) with full asset context
(CLAUDE.md rule #10 satisfied).
- `auto_organize` action is well-shaped: AND person + smart-query union,
exclusions, type/date/favorite filters, 500-asset batched add,
idempotent diff against album asset_ids, dry-run, `ActionExecution`
log.
- Three scheduled features wired: periodic summaries, scheduled-asset
delivery, Memory/On-This-Day (with native Immich memory API + fallback).
### Highest-leverage candidates
1. **Webhook ingestion**`webhook_based=False` at
[capabilities.py:46](../../packages/core/src/notify_bridge_core/providers/capabilities.py).
Sub-second latency vs the current 5-min poll. New
`/api/webhooks/immich/{secret}` route + parser + capability flip.
2. **Share-link expiry monitoring + auto-rotate action** — links
silently break today; data is already fetched per event
([provider.py:541-569](../../packages/core/src/notify_bridge_core/providers/immich/provider.py)).
3. **Duplicate cluster digest** — Immich >= 1.100 `/api/duplicates` is
unused; pairs with inline buttons for "merge / ignore 30d".
4. **Auto-favorite by person** (already in backlog) — smallest delta on
the existing `auto_organize` executor.
5. **Per-person notification subscription** — tracker-config filter,
reuses existing `asset.people` data.
6. **Album auto-curation from Inbox** — date-based target album name,
move (not copy); needs the Immich move endpoint (currently we only
`add_assets_to_album`).
7. **Storage / job-queue alerts**`/api/server/stats` and `/api/jobs`
unused; lightweight poll + threshold = "disk full" / "transcoding
stalled" notifications.
8. **Smart-action infra polish** — descriptors are reusable, but the
rule editor is JSON-shaped, action-run statistics aren't aggregated,
and dry-run shows counts not the asset list. Address before adding 5
more action types.
---
## 3. Logging
### What's already in place
In [logging_setup.py](../../packages/server/src/notify_bridge_server/logging_setup.py):
- `dictConfig` with `JsonFormatter` (line-delimited JSON) toggleable via
`NOTIFY_BRIDGE_LOG_FORMAT=json`.
- `SecretMaskingFilter` redacts Telegram bot tokens + Authorization /
api_key / password / refresh_token across `msg`, `exc_text`,
`stack_info`.
- ContextVar-driven record factory injects `request_id`, `command`,
`chat_id`, `bot_id`, `dispatch_id` on every record. Text format:
`[req=- cmd=- bot=- chat=- disp=-]`.
- Per-module overrides via `NOTIFY_BRIDGE_LOG_LEVELS` env or DB
`AppSetting`. Live runtime patch via `apply_log_levels()` — no
restart.
- Noisy libs pre-quieted (sqlalchemy, aiohttp, apscheduler, urllib3,
asyncio, httpx, httpcore, PIL, uvicorn.access).
Plus:
- `EventLog` table with structured rows (event_type, status,
assets_count, details JSON, FKs to tracker/provider/action/
command_tracker/bot), `event_log_retention_days=30` default, daily
APScheduler cleanup `_cleanup_old_events`
([scheduler.py:332](../../packages/server/src/notify_bridge_server/services/scheduler.py)).
- Prometheus counter `notify_bridge_event_log_total{status,event_type}`.
- Frontend viewer with filters at
[api/status.py](../../packages/server/src/notify_bridge_server/api/status.py).
- `bind_log_context` actually used in: dispatcher (dispatch_id),
telegram_poller (bot/chat/command/request_id), webhook commands.
### Gaps, ordered by debug-pain payoff
1. **No FastAPI request-ID middleware.** `request_id_var` is set only
in webhook + Telegram poller paths. Every REST call from the SPA
logs as `req=-`. Tiny middleware (read `X-Request-Id` or
`uuid4()`, bind context, echo header) closes this whole-app blind
spot.
2. **`dispatch_id` is in log lines but NOT persisted on the `EventLog`
row.** Means you can find the failed row in the UI but can't grep
stderr for the matching `disp=...`. Stash it in `details.dispatch_id`
(no migration needed) — biggest cross-surface correlation win.
3. **HTTP access log is uvicorn default**
(`access_log=not _cfg.debug` at
[main.py:419](../../packages/server/src/notify_bridge_server/main.py)).
Doesn't include `request_id`, latency, user, status as structured
fields. Replace with a small `RequestLoggerMiddleware` that emits
`method`, `path`, `status`, `latency_ms`, `request_id`.
4. **Telegram media-group failures log richly but aren't linked to the
resulting `EventLog` row.** The dispatcher result-aggregation work
in flight is the right place to dump `errors[]` into
`EventLog.details.errors`.
5. **In-browser log access is missing.** EventLog rows are visible, but
raw logger output requires container/SSH access. A bounded
in-memory ring-buffer endpoint (admin-only, last N lines, filtered
by context fields) would mean ~90% of triage stays in the UI.
6. **No "diagnostic mode" UI.** The runtime `apply_log_levels()` is
great but only reachable through the app-settings JSON editor.
A "Debug for 15 minutes: `notify_bridge_core.notifications.telegram.client`"
button with auto-revert is a few-hours job.
7. **`EventLog.details` is freeform.** Frontend already destructures
`dispatch_status`, `deferred_until`, `deferred_for_seconds`,
`original_event_log_id`
([types.ts:238-261](../../frontend/src/lib/types.ts)). Define a
typed `EventLogDetails` per `event_type` (Pydantic at the boundary)
— prevents drift between providers.
8. **No log rotation**`StreamHandler(sys.stderr)` only. Fine in
containers, brittle on bare-metal. Optional `RotatingFileHandler`
opt-in via env.
9. **No slow-query / outbound-HTTP timing logs.**
`sqlalchemy.engine=WARNING` by default; no per-query duration log.
Same for outbound calls to Immich / Telegram. A
"duration_ms >= N" threshold logger would surface "why is this
dispatch slow" without flipping global DEBUG.
10. **Action dry-run output is logger-only.** Could be streamed into
the action editor.
11. **Poll-result not persisted.** Webhook payloads are logged
([api/webhook_logs.py](../../packages/server/src/notify_bridge_server/api/webhook_logs.py)),
but Immich/Google-Photos poll cycles emit no
"last poll: 0 changes / 245ms" row. A lightweight
`provider_poll_log` (small table or ring buffer) would answer
"is the poller actually running" without reading stderr.
---
## Recommended sequencing
| # | Item | Status | Why first |
| --- | --- | --- | --- |
| 1 | Request-ID middleware + persist `dispatch_id` on `EventLog` | **SHIPPED 2026-05-28** | Unlocks the rest of the debug story; ~2 hours combined |
| 2 | Finish in-flight Telegram byte-budget chunking + write `errors[]` into `EventLog.details` | **SHIPPED 2026-05-28** | Already half-done; aligns with #1 |
| 3 | Telegram inline keyboards + `callback_query` handler | not started | Prereq for several Immich smart actions |
| 4 | Telegram `disable_notification` + `message_thread_id` per target | **SHIPPED 2026-05-28** | Small, also feeds the open Quiet Hours v1 backlog item |
| 5 | Immich webhook ingestion | not started | 5-min → sub-second; biggest user-facing latency win |
| 6 | Immich share-link expiry + auto-rotate (using #3) | not started | Real silent-breakage today |
| 7 | Diagnostic-mode UI (live log-level toggle with auto-revert) | **SHIPPED 2026-05-28** | Shifts triage to the browser |
| 8 | Immich duplicate digest + auto-favorite by person | not started | Both ride on #3 |
Items 14 are infrastructure that unlocks 58. Items 1, 2, 4 also
smooth the Quiet Hours v1 / target-level windows that's top of the
backlog — worth landing before that feature so quiet hours can dispatch
through edited messages and silent sends from day one.
---
## Decision log
- **2026-05-28** — Review completed. Starting work on item #1
(request-id middleware + persist `dispatch_id` on `EventLog`).
- **2026-05-28** — Item #1 shipped. Summary of the change:
- New helpers in
[packages/core/src/notify_bridge_core/log_context.py](../../packages/core/src/notify_bridge_core/log_context.py):
`ensure_dispatch_id()` (reuse existing or mint a new
`disp:<12 hex>`) and `enrich_details_with_correlation(details)`
(shallow-copy a details dict and merge active `dispatch_id` /
`request_id` from the ContextVar snapshot).
- New `RequestContextMiddleware` in
[packages/server/src/notify_bridge_server/main.py](../../packages/server/src/notify_bridge_server/main.py)
that reads inbound `X-Request-Id` (charset/length validated, `:`
excluded so a client can't masquerade as a server-minted id),
falls back to `req:<12 hex>`, binds the value via
`bind_log_context`, and echoes it back as the response header.
Added LAST so it's the outermost middleware.
- Outer entry points now bind a `dispatch_id` via a thin wrapper
function (`check_tracker`, `dispatch_provider_event`,
`dispatch_scheduled_for_tracker`, `_process_row`, `run_action`).
All 10 `EventLog(...)` creation sites wrap their `details=`
payload in `enrich_details_with_correlation(...)`.
- Switched `NotificationDispatcher.dispatch` to use
`ensure_dispatch_id()` instead of inline `uuid.uuid4()`.
- New tests in
[packages/server/tests/test_request_correlation.py](../../packages/server/tests/test_request_correlation.py)
(12 tests) covering header echo, charset validation, prefix-
masquerade rejection, helper merge semantics. All 239 server
tests green.
- Reviewed by `python-reviewer` subagent (no CRITICAL/HIGH; 3 MEDIUM
and 1 LOW addressed: PEP 8 imports moved to top of main.py;
`RequestResponseEndpoint` type added to `dispatch`; `:` dropped
from the request-id charset; shallow-copy caveat documented).
- Live smoke verified: generated id `req:a9b9821f5aab` on plain
request; safe inbound `my-trace-abc123` echoed unchanged;
`disp:fake12345678` correctly replaced; watcher tick log lines now
show distinct `disp=disp:<hex>` per tracker check.
- **2026-05-28** — Item #2 shipped. Summary of the change:
- Confirmed the in-flight Telegram byte-budget media-group chunking
in
[telegram/client.py](../../packages/core/src/notify_bridge_core/notifications/telegram/client.py)
is complete (15/15 media-group tests pass). Deleted the now-unused
`split_media_by_upload_size()` from
[telegram/media.py](../../packages/core/src/notify_bridge_core/notifications/telegram/media.py).
- New module
[services/dispatch_summary.py](../../packages/server/src/notify_bridge_server/services/dispatch_summary.py)
with `summarize_dispatch_results()` (aggregator),
`attach_summary_in_place()` (in-session) and
`record_dispatch_summary_async()` (post-commit). Captures
`targets_attempted/succeeded/failed`, per-target `errors`,
media-group `media{delivered,skipped,failed}` counts and
`media_errors[]` from the new
`TelegramClient._send_media_group` partial-failure path.
Bounded: 20 errors / 20 media errors / 500-char message cap with
explicit `…[truncated]` marker.
- Wired at 4 dispatch sites:
- `event_dispatch.py`: accumulates per-target results across all
tracking-config groups, attaches summary in-session before
commit.
- `deferred_dispatch.py`: inlines summary into the new EventLog
row's `details` for both `delivered_after_quiet_hours` and
`deferred_then_failed` paths.
- `scheduled_dispatch.py`: inlines summary into the cron-fire
EventLog row's `details`.
- `watcher.py`: follow-up `record_dispatch_summary_async` in a
fresh session because the EventLog row was committed before
dispatch.
- Frontend type drift fixed:
[types.ts](../../frontend/src/lib/types.ts) gets new
`DispatchSummary`, `DispatchSummaryError`,
`DispatchSummaryMediaError` interfaces plus `dispatch_id` /
`request_id` / `dispatch_summary` keys on `EventLog.details`.
- New tests in
[tests/test_dispatch_summary.py](../../packages/server/tests/test_dispatch_summary.py)
(10 tests): empty/all-success/mixed/media-counts/sub-errors/
truncation/long-message-trim/in-place attach/no-results no-op/
malformed sub-error. All 249 server tests green.
- Reviewed by `python-reviewer` subagent (no CRITICAL; 2 HIGH + 3
MEDIUM addressed: `asyncio.CancelledError` re-raise in the
best-effort catch; late `from .dispatch_summary import …` calls
hoisted to top of each file; empty-results contract changed from
"zero-count summary attached" to "no key written"; truncation
marker upgraded to `…[truncated]` for operator clarity;
`flag_modified` comment tightened).
- Live smoke: backend restarts cleanly, watcher tick log lines
continue showing `disp=disp:<hex>` correlation, no startup
errors.
- **2026-05-28** — Item #4 shipped. Summary of the change:
- `TelegramReceiver` dataclass in
[receiver.py](../../packages/core/src/notify_bridge_core/notifications/receiver.py)
gains `disable_notification: bool = False` and
`message_thread_id: int | None = None`. New
`_coerce_telegram_thread_id` helper collapses Telegram's "general
topic" sentinels (`0`, negatives, blanks, bools) to `None` so the
Bot API just omits the field — matches the frontend's `<= 0 → unset`
behaviour.
- `TelegramClient`
([client.py](../../packages/core/src/notify_bridge_core/notifications/telegram/client.py))
gets a frozen `_SendOptions` + `_send_options_var` `ContextVar`
pattern for the deep media paths (`_upload_media`,
`_post_media_group`, `_send_from_cache`) that can't easily plumb
kwargs through. `send_notification` binds the var; the 3 deep
builders read it via `_apply_send_opts_to_payload` /
`_apply_send_opts_to_form`. `send_message` is a leaf and just
inlines its kwargs into the JSON body directly (no ContextVar
needed there).
- Dispatcher
([dispatcher.py](../../packages/core/src/notify_bridge_core/notifications/dispatcher.py))
passes `receiver.disable_notification` / `receiver.message_thread_id`
into `client.send_message(...)` and `client.send_notification(...)`.
- Frontend: new inline per-Telegram-receiver options panel in
[ReceiverSection.svelte](../../frontend/src/routes/targets/ReceiverSection.svelte)
triggered by a cog icon. Silent + thread-id indicators (bell-off
icon, `#N` badge) on the row when set. `+page.svelte` handlers
PUT the merged config to `/api/targets/{id}/receivers/{rid}`.
5 new i18n keys in `en.json` / `ru.json`.
- New tests in
[test_telegram_per_send_options.py](../../packages/server/tests/test_telegram_per_send_options.py)
— 19 tests: factory + thread-id coercion table (including bool
rejection and `0`/negative collapse), payload/form helper merge
semantics, bind/reset under exceptions, concurrent-task isolation
via `asyncio.gather`, end-to-end `send_message` payload assertions.
All 270 server tests green.
- Reviewed by `python-reviewer` subagent (no CRITICAL; 2 HIGH + 1
MEDIUM + 1 LOW addressed: dead ContextVar bind in `send_message`
removed in favor of inline kwarg injection; re-entrant bind from
`send_notification → send_message` auto-resolved by the same fix;
`message_thread_id=0` collapse aligns backend with frontend;
`_coerce_telegram_thread_id` rejects `bool` input).
- Live smoke: backend restarts cleanly, no errors in startup log.
- **2026-05-28** — Holistic `code-reviewer` pass over the full session
diff (Features 1+2+4+7) caught a real HIGH that the per-feature
Python-narrow reviews missed: ``summarize_dispatch_results`` in
Feature 2 was reading the wrong dict shape. The dispatcher's
``_aggregate_results`` wraps per-receiver dicts under
``result["results"]`` and renames the Telegram media counts to
``media_delivered_count`` / ``media_skipped_count`` /
``media_failed_count``. The summarizer was reading the top-level
``delivered_count``, which is always absent in production aggregated
output — meaning the ``dispatch_summary.media`` block was silently
zero / missing for every real dispatch, and the ``media_errors``
list never populated. The unit tests passed because they
hand-constructed leaf-shaped dicts that masked the wrong-shape
read. Fixed in
[dispatch_summary.py](../../packages/server/src/notify_bridge_server/services/dispatch_summary.py)
by drilling into ``result["results"]`` per-receiver leaves and
preferring ``media_*_count`` field names with fallback to the
top-level names. Receiver index added to ``media_errors`` entries
when drilling. New integration tests in
[test_dispatch_summary.py](../../packages/server/tests/test_dispatch_summary.py)
use the real dispatcher envelope so a future shape regression fails
loudly. Also addressed MEDIUM findings: ``attach_summary_in_place``
/ ``record_dispatch_summary_async`` now skip when a caller has
pre-set ``dispatch_summary`` (mirrors the "caller wins" rule in
``enrich_details_with_correlation``); ``ReceiverSection.svelte``
props for the Telegram options panel are now optional + gated
internally so the component stays portable; TS type for
``editingReceiverOptions.message_thread_id`` is ``number | ''``
with proper coercion in ``openEditReceiver``. 294/294 server tests
green; backend restarts clean.
- **2026-05-28** — Item #5 NOT shipped. Reason: Immich has no
outbound webhook feature. The closest thing is `POST /sync/stream`
(a server-streaming sync API designed for first-party Immich
clients), and adopting it would (a) take 1-2 days of new
subscription-manager infrastructure, (b) couple us to an API with no
third-party stability contract, and (c) deliver 5-min → sub-second
latency on photo notifications which is rarely critical. If
someone later actually needs lower latency, dropping the default
``scan_interval`` is a 5-minute alternative that gets 80% of the
win for 1% of the cost. Skipped in favour of #7.
- **2026-05-28** — Item #7 shipped. Summary of the change:
- New service module
[services/diagnostic_mode.py](../../packages/server/src/notify_bridge_server/services/diagnostic_mode.py)
with `set_diagnostic` / `revert_diagnostic` / `revert_all` /
`list_active`. State is in-memory only — restart wipes overrides
(`setup_logging` re-applies the DB baseline at boot). Modules go
through an allowlist (`notify_bridge_*`, `sqlalchemy`, `aiohttp`,
`apscheduler`, `urllib3`, `httpx`, `httpcore`, `asyncio`, `PIL`,
`uvicorn`, `starlette`, `fastapi`) so a button press can't flip
root. Duration clamped to `[1, 240]` minutes. Baseline derivation
walks the dotted parents so
`sqlalchemy.engine.Engine` correctly inherits `sqlalchemy.engine`
→ WARNING rather than falling through to root.
- 3 new admin-only endpoints under `/api/settings/diagnostic-mode`
in
[api/app_settings.py](../../packages/server/src/notify_bridge_server/api/app_settings.py):
`GET` (list active), `POST` (activate, 400 on invalid input),
`DELETE /{module:path}` (manual revert, 404 if not active).
- Auto-revert uses APScheduler's date trigger with `misfire_grace_time=60`,
falling back to a strongly-referenced asyncio task (stored in a
module-level set with `add_done_callback(discard)`) when the
scheduler isn't running. `_expire_callback` re-reads `log_levels`
from the DB at fire time, so an admin who edits overrides mid-window
sees the new baseline restored — not a stale snapshot.
- `revert_all` is wired into the FastAPI lifespan shutdown in
[main.py](../../packages/server/src/notify_bridge_server/main.py)
so a clean stop / hot-reload leaves the world tidy.
- New frontend
[DiagnosticsCassette.svelte](../../frontend/src/routes/settings/DiagnosticsCassette.svelte)
sits below `LoggingCassette` in the settings page. Quick-pick
module dropdown + custom-text fallback, duration chip group (5m /
15m / 30m / 1h / 2h), Activate button. Active list with countdown
updated by a 1s ticker; resyncs from the backend every 30s based
on elapsed time (not modulo-of-now, which the prior version had
wrong). Manual revert via undo-icon button on each row.
- 15 new i18n keys in `en.json` / `ru.json`.
- 20 new tests in
[test_diagnostic_mode.py](../../packages/server/tests/test_diagnostic_mode.py)
— service-module unit tests + 4 FastAPI smoke tests via
`dependency_overrides[require_admin]` exercising the router /
path converter / HTTPException paths. All 290 server tests green.
- Reviewed by `python-reviewer` subagent (no CRITICAL; 3 HIGH +
3 MEDIUM addressed: fallback task retention in a module-level set
to prevent GC; prefix-walk for `_baseline_for` so sub-loggers
inherit parent defaults; `revert_all` wired into lifespan
shutdown; `list_active` now sweeps expired entries; DB
`log_levels` re-read at revert time instead of snapshot at
activation; frontend resync uses elapsed time. LOW items
addressed: scheduler-unavailable paths log at DEBUG instead of
silently passing; test cleanup of dead `_MIN_DURATION_MINUTES`
mutation).
- Live smoke: backend restarts cleanly, no errors in startup log.