feat: observability, per-receiver Telegram options, oversized-video fallback

Operability:
- Correlation IDs end-to-end: shared dispatch_id between log lines and
  EventLog rows (event/watcher/scheduled/deferred/action/HA/command paths)
  and a new X-Request-Id middleware that normalizes inbound ids and binds
  request_id into log context.
- dispatch_summary block merged into EventLog.details: per-target
  success/failure counts plus Telegram media delivered/skipped/failed and
  truncated error lists, so partial outcomes surface in the UI.
- Diagnostic mode: admin can flip one module to DEBUG for a bounded
  window with auto-revert (in-memory only; setup_logging() resets on
  boot, lifespan reverts on shutdown). New /diagnostic-mode endpoints
  plus DiagnosticsCassette UI on the settings page.

Telegram:
- Per-receiver options: disable_notification (silent send) and
  message_thread_id (forum-topic routing), wired through the dispatcher
  via a ContextVar so all four send sites (sendMessage / sendPhoto-Video-
  Document / sendMediaGroup / cache-hit POST) pick them up.
- send_large_videos_as_documents target setting: bypass the 50 MB
  sendVideo cap by falling back to sendDocument for oversized videos.
- sendMediaGroup byte-budget enforcement (TELEGRAM_MAX_GROUP_TOTAL_BYTES,
  45 MB) with per-item fallback on chunk failure so a stale file_id no
  longer silently drops a cached asset.

Tests:
- New: diagnostic_mode, dispatch_summary, request_correlation,
  telegram_media_group_partial, telegram_per_send_options.

Docs:
- .claude/reviews/: six-axis production-readiness review of v0.8.1.
- .claude/docs/functional-review-2026-05-28.md: focused review of
  Telegram/Immich/logging subsystems.
This commit is contained in:
2026-05-28 15:19:31 +03:00
parent 85a8f1e71c
commit 6a8f374678
39 changed files with 7239 additions and 142 deletions
@@ -33,6 +33,11 @@ from sqlalchemy.orm.attributes import flag_modified
from sqlmodel import select
from sqlmodel.ext.asyncio.session import AsyncSession
from notify_bridge_core.log_context import (
bind_log_context,
ensure_dispatch_id,
enrich_details_with_correlation,
)
from notify_bridge_core.models.events import EventType, ServiceEvent
from notify_bridge_core.models.media import MediaAsset, MediaType
from notify_bridge_core.notifications.dispatcher import (
@@ -56,6 +61,7 @@ from .dispatch_helpers import (
load_link_data,
resolve_provider_credential,
)
from .dispatch_summary import summarize_dispatch_results
_LOGGER = logging.getLogger(__name__)
@@ -616,12 +622,12 @@ async def _mark_dropped(
collection_name=payload.get("collection_name", ""),
assets_count=int(payload.get("added_count", 0))
or int(payload.get("removed_count", 0)),
details={
details=enrich_details_with_correlation({
"dispatch_status": "deferred_then_dropped",
"reason": reason,
"original_event_log_id": row.event_log_id,
"provider_type": payload.get("provider_type", ""),
},
}),
))
@@ -644,6 +650,28 @@ async def _process_row(
entry produces its own target_config so a broadcast deferred row fans
out to all current children at drain time.
"""
# Bind a fresh dispatch_id per drained row so the EventLog rows written
# by the success/drop paths AND the inner dispatcher's log lines share
# one id. Each deferred row is a logically separate dispatch attempt.
with bind_log_context(dispatch_id=ensure_dispatch_id()):
await _process_row_impl(
session, row, tracker, provider_id, provider_name,
provider_config, app_tz, link_by_id, dispatcher, stats,
)
async def _process_row_impl(
session: AsyncSession,
row: DeferredDispatch,
tracker: NotificationTracker,
provider_id: int,
provider_name: str,
provider_config: dict[str, Any],
app_tz: str,
link_by_id: dict[int, list[dict[str, Any]]],
dispatcher: NotificationDispatcher,
stats: dict[str, int],
) -> None:
expanded = link_by_id.get(row.link_id)
if not expanded:
# Link removed/disabled between defer and drain.
@@ -735,6 +763,8 @@ async def _process_row(
row.fired_at = datetime.now(timezone.utc)
session.add(row)
summary = summarize_dispatch_results(results)
if success:
stats["fired"] += 1
session.add(EventLog(
@@ -747,14 +777,15 @@ async def _process_row(
collection_id=row.collection_id,
collection_name=event.collection_name,
assets_count=event.added_count or event.removed_count or 0,
details={
details=enrich_details_with_correlation({
"dispatch_status": "delivered_after_quiet_hours",
"original_event_log_id": row.event_log_id,
"deferred_for_seconds": int(
(row.fired_at - row.created_at).total_seconds()
),
"provider_type": event.provider_type.value,
},
"dispatch_summary": summary,
}),
))
else:
stats["dropped"] += 1
@@ -769,12 +800,13 @@ async def _process_row(
collection_id=row.collection_id,
collection_name=event.collection_name,
assets_count=event.added_count or event.removed_count or 0,
details={
details=enrich_details_with_correlation({
"dispatch_status": "deferred_then_failed",
"reason": str(first_err)[:200],
"original_event_log_id": row.event_log_id,
"provider_type": event.provider_type.value,
},
"dispatch_summary": summary,
}),
))