test_fallback_task_retained_until_fire asserts len(_bg_tasks) == 1, but
the set carries pending tasks from earlier tests' fallback schedules, so
the assertion saw the accumulated count instead. Drop the references (no
.cancel() — the tasks belong to closed loops, cross-loop cancel raises
RuntimeError on the next test's setup).
Operability:
- Correlation IDs end-to-end: shared dispatch_id between log lines and
EventLog rows (event/watcher/scheduled/deferred/action/HA/command paths)
and a new X-Request-Id middleware that normalizes inbound ids and binds
request_id into log context.
- dispatch_summary block merged into EventLog.details: per-target
success/failure counts plus Telegram media delivered/skipped/failed and
truncated error lists, so partial outcomes surface in the UI.
- Diagnostic mode: admin can flip one module to DEBUG for a bounded
window with auto-revert (in-memory only; setup_logging() resets on
boot, lifespan reverts on shutdown). New /diagnostic-mode endpoints
plus DiagnosticsCassette UI on the settings page.
Telegram:
- Per-receiver options: disable_notification (silent send) and
message_thread_id (forum-topic routing), wired through the dispatcher
via a ContextVar so all four send sites (sendMessage / sendPhoto-Video-
Document / sendMediaGroup / cache-hit POST) pick them up.
- send_large_videos_as_documents target setting: bypass the 50 MB
sendVideo cap by falling back to sendDocument for oversized videos.
- sendMediaGroup byte-budget enforcement (TELEGRAM_MAX_GROUP_TOTAL_BYTES,
45 MB) with per-item fallback on chunk failure so a stale file_id no
longer silently drops a cached asset.
Tests:
- New: diagnostic_mode, dispatch_summary, request_correlation,
telegram_media_group_partial, telegram_per_send_options.
Docs:
- .claude/reviews/: six-axis production-readiness review of v0.8.1.
- .claude/docs/functional-review-2026-05-28.md: focused review of
Telegram/Immich/logging subsystems.
Apply six isolated, low-risk fixes surfaced by the parallel
production-readiness review (backend, frontend, security, perf,
UI/UX, bugs+features).
Backend
- Mask access_token in provider GET responses and drop it on edit
when carrying the *** placeholder — fixes plaintext leak of HA
long-lived tokens (security H-1). Centralized via
PROVIDER_SECRET_FIELDS so all call sites stay in sync (C-5).
- Hold HA status-change tasks in a module-level set with a
done_callback — asyncio.create_task only keeps weak refs and
the task could be GC'd before its row was written (C-1).
- Roll back the request session in the Telegram-webhook catch-all
so a handler exception cannot leak uncommitted writes into the
next request (C-2).
- Bail before reading the 1 MiB webhook body when the Gitea
provider has no secret configured or the request has no
signature header. For the generic webhook with bearer_token
auth, verify the Authorization header before the body read.
Closes the pre-auth resource-exhaustion amplifier (C-3).
Frontend
- Add supportsAutoOrganize capability to ProviderDescriptor and
consume it from RuleEditor instead of `provider.type !== 'immich'`,
bringing the last action-rule editor under CLAUDE.md rule 8
(no provider-type hardcoding in components).
- Snackbar: add role="region" + per-toast role/aria-live/aria-atomic
so screen readers announce success/error toasts.
- Sidebar nav: add aria-current="page" on the active link so the
active state has an accessible name.
- New snackbar.region key in en + ru (locale parity preserved).
Out of scope for this commit (tracked in .claude/reviews/README.md
ship-blocker list): secret encryption at rest, JWT cookie move,
Alembic adoption, webhook idempotency, deferred-dispatch crash
window, persisted Telegram update watermark, bridge_self counter
lock — each needs more than a mechanical edit.
Four root causes blocked the CI test gate; all fixed minimally:
1. test_release_provider._allow_private_urls used setenv +
importlib.reload(ssrf_mod). The reload permanently rebound
_ALLOW_PRIVATE=True in the module; monkeypatch.setenv undid the
env var on teardown but the module attribute stayed True for the
rest of the session, masking every test_ssrf*/test_ssrf_hardening
case (16 failures). Switched to monkeypatch.setattr on the module
attribute directly — restored cleanly on teardown.
2. _FakeResponse in test_release_provider lacked the content_length
attribute and a top-level read() method that the new size-cap
guards in gitea.py consult before parsing (5 failures).
3. test_gate_quiet_hours_wins_over_event_type_flag was asserting the
pre-refactor gate order. evaluate_event_gate now intentionally
reports EVENT_TYPE_DISABLED before QUIET_HOURS so deferrable
events with the event-type flag off get dropped immediately
instead of being deferred and then silently discarded at drain
time. Renamed the test and inverted the expectation.
4. resolve_version() returned 0.0.0+unknown in CI because
pip-wheel-built hatchling distributions ended up with METADATA
missing the Version field — importlib.metadata returned None.
Added __version__ = "0.8.1" to notify_bridge_server/__init__.py
as a third (always-available) candidate; resolve_version() now
picks the max of (installed, package, source).
Adds bot commands for the bridge_self provider so operators can inspect
and manage bridge health from chat: /status, /thresholds, /reset, /health.
Includes Jinja2 templates for both locales, seed data, capability slots,
and a handler that exposes pending deferred backlog plus per-counter
reset. Also adds .claude/skills/ for project-scoped graph-aware skills.
Comprehensive multi-area pass driven by a parallel 8-agent production
review. Frontend, backend, database, security, performance, operational,
plus a new self-monitoring feature.
## Critical fixes
- Planka webhook: reads bounded raw body (was NameError on every call)
- HA quiet hours: ha_state_changed/automation_triggered/service_called/
event_fired added to deferrable set (were silently dropped)
- DNS-rebinding SSRF: PinnedResolver wired into shared aiohttp session
- Telegram inbound webhook: secret now mandatory (401 without)
- Generic webhook: auth_mode="none" requires explicit
acknowledge_unauthenticated=true; per-IP rate limit 60/min
- svelte-check: 5 null-narrowing errors in EventDetailModal fixed
- Provider hardcoding: Immich-only block extracted to descriptor
featureDiscoveryHint
- command_sync: snapshot+expunge bot before exiting AsyncSession
## Bug fixes
- notifier asyncio.gather(return_exceptions=True) — one bad chat no longer
cancels peer sends
- NotificationDispatcher hoisted out of per-tracker loop
- Provider credential resolution unified across all 5 dispatch sites
- HA asyncio.shield now drains inner task on cancellation
- Provider construction switched from if/elif ladder to factory registry
- NUT first poll seeds silently (no spurious ups_on_battery)
- Quiet-hours gate: event-type-disabled now wins over deferral
- APScheduler drain job ID resolution upgraded to seconds
- HA on_status_change wired through to EventLog
- Webhook payload rollback failures now logged (not swallowed)
- Batched receivers/chats/bots in load_link_data (was per-target N+1)
- flag_modified on JSON column reassignments in deferred_dispatch
## Database
- UNIQUE indexes on service_provider.webhook_token,
telegram_bot.webhook_path_id, partial UNIQUE on telegram_bot.bot_id,
telegram_chat(bot_id, chat_id), notification_tracker_target unique link,
partial UNIQUE on bridge_self provider per user
- Composite ix_event_log_user_event_type_created index
- save_chat_from_webhook switched to ON CONFLICT DO UPDATE
- ondelete=CASCADE on user-id FKs (model annotation; app-side cascade
delete added for existing data)
- delete_notification_tracker converted from N+1 to bulk DELETE/UPDATE
- Module-level asyncio.Lock replaced with lazy _get_lock() pattern
- VACUUM INTO snapshot now PRAGMA integrity_check verified
## Performance
- Jinja2 template compilation LRU cached (lru_cache maxsize=512)
- Per-locale render cache in NotificationDispatcher (skips re-rendering
identical content for receivers sharing a locale)
- Tracker list cached per provider_id with 5s TTL + explicit invalidation
on tracker CRUD (relieves HA chat-bus rate query pressure)
- Nav-counts collapsed from 16 round-trips to single UNION ALL
- HA event_log: skip persisting empty assets_added/removed events
## Security hardening
- Mass-assignment guard on Action create/update; cron sub-minute reject
- Backup JSON depth/node-count cap (depth ≤ 10, nodes ≤ 100k)
- _sanitize_config extended to all JSON-typed fields on backup import
- Telegram _safe_get walks redirects manually with SSRF revalidation
- Bcrypt 72-byte password length cap with clear 422
- Webhook payload body redaction; sensitive substring set extended with
oauth/client_secret/webhook_secret/csrf in both header filter and
template extras filter
## Frontend
- 76 catch (err: any) sites converted to errMsg(err) helper
- globalProviderFilter: pure getter; reconciliation moved to one-time
$effect in +layout
- Provider-filter binding: removed paired $effects + _syncingFilter flag,
now one-way derived
- entity-cache: separate _refreshing flag for background re-fetches
- api.ts 401 handling: AuthRedirectError class + dedup _redirecting flag,
goto() instead of window.location.href
- a11y: aria-expanded on mobile More, role=switch + aria-checked on
Telegram bot toggles
## Tests & operations
- CI pytest gate added to .gitea/workflows/build.yml + release.yml
(wheel-built install to dodge editable-install slowness)
- /api/ready upgraded to deep healthcheck (db SELECT 1, scheduler.running,
HA supervisor presence) returning {ready, checks, errors, version}
- /api/metrics endpoint with prometheus_client (deferred_pending,
event_log_total, dispatch_duration, poll_failures, send_failures)
- New OPERATIONS.md covering deploy, healthchecks, metrics, backup/restore
procedures, log handling, common scenarios, upgrade flow
- New tests: test_bridge_self (11), test_gitea_parser (9),
test_planka_parser (6), test_immich_change_detector (6),
test_backup_roundtrip (1)
## New feature: bridge self-monitoring
- New bridge_self provider type — internal sink for bridge health events
- Three event types: bridge_self_poll_failures (consecutive tracker poll
failures), bridge_self_deferred_backlog (pending count crosses
threshold), bridge_self_target_failures (consecutive 5xx/network
failures per target)
- Per-user thresholds (defaults: 3 / 100 / 5) configurable via the
provider config form
- Auto-seeded on user create + /setup + boot backfill for existing users
- Anti-spam: counters reset after emission; backlog uses transition latch
- Self-loop guard: bridge_self failures don't count toward target-failure
thresholds (logged only) — wire to your own Telegram/Email/Matrix to
get notified when polls/dispatches/sends fail
- 6 default templates (3 events × 2 locales), tracking config columns
with backfill migration, frontend descriptor (excluded from "create
provider" wizard since auto-managed)
Operator-visible behavior changes (call out in release notes):
- NOTIFY_BRIDGE_TELEGRAM_WEBHOOK_SECRET now REQUIRED for webhook mode
- Existing webhook providers with auth_mode="none" need explicit opt-in
- Generic webhook endpoint rate-limited 60/min per source IP
- HA disconnect/reconnect writes ha_status_* EventLog rows
- Every user gets a bridge_self provider — wire it to a target to
receive failure alerts
Pre-existing test failures (test_ssrf, test_release_provider) on
Python 3.13 are unrelated; CI runs on 3.12.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds Home Assistant as a service provider with two coordinated surfaces:
Notifications (subscription):
- Long-lived WebSocket client (aiohttp ws_connect) with auth handshake,
exponential-backoff reconnect, bounded event queue, and area-registry
enrichment cached per (re)connect
- ServiceProvider ABC gains an optional `subscribe()` method for push-style
providers; HomeAssistantServiceProvider uses it via a per-provider
supervisor task started in the FastAPI lifespan
- 4 event types (state_changed, automation_triggered, call_service,
event_fired), 4 default Jinja templates (en + ru), HA-specific
tracker filters (entity_glob, domain_allowlist, exact entity ids)
- Extracted shared dispatch pipeline (api/webhooks.py → services/
event_dispatch.py) so subscription and webhook ingest share the same
event_log + deferred-dispatch + quiet-hours code path
Bot commands:
- /status, /entities [glob], /state <entity_id>, /areas
- Multi-command WS session so /status and /areas cost one handshake
- Sensitive-attribute blocklist (camera access_token, entity_picture, etc.)
and 30-attribute cap to keep /state output safe and within Telegram's
message size
- Error-message redaction strips URL userinfo before surfacing to chat
Frontend:
- HA descriptor with toggle ConfigField type (new) and tag-input filter
mode for free-text glob/domain lists (new TagInput component)
- 15 command slots + 4 notification slots wired into the existing
template-config UI
The APScheduler cron job fires daily at every entry in `periodic_times`,
and nothing in the dispatch path consulted `periodic_interval_days` or
`periodic_start_date` — so a configured 3-day interval still produced a
daily summary.
Gate the dispatch in `dispatch_scheduled_for_tracker` for kind=periodic
via `(today_in_app_tz - start_date).days % interval == 0`. Log a skip
with reason `interval_not_due` on non-firing days so operators can tell
suppressed-by-interval apart from other skip causes.
Adds an icon selector to the "On watch" provider deck letting users
choose between page-scoped stats (legacy) and full-corpus stats that
aggregate across every event matching the current filters. Backend
returns a new provider_event_counts map alongside the paginated events.
- Defer quiet-hours dispatches into new deferred_dispatch table; drain
job + periodic catch-up scan re-fire at window end with coalescing on
(link, event_type, collection_id).
- Add ON DELETE SET NULL migration on event_log_id and partial unique
index on (link_id, collection_id, event_type) WHERE status='pending'.
- Add release-check provider abstraction (Gitea/GitHub) with SSRF-safe
URL validation, settings UI cassette, and scheduled polling.
- Replace importlib-only version lookup with version.py helper that
prefers the higher of installed metadata vs source pyproject so stale
editable dev installs stop misreporting.
- Aurora frontend polish: MetaStrip component, ReleaseCassette,
EventDetailModal expansion, and i18n additions.
The generic-webhook provider has no upstream API, so /status reports
DB-derived stats: active/total trackers, provider name, and last event
timestamp (formatted via the shared get_last_event_str helper).
Includes pytest coverage for handler registration, populated stats with
a recent event, the empty-state dash sentinel, and unknown-command
fall-through. Template variable docs in command_template_configs.py
extended with the new trackers_active/trackers_total keys.
Bot commands were the only user-initiated path that didn't surface in
the dashboard. They now produce ``command_handled`` /
``command_rate_limited`` / ``command_failed`` rows in ``EventLog``
alongside tracker and action events.
Backend
- ``EventLog`` gains nullable ``command_tracker_id`` / ``telegram_bot_id``
FKs plus deletion-snapshot name columns (idempotent migration).
- New ``_log_command_event`` helper emits one row per invocation at the
three branches in ``handle_command``. Logging failures are swallowed
so they cannot block the user-visible reply.
- Telegram ``from`` is captured in poller and webhook, whitelisted to
identity fields by ``_normalize_issuer`` (drops ``language_code`` and
any future PII), persisted under ``details.issuer``.
- ``/api/status`` resolves live ``CommandTracker`` / ``TelegramBot``
names (mirroring the action pattern) and exposes ``tracker_id``,
``command_tracker_id``, ``telegram_bot_id`` so the frontend can
deep-link.
Frontend
- Event rows are now clickable and open a detail modal with full
provenance (bot → chat → issuer → provider), raw ``details`` JSON,
and per-entity action buttons.
- Buttons use the existing ``requestHighlight`` + ``goto`` crosslink
pattern, so clicking lands on the entity's list page with that
specific card scrolled into view and pulsing.
- Auto-refresh dropdown (Off / 10s / 30s / 1m / 5m) persisted in
``localStorage``; ticker pauses while the tab is hidden.
- Event-type filter, dashboard verb labels, and gradients extended for
the three new ``command_*`` types.
- Filled in pre-existing missing i18n keys (``common.hide`` /
``common.show`` / ``commandConfig.noCommandsForProvider``).
Tests
- New ``test_command_event_logging.py`` covers subject formatting,
issuer normalization, the three event branches, and graceful failure
when the DB is unreachable. ``pytest packages/server/tests/`` → 96/96.
Notifications:
- Add shared http_base, redact, and SSRF hardening modules
- Refactor dispatcher, queue, receiver and per-provider clients
(telegram, discord, email, matrix, ntfy, slack, webhook) to use
the shared base, with bounded queue and redacted error logs
- Tests for ssrf, redact, http_base, queue bounds, dispatcher
aggregation, telegram media partition, email and matrix clients
Frontend:
- Settings: log level / log format selectors now use IconGridSelect
with per-option icons and i18n descriptions
- Minor providers page and entity-cache store updates
Tooling:
- Document code-review-graph MCP usage in CLAUDE.md
- Ignore .code-review-graph/, register .mcp.json
chat_action was stored in two places — the model column and config JSON —
and dispatch_helpers unconditionally overrode the config value with the
column. The frontend only ever wrote the JSON path, so the UI choice
silently had no effect on outgoing chat actions.
Make the column the single source of truth: frontend sends chat_action
top-level, dispatch_helpers reads from the column, and a one-time
backfill migrates existing config values to the column and strips the
legacy key.
Also fix a long-standing race where the keepalive's bare sleep(4) +
finally cancel could fire one last sendChatAction after the response
already arrived, leaving a phantom indicator for ~5s. Replace with a
stop event + wait_for so callers can signal stop cleanly via the new
stop_keepalive helper.
- Gitea: NotificationTracker now exposes sender allowlist / blocklist filters
via MultiEntitySelect, populated from Gitea /users/search merged with past
EventLog senders so the picker is useful before the first webhook arrives.
- Webhook providers (gitea, planka, webhook): stop scheduling interval polling
jobs on tracker create/update/startup; hide the "every Xs" indicator in the
tracker list since there is no polling.
- Dashboard: stat cards are now <a> links that route to providers, trackers,
targets, command-trackers, or scroll to the events panel. Provider deck
rows highlight the target provider on click.
- Command trackers / command configs: auto-reselect the right config when the
provider type changes (matches notification-tracker behavior).
- Migration: drop legacy batch_duration column from notification_tracker —
the field is gone from the model but its NOT NULL constraint blocked
inserts on older DBs.
- Docs: refresh entity-relationships.md with current NotificationTracker
fields (filters, adaptive_max_skip, default_*_config_id).
Two related Telegram changes:
1. Per-chat command localization. setMyCommands now accepts a scope
(BotCommandScopeChat) and deleteMyCommands clears scoped bindings.
Command registration runs three tiers: default → per-language
(Telegram client language) → per-chat (UI override). Saving a
chat's language_override or commands_enabled toggle pushes the
binding to Telegram inline rather than waiting on the 30s
debounced bot-wide sync.
2. Unified Telegram locale resolution. Three test paths (bot test_chat,
target receiver test, target-level fan-out) used to disagree on
locale priority — the target receiver test in particular only
consulted receiver.locale and ignored the chat's language_override.
Introduced pick_telegram_locale (pure) and
resolve_telegram_chat_locale (async DB lookup) in services/notifier
so all three paths share one priority order:
receiver.locale → chat.language_override → chat.language_code → fallback
Fan-out keeps batch-loading TelegramChat rows for efficiency, just
runs them through the same priority function now.
Display filters (Immich tracking config):
- favorites_only drops events with no favorited new assets, or filters
added_assets to favorites only
- assets_order_by/assets_order sort the rendered list
(date / name / rating / random / none)
- max_assets_to_show caps rendered+attached media (default 5 -> 10)
- include_tags strips people from event extras and tags from each asset
- include_asset_details strips city/country/state/lat/lon/is_favorite/
rating/description; load-bearing fields (thumbhash, file_size,
playback_size, cache keys) preserved
- New apply_tracking_display_filters helper in dispatch_helpers; wired
into watcher, webhooks, scheduled/periodic/memory, and manual
test-dispatch
- Targets sharing a TrackingConfig dispatch together; targets with
different TCs each see their own shaped event
Adaptive polling:
- Replace NotificationTracker.batch_duration with adaptive_max_skip
- Per-tracker opt-in: NULL/0 disables back-off (every tick runs);
positive N caps the skip factor at (N-1)-in-N after long idle
- Scheduler caches the cap in module state for the tick fast-path
- Migration adds the new column; API schemas/responses, frontend types,
i18n, and the tracker form updated to match
Dispatch: honor {kind}_collection_mode on TrackingConfig — "per_collection"
fans out one event per album; "combined" pools assets as before. Extract
build_immich_dispatch_events shared by cron and test paths.
Assets: collect_scheduled_assets attaches album_name/album_url/album_public_url
to each asset so combined-mode templates can attribute rows to their source
album. Default scheduled_assets templates render a multi-album header with
inline album list and per-row album link; memory_mode follows the same pattern.
UI: "Reset to default" buttons on notification and command template slots
(per-slot and whole-template), backed by new GET /*-template-configs/defaults
endpoints. tracking-configs "Preview template" now opens an inline preview
modal with locale tabs instead of navigating away; Edit button deep-links
with ?edit_slot=<name> so the destination auto-opens the config and scrolls
to the slot. Reset confirmations use ConfirmModal instead of window.confirm.
Fixes:
* NotificationDispatcher._session_ctx infinite recursion when no shared
aiohttp.ClientSession was passed — broke test dispatch for periodic/
scheduled/memory (cron path was unaffected).
* telegram-bots /chats/{id}/test now resolves chat.language_override /
language_code instead of using the raw ?locale query param, matching
the resolution the tracker-target test endpoint already used.
* scheduled_assets default template no longer emits a blank line between
header and the first asset when the multi-album branch is taken.
Introduce a third update_mode option alongside polling/webhook. 'none'
disables both polling and webhook delivery — useful when another instance
owns the listener or when the bot is send-only. Switching into 'none' now
unschedules polling and unregisters any active webhook so Telegram stops
delivering updates.
New bots default to 'none' (safer when multiple bridges share a token).
Existing bots upgraded from a pre-update_mode schema keep 'polling' so
their behavior is unchanged.
The scheduled_enabled / scheduled_times (and the periodic / memory
counterparts) on TrackingConfig had been wired into the model, the
API, and the test-dispatch path — but no production scheduler ever
read them, so users saw the slot in the UI and only ever got fires
through "Test". This adds the missing cron jobs and the dispatch
fan-out, both keyed off the app-level IANA timezone.
* services/scheduled_dispatch.py — production fan-out reusing the
test-path event builders, picking the slot template per kind, and
writing an EventLog row per fire so the dashboard reflects it.
* services/scheduler.py — _load_immich_dispatch_jobs builds one
CronTrigger per (tracker, kind, HH:MM) from the tracker's default
TrackingConfig; reschedule_immich_dispatch_jobs rebuilds them all
on any relevant CRUD or timezone change.
* tracker / link / tracking-config CRUD endpoints now invalidate.
Also: skip dispatch when scheduled/memory yield zero matching assets
(prevents header-only "On this day:" spam), and update the EN/RU
default scheduled_assets templates to surface that the delivery is
a scheduled random selection.
The static-adapter build emits an inline <script> with the hydration
payload; ``script-src 'self'`` alone blocks the SPA from starting
(browser error: "Executing inline script violates the following Content
Security Policy directive").
Mirrors the 'unsafe-inline' already present for style-src. Primary XSS
protection still comes from Svelte's auto-escaping and
frontend/src/lib/sanitize.ts for the {@html} paths that render user
content. CSP still blocks remote scripts (no https: in script-src),
framing (frame-ancestors 'none'), base-uri hijacking, and form
exfiltration.
Take a consistent, atomic copy of the DB at lifespan startup BEFORE
migrations run, so a botched future upgrade is recoverable by restoring
a single file instead of a data-loss incident.
Uses SQLite's VACUUM INTO — safe under WAL, cannot tear against
concurrent writes. Best-effort: failures are logged, never raised —
the main DB remains the source of truth.
Configurable via NOTIFY_BRIDGE_PRE_MIGRATE_SNAPSHOT_KEEP (default 5;
0 disables). Snapshots land in ``data_dir/backups/pre-migrate-<ts>.db``
and the N oldest are pruned each boot.
Security
- SSRF: async DNS resolver; allow_redirects=False on all outbound clients;
matrix homeserver_url validated on create/update/test; update_provider
and email_bot merge incoming config and reject ***-masked secrets.
- Auth: bcrypt offloaded to asyncio.to_thread; JWT now carries iss/aud +
leeway and rejects missing claims; setup TOCTOU closed inside a
transaction; rate limits extended (default 600/min, 10/min on password
change, 30/min on needs-setup); constant-time login to prevent username
enumeration.
- Config: rejects known dev secret keys; validates CORS origin schemes,
port range, token lifetimes.
- Webhook handlers stream-read body with a 1 MiB cap; Discord 429 retries
bounded (3 attempts, Retry-After capped at 60 s).
- CSP + HSTS added to SecurityHeadersMiddleware.
Async / runtime
- SQLite engine: WAL, synchronous=NORMAL, foreign_keys=ON, busy_timeout,
pool_pre_ping, dispose on shutdown.
- Lifespan shutdown now stops scheduler before closing HTTP session and
disposing the engine.
- Shared aiohttp session locked against concurrent first-caller races;
core NotificationDispatcher accepts and reuses it.
- Storage and scheduled backup writes wrapped in asyncio.to_thread.
- NUT client writes bounded by asyncio.wait_for.
- Telegram poller switched from 3 s short-poll to 30 s interval + 25 s
long-poll (~10x fewer API calls).
Database
- New performance-indexes migration covers every FK/owner column and
hot-path composite (notification_tracker(provider_id, enabled);
event_log(user_id, created_at DESC); webhook_payload_log(provider_id,
created_at DESC); action_execution(action_id, started_at DESC)).
- New schema_version table for future upgrade gating.
- __system__ placeholder user (id=0) seeded so user_id=0 system defaults
satisfy the newly enforced FK; filtered out of /auth/needs-setup,
/api/users, and setup.
- list_notification_trackers rewritten to batched loads (was 1+N+N*M).
- Retention job extended to event_log, webhook_payload_log, and
action_execution; retention days exposed as a setting.
Scheduler
- AsyncIOScheduler job_defaults: coalesce, misfire_grace_time=300,
max_instances=1.
Ops
- uvicorn runs with proxy_headers, forwarded_allow_ips,
timeout_graceful_shutdown; access log suppressed in non-debug.
- FastAPI version string now reads from importlib.metadata.
- New /api/ready endpoint separate from /api/health.
- docker-compose drops the ALLOW_PRIVATE_URLS=1 default, adds mem/cpu/pid
limits, read_only + tmpfs, cap_drop:ALL, no-new-privileges; healthcheck
targets /api/ready.
- CI now runs on push/PR with backend pytest, frontend svelte-check +
build, and a non-push image build; release workflow gated on tests,
publishes immutable sha-<commit> image tag, adds Trivy scan.
Tests
- New packages/server/tests/ with 29 passing tests: config validation,
JWT round-trip + aud/alg=none rejection, SSRF scheme and private-range
enforcement (sync + async), Discord bounded retry, and a lifespan-level
/api/health + /api/ready smoke check.
- Renamed the misnamed services/test_dispatch.py to manual_dispatch.py so
pytest never auto-collects production code.
Frontend
- /login now redirects already-authenticated users to /, shows a distinct
'backend unreachable' banner (en/ru) when /auth/needs-setup fails.
Boot-time logging was a three-line basicConfig stub with no timestamps, no
correlation, and silent drops at every layer of the Telegram send path — a
/random command that delivered text but no media left zero evidence in the
log. This replaces the setup and closes every silent drop encountered end-to-end.
New infrastructure:
- notify_bridge_core.log_context: request_id/command/chat_id/bot_id/dispatch_id
ContextVars with a bind_log_context() context manager so deep call sites
(TelegramClient, NotificationDispatcher) inherit the correlation tag without
threading args through.
- notify_bridge_server.logging_setup: dictConfig-based setup with a
LogRecordFactory that tags every record, a SecretMaskingFilter that redacts
/botN:TOKEN plus Authorization/x-api-key/password/secret in messages AND
tracebacks, a JSON formatter for aggregators, text formatter with grep-friendly
[req=... cmd=... bot=... chat=... disp=...] prefix, and default dampening
for sqlalchemy/aiohttp/apscheduler/urllib3/PIL.
Runtime control:
- NOTIFY_BRIDGE_LOG_LEVEL / _FORMAT / _LEVELS env vars (boot).
- DB-backed log_level / log_format / log_levels AppSettings, applied on
boot after migrations and live via apply_log_levels() when edited in
the settings UI (format still requires restart, logs a WARN).
- Frontend settings page gains a Logging card (level dropdown, format
dropdown, per-module overrides); en/ru i18n keys added.
Call-site fixes (/random media-group blind spot and adjacent):
- TelegramClient._fetch_asset: every silent drop now WARN-logs with reason
(missing url, HTTP non-200, size/dimension limits, ClientError).
- TelegramClient._send_media_group: WARN on "chunk had N items but 0 usable",
ERROR on sendMediaGroup non-ok/transport with full context; returns
success=False + "no_items_delivered" instead of success=True with an empty
message_ids list so callers can distinguish.
- TelegramClient.send_message / _upload_media / _send_from_cache: ERROR on
non-ok + transport failures with status/code/desc; DEBUG for cache-hit
fallbacks.
- NotificationDispatcher.dispatch: generates a dispatch_id, binds it, logs
start/finish with failure count, uses exc_info for target failures.
- commands/handler: missing/failed templates -> ERROR + exc_info; send_reply
and send_media_group errors upgraded WARNING -> ERROR with chat/error_code
context; rate-limit and truncation cases logged with full context.
- commands/webhook and services/telegram_poller: bind_log_context(request_id
=tg:<update_id>, command, chat_id, bot_id), INFO on receive/dispatch/
completion with duration, exc_info on raise, INFO when commands disabled.
- commands/immich: INFO when album scope is empty; WARN per asset dropped
from media payload and a summary WARN when "N assets in, 0 out".
CronTrigger.from_crontab was constructed without a timezone, so a cron like
'0 9 * * *' fired at 09:00 host-local instead of 09:00 in the admin-configured
timezone. Now all tracker/action cron triggers are built with the app tz, and
the setting endpoint rebuilds existing cron jobs when the tz changes (since
CronTrigger freezes its tz at construction time).
The scheduler provider also renders current_date/time/datetime/weekday in the
configured tz and exposes a new 'timezone' template variable.
EventLog entries for scheduled_message now include schedule_type,
cron_expression/interval_seconds, timezone, and fire_count, and the dashboard
shows the event type with a label/icon/color.
Bot commands like /random, /latest, /memory refetch the same albums in
quick succession; the GET /api/albums/{id} response can be tens of MB on
large albums, and /api/shared-links has no per-album filter so every
get_shared_links call was already paying for the full server-wide list.
- Module-level 60s TTL cache for album bodies, keyed by
(server_digest, album_id), 32-entry FIFO cap. Module-scoped (not
instance-scoped) because ImmichClient is constructed fresh per request
in several places, so an instance cache would never survive a second
caller. Mirrors the existing _users_cache pattern.
- Module-level 60s TTL cache for the bucketed shared-links map, keyed by
server_digest. get_shared_links(album_id) now delegates to a single
server-wide fetch that serves every album.
- server_digest hashes url+api_key so raw creds don't sit in dict keys.
- get_album(use_cache=False) escape hatch for paths that must observe
current server state — wired into ImmichActionExecutor.execute (diffs
the album to decide what to add) and ImmichServiceProvider.poll's
full-fetch path (stale data would silently delay removal events).
- Async locks guard cache writes with under-lock re-check so concurrent
misses collapse to one fetch.
Slow bot commands (/latest, /random, /favorites, /memory, /search,
/find, /person, /place, /summary) spend most of their wall time
fetching assets from the service provider, not uploading to Telegram.
Telegram chat actions expire after ~5s, so the previous one-shot hint
vanished long before media arrived — users saw nothing happening.
- TelegramClient.start_chat_action_keepalive: promoted from private
helper to public API, posts the action every 4s until cancelled.
- telegram_send.telegram_chat_action: async context manager that
starts the keep-alive task on enter and cancels + awaits it on
exit. A None action makes it a no-op so callers don't branch.
- classify_command_chat_action: maps command name to the right
Telegram action (upload_photo for media-returning commands, typing
for /summary, None for fast DB-only commands like /status /events).
- webhook.py + telegram_poller.py: wrap handle_command in the context
manager so the hint persists through the whole fetch+upload window
in both webhook and long-poll modes.
Optimizes polling for large Immich albums (tested path targets ~200k
assets). Combined impact on idle albums drops per-tick cost from ~150 MB
fetch to ~few hundred bytes; active albums fetch O(changes) instead of
O(library).
Core changes
- ImmichAlbumMeta + get_album_meta() using ?withoutAssets=true as a
cheap change-detection probe.
- poll() fast-path: skip full fetch when meta fingerprint matches and
no pending assets are outstanding.
- poll() delta-path: search/metadata with updatedAfter when fingerprint
changed, falling back to full fetch on count decrease or mixed
add+remove that delta can't reconcile.
- asyncio.gather over meta probes so a 20-album tracker pays one
round-trip of latency instead of 20.
- Event payload cap (50 added / 200 removed) so a bulk import can't
explode a Jinja template or exceed Telegram's message limits.
- Module-level users cache (1h TTL, sha256-keyed) shared across
providers on the same Immich server.
- Tick-scoped shared-links cache via new
get_all_shared_links_by_album() — one /api/shared-links request per
tick instead of one per changed album.
Server changes
- meta_fingerprint JSON column on NotificationTrackerState + migration.
- watcher skips the asset_ids DB rewrite when the fingerprint didn't
change, avoiding ~8 MB JSON writes on idle ticks for huge albums.
- Adaptive polling: after 10 empty ticks skip 1-in-2, after 30 skip
1-in-4, reset on first detected change; resets on schedule changes.
- APScheduler jitter (interval/4, capped at 30s) to smooth thundering-
herd bursts when many trackers share the same scan_interval.
common._format_assets was passing cache_key=<bare asset UUID>, but the
notification dispatcher writes keys as <host>:<uuid> (derived from the
URL by extract_asset_id_from_url). Result: the two paths populated
different keys for the same asset, so neither could hit the other's
cached file_id and the WebUI stats only ever reflected the notification
side.
Drop the explicit cache_key — TelegramClient derives <host>:<uuid> from
the URL, identical to the notification path, so one file_id cached by
any dispatch or /random / /latest reply is reused by every later send.