From 10d30fc9569d367779e975ae66af6ccf75f3020b Mon Sep 17 00:00:00 2001
From: "alexei.dolgolyov" <dolgolyov.alexei@gmail.com>
Date: Sat, 16 May 2026 02:16:49 +0300
Subject: [PATCH] =?UTF-8?q?feat:=20production=20readiness=20=E2=80=94=20se?=
 =?UTF-8?q?curity,=20perf,=20bug=20fixes,=20bridge=20self-monitoring?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Comprehensive multi-area pass driven by a parallel 8-agent production
review. Frontend, backend, database, security, performance, operational,
plus a new self-monitoring feature.

## Critical fixes
- Planka webhook: reads bounded raw body (was NameError on every call)
- HA quiet hours: ha_state_changed/automation_triggered/service_called/
  event_fired added to deferrable set (were silently dropped)
- DNS-rebinding SSRF: PinnedResolver wired into shared aiohttp session
- Telegram inbound webhook: secret now mandatory (401 without)
- Generic webhook: auth_mode="none" requires explicit
  acknowledge_unauthenticated=true; per-IP rate limit 60/min
- svelte-check: 5 null-narrowing errors in EventDetailModal fixed
- Provider hardcoding: Immich-only block extracted to descriptor
  featureDiscoveryHint
- command_sync: snapshot+expunge bot before exiting AsyncSession

## Bug fixes
- notifier asyncio.gather(return_exceptions=True) — one bad chat no longer
  cancels peer sends
- NotificationDispatcher hoisted out of per-tracker loop
- Provider credential resolution unified across all 5 dispatch sites
- HA asyncio.shield now drains inner task on cancellation
- Provider construction switched from if/elif ladder to factory registry
- NUT first poll seeds silently (no spurious ups_on_battery)
- Quiet-hours gate: event-type-disabled now wins over deferral
- APScheduler drain job ID resolution upgraded to seconds
- HA on_status_change wired through to EventLog
- Webhook payload rollback failures now logged (not swallowed)
- Batched receivers/chats/bots in load_link_data (was per-target N+1)
- flag_modified on JSON column reassignments in deferred_dispatch

## Database
- UNIQUE indexes on service_provider.webhook_token,
  telegram_bot.webhook_path_id, partial UNIQUE on telegram_bot.bot_id,
  telegram_chat(bot_id, chat_id), notification_tracker_target unique link,
  partial UNIQUE on bridge_self provider per user
- Composite ix_event_log_user_event_type_created index
- save_chat_from_webhook switched to ON CONFLICT DO UPDATE
- ondelete=CASCADE on user-id FKs (model annotation; app-side cascade
  delete added for existing data)
- delete_notification_tracker converted from N+1 to bulk DELETE/UPDATE
- Module-level asyncio.Lock replaced with lazy _get_lock() pattern
- VACUUM INTO snapshot now PRAGMA integrity_check verified

## Performance
- Jinja2 template compilation LRU cached (lru_cache maxsize=512)
- Per-locale render cache in NotificationDispatcher (skips re-rendering
  identical content for receivers sharing a locale)
- Tracker list cached per provider_id with 5s TTL + explicit invalidation
  on tracker CRUD (relieves HA chat-bus rate query pressure)
- Nav-counts collapsed from 16 round-trips to single UNION ALL
- HA event_log: skip persisting empty assets_added/removed events

## Security hardening
- Mass-assignment guard on Action create/update; cron sub-minute reject
- Backup JSON depth/node-count cap (depth ≤ 10, nodes ≤ 100k)
- _sanitize_config extended to all JSON-typed fields on backup import
- Telegram _safe_get walks redirects manually with SSRF revalidation
- Bcrypt 72-byte password length cap with clear 422
- Webhook payload body redaction; sensitive substring set extended with
  oauth/client_secret/webhook_secret/csrf in both header filter and
  template extras filter

## Frontend
- 76 catch (err: any) sites converted to errMsg(err) helper
- globalProviderFilter: pure getter; reconciliation moved to one-time
  $effect in +layout
- Provider-filter binding: removed paired $effects + _syncingFilter flag,
  now one-way derived
- entity-cache: separate _refreshing flag for background re-fetches
- api.ts 401 handling: AuthRedirectError class + dedup _redirecting flag,
  goto() instead of window.location.href
- a11y: aria-expanded on mobile More, role=switch + aria-checked on
  Telegram bot toggles

## Tests & operations
- CI pytest gate added to .gitea/workflows/build.yml + release.yml
  (wheel-built install to dodge editable-install slowness)
- /api/ready upgraded to deep healthcheck (db SELECT 1, scheduler.running,
  HA supervisor presence) returning {ready, checks, errors, version}
- /api/metrics endpoint with prometheus_client (deferred_pending,
  event_log_total, dispatch_duration, poll_failures, send_failures)
- New OPERATIONS.md covering deploy, healthchecks, metrics, backup/restore
  procedures, log handling, common scenarios, upgrade flow
- New tests: test_bridge_self (11), test_gitea_parser (9),
  test_planka_parser (6), test_immich_change_detector (6),
  test_backup_roundtrip (1)

## New feature: bridge self-monitoring
- New bridge_self provider type — internal sink for bridge health events
- Three event types: bridge_self_poll_failures (consecutive tracker poll
  failures), bridge_self_deferred_backlog (pending count crosses
  threshold), bridge_self_target_failures (consecutive 5xx/network
  failures per target)
- Per-user thresholds (defaults: 3 / 100 / 5) configurable via the
  provider config form
- Auto-seeded on user create + /setup + boot backfill for existing users
- Anti-spam: counters reset after emission; backlog uses transition latch
- Self-loop guard: bridge_self failures don't count toward target-failure
  thresholds (logged only) — wire to your own Telegram/Email/Matrix to
  get notified when polls/dispatches/sends fail
- 6 default templates (3 events × 2 locales), tracking config columns
  with backfill migration, frontend descriptor (excluded from "create
  provider" wizard since auto-managed)

Operator-visible behavior changes (call out in release notes):
- NOTIFY_BRIDGE_TELEGRAM_WEBHOOK_SECRET now REQUIRED for webhook mode
- Existing webhook providers with auth_mode="none" need explicit opt-in
- Generic webhook endpoint rate-limited 60/min per source IP
- HA disconnect/reconnect writes ha_status_* EventLog rows
- Every user gets a bridge_self provider — wire it to a target to
  receive failure alerts

Pre-existing test failures (test_ssrf, test_release_provider) on
Python 3.13 are unrelated; CI runs on 3.12.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .gitea/workflows/build.yml                    |  42 +-
 .gitea/workflows/release.yml                  |  33 ++
 OPERATIONS.md                                 | 394 +++++++++++++++
 frontend/src/lib/api.ts                       |  48 +-
 .../lib/components/EventDetailModal.svelte    |  20 +-
 .../src/lib/components/IconGridSelect.svelte  |   9 +
 frontend/src/lib/grid-items.ts                |  15 +-
 frontend/src/lib/i18n/en.json                 |  14 +-
 frontend/src/lib/i18n/ru.json                 |  14 +-
 frontend/src/lib/providers/bridge-self.ts     |  98 ++++
 frontend/src/lib/providers/immich.ts          |  11 +
 frontend/src/lib/providers/index.ts           |   2 +
 frontend/src/lib/providers/types.ts           |  16 +
 .../src/lib/stores/entity-cache.svelte.ts     |  20 +-
 .../src/lib/stores/provider-filter.svelte.ts  |  34 +-
 frontend/src/routes/+layout.svelte            |  55 ++-
 frontend/src/routes/actions/+page.svelte      |  12 +-
 frontend/src/routes/actions/RuleEditor.svelte |  12 +-
 frontend/src/routes/bots/+page.svelte         |   4 +-
 frontend/src/routes/bots/EmailBotTab.svelte   |  10 +-
 frontend/src/routes/bots/MatrixBotTab.svelte  |  10 +-
 .../src/routes/bots/TelegramBotTab.svelte     |  46 +-
 .../src/routes/command-configs/+page.svelte   |  12 +-
 .../command-template-configs/+page.svelte     |  40 +-
 .../src/routes/command-trackers/+page.svelte  |  20 +-
 frontend/src/routes/login/+page.svelte        |   8 +-
 .../routes/notification-trackers/+page.svelte |  44 +-
 .../SharedLinkModal.svelte                    |   8 +-
 .../notification-trackers/TrackerForm.svelte  |  69 ++-
 frontend/src/routes/providers/+page.svelte    |  20 +-
 .../src/routes/settings/backup/+page.svelte   |  47 +-
 frontend/src/routes/setup/+page.svelte        |   3 +-
 frontend/src/routes/targets/+page.svelte      |  64 +--
 .../src/routes/template-configs/+page.svelte  |  22 +-
 .../src/routes/tracking-configs/+page.svelte  |  50 +-
 frontend/src/routes/users/+page.svelte        | 452 +++++++++---------
 .../src/notify_bridge_core/models/events.py   |   6 +
 .../notifications/dispatcher.py               |  59 ++-
 .../notifications/telegram/client.py          |  47 +-
 .../src/notify_bridge_core/providers/base.py  |   1 +
 .../providers/bridge_self/__init__.py         |  39 ++
 .../providers/bridge_self/event_parser.py     |  89 ++++
 .../providers/bridge_self/provider.py         | 148 ++++++
 .../providers/capabilities.py                 |  34 ++
 .../providers/home_assistant/provider.py      |  15 +-
 .../providers/immich/provider.py              |  15 +-
 .../providers/nut/provider.py                 |  18 +
 .../notify_bridge_core/templates/context.py   |   4 +
 .../en/bridge_self_deferred_backlog.jinja2    |   6 +
 .../en/bridge_self_poll_failures.jinja2       |   6 +
 .../en/bridge_self_target_failures.jinja2     |   6 +
 .../templates/defaults/loader.py              |   5 +
 .../ru/bridge_self_deferred_backlog.jinja2    |   6 +
 .../ru/bridge_self_poll_failures.jinja2       |   6 +
 .../ru/bridge_self_target_failures.jinja2     |   6 +
 .../notify_bridge_core/templates/renderer.py  |  16 +-
 .../notify_bridge_core/templates/validator.py |   3 +
 packages/server/pyproject.toml                |   1 +
 .../src/notify_bridge_server/api/actions.py   |  69 ++-
 .../src/notify_bridge_server/api/backup.py    |  40 ++
 .../api/command_template_configs.py           |  11 +-
 .../api/command_trackers.py                   |  21 +-
 .../src/notify_bridge_server/api/metrics.py   | 161 +++++++
 .../api/notification_trackers.py              |  49 +-
 .../src/notify_bridge_server/api/providers.py |  27 +-
 .../src/notify_bridge_server/api/status.py    |  82 ++--
 .../api/template_configs.py                   |  28 ++
 .../src/notify_bridge_server/api/users.py     | 179 ++++++-
 .../src/notify_bridge_server/api/webhooks.py  |  48 +-
 .../src/notify_bridge_server/auth/routes.py   |  37 +-
 .../notify_bridge_server/commands/webhook.py  |  31 +-
 .../server/src/notify_bridge_server/config.py |   7 +
 .../database/migrations.py                    | 285 +++++++++++
 .../notify_bridge_server/database/models.py   | 166 ++++++-
 .../notify_bridge_server/database/seeds.py    |  72 +++
 .../server/src/notify_bridge_server/main.py   | 115 ++++-
 .../services/backup_service.py                |  36 +-
 .../services/bridge_self.py                   | 432 +++++++++++++++++
 .../services/command_sync.py                  |   9 +-
 .../services/deferred_dispatch.py             |  16 +-
 .../services/dispatch_helpers.py              | 183 ++++++-
 .../services/event_dispatch.py                | 138 +++++-
 .../services/ha_subscription.py               | 107 ++++-
 .../services/http_session.py                  |  72 ++-
 .../services/manual_dispatch.py               |   4 +-
 .../notify_bridge_server/services/notifier.py |  16 +-
 .../services/sample_context.py                |   8 +
 .../services/scheduled_dispatch.py            |   3 +-
 .../services/scheduler.py                     |  53 +-
 .../notify_bridge_server/services/telegram.py |  62 ++-
 .../notify_bridge_server/services/watcher.py  | 267 ++++++++---
 .../server/tests/test_backup_roundtrip.py     | 268 +++++++++++
 packages/server/tests/test_bridge_self.py     | 265 ++++++++++
 packages/server/tests/test_gitea_parser.py    | 249 ++++++++++
 packages/server/tests/test_health.py          |   8 +-
 .../tests/test_immich_change_detector.py      | 159 ++++++
 packages/server/tests/test_planka_parser.py   | 147 ++++++
 97 files changed, 5423 insertions(+), 821 deletions(-)
 create mode 100644 OPERATIONS.md
 create mode 100644 frontend/src/lib/providers/bridge-self.ts
 create mode 100644 packages/core/src/notify_bridge_core/providers/bridge_self/__init__.py
 create mode 100644 packages/core/src/notify_bridge_core/providers/bridge_self/event_parser.py
 create mode 100644 packages/core/src/notify_bridge_core/providers/bridge_self/provider.py
 create mode 100644 packages/core/src/notify_bridge_core/templates/defaults/en/bridge_self_deferred_backlog.jinja2
 create mode 100644 packages/core/src/notify_bridge_core/templates/defaults/en/bridge_self_poll_failures.jinja2
 create mode 100644 packages/core/src/notify_bridge_core/templates/defaults/en/bridge_self_target_failures.jinja2
 create mode 100644 packages/core/src/notify_bridge_core/templates/defaults/ru/bridge_self_deferred_backlog.jinja2
 create mode 100644 packages/core/src/notify_bridge_core/templates/defaults/ru/bridge_self_poll_failures.jinja2
 create mode 100644 packages/core/src/notify_bridge_core/templates/defaults/ru/bridge_self_target_failures.jinja2
 create mode 100644 packages/server/src/notify_bridge_server/api/metrics.py
 create mode 100644 packages/server/src/notify_bridge_server/services/bridge_self.py
 create mode 100644 packages/server/tests/test_backup_roundtrip.py
 create mode 100644 packages/server/tests/test_bridge_self.py
 create mode 100644 packages/server/tests/test_gitea_parser.py
 create mode 100644 packages/server/tests/test_immich_change_detector.py
 create mode 100644 packages/server/tests/test_planka_parser.py

diff --git a/.gitea/workflows/build.yml b/.gitea/workflows/build.yml
index a8a5ab7..003cec5 100644
--- a/.gitea/workflows/build.yml
+++ b/.gitea/workflows/build.yml
@@ -29,16 +29,54 @@ jobs:
       - name: Svelte check
         run: |
           cd frontend
-          npm run check || echo "::warning::svelte-check reported warnings"
+          npm run check
 
       - name: Build
         run: |
           cd frontend
           npm run build
 
+  test-backend:
+    if: ${{ !startsWith(gitea.event.head_commit.message, 'chore: release v') }}
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.12"
+
+      # Editable installs of packages/core + packages/server are extremely slow
+      # on the hosted runner — measured 4-6x slower than building wheels first
+      # because hatchling's editable hook re-resolves on every collection. We
+      # build wheels once, then install them (and only the test deps) into a
+      # plain venv. The wheels themselves are NOT cached because their hashes
+      # depend on every file under packages/ — invalidates on basically every
+      # PR. Pip's HTTP cache for the test deps is enough.
+      - name: Build wheels
+        run: |
+          python -m pip install --upgrade pip build
+          mkdir -p /tmp/wheels
+          pip wheel --no-deps -w /tmp/wheels packages/core packages/server
+
+      - name: Install backend + test deps
+        run: |
+          pip install /tmp/wheels/*.whl pytest pytest-asyncio httpx aioresponses prometheus_client
+
+      - name: Run pytest
+        env:
+          NOTIFY_BRIDGE_DATA_DIR: /tmp/nb-test-data
+          NOTIFY_BRIDGE_SECRET_KEY: ci-secret-key-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+          NOTIFY_BRIDGE_DEBUG: "false"
+          NOTIFY_BRIDGE_CORS_ALLOWED_ORIGINS: "http://localhost:8420"
+        run: |
+          cd packages/server
+          pytest tests --tb=short
+
   build-image:
     if: ${{ !startsWith(gitea.event.head_commit.message, 'chore: release v') }}
-    needs: [test-frontend]
+    needs: [test-frontend, test-backend]
     runs-on: ubuntu-latest
     steps:
       - uses: actions/checkout@v4
diff --git a/.gitea/workflows/release.yml b/.gitea/workflows/release.yml
index 6e3dbea..527136d 100644
--- a/.gitea/workflows/release.yml
+++ b/.gitea/workflows/release.yml
@@ -10,7 +10,40 @@ env:
   IMAGE_NAME: alexei.dolgolyov/notify-bridge
 
 jobs:
+  test-backend:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.12"
+
+      # Same wheel-first strategy as build.yml — editable install is too slow
+      # on the hosted runner.
+      - name: Build wheels
+        run: |
+          python -m pip install --upgrade pip build
+          mkdir -p /tmp/wheels
+          pip wheel --no-deps -w /tmp/wheels packages/core packages/server
+
+      - name: Install backend + test deps
+        run: |
+          pip install /tmp/wheels/*.whl pytest pytest-asyncio httpx aioresponses prometheus_client
+
+      - name: Run pytest
+        env:
+          NOTIFY_BRIDGE_DATA_DIR: /tmp/nb-test-data
+          NOTIFY_BRIDGE_SECRET_KEY: ci-secret-key-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+          NOTIFY_BRIDGE_DEBUG: "false"
+          NOTIFY_BRIDGE_CORS_ALLOWED_ORIGINS: "http://localhost:8420"
+        run: |
+          cd packages/server
+          pytest tests --tb=short
+
   release:
+    needs: [test-backend]
     runs-on: ubuntu-latest
     steps:
       - name: Checkout repo
diff --git a/OPERATIONS.md b/OPERATIONS.md
new file mode 100644
index 0000000..4001c31
--- /dev/null
+++ b/OPERATIONS.md
@@ -0,0 +1,394 @@
+# Operations Guide
+
+This document covers running, monitoring, and recovering Notify Bridge in
+production. The intended audience is the operator on call when the
+notifications stop firing or when a release upgrade goes sideways.
+
+For developer-focused docs (architecture, conventions, project layout) see
+`CLAUDE.md` and the `.claude/docs/` directory.
+
+## Deployment overview
+
+Notify Bridge ships as a single Docker image. All state lives in a single
+data directory mounted at `/data`.
+
+### Required environment variables
+
+| Variable | Default | Notes |
+| --- | --- | --- |
+| `NOTIFY_BRIDGE_SECRET_KEY` | _(none)_ | **Required.** 32+ random bytes. The server refuses to boot with the default placeholder or any of the known dev literals. |
+| `NOTIFY_BRIDGE_CORS_ALLOWED_ORIGINS` | `http://localhost:5175` | Comma-separated list. `*` is rejected because credentials are enabled. |
+| `NOTIFY_BRIDGE_FORWARDED_ALLOW_IPS` | `127.0.0.1` | Trusted proxy IPs whose `X-Forwarded-For` / `X-Forwarded-Proto` headers are honored. Set to your reverse-proxy IP. |
+
+### Useful environment variables
+
+| Variable | Default | Notes |
+| --- | --- | --- |
+| `NOTIFY_BRIDGE_DATA_DIR` | `/data` | Where the SQLite DB, snapshots, and backups live. |
+| `NOTIFY_BRIDGE_DATABASE_URL` | _(derived from data_dir)_ | Override only if you want a non-default DB path. |
+| `NOTIFY_BRIDGE_DEBUG` | `false` | Verbose logging + SQL echo. Do not enable in production. |
+| `NOTIFY_BRIDGE_LOG_FORMAT` | `text` | Set to `json` for one JSON object per line — pipe to a log aggregator. |
+| `NOTIFY_BRIDGE_LOG_LEVEL` | `INFO` | Root logger level. |
+| `NOTIFY_BRIDGE_LOG_LEVELS` | _(empty)_ | Per-module overrides, e.g. `sqlalchemy.engine=WARNING,notify_bridge_core.notifications.telegram.client=DEBUG`. |
+| `NOTIFY_BRIDGE_EVENT_LOG_RETENTION_DAYS` | `30` | Days of `event_log` history kept by the daily cleanup job. `0` disables retention. |
+| `NOTIFY_BRIDGE_PRE_MIGRATE_SNAPSHOT_KEEP` | `5` | Number of pre-migration DB snapshots retained. `0` disables snapshotting. |
+| `NOTIFY_BRIDGE_METRICS_ENABLED` | `true` | Expose `/api/metrics` for Prometheus. Set to `false` if the API port crosses a trust boundary. |
+| `NOTIFY_BRIDGE_GRACEFUL_SHUTDOWN_SECONDS` | `60` | SIGTERM grace period before in-flight requests are killed. |
+| `NOTIFY_BRIDGE_SUPERVISED` | _(auto)_ | Force the supervised flag for `apply-restart`. Use `true` when running under systemd/PM2 outside Docker. |
+
+### Data directory layout
+
+```
+/data/
+  notify_bridge.db          # main SQLite DB (WAL mode)
+  notify_bridge.db-wal      # SQLite write-ahead log
+  notify_bridge.db-shm      # SQLite shared memory file
+  backups/
+    pre-migrate-*.db        # automatic pre-upgrade snapshots
+    backup-*.json           # scheduled / manual config backups
+  snapshots/                # legacy alias retained for older deployments
+  pending_restore.json      # staged restore (consumed at next boot)
+  applied_restores/         # archive of applied restore payloads
+```
+
+Always mount `/data` on a persistent volume. The WAL files MUST live on the
+same filesystem as the main DB — never split them across mounts.
+
+### Docker example
+
+See `docker-compose.yml` at the repo root for the canonical reference. The
+container runs read-only with `tmpfs` for `/tmp`, drops all capabilities,
+and limits memory/CPU. The healthcheck targets `/api/ready` (deep) — see
+the next section.
+
+## Healthchecks
+
+Two endpoints, used for different probe types.
+
+### `GET /api/health` — liveness, shallow
+
+Returns `200 OK` once the ASGI app has started. Does not touch the DB or
+the scheduler. Use this for liveness probes that should only restart the
+process if it stops responding entirely.
+
+```json
+{"status": "ok", "version": "0.8.0"}
+```
+
+### `GET /api/ready` — readiness, deep
+
+Verifies that each critical dependency is reachable:
+
+* **db** — `SELECT 1` against the SQLAlchemy engine, 2-second timeout.
+* **scheduler** — APScheduler `running` flag.
+* **ha** — Home Assistant subscription supervisor task. Reported as
+  `na` when no HA providers are configured, `ok` when at least one
+  supervisor is alive, `degraded` otherwise. **Informational only** —
+  HA degradation does not flip readiness off.
+
+Returns `503` when any required check (db, scheduler) fails.
+
+```json
+{
+  "ready": true,
+  "checks": {"db": "ok", "scheduler": "ok", "ha": "na"},
+  "errors": [],
+  "version": "0.8.0"
+}
+```
+
+### Kubernetes probe example
+
+```yaml
+livenessProbe:
+  httpGet:
+    path: /api/health
+    port: 8420
+  initialDelaySeconds: 10
+  periodSeconds: 30
+  timeoutSeconds: 5
+  failureThreshold: 3
+
+readinessProbe:
+  httpGet:
+    path: /api/ready
+    port: 8420
+  initialDelaySeconds: 15
+  periodSeconds: 15
+  timeoutSeconds: 5
+  failureThreshold: 2
+```
+
+The Docker compose file uses `/api/ready` as its healthcheck so the
+container is only reported healthy after migrations finish.
+
+## Metrics
+
+Notify Bridge exposes Prometheus metrics at `GET /api/metrics` in the
+standard text exposition format. **No authentication** — Prometheus
+scrapers do not authenticate. Disable via `NOTIFY_BRIDGE_METRICS_ENABLED=false`
+when the API port is reachable beyond the trust boundary.
+
+### Prometheus scrape example
+
+```yaml
+scrape_configs:
+  - job_name: notify-bridge
+    metrics_path: /api/metrics
+    static_configs:
+      - targets: ['notify-bridge.internal:8420']
+    scrape_interval: 30s
+```
+
+### Available metrics
+
+| Metric | Type | Labels | Meaning |
+| --- | --- | --- | --- |
+| `notify_bridge_deferred_pending` | Gauge | _(none)_ | Pending rows in `deferred_dispatch`. Refreshed on each scrape. A persistent non-zero value usually means a tracker target is in extended quiet hours. |
+| `notify_bridge_event_log_total` | Counter | `status`, `event_type` | Events written to `event_log`. `status` is the dispatch outcome (`dispatched`, `dropped`, `deferred`, etc.). |
+| `notify_bridge_dispatch_duration_seconds` | Histogram | `channel` | Wall-clock duration of one outbound dispatch (Telegram, Discord, email, …). Useful for latency alerts. |
+| `notify_bridge_provider_poll_failures_total` | Counter | `provider_type` | Polling provider tick failures (Immich poll error, Gitea API down, …). Compare against expected scan interval to compute failure rate. |
+| `notify_bridge_target_send_failures_total` | Counter | `target_type`, `status_code` | Failed sends to a notification channel. `status_code` is the HTTP status (or `0` when no HTTP response was received). |
+
+The metrics module never imports `prometheus_client` outside `api/metrics.py`.
+Other modules record events through the `metrics` singleton — see that
+module's docstring before adding new collectors.
+
+## Backups
+
+Notify Bridge produces three different kinds of backup files. Know which
+one you are looking at before restoring.
+
+| Kind | Location | Format | Trigger |
+| --- | --- | --- | --- |
+| Config backup | `data/backups/backup-*.json` | JSON (BackupFile schema) | Manual via `/api/backup/files` POST or scheduled job |
+| Pre-migration snapshot | `data/backups/pre-migrate-*.db` | SQLite DB file | Automatic on every boot before migrations |
+| Pending restore | `data/pending_restore.json` | JSON | Staged via `/api/backup/prepare-restore`, consumed at next restart |
+
+Config backups capture user configuration (providers, trackers, targets,
+templates, …). They do **not** include `event_log`, `deferred_dispatch`,
+or any other operational table. Pre-migration snapshots are full DB
+copies and contain everything.
+
+### Manual backup
+
+The admin UI has a one-click button under Settings → Backup. Equivalent
+HTTP call:
+
+```bash
+curl -fsS -X POST \
+  -H "Authorization: Bearer $ADMIN_JWT" \
+  "https://notify-bridge.example.com/api/backup/files?secrets_mode=exclude"
+```
+
+The download endpoint produces a downloadable JSON envelope with no
+secrets unless `secrets_mode=include` is passed:
+
+```bash
+curl -fsS -X GET \
+  -H "Authorization: Bearer $ADMIN_JWT" \
+  -OJ "https://notify-bridge.example.com/api/backup/export?secrets_mode=exclude"
+```
+
+### Scheduled backup
+
+Configure under Settings → Backup or via `PUT /api/backup/scheduled` with:
+
+```json
+{
+  "backup_scheduled_enabled": "true",
+  "backup_scheduled_interval_hours": "24",
+  "backup_secrets_mode": "exclude",
+  "backup_retention_count": "5"
+}
+```
+
+Saved files land in `data/backups/`; retention prunes the oldest files
+beyond `backup_retention_count`. Backups can be downloaded individually:
+
+```bash
+curl -fsS -X GET \
+  -H "Authorization: Bearer $ADMIN_JWT" \
+  "https://notify-bridge.example.com/api/backup/files/backup-2026-05-16T12-00-00.json" \
+  -o backup-latest.json
+```
+
+### Cron snippet for off-host backup
+
+```bash
+# /etc/cron.d/notify-bridge-backup
+0 3 * * * www-data \
+  curl -fsS -X POST \
+    -H "Authorization: Bearer $(cat /etc/notify-bridge/admin.token)" \
+    "https://notify-bridge.example.com/api/backup/files?secrets_mode=exclude" \
+    -o /var/backups/notify-bridge/backup-$(date +\%F).json
+```
+
+### Restore procedure
+
+Restoring REPLACES configuration. Always export the current state first.
+
+```bash
+# 1. Stage the backup file (validates and writes to data/pending_restore.json)
+curl -fsS -X POST \
+  -H "Authorization: Bearer $ADMIN_JWT" \
+  -F "file=@backup-2026-05-16T12-00-00.json" \
+  "https://notify-bridge.example.com/api/backup/prepare-restore?conflict_mode=overwrite"
+
+# 2. Trigger graceful restart so startup applies the staged restore.
+#    Same-origin Origin/Referer is enforced — call from the admin UI when
+#    possible, or from the same host. Requires the supervisor to respawn
+#    the process (Docker restart policy, systemd, PM2, etc.).
+curl -fsS -X POST \
+  -H "Origin: https://notify-bridge.example.com" \
+  -H "Referer: https://notify-bridge.example.com/settings/backup" \
+  -H "Authorization: Bearer $ADMIN_JWT" \
+  "https://notify-bridge.example.com/api/backup/apply-restart"
+```
+
+If the process is **not** supervised, `/api/backup/apply-restart` returns
+`409`. Restart the backend manually after staging — startup applies the
+pending restore on the next boot.
+
+To cancel a staged restore before applying:
+
+```bash
+curl -fsS -X DELETE \
+  -H "Authorization: Bearer $ADMIN_JWT" \
+  "https://notify-bridge.example.com/api/backup/pending-restore"
+```
+
+### Recovery from a corrupted DB
+
+If migrations crash on boot or the DB file is unreadable, roll back to a
+pre-migration snapshot:
+
+```bash
+# Stop the backend, then
+cd /var/lib/docker/volumes/notify-bridge-data/_data
+ls -1t backups/pre-migrate-*.db | head -5      # pick the snapshot
+
+cp notify_bridge.db notify_bridge.db.broken    # keep the broken DB for forensics
+cp backups/pre-migrate-2026-05-16T11-58-30.db notify_bridge.db
+rm -f notify_bridge.db-wal notify_bridge.db-shm   # WAL belongs to the broken file
+```
+
+Restart the container. The startup snapshot will run again and capture
+the rolled-back state, so you have a clean recovery point if the next
+boot needs another rollback.
+
+## Logs
+
+* Output goes to **stderr only**. The Docker log driver captures it.
+* Set `NOTIFY_BRIDGE_LOG_FORMAT=json` for line-delimited JSON suitable
+  for Loki, ELK, or CloudWatch.
+* Secret values (bot tokens, API keys, passwords) are masked at the log
+  formatter level — see `notify_bridge_server.logging_setup`.
+* No file rotation is built in. Use the Docker JSON log driver's
+  `max-size`/`max-file` options or send logs to your aggregator.
+
+```yaml
+# docker-compose.yml snippet
+logging:
+  driver: json-file
+  options:
+    max-size: "10m"
+    max-file: "5"
+```
+
+## Common operational scenarios
+
+### "Notifications stopped firing"
+
+1. Hit `/api/ready`. If `scheduler` is `fail`, restart the backend; the
+   scheduler died in a way it cannot recover from.
+2. Check `notify_bridge_deferred_pending`. A non-zero value during quiet
+   hours is normal; a value that grows monotonically across days is a
+   bug — inspect the `deferred_dispatch` table.
+3. Inspect the most recent `event_log` rows in the admin Events page or:
+
+   ```sql
+   SELECT created_at, event_type, dispatch_status, details
+   FROM event_log
+   ORDER BY created_at DESC LIMIT 50;
+   ```
+
+   Look for a `dispatch_status` other than `dispatched`.
+4. If a single tracker is silent, verify the provider's last poll status
+   in the admin UI (Providers page) — `notify_bridge_provider_poll_failures_total`
+   tells you which provider type is failing.
+5. If you've configured a `bridge_self` tracker but never received a
+   self-monitoring alert when something failed, see the next section —
+   `bridge_self` failures are deliberately log-only to prevent recursion.
+
+### Bridge self-monitoring is log-only on its own failures
+
+The built-in `bridge_self` provider emits notifications when polls,
+dispatches, or target sends fail. To prevent infinite-recursion (a
+`bridge_self` notification failing → triggering another `bridge_self`
+notification → ...), failures of `bridge_self` events themselves are
+**not** counted toward target-failure thresholds and are logged only.
+
+If your `bridge_self` notifications stop arriving, it means the
+notification target you wired them to is itself failing. Grep stderr for:
+
+```text
+bridge_self target-failure emission failed
+emit_bridge_self_event failed
+```
+
+The fix is always at the target layer (Telegram bot blocked, Matrix
+homeserver down, SMTP credentials rotated). The bridge cannot tell you
+about its own outbound failure — that's what the operator's external
+monitoring (Prometheus alert on `notify_bridge_target_send_failures_total`)
+is for.
+
+### "Webhook returns 500"
+
+Inspect the `webhook_payload_log` table for the matching request:
+
+```sql
+SELECT received_at, status_code, error_message, payload_excerpt
+FROM webhook_payload_log
+ORDER BY received_at DESC LIMIT 20;
+```
+
+Common causes: payload schema change in the source service, a tracker
+referencing a deleted provider, a Jinja template that errors out (look
+for `template render failed` in logs).
+
+### "Telegram bot rate-limited (429)"
+
+The Telegram client implements exponential backoff with jitter on
+`Retry-After`. No operator action is required for transient throttling.
+If the rate-limit persists, check:
+
+* The bot is being driven by multiple Notify Bridge instances pointing
+  at the same chat (split-brain — only one instance should own a bot).
+* A template is producing very large messages (Telegram limits message
+  size to 4096 chars). Look for `MessageTooLong` in the logs.
+
+### "DB lock contention"
+
+SQLite WAL mode and `busy_timeout=10000` make this rare. If you see
+`SQLITE_BUSY` in logs:
+
+* Check for long-running transactions (most often a stuck migration).
+* Confirm the WAL files are on the same filesystem as the main DB —
+  splitting them across mounts is a known cause.
+* Run `sqlite3 notify_bridge.db "PRAGMA wal_checkpoint(TRUNCATE);"` to
+  flush the WAL. Safe to run while the backend is up.
+
+## Upgrades
+
+1. Pre-migration snapshot is taken automatically before any migration
+   runs. The latest five snapshots are retained by default.
+2. Migrations are idempotent — re-running an upgrade is safe.
+3. If a migration fails, the snapshot from step 1 is the recovery point.
+   See "Recovery from a corrupted DB" above.
+4. Always test major version upgrades in staging first. The upgrade flow
+   is the same in staging: pull the new image, restart the container.
+
+The release tag stream lives at the project Gitea / GitHub releases page.
+Release notes are written to `RELEASE_NOTES.md` for the upcoming version
+and copied into the Gitea release body by the `release.yml` workflow.
diff --git a/frontend/src/lib/api.ts b/frontend/src/lib/api.ts
index 42b3ab0..11a38df 100644
--- a/frontend/src/lib/api.ts
+++ b/frontend/src/lib/api.ts
@@ -2,8 +2,41 @@
  * API client with JWT auth for the Notify Bridge backend.
  */
 
+import { goto } from '$app/navigation';
+
 const API_BASE = '/api';
 
+/**
+ * Thrown when the API client decides to redirect the user to /login (after a
+ * terminal 401). Caller-side `try/catch` blocks can branch on
+ * `instanceof AuthRedirectError` to skip showing an "Unauthorized" snackbar
+ * — the redirect itself is the user-visible signal.
+ */
+export class AuthRedirectError extends Error {
+	constructor() {
+		super('Unauthorized — redirecting to login');
+		this.name = 'AuthRedirectError';
+	}
+}
+
+// Module-level dedupe — a burst of concurrent requests that all get 401 (e.g.
+// the dashboard's parallel cache loads) should only schedule a single
+// `goto('/login')` instead of stacking N navigations.
+let _redirecting = false;
+
+/** Centralised "send the user to /login" path used by both api() and fetchAuth(). */
+function redirectToLogin(): void {
+	if (_redirecting) return;
+	_redirecting = true;
+	clearTokens();
+	if (typeof window !== 'undefined') {
+		// SvelteKit's goto() with replaceState avoids leaving the failed page
+		// in the back-stack (no "back-button to broken view" UX). We don't
+		// reset `_redirecting` — the page about to unmount makes it moot.
+		goto('/login', { replaceState: true });
+	}
+}
+
 /** Normalize a caught error to a user-safe message. */
 export function errMsg(err: unknown, fallback = 'Unexpected error'): string {
 	if (err instanceof Error && err.message) return err.message;
@@ -129,11 +162,11 @@ export async function api<T = any>(
 		}
 
 		if (res.status === 401 && token) {
-			clearTokens();
-			if (typeof window !== 'undefined') {
-				window.location.href = '/login';
-			}
-			throw new Error('Unauthorized');
+			redirectToLogin();
+			// Tagged so the caller's catch can distinguish "we already showed
+			// the user a redirect" from a real authorization failure they
+			// should snackbar.
+			throw new AuthRedirectError();
 		}
 
 		if (res.status === 204) return undefined as T;
@@ -204,9 +237,8 @@ export async function fetchAuth(
 		}
 
 		if (res.status === 401) {
-			clearTokens();
-			if (typeof window !== 'undefined') window.location.href = '/login';
-			throw new ApiError('Unauthorized', 401);
+			redirectToLogin();
+			throw new AuthRedirectError();
 		}
 
 		if (!res.ok) {
diff --git a/frontend/src/lib/components/EventDetailModal.svelte b/frontend/src/lib/components/EventDetailModal.svelte
index 4e35a3f..17cef42 100644
--- a/frontend/src/lib/components/EventDetailModal.svelte
+++ b/frontend/src/lib/components/EventDetailModal.svelte
@@ -240,34 +240,42 @@
 				{/if}
 			</dl>
 
-			<!-- Action buttons — deep-link + highlight the related entity card -->
+			<!-- Action buttons — deep-link + highlight the related entity card.
+			     IDs are snapshotted into local consts so the deferred onclick
+			     closures don't lose the narrowed type that the `{#if ...}` gate
+			     proves at template-render time. -->
 			<div class="actions">
 				{#if displayEvent.provider_id}
-					<button type="button" onclick={() => openEntity('/providers', displayEvent.provider_id)}>
+					{@const providerId = displayEvent.provider_id}
+					<button type="button" onclick={() => openEntity('/providers', providerId)}>
 						<MdiIcon name="mdiServer" size={14} />
 						{t('events.openProvider')}
 					</button>
 				{/if}
 				{#if displayEvent.telegram_bot_id && isCommand}
-					<button type="button" onclick={() => openEntity('/bots', displayEvent.telegram_bot_id)}>
+					{@const botId = displayEvent.telegram_bot_id}
+					<button type="button" onclick={() => openEntity('/bots', botId)}>
 						<MdiIcon name="mdiRobotHappy" size={14} />
 						{t('events.openBot')}
 					</button>
 				{/if}
 				{#if displayEvent.command_tracker_id && isCommand}
-					<button type="button" onclick={() => openEntity('/command-trackers', displayEvent.command_tracker_id)}>
+					{@const cmdTrackerId = displayEvent.command_tracker_id}
+					<button type="button" onclick={() => openEntity('/command-trackers', cmdTrackerId)}>
 						<MdiIcon name="mdiChat" size={14} />
 						{t('events.openCommandTracker')}
 					</button>
 				{/if}
 				{#if displayEvent.action_id && isAction}
-					<button type="button" onclick={() => openEntity('/actions', displayEvent.action_id)}>
+					{@const actionId = displayEvent.action_id}
+					<button type="button" onclick={() => openEntity('/actions', actionId)}>
 						<MdiIcon name="mdiPlayCircle" size={14} />
 						{t('events.openAction')}
 					</button>
 				{/if}
 				{#if !isCommand && !isAction && displayEvent.tracker_id}
-					<button type="button" onclick={() => openEntity('/notification-trackers', displayEvent.tracker_id)}>
+					{@const trackerId = displayEvent.tracker_id}
+					<button type="button" onclick={() => openEntity('/notification-trackers', trackerId)}>
 						<MdiIcon name="mdiRadar" size={14} />
 						{t('events.openTracker')}
 					</button>
diff --git a/frontend/src/lib/components/IconGridSelect.svelte b/frontend/src/lib/components/IconGridSelect.svelte
index ccba7d3..6fd04c0 100644
--- a/frontend/src/lib/components/IconGridSelect.svelte
+++ b/frontend/src/lib/components/IconGridSelect.svelte
@@ -17,6 +17,7 @@
 		columns = 2,
 		disabled = false,
 		compact = false,
+		onChange,
 	}: {
 		items: GridItem[];
 		value: string | number | null;
@@ -24,6 +25,13 @@
 		columns?: number;
 		disabled?: boolean;
 		compact?: boolean;
+		/**
+		 * Optional one-way change callback. Fired in addition to updating
+		 * `value` so callers that own state externally (e.g. a global store)
+		 * can avoid the read-modify-write feedback loop that `bind:value` plus
+		 * a sync `$effect` produces.
+		 */
+		onChange?: (value: string | number) => void;
 	} = $props();
 
 	let open = $state(false);
@@ -63,6 +71,7 @@
 		value = item.value;
 		open = false;
 		search = '';
+		onChange?.(item.value);
 	}
 
 	function handleKeydown(e: KeyboardEvent) {
diff --git a/frontend/src/lib/grid-items.ts b/frontend/src/lib/grid-items.ts
index 59bf914..25c6901 100644
--- a/frontend/src/lib/grid-items.ts
+++ b/frontend/src/lib/grid-items.ts
@@ -185,6 +185,19 @@ export const providerTypeFilterItems = (): GridItem[] => [
 	...allDescriptors().map(descriptorToGridItem),
 ];
 
+/** Provider types the user is allowed to create from the "new provider" wizard.
+ *
+ * Excludes ``bridge_self`` because it's auto-created exactly once per user
+ * (see ``packages/server/.../seeds.py``). Letting users pick it from the
+ * wizard would either duplicate the row or surface a confusing 409.
+ */
+const _USER_CREATABLE_PROVIDER_TYPES = (): string[] =>
+	allDescriptors()
+		.filter((d) => d.type !== 'bridge_self')
+		.map((d) => d.type);
+
 /** Provider type selector (no "All" option). */
 export const providerTypeItems = (): GridItem[] =>
-	allDescriptors().map(descriptorToGridItem);
+	allDescriptors()
+		.filter((d) => _USER_CREATABLE_PROVIDER_TYPES().includes(d.type))
+		.map(descriptorToGridItem);
diff --git a/frontend/src/lib/i18n/en.json b/frontend/src/lib/i18n/en.json
index 141ddb4..e6219e4 100644
--- a/frontend/src/lib/i18n/en.json
+++ b/frontend/src/lib/i18n/en.json
@@ -236,6 +236,13 @@
 		"typeGooglePhotos": "Google Photos",
 		"typeWebhook": "Generic Webhook",
 		"typeHomeAssistant": "Home Assistant",
+		"typeBridgeSelf": "Bridge Self-Monitoring",
+		"bridgeSelfPollThreshold": "Tracker poll failure threshold",
+		"bridgeSelfPollThresholdHint": "Notify after this many consecutive poll failures for any tracker.",
+		"bridgeSelfDeferredThreshold": "Deferred backlog threshold",
+		"bridgeSelfDeferredThresholdHint": "Notify when pending deferred-dispatch rows exceed this count.",
+		"bridgeSelfTargetThreshold": "Target send failure threshold",
+		"bridgeSelfTargetThresholdHint": "Notify after this many consecutive 5xx/network failures for any target.",
 		"haAccessToken": "Long-Lived Access Token",
 		"haAccessTokenKeep": "Long-Lived Access Token (leave empty to keep current)",
 		"haAccessTokenHint": "Create one in HA → Profile → Long-Lived Access Tokens. Required for WebSocket subscription.",
@@ -663,6 +670,9 @@
 		"haServiceCalled": "Service called",
 		"haEventFired": "Other HA event (catch-all)",
 		"haEventFiredHint": "Fires for any HA event type not covered by the boxes above. Useful for custom integrations; expect high volume.",
+		"bridgeSelfPollFailures": "Tracker poll failures",
+		"bridgeSelfDeferredBacklog": "Deferred backlog crossed threshold",
+		"bridgeSelfTargetFailures": "Target send failures",
 		"trackImages": "Track images",
 		"trackVideos": "Track videos",
 		"favoritesOnly": "Favorites only",
@@ -1199,6 +1209,7 @@
 	},
 	"common": {
 		"loading": "Loading...",
+		"auto": "Auto",
 		"save": "Save",
 		"cancel": "Cancel",
 		"delete": "Delete",
@@ -1365,7 +1376,8 @@
 		"providerNut": "Network UPS monitoring",
 		"providerGooglePhotos": "Google Photos albums & shared libraries",
 		"providerWebhook": "Receive events via HTTP POST",
-		"providerHomeAssistant": "Home Assistant event bus over WebSocket"
+		"providerHomeAssistant": "Home Assistant event bus over WebSocket",
+		"providerBridgeSelf": "Internal health alerts when polling, dispatch, or sends fail"
 	},
 	"webhookLogs": {
 		"title": "Recent Payloads",
diff --git a/frontend/src/lib/i18n/ru.json b/frontend/src/lib/i18n/ru.json
index 5dc12ed..86be749 100644
--- a/frontend/src/lib/i18n/ru.json
+++ b/frontend/src/lib/i18n/ru.json
@@ -236,6 +236,13 @@
 		"typeGooglePhotos": "Google Фото",
 		"typeWebhook": "Универсальный вебхук",
 		"typeHomeAssistant": "Home Assistant",
+		"typeBridgeSelf": "Самомониторинг моста",
+		"bridgeSelfPollThreshold": "Порог сбоев опроса трекера",
+		"bridgeSelfPollThresholdHint": "Уведомлять после стольких подряд сбоев опроса любого трекера.",
+		"bridgeSelfDeferredThreshold": "Порог очереди отложенной отправки",
+		"bridgeSelfDeferredThresholdHint": "Уведомлять, когда количество ожидающих записей deferred_dispatch превысит это значение.",
+		"bridgeSelfTargetThreshold": "Порог сбоев отправки в адресат",
+		"bridgeSelfTargetThresholdHint": "Уведомлять после стольких подряд сбоев 5xx/сети при отправке в любой адресат.",
 		"haAccessToken": "Долгоживущий токен доступа",
 		"haAccessTokenKeep": "Долгоживущий токен (оставьте пустым для сохранения)",
 		"haAccessTokenHint": "Создайте в HA → Профиль → Long-Lived Access Tokens. Нужен для WebSocket-подписки.",
@@ -663,6 +670,9 @@
 		"haServiceCalled": "Вызвана служба",
 		"haEventFired": "Прочее событие HA (catch-all)",
 		"haEventFiredHint": "Срабатывает на любые типы событий HA, не охваченные чекбоксами выше. Полезно для пользовательских интеграций; ожидайте большой объём.",
+		"bridgeSelfPollFailures": "Сбои опроса трекера",
+		"bridgeSelfDeferredBacklog": "Очередь отложенной отправки превысила порог",
+		"bridgeSelfTargetFailures": "Сбои отправки в адресат",
 		"trackImages": "Фото",
 		"trackVideos": "Видео",
 		"favoritesOnly": "Только избранные",
@@ -1199,6 +1209,7 @@
 	},
 	"common": {
 		"loading": "Загрузка...",
+		"auto": "Авто",
 		"save": "Сохранить",
 		"cancel": "Отмена",
 		"delete": "Удалить",
@@ -1365,7 +1376,8 @@
 		"providerNut": "Мониторинг ИБП через NUT",
 		"providerGooglePhotos": "Альбомы и общие библиотеки Google Фото",
 		"providerWebhook": "Приём событий через HTTP POST",
-		"providerHomeAssistant": "Шина событий Home Assistant по WebSocket"
+		"providerHomeAssistant": "Шина событий Home Assistant по WebSocket",
+		"providerBridgeSelf": "Внутренние оповещения о сбоях опроса, отправки или диспатча"
 	},
 	"webhookLogs": {
 		"title": "Последние запросы",
diff --git a/frontend/src/lib/providers/bridge-self.ts b/frontend/src/lib/providers/bridge-self.ts
new file mode 100644
index 0000000..cdd8efd
--- /dev/null
+++ b/frontend/src/lib/providers/bridge-self.ts
@@ -0,0 +1,98 @@
+import type { ProviderDescriptor } from './types';
+
+/**
+ * Bridge self-monitoring provider descriptor.
+ *
+ * The bridge_self provider has no remote URL and no credentials. The only
+ * configuration surface is the three thresholds below, used by the server
+ * to decide when an internal failure deserves a notification.
+ *
+ * Exactly one bridge_self provider exists per user, auto-seeded on user
+ * creation (see ``packages/server/src/notify_bridge_server/database/seeds.py``).
+ */
+export const bridgeSelfDescriptor: ProviderDescriptor = {
+	type: 'bridge_self',
+	defaultName: 'Bridge Self-Monitoring',
+	icon: 'mdiAlertCircleOutline',
+	hasUrl: false,
+
+	configFields: [
+		{
+			key: 'poll_failure_threshold',
+			configKey: 'poll_failure_threshold',
+			label: 'providers.bridgeSelfPollThreshold',
+			type: 'number',
+			optional: true,
+			min: 1,
+			defaultValue: 3,
+			hint: 'providers.bridgeSelfPollThresholdHint',
+		},
+		{
+			key: 'deferred_backlog_threshold',
+			configKey: 'deferred_backlog_threshold',
+			label: 'providers.bridgeSelfDeferredThreshold',
+			type: 'number',
+			optional: true,
+			min: 1,
+			defaultValue: 100,
+			hint: 'providers.bridgeSelfDeferredThresholdHint',
+		},
+		{
+			key: 'target_failure_threshold',
+			configKey: 'target_failure_threshold',
+			label: 'providers.bridgeSelfTargetThreshold',
+			type: 'number',
+			optional: true,
+			min: 1,
+			defaultValue: 5,
+			hint: 'providers.bridgeSelfTargetThresholdHint',
+		},
+	],
+
+	buildConfig(form) {
+		const toInt = (raw: unknown, fallback: number): number => {
+			const n = typeof raw === 'number' ? raw : parseInt(String(raw ?? ''), 10);
+			return Number.isFinite(n) && n >= 1 ? n : fallback;
+		};
+		return {
+			config: {
+				poll_failure_threshold: toInt(form.poll_failure_threshold, 3),
+				deferred_backlog_threshold: toInt(form.deferred_backlog_threshold, 100),
+				target_failure_threshold: toInt(form.target_failure_threshold, 5),
+			},
+		};
+	},
+
+	hasConfigChanged(form, existing) {
+		const toInt = (raw: unknown, fallback: number): number => {
+			const n = typeof raw === 'number' ? raw : parseInt(String(raw ?? ''), 10);
+			return Number.isFinite(n) && n >= 1 ? n : fallback;
+		};
+		return (
+			toInt(form.poll_failure_threshold, 3) !== toInt(existing.poll_failure_threshold, 3) ||
+			toInt(form.deferred_backlog_threshold, 100) !== toInt(existing.deferred_backlog_threshold, 100) ||
+			toInt(form.target_failure_threshold, 5) !== toInt(existing.target_failure_threshold, 5)
+		);
+	},
+
+	eventFields: [
+		{
+			key: 'track_bridge_self_poll_failures',
+			label: 'trackingConfig.bridgeSelfPollFailures',
+			default: true,
+		},
+		{
+			key: 'track_bridge_self_deferred_backlog',
+			label: 'trackingConfig.bridgeSelfDeferredBacklog',
+			default: true,
+		},
+		{
+			key: 'track_bridge_self_target_failures',
+			label: 'trackingConfig.bridgeSelfTargetFailures',
+			default: true,
+		},
+	],
+
+	collectionMeta: null,
+	webhookBased: false,
+};
diff --git a/frontend/src/lib/providers/immich.ts b/frontend/src/lib/providers/immich.ts
index ef2086f..5ba8ec0 100644
--- a/frontend/src/lib/providers/immich.ts
+++ b/frontend/src/lib/providers/immich.ts
@@ -113,6 +113,17 @@ export const immichDescriptor: ProviderDescriptor = {
 		desc: (col) => `${col.assetCount ?? col.asset_count ?? 0} assets`,
 	},
 
+	// Periodic summaries / scheduled picks / memories / quiet hours all live on
+	// the linked tracking & template configs — surface that connection on the
+	// tracker form so users don't need to read docs to find them.
+	featureDiscoveryHint: {
+		messageKey: 'notificationTracker.featureDiscovery',
+		ctas: [
+			{ href: '/tracking-configs?edit={tracking_config_id}', labelKey: 'notificationTracker.openTrackingConfig', icon: 'mdiArrowRight' },
+			{ href: '/template-configs?edit={template_config_id}', labelKey: 'notificationTracker.openTemplateConfig', icon: 'mdiArrowRight' },
+		],
+	},
+
 	async onBeforeSave({ form, previousCollectionIds, collections, api: apiFn }) {
 		const newIds = (form.collection_ids as string[]).filter(id => !previousCollectionIds.includes(id));
 		if (newIds.length === 0) return { proceed: true };
diff --git a/frontend/src/lib/providers/index.ts b/frontend/src/lib/providers/index.ts
index 16c3cb3..9512a87 100644
--- a/frontend/src/lib/providers/index.ts
+++ b/frontend/src/lib/providers/index.ts
@@ -14,6 +14,7 @@ import { nutDescriptor } from './nut';
 import { googlePhotosDescriptor } from './google-photos';
 import { webhookDescriptor } from './webhook';
 import { homeAssistantDescriptor } from './home-assistant';
+import { bridgeSelfDescriptor } from './bridge-self';
 
 const REGISTRY: ReadonlyMap<string, ProviderDescriptor> = new Map([
 	['immich', immichDescriptor],
@@ -24,6 +25,7 @@ const REGISTRY: ReadonlyMap<string, ProviderDescriptor> = new Map([
 	['google_photos', googlePhotosDescriptor],
 	['webhook', webhookDescriptor],
 	['home_assistant', homeAssistantDescriptor],
+	['bridge_self', bridgeSelfDescriptor],
 ]);
 
 /** Look up a provider descriptor by type. Returns null for unknown types. */
diff --git a/frontend/src/lib/providers/types.ts b/frontend/src/lib/providers/types.ts
index aecf448..338871f 100644
--- a/frontend/src/lib/providers/types.ts
+++ b/frontend/src/lib/providers/types.ts
@@ -196,6 +196,22 @@ export interface ProviderDescriptor {
 	/** Whether this provider stores incoming payload history for debugging. */
 	payloadHistory?: boolean;
 
+	// ── Tracker-form discovery hint ──
+	/**
+	 * Optional info banner shown on the TrackerForm to point users at related
+	 * configuration pages they would otherwise have to discover from docs.
+	 *
+	 * The hint is rendered as a single i18n message followed by zero or more
+	 * call-to-action links. ``ctas[].href`` may include ``{tracking_config_id}``
+	 * / ``{template_config_id}`` placeholders that the form substitutes from
+	 * the tracker's currently selected default-config IDs (or omits the
+	 * ``?edit=...`` query when the value is 0).
+	 */
+	featureDiscoveryHint?: {
+		messageKey: string;
+		ctas?: Array<{ href: string; labelKey: string; icon?: string }>;
+	};
+
 	// ── Provider-specific hooks ──
 	/**
 	 * Called after collection selection changes (before save).
diff --git a/frontend/src/lib/stores/entity-cache.svelte.ts b/frontend/src/lib/stores/entity-cache.svelte.ts
index b033910..e08404f 100644
--- a/frontend/src/lib/stores/entity-cache.svelte.ts
+++ b/frontend/src/lib/stores/entity-cache.svelte.ts
@@ -16,8 +16,19 @@ const DEFAULT_TTL_MS = 30_000; // 30 seconds
 export interface EntityCache<T extends { id: number }> {
 	/** Reactive list of cached entities. */
 	readonly items: T[];
-	/** True only during the very first fetch (no cached data yet). */
+	/**
+	 * True only during the very first fetch — when there is no cached data
+	 * to show yet. Background re-fetches keep `loading` false so consumers
+	 * keep rendering the previous list and don't flash a spinner; observe
+	 * `refreshing` instead if a subtle indicator is needed.
+	 */
 	readonly loading: boolean;
+	/**
+	 * True during any non-first fetch (cached items already populated).
+	 * Lets consumers distinguish "show skeleton" (loading) from "show subtle
+	 * shimmer/disabled state" (refreshing) without sharing one flag.
+	 */
+	readonly refreshing: boolean;
 	/** Timestamp of last successful fetch. */
 	readonly fetchedAt: number;
 	/** Fetch entities — returns cached data if fresh, else hits network. */
@@ -43,6 +54,7 @@ export function createEntityCache<T extends { id: number }>(
 ): EntityCache<T> {
 	let _items = $state<T[]>([]);
 	let _loading = $state(false);
+	let _refreshing = $state(false);
 	let _fetchedAt = $state(0);
 
 	function isFresh(): boolean {
@@ -56,8 +68,12 @@ export function createEntityCache<T extends { id: number }>(
 		const existing = inflightRequests.get(endpoint);
 		if (existing) return existing;
 
+		// First-load vs background-refresh state. We split these so consumers
+		// can keep the previous list visible during a re-fetch (refreshing)
+		// instead of flashing a spinner placeholder (loading).
 		const isFirstLoad = _fetchedAt === 0;
 		if (isFirstLoad) _loading = true;
+		else _refreshing = true;
 
 		const request = api<T[]>(endpoint)
 			.then((data) => {
@@ -67,6 +83,7 @@ export function createEntityCache<T extends { id: number }>(
 			})
 			.finally(() => {
 				_loading = false;
+				_refreshing = false;
 				inflightRequests.delete(endpoint);
 			});
 
@@ -104,6 +121,7 @@ export function createEntityCache<T extends { id: number }>(
 	return {
 		get items() { return _items; },
 		get loading() { return _loading; },
+		get refreshing() { return _refreshing; },
 		get fetchedAt() { return _fetchedAt; },
 		fetch,
 		invalidate,
diff --git a/frontend/src/lib/stores/provider-filter.svelte.ts b/frontend/src/lib/stores/provider-filter.svelte.ts
index f57d589..4ef7cad 100644
--- a/frontend/src/lib/stores/provider-filter.svelte.ts
+++ b/frontend/src/lib/stores/provider-filter.svelte.ts
@@ -26,15 +26,14 @@ function loadFromStorage(): void {
 loadFromStorage();
 
 export const globalProviderFilter = {
-	get id() {
-		// If providers are loaded and the stored ID doesn't match any, auto-clear
-		if (_providerId != null && providersCache.items.length > 0 &&
-			!providersCache.items.some(p => p.id === _providerId)) {
-			globalProviderFilter.clear();
-			return null;
-		}
-		return _providerId;
-	},
+	/**
+	 * Pure getter — returns whatever was last stored, never mutates. Stale-ID
+	 * reconciliation against `providersCache` is the responsibility of a
+	 * one-time `$effect` in `+layout.svelte` (see `reconcileStaleProviderId`),
+	 * because writing during read inside a `$state`-derived getter triggers
+	 * Svelte 5's `state_unsafe_mutation` warning.
+	 */
+	get id() { return _providerId; },
 	get initialized() { return _initialized; },
 
 	set(id: number | null) {
@@ -52,9 +51,24 @@ export const globalProviderFilter = {
 		this.set(null);
 	},
 
+	/**
+	 * Drop the stored provider ID if it no longer matches any item in the
+	 * providers cache. Safe to call from a `$effect` after the cache has been
+	 * fetched. Returns true when reconciliation actually changed state, so the
+	 * caller can short-circuit follow-up work.
+	 */
+	reconcileWithCache(): boolean {
+		if (_providerId != null && providersCache.items.length > 0 &&
+			!providersCache.items.some(p => p.id === _providerId)) {
+			this.clear();
+			return true;
+		}
+		return false;
+	},
+
 	/** The currently selected provider object (reactive). */
 	get provider() {
-		const id = this.id; // triggers stale-ID auto-clear
+		const id = _providerId;
 		if (id == null) return null;
 		return providersCache.items.find(p => p.id === id) ?? null;
 	},
diff --git a/frontend/src/routes/+layout.svelte b/frontend/src/routes/+layout.svelte
index c93a51c..2d23704 100644
--- a/frontend/src/routes/+layout.svelte
+++ b/frontend/src/routes/+layout.svelte
@@ -5,7 +5,7 @@
 	import { onMount } from 'svelte';
 	import { fade, slide } from 'svelte/transition';
 	import { cubicOut } from 'svelte/easing';
-	import { api } from '$lib/api';
+	import { api, errMsg } from '$lib/api';
 	import { getAuth, loadUser, logout } from '$lib/auth.svelte';
 	import { t, getLocale, setLocale } from '$lib/i18n';
 	import { getTheme, initTheme, setTheme, type Theme } from '$lib/theme.svelte';
@@ -46,30 +46,40 @@
 		{ value: 0, icon: 'mdiFilterOff', label: t('common.allProviders'), desc: '' },
 		...allProviders.map(p => ({ value: p.id, icon: providerDefaultIcon(p), label: p.name, desc: p.type })),
 	]);
-	let providerFilterValue = $state(globalProviderFilter.id ?? 0);
-	let _syncingFilter = false;
+	// One-way: the store is the source of truth, the filter widget displays it.
+	// IconGridSelect mutations route through `onChange` (see template) so we
+	// never need a paired `$effect` to mirror the local <-> store value, which
+	// previously required a `_syncingFilter` reentrancy flag.
+	let providerFilterValue = $derived(globalProviderFilter.id ?? 0);
 
 	// Reserve the provider-filter row from first paint until the cache resolves.
 	// Without this, the row appears mid-paint and pushes nav items down on every
 	// hard reload — the most visible "jump" the user reported.
 	let showProviderFilter = $derived(allProviders.length >= 1 || providersCache.fetchedAt === 0);
 
-	// Sync filter value → store
+	// Reconcile a stale persisted provider ID against the freshly-loaded
+	// providers cache. Lives here (not in the store getter) because writing
+	// `_providerId` from a `$state`-derived getter triggers Svelte's
+	// `state_unsafe_mutation`. Runs once per cache refresh.
 	$effect(() => {
-		const v = providerFilterValue;
-		if (_syncingFilter) return;
-		globalProviderFilter.set(v === 0 ? null : v);
+		// Track `fetchedAt` so we re-run after the cache loads.
+		void providersCache.fetchedAt;
+		void providersCache.items.length;
+		globalProviderFilter.reconcileWithCache();
 	});
 
-	// Sync store → filter value (handles auto-clear of stale IDs)
-	$effect(() => {
-		const storeId = globalProviderFilter.id;
-		if (storeId === null && providerFilterValue !== 0) {
-			_syncingFilter = true;
-			providerFilterValue = 0;
-			_syncingFilter = false;
-		}
-	});
+	function setProviderFilter(v: string | number) {
+		const num = typeof v === 'number' ? v : Number(v);
+		globalProviderFilter.set(num === 0 ? null : num);
+	}
+
+	// Collapsed-rail filter cycles through providers via the same setter so the
+	// store stays the single write path.
+	function cycleProviderFilter() {
+		const ids = [0, ...allProviders.map(p => p.id)];
+		const idx = ids.indexOf(providerFilterValue);
+		setProviderFilter(ids[(idx + 1) % ids.length]);
+	}
 
 	let showPasswordForm = $state(false);
 	let redirecting = $state(false);
@@ -91,7 +101,7 @@
 			pwdCurrent = ''; pwdNew = ''; pwdConfirm = '';
 			snackSuccess(t('snack.passwordChanged'));
 			setTimeout(() => { showPasswordForm = false; pwdMsg = ''; pwdSuccess = false; pwdConfirm = ''; }, 2000);
-		} catch (err: any) { pwdMsg = err.message; pwdSuccess = false; snackError(err.message); }
+		} catch (err: unknown) { const m = errMsg(err); pwdMsg = m; pwdSuccess = false; snackError(m); }
 	}
 
 	// Read persisted UI state synchronously so first paint already matches the
@@ -446,18 +456,14 @@
 			{#if showProviderFilter}
 				<div class="{collapsed ? 'px-2 py-1' : 'px-3 py-1.5'}" style="border-bottom: 1px solid var(--color-border);">
 					{#if collapsed}
-						<button onclick={() => {
-							const ids = [0, ...allProviders.map(p => p.id)];
-							const idx = ids.indexOf(providerFilterValue);
-							providerFilterValue = ids[(idx + 1) % ids.length];
-						}}
+						<button onclick={cycleProviderFilter}
 							class="provider-filter-btn flex items-center justify-center w-full py-1.5 rounded-lg text-sm transition-all duration-200"
 							title={globalProviderFilter.provider?.name || t('common.allProviders')}
 							aria-label={globalProviderFilter.provider?.name || t('common.allProviders')}>
 							<NavIcon name={globalProviderFilter.provider ? providerDefaultIcon(globalProviderFilter.provider) : 'mdiFilterOff'} size={16} />
 						</button>
 					{:else}
-						<IconGridSelect items={providerFilterItems} bind:value={providerFilterValue} columns={Math.min(providerFilterItems.length, 3)} compact />
+						<IconGridSelect items={providerFilterItems} value={providerFilterValue} onChange={setProviderFilter} columns={Math.min(providerFilterItems.length, 3)} compact />
 					{/if}
 				</div>
 			{/if}
@@ -595,6 +601,7 @@
 				<NavIcon name="mdiMagnify" size={20} />
 			</button>
 			<button onclick={() => mobileMoreOpen = !mobileMoreOpen} aria-label={t('nav.more')}
+				aria-expanded={mobileMoreOpen}
 				class="flex flex-col items-center gap-0.5 px-2 py-1.5 text-xs rounded-lg transition-all duration-200"
 				style="color: {mobileMoreOpen ? 'var(--color-primary)' : 'var(--color-muted-foreground)'};">
 				<NavIcon name="mdiDotsHorizontal" size={20} />
@@ -609,7 +616,7 @@
 				transition:slide={{ duration: 200, easing: cubicOut }}>
 				{#if allProviders.length >= 1}
 					<div class="mb-3 pb-3" style="border-bottom: 1px solid var(--color-border);">
-						<IconGridSelect items={providerFilterItems} bind:value={providerFilterValue} columns={Math.min(providerFilterItems.length, 4)} compact />
+						<IconGridSelect items={providerFilterItems} value={providerFilterValue} onChange={setProviderFilter} columns={Math.min(providerFilterItems.length, 4)} compact />
 					</div>
 				{/if}
 				<div class="space-y-3">
diff --git a/frontend/src/routes/actions/+page.svelte b/frontend/src/routes/actions/+page.svelte
index 3c3ab91..9b4626f 100644
--- a/frontend/src/routes/actions/+page.svelte
+++ b/frontend/src/routes/actions/+page.svelte
@@ -1,7 +1,7 @@
 <script lang="ts">
 	import { onMount } from 'svelte';
 	import { slide } from 'svelte/transition';
-	import { api } from '$lib/api';
+	import { api , errMsg} from '$lib/api';
 	import { t } from '$lib/i18n';
 	import { actionsCache, providersCache, capabilitiesCache } from '$lib/stores/caches.svelte';
 	import PageHeader from '$lib/components/PageHeader.svelte';
@@ -99,8 +99,8 @@
 				capabilitiesCache.fetch(),
 			]);
 			loadError = '';
-		} catch (err: any) {
-			loadError = err.message || t('actions.loadError');
+		} catch (err: unknown) {
+			loadError = errMsg(err, t('actions.loadError'));
 		} finally { loaded = true; highlightFromUrl(); }
 	}
 
@@ -138,7 +138,7 @@
 			}
 			showForm = false; editing = null; actionsCache.invalidate(); await load();
 			snackSuccess(t('actions.saved'));
-		} catch (err: any) { error = err.message; snackError(err.message); }
+		} catch (err: unknown) { const __m = errMsg(err); error = __m; snackError(__m); }
 		submitting = false;
 	}
 
@@ -151,7 +151,7 @@
 			await api(`/actions/${id}`, { method: 'DELETE' });
 			actionsCache.invalidate(); await load();
 			snackSuccess(t('actions.deleted'));
-		} catch (err: any) { snackError(err.message); }
+		} catch (err: unknown) { snackError(errMsg(err)); }
 	}
 
 	async function executeAction(id: number, dryRun = false) {
@@ -165,7 +165,7 @@
 				: `${t('actions.execute')}: ${affected} ${t('actions.affected')}`;
 			snackSuccess(msg);
 			actionsCache.invalidate(); await load();
-		} catch (err: any) { snackError(err.message); }
+		} catch (err: unknown) { snackError(errMsg(err)); }
 		executing = { ...executing, [id]: false };
 	}
 
diff --git a/frontend/src/routes/actions/RuleEditor.svelte b/frontend/src/routes/actions/RuleEditor.svelte
index d889eb4..2222f25 100644
--- a/frontend/src/routes/actions/RuleEditor.svelte
+++ b/frontend/src/routes/actions/RuleEditor.svelte
@@ -1,5 +1,5 @@
 <script lang="ts">
-	import { api } from '$lib/api';
+	import { api , errMsg} from '$lib/api';
 	import { t } from '$lib/i18n';
 	import { providersCache } from '$lib/stores/caches.svelte';
 	import { snackSuccess, snackError } from '$lib/stores/snackbar.svelte';
@@ -47,7 +47,7 @@
 		loading = true;
 		try {
 			rules = await api<ActionRule[]>(`/actions/${actionId}/rules`);
-		} catch (err: any) { snackError(err.message); }
+		} catch (err: unknown) { snackError(errMsg(err)); }
 		loading = false;
 	}
 
@@ -63,7 +63,7 @@
 			people = Array.isArray(p) ? p : [];
 			albums = Array.isArray(a) ? a : [];
 		} catch {
-			// People/album endpoints may not exist yet — degrade gracefully
+			// People/album endpoints may not exist yet вЂ” degrade gracefully
 		}
 	}
 
@@ -79,7 +79,7 @@
 			resetNewRule();
 			await loadRules();
 			snackSuccess(t('actions.ruleSaved'));
-		} catch (err: any) { snackError(err.message); }
+		} catch (err: unknown) { snackError(errMsg(err)); }
 		submitting = false;
 	}
 
@@ -91,7 +91,7 @@
 			});
 			await loadRules();
 			snackSuccess(t('actions.ruleSaved'));
-		} catch (err: any) { snackError(err.message); }
+		} catch (err: unknown) { snackError(errMsg(err)); }
 	}
 
 	async function deleteRule(ruleId: number) {
@@ -99,7 +99,7 @@
 			await api(`/actions/${actionId}/rules/${ruleId}`, { method: 'DELETE' });
 			await loadRules();
 			snackSuccess(t('actions.ruleDeleted'));
-		} catch (err: any) { snackError(err.message); }
+		} catch (err: unknown) { snackError(errMsg(err)); }
 	}
 
 	async function toggleRule(rule: ActionRule) {
diff --git a/frontend/src/routes/bots/+page.svelte b/frontend/src/routes/bots/+page.svelte
index a82f23c..61966a9 100644
--- a/frontend/src/routes/bots/+page.svelte
+++ b/frontend/src/routes/bots/+page.svelte
@@ -1,7 +1,7 @@
 <script lang="ts">
 	import { onMount } from 'svelte';
 	import { page } from '$app/state';
-	import { api } from '$lib/api';
+	import { api , errMsg} from '$lib/api';
 	import { t } from '$lib/i18n';
 	import { telegramBotsCache, emailBotsCache, matrixBotsCache } from '$lib/stores/caches.svelte';
 	import Loading from '$lib/components/Loading.svelte';
@@ -29,7 +29,7 @@
 				emailBotsCache.fetch(true),
 				matrixBotsCache.fetch(true),
 			]);
-		} catch (err: any) { error = err.message || t('common.loadError'); snackError(error); }
+		} catch (err: unknown) { error = errMsg(err, t('common.loadError')); snackError(error); }
 		finally { loaded = true; highlightFromUrl(); }
 	}
 </script>
diff --git a/frontend/src/routes/bots/EmailBotTab.svelte b/frontend/src/routes/bots/EmailBotTab.svelte
index ec4b9ba..4b2d4db 100644
--- a/frontend/src/routes/bots/EmailBotTab.svelte
+++ b/frontend/src/routes/bots/EmailBotTab.svelte
@@ -1,5 +1,5 @@
 <script lang="ts">
-	import { api, getBlockedBy, type BlockedByDetail } from '$lib/api';
+	import { api, getBlockedBy, type BlockedByDetail , errMsg} from '$lib/api';
 	import BlockedByModal from '$lib/components/BlockedByModal.svelte';
 	import { t, getLocale } from '$lib/i18n';
 	import { emailBotsCache } from '$lib/stores/caches.svelte';
@@ -89,7 +89,7 @@
 				snackSuccess(t('snack.emailBotCreated'));
 			}
 			emailForm = defaultEmailForm(); nameManuallyEdited = false; showEmailForm = false; editingEmail = null; await onreload();
-		} catch (err: any) { error = err.message; snackError(err.message); }
+		} catch (err: unknown) { const __m = errMsg(err); error = __m; snackError(__m); }
 		finally { emailSubmitting = false; }
 	}
 
@@ -99,10 +99,10 @@
 			id,
 			onconfirm: async () => {
 				try { await api(`/email-bots/${id}`, { method: 'DELETE' }); await onreload(); snackSuccess(t('snack.emailBotDeleted')); }
-				catch (err: any) {
+				catch (err: unknown) {
 					const bb = getBlockedBy(err);
 					if (bb) { blockedBy = bb; return; }
-					error = err.message; snackError(err.message);
+					const m = errMsg(err); error = m; snackError(m);
 				}
 				finally { confirmDeleteEmail = null; }
 			}
@@ -115,7 +115,7 @@
 			const res = await api(`/email-bots/${botId}/test?locale=${getLocale()}`, { method: 'POST' });
 			if (res.success) snackSuccess(t('snack.emailBotTestSent'));
 			else snackError(res.error || t('emailBot.operationFailed'));
-		} catch (err: any) { snackError(err.message); }
+		} catch (err: unknown) { snackError(errMsg(err)); }
 		emailTesting = { ...emailTesting, [botId]: false };
 	}
 </script>
diff --git a/frontend/src/routes/bots/MatrixBotTab.svelte b/frontend/src/routes/bots/MatrixBotTab.svelte
index 88f37d3..876e7cc 100644
--- a/frontend/src/routes/bots/MatrixBotTab.svelte
+++ b/frontend/src/routes/bots/MatrixBotTab.svelte
@@ -1,5 +1,5 @@
 <script lang="ts">
-	import { api, getBlockedBy, type BlockedByDetail } from '$lib/api';
+	import { api, getBlockedBy, type BlockedByDetail , errMsg} from '$lib/api';
 	import BlockedByModal from '$lib/components/BlockedByModal.svelte';
 	import { t, getLocale } from '$lib/i18n';
 	import { matrixBotsCache } from '$lib/stores/caches.svelte';
@@ -85,7 +85,7 @@
 				snackSuccess(t('snack.matrixBotCreated'));
 			}
 			matrixForm = defaultMatrixForm(); nameManuallyEdited = false; showMatrixForm = false; editingMatrix = null; await onreload();
-		} catch (err: any) { error = err.message; snackError(err.message); }
+		} catch (err: unknown) { const __m = errMsg(err); error = __m; snackError(__m); }
 		finally { matrixSubmitting = false; }
 	}
 
@@ -95,10 +95,10 @@
 			id,
 			onconfirm: async () => {
 				try { await api(`/matrix-bots/${id}`, { method: 'DELETE' }); await onreload(); snackSuccess(t('snack.matrixBotDeleted')); }
-				catch (err: any) {
+				catch (err: unknown) {
 					const bb = getBlockedBy(err);
 					if (bb) { blockedBy = bb; return; }
-					error = err.message; snackError(err.message);
+					const m = errMsg(err); error = m; snackError(m);
 				}
 				finally { confirmDeleteMatrix = null; }
 			}
@@ -111,7 +111,7 @@
 			const res = await api(`/matrix-bots/${botId}/test?locale=${getLocale()}`, { method: 'POST' });
 			if (res.success) snackSuccess(t('snack.matrixBotTestOk'));
 			else snackError(res.error || t('matrixBot.operationFailed'));
-		} catch (err: any) { snackError(err.message); }
+		} catch (err: unknown) { snackError(errMsg(err)); }
 		matrixTesting = { ...matrixTesting, [botId]: false };
 	}
 </script>
diff --git a/frontend/src/routes/bots/TelegramBotTab.svelte b/frontend/src/routes/bots/TelegramBotTab.svelte
index d9e52ca..f76b626 100644
--- a/frontend/src/routes/bots/TelegramBotTab.svelte
+++ b/frontend/src/routes/bots/TelegramBotTab.svelte
@@ -1,7 +1,7 @@
 <script lang="ts">
 	import { slide, fade } from 'svelte/transition';
 	import { flip } from 'svelte/animate';
-	import { api, getBlockedBy, type BlockedByDetail } from '$lib/api';
+	import { api, errMsg, getBlockedBy, type BlockedByDetail } from '$lib/api';
 	import BlockedByModal from '$lib/components/BlockedByModal.svelte';
 	import { t, getLocale } from '$lib/i18n';
 	import { telegramBotsCache } from '$lib/stores/caches.svelte';
@@ -105,7 +105,7 @@
 				snackSuccess(t('snack.botRegistered'));
 			}
 			form = { name: '', icon: '', token: '' }; nameManuallyEdited = false; showForm = false; editing = null; await onreload();
-		} catch (err: any) { error = err.message; snackError(err.message); }
+		} catch (err: unknown) { const m = errMsg(err); error = m; snackError(m); }
 		finally { submitting = false; }
 	}
 
@@ -115,10 +115,10 @@
 			id,
 			onconfirm: async () => {
 				try { await api(`/telegram-bots/${id}`, { method: 'DELETE' }); await onreload(); snackSuccess(t('snack.botDeleted')); }
-				catch (err: any) {
+				catch (err: unknown) {
 					const bb = getBlockedBy(err);
 					if (bb) { blockedBy = bb; return; }
-					error = err.message; snackError(err.message);
+					const m = errMsg(err); error = m; snackError(m);
 				}
 				finally { confirmDelete = null; }
 			}
@@ -147,7 +147,7 @@
 		try {
 			chats = { ...chats, [botId]: await api<TelegramChat[]>(`/telegram-bots/${botId}/chats/discover`, { method: 'POST' }) };
 			snackSuccess(t('telegramBot.chatsDiscovered'));
-		} catch (err: any) { snackError(err.message); }
+		} catch (err: unknown) { snackError(errMsg(err)); }
 		chatsRefreshing = { ...chatsRefreshing, [botId]: false };
 	}
 
@@ -156,11 +156,15 @@
 			await api(`/telegram-bots/${botId}/chats/${chatDbId}`, { method: 'DELETE' });
 			chats[botId] = (chats[botId] || []).filter((c) => c.id !== chatDbId);
 			snackSuccess(t('telegramBot.chatDeleted'));
-		} catch (err: any) { snackError(err.message); }
+		} catch (err: unknown) { snackError(errMsg(err)); }
 	}
 
-	const LANG_ITEMS = [
-		{ value: '', label: '—', icon: 'mdiTranslate', desc: 'Auto' },
+	// `desc` is the only locale-sensitive field — language *names* are intentionally
+	// shown in their own language (English under EN, Русский under RU, …) so users
+	// recognise them regardless of the current UI locale. Only the "Auto" sentinel
+	// for the no-override row is translated.
+	let LANG_ITEMS = $derived([
+		{ value: '', label: '—', icon: 'mdiTranslate', desc: t('common.auto') },
 		{ value: 'en', label: 'EN', icon: 'mdiAlphaECircle', desc: 'English' },
 		{ value: 'ru', label: 'RU', icon: 'mdiAlphaRCircle', desc: 'Русский' },
 		{ value: 'uk', label: 'UK', icon: 'mdiAlphaUCircle', desc: 'Українська' },
@@ -177,7 +181,7 @@
 		{ value: 'tr', label: 'TR', icon: 'mdiAlphaTCircle', desc: 'Türkçe' },
 		{ value: 'ar', label: 'AR', icon: 'mdiAlphaACircle', desc: 'العربية' },
 		{ value: 'hi', label: 'HI', icon: 'mdiAlphaHCircle', desc: 'हिन्दी' },
-	];
+	]);
 
 	async function updateChatLanguage(botId: number, chat: TelegramChat, lang: string) {
 		try {
@@ -189,7 +193,7 @@
 				c.id === chat.id ? { ...c, language_override: lang } : c
 			);
 			snackSuccess(t('telegramBot.languageUpdated'));
-		} catch (err: any) { snackError(err.message); }
+		} catch (err: unknown) { snackError(errMsg(err)); }
 	}
 
 	async function toggleChatCommands(botId: number, chat: TelegramChat) {
@@ -202,7 +206,7 @@
 			chats[botId] = (chats[botId] || []).map(c =>
 				c.id === chat.id ? { ...c, commands_enabled: newVal } : c
 			);
-		} catch (err: any) { snackError(err.message); }
+		} catch (err: unknown) { snackError(errMsg(err)); }
 	}
 
 	async function loadListenerStatus(botId: number) {
@@ -232,7 +236,7 @@
 					t.id === trk.id ? { ...t, enabled: !t.enabled } : t
 				),
 			};
-		} catch (err: any) { snackError(err.message); }
+		} catch (err: unknown) { snackError(errMsg(err)); }
 	}
 
 	async function syncCommands(botId: number) {
@@ -241,7 +245,7 @@
 			const res = await api<ApiResult>(`/telegram-bots/${botId}/sync-commands`, { method: 'POST' });
 			if (res.success) snackSuccess(t('telegramBot.commandsSynced'));
 			else snackError(res.error || t('telegramBot.saveFailed'));
-		} catch (err: any) { snackError(err.message); }
+		} catch (err: unknown) { snackError(errMsg(err)); }
 		modeChanging = { ...modeChanging, [botId]: false };
 	}
 
@@ -254,7 +258,7 @@
 				await loadWebhookStatus(botId);
 			}
 			snackSuccess(t('snack.botUpdated'));
-		} catch (err: any) { snackError(err.message); }
+		} catch (err: unknown) { snackError(errMsg(err)); }
 		modeChanging = { ...modeChanging, [botId]: false };
 	}
 
@@ -274,7 +278,7 @@
 			} else {
 				snackError(res.error || t('telegramBot.webhookFailed'));
 			}
-		} catch (err: any) { snackError(err.message); }
+		} catch (err: unknown) { snackError(errMsg(err)); }
 		modeChanging = { ...modeChanging, [botId]: false };
 	}
 
@@ -284,7 +288,7 @@
 			const res = await api<ApiResult>(`/telegram-bots/${botId}/webhook/unregister`, { method: 'POST' });
 			if (res.success) { snackSuccess(t('telegramBot.webhookUnregistered')); await loadWebhookStatus(botId); }
 			else snackError(res.error || t('telegramBot.saveFailed'));
-		} catch (err: any) { snackError(err.message); }
+		} catch (err: unknown) { snackError(errMsg(err)); }
 		modeChanging = { ...modeChanging, [botId]: false };
 	}
 
@@ -315,7 +319,7 @@
 			const res = await api<ApiResult>(`/telegram-bots/${botId}/chats/${chatId}/test?locale=${getLocale()}`, { method: 'POST' });
 			if (res.success) snackSuccess(t('snack.targetTestSent'));
 			else snackError(res.error || t('telegramBot.saveFailed'));
-		} catch (err: any) { snackError(err.message); }
+		} catch (err: unknown) { snackError(errMsg(err)); }
 		chatTesting = { ...chatTesting, [key]: false };
 	}
 
@@ -464,6 +468,10 @@
 										</div>
 										<div style="justify-self:center" role="presentation" onclick={(e: MouseEvent) => e.stopPropagation()} onkeydown={(e: KeyboardEvent) => e.stopPropagation()}>
 											<button
+												type="button"
+												role="switch"
+												aria-checked={chat.commands_enabled}
+												aria-label={t('telegramBot.commandsToggle')}
 												style="width:28px; height:16px; border-radius:8px; position:relative; transition:background-color 0.2s; background-color:{chat.commands_enabled ? 'var(--color-primary)' : 'var(--color-border)'};"
 												title={t('telegramBot.commandsToggle')}
 												onclick={() => toggleChatCommands(bot.id, chat)}>
@@ -511,6 +519,10 @@
 											<a href="/command-trackers" class="font-medium text-[var(--color-primary)] hover:underline">{trk.name}</a>
 										</div>
 										<button
+											type="button"
+											role="switch"
+											aria-checked={trk.enabled}
+											aria-label={trk.enabled ? t('notificationTracker.pause') : t('notificationTracker.resume')}
 											style="width:28px; height:16px; border-radius:8px; position:relative; transition:background-color 0.2s; background-color:{trk.enabled ? 'var(--color-primary)' : 'var(--color-border)'};"
 											title={trk.enabled ? t('notificationTracker.pause') : t('notificationTracker.resume')}
 											onclick={() => toggleListenerEnabled(bot.id, trk)}>
diff --git a/frontend/src/routes/command-configs/+page.svelte b/frontend/src/routes/command-configs/+page.svelte
index 0d956dc..b0bce98 100644
--- a/frontend/src/routes/command-configs/+page.svelte
+++ b/frontend/src/routes/command-configs/+page.svelte
@@ -1,6 +1,6 @@
 <script lang="ts">
 	import { onMount } from 'svelte';
-	import { api, getBlockedBy, type BlockedByDetail } from '$lib/api';
+	import { api, getBlockedBy, type BlockedByDetail , errMsg} from '$lib/api';
 	import BlockedByModal from '$lib/components/BlockedByModal.svelte';
 	import { t } from '$lib/i18n';
 	import { commandConfigsCache, commandTemplateConfigsCache, capabilitiesCache } from '$lib/stores/caches.svelte';
@@ -51,7 +51,7 @@
 	let submitting = $state(false);
 	let confirmDelete = $state<{ id: number; onconfirm: () => Promise<void> } | null>(null);
 
-	// Immich command icons — used as fallback when capabilities don't specify icons
+	// Immich command icons РІР‚вЂќ used as fallback when capabilities don't specify icons
 	const commandIcons: Record<string, string> = {
 		help: 'mdiHelpCircle', status: 'mdiChartBox', albums: 'mdiImageMultiple',
 		events: 'mdiPulse', summary: 'mdiFileDocumentEdit', latest: 'mdiImagePlus',
@@ -105,7 +105,7 @@
 				commandTemplateConfigsCache.fetch(),
 				capabilitiesCache.fetch(),
 			]);
-		} catch (err: any) { error = err.message || t('common.loadError'); snackError(error); }
+		} catch (err: unknown) { error = errMsg(err, t('common.loadError')); snackError(error); }
 		finally { loaded = true; highlightFromUrl(); }
 	}
 
@@ -214,7 +214,7 @@
 				snackSuccess(t('snack.commandConfigSaved'));
 			}
 			form = defaultForm(); nameManuallyEdited = false; showForm = false; editing = null; await load();
-		} catch (err: any) { error = err.message; snackError(err.message); }
+		} catch (err: unknown) { const __m = errMsg(err); error = __m; snackError(__m); }
 		finally { submitting = false; }
 	}
 
@@ -227,10 +227,10 @@
 					await api(`/command-configs/${cfg.id}`, { method: 'DELETE' });
 					await load();
 					snackSuccess(t('snack.commandConfigDeleted'));
-				} catch (err: any) {
+				} catch (err: unknown) {
 					const bb = getBlockedBy(err);
 					if (bb) { blockedBy = bb; return; }
-					snackError(err.message);
+					snackError(errMsg(err));
 				}
 				finally { confirmDelete = null; }
 			}
diff --git a/frontend/src/routes/command-template-configs/+page.svelte b/frontend/src/routes/command-template-configs/+page.svelte
index 574468e..18e4a34 100644
--- a/frontend/src/routes/command-template-configs/+page.svelte
+++ b/frontend/src/routes/command-template-configs/+page.svelte
@@ -1,7 +1,7 @@
 <script lang="ts">
 	import { onMount } from 'svelte';
 	import { slide } from 'svelte/transition';
-	import { api, getBlockedBy, type BlockedByDetail } from '$lib/api';
+	import { api, getBlockedBy, type BlockedByDetail , errMsg} from '$lib/api';
 	import BlockedByModal from '$lib/components/BlockedByModal.svelte';
 	import { t } from '$lib/i18n';
 	import { sanitizePreview } from '$lib/sanitize';
@@ -85,7 +85,7 @@
 		return {
 			value: code,
 			label: m.native,
-			desc: i === 0 ? `${code.toUpperCase()} · ${t('locales.primary')}` : code.toUpperCase(),
+			desc: i === 0 ? `${code.toUpperCase()} Р’В· ${t('locales.primary')}` : code.toUpperCase(),
 		};
 	}));
 	$effect(() => {
@@ -144,10 +144,10 @@
 	 * Group command slots by purpose so the form mirrors how notification
 	 * templates are split (event vs scheduled vs settings).
 	 *
-	 *   commandResponses  — primary reply templates (/start, /help, /status, data slots)
-	 *   commandErrors     — fallback messages (rate_limited, no_results)
-	 *   commandDescriptions — desc_* slots: short menu blurbs in Telegram's command picker
-	 *   commandUsage      — usage_* slots: invocation examples shown by /help
+	 *   commandResponses  РІР‚вЂќ primary reply templates (/start, /help, /status, data slots)
+	 *   commandErrors     РІР‚вЂќ fallback messages (rate_limited, no_results)
+	 *   commandDescriptions РІР‚вЂќ desc_* slots: short menu blurbs in Telegram's command picker
+	 *   commandUsage      РІР‚вЂќ usage_* slots: invocation examples shown by /help
 	 */
 	let commandSlotGroups = $derived([
 		{
@@ -213,8 +213,8 @@
 			allCmdTplConfigs = cfgs;
 			allCapabilities = caps;
 			varsRef = vars;
-		} catch (err: any) {
-			error = err.message || t('common.loadError');
+		} catch (err: unknown) {
+			error = errMsg(err, t('common.loadError'));
 			snackError(error);
 		} finally {
 			loaded = true;
@@ -354,9 +354,10 @@
 			editing = null;
 			await load();
 			snackSuccess(t('snack.cmdTemplateSaved'));
-		} catch (err: any) {
-			error = err.message;
-			snackError(err.message);
+		} catch (err: unknown) {
+			const m = errMsg(err);
+			error = m;
+			snackError(m);
 		}
 	}
 
@@ -407,8 +408,8 @@
 				refreshAllPreviews();
 			}
 			snackSuccess(t('templateConfig.resetApplied'));
-		} catch (err: any) {
-			snackError(err.message);
+		} catch (err: unknown) {
+			snackError(errMsg(err));
 		}
 	}
 
@@ -444,11 +445,12 @@
 					await api(`/command-template-configs/${id}`, { method: 'DELETE' });
 					await load();
 					snackSuccess(t('snack.cmdTemplateDeleted'));
-				} catch (err: any) {
+				} catch (err: unknown) {
 					const bb = getBlockedBy(err);
 					if (bb) { blockedBy = bb; return; }
-					error = err.message;
-					snackError(err.message);
+					const m = errMsg(err);
+					error = m;
+					snackError(m);
 				} finally {
 					confirmDelete = null;
 				}
@@ -546,7 +548,7 @@
 					{#each filteredSlots as slot}
 						<CollapsibleSlot
 							label={slot.name}
-							description="/{slot.name} — {slot.description}"
+							description="/{slot.name} РІР‚вЂќ {slot.description}"
 							expanded={expandedSlots.has(slot.name)}
 							status={getSlotStatus(slot.name)}
 							ontoggle={() => toggleSlot(slot.name)}
@@ -585,9 +587,9 @@
 
 							{#if slotErrors[slot.name]}
 								{#if slotErrorTypes[slot.name] === 'undefined'}
-									<p class="mt-1 text-xs" style="color: var(--color-warning-fg);">⚠ {t('common.undefinedVar')}: {slotErrors[slot.name]}</p>
+									<p class="mt-1 text-xs" style="color: var(--color-warning-fg);">РІС™В  {t('common.undefinedVar')}: {slotErrors[slot.name]}</p>
 								{:else}
-									<p class="mt-1 text-xs" style="color: var(--color-error-fg);">✕ {t('common.syntaxError')}: {slotErrors[slot.name]}{slotErrorLines[slot.name] ? ` (${t('common.line')} ${slotErrorLines[slot.name]})` : ''}</p>
+									<p class="mt-1 text-xs" style="color: var(--color-error-fg);">РІСљвЂў {t('common.syntaxError')}: {slotErrors[slot.name]}{slotErrorLines[slot.name] ? ` (${t('common.line')} ${slotErrorLines[slot.name]})` : ''}</p>
 								{/if}
 							{/if}
 						</CollapsibleSlot>
diff --git a/frontend/src/routes/command-trackers/+page.svelte b/frontend/src/routes/command-trackers/+page.svelte
index f6ed87a..c3ae052 100644
--- a/frontend/src/routes/command-trackers/+page.svelte
+++ b/frontend/src/routes/command-trackers/+page.svelte
@@ -1,7 +1,7 @@
 <script lang="ts">
 	import { onMount, onDestroy } from 'svelte';
 	import { slide } from 'svelte/transition';
-	import { api } from '$lib/api';
+	import { api , errMsg} from '$lib/api';
 	import { topbarAction } from '$lib/stores/topbar-action.svelte';
 	import { t } from '$lib/i18n';
 	import { providersCache, telegramBotsCache, commandConfigsCache } from '$lib/stores/caches.svelte';
@@ -107,7 +107,7 @@
 				providersCache.fetch(), commandConfigsCache.fetch(),
 				telegramBotsCache.fetch(),
 			]);
-		} catch (err: any) { error = err.message || t('common.loadError'); snackError(error); }
+		} catch (err: unknown) { error = errMsg(err, t('common.loadError')); snackError(error); }
 		finally { loaded = true; highlightFromUrl(); }
 	}
 
@@ -168,7 +168,7 @@
 				snackSuccess(t('snack.commandTrackerCreated'));
 			}
 			form = defaultForm(); nameManuallyEdited = false; showForm = false; editing = null; await load();
-		} catch (err: any) { error = err.message; snackError(err.message); }
+		} catch (err: unknown) { const __m = errMsg(err); error = __m; snackError(__m); }
 		finally { submitting = false; }
 	}
 
@@ -180,7 +180,7 @@
 					await api(`/command-trackers/${trk.id}`, { method: 'DELETE' });
 					await load();
 					snackSuccess(t('snack.commandTrackerDeleted'));
-				} catch (err: any) { snackError(err.message); }
+				} catch (err: unknown) { snackError(errMsg(err)); }
 				finally { confirmDelete = null; }
 			}
 		};
@@ -193,7 +193,7 @@
 			await api(`/command-trackers/${trk.id}/${endpoint}`, { method: 'POST' });
 			snackSuccess(trk.enabled ? t('snack.commandTrackerDisabled') : t('snack.commandTrackerEnabled'));
 			await load();
-		} catch (err: any) { snackError(err.message); }
+		} catch (err: unknown) { snackError(errMsg(err)); }
 		toggling = { ...toggling, [trk.id]: false };
 	}
 
@@ -226,7 +226,7 @@
 			snackSuccess(t('snack.listenerAdded'));
 			await loadListeners(trkId);
 			newListenerBotId = { ...newListenerBotId, [trkId]: 0 };
-		} catch (err: any) { snackError(err.message); }
+		} catch (err: unknown) { snackError(errMsg(err)); }
 		addingListener = { ...addingListener, [trkId]: false };
 	}
 
@@ -235,7 +235,7 @@
 			await api(`/command-trackers/${trkId}/listeners/${listenerId}`, { method: 'DELETE' });
 			snackSuccess(t('snack.listenerRemoved'));
 			await loadListeners(trkId);
-		} catch (err: any) { snackError(err.message); }
+		} catch (err: unknown) { snackError(errMsg(err)); }
 	}
 
 	// Per-listener album scope editing
@@ -264,7 +264,7 @@
 			snackSuccess(t('snack.listenerScopeSaved'));
 			await loadListeners(scopeEditor.trkId);
 			scopeEditor = null;
-		} catch (err: any) { snackError(err.message); }
+		} catch (err: unknown) { snackError(errMsg(err)); }
 	}
 
 	function providerName(id: number): string {
@@ -398,7 +398,7 @@
 						<IconButton icon={trk.enabled ? 'mdiPause' : 'mdiPlay'} title={trk.enabled ? t('notificationTracker.pause') : t('notificationTracker.resume')} onclick={() => toggleEnabled(trk)} disabled={toggling[trk.id]} />
 						<button onclick={() => toggleListeners(trk.id)}
 							class="text-xs text-[var(--color-muted-foreground)] hover:underline px-2 py-1">
-							{t('commandTracker.listeners')} {expandedTracker === trk.id ? '▲' : '▼'}
+							{t('commandTracker.listeners')} {expandedTracker === trk.id ? 'в–І' : 'в–ј'}
 						</button>
 						<IconButton icon="mdiDelete" title={t('common.delete')} onclick={() => remove(trk)} variant="danger" />
 					</div>
@@ -473,7 +473,7 @@
 							onclick={() => { if (scopeEditor) scopeEditor.selectedIds = scopeEditor.collections.map((c: any) => c.id); }}>
 							{t('backup.selectAll')}
 						</button>
-						<span aria-hidden="true">·</span>
+						<span aria-hidden="true">В·</span>
 						<button type="button" class="underline hover:text-[var(--color-primary)]"
 							onclick={() => { if (scopeEditor) scopeEditor.selectedIds = []; }}>
 							{t('backup.deselectAll')}
diff --git a/frontend/src/routes/login/+page.svelte b/frontend/src/routes/login/+page.svelte
index bff7ec2..6515345 100644
--- a/frontend/src/routes/login/+page.svelte
+++ b/frontend/src/routes/login/+page.svelte
@@ -1,7 +1,7 @@
 <script lang="ts">
 	import { goto } from '$app/navigation';
 	import { onMount } from 'svelte';
-	import { api } from '$lib/api';
+	import { api , errMsg} from '$lib/api';
 	import { login } from '$lib/auth.svelte';
 	import { t, getLocale, setLocale } from '$lib/i18n';
 	import { initTheme, getTheme, setTheme, type Theme } from '$lib/theme.svelte';
@@ -37,7 +37,7 @@
 			const res = await api<{ needs_setup: boolean }>('/auth/needs-setup');
 			if (res.needs_setup) goto('/setup');
 		} catch {
-			// The backend is unreachable — surface that distinctly so the user
+			// The backend is unreachable вЂ” surface that distinctly so the user
 			// doesn't blame the login form for a network/backend problem.
 			backendDown = true;
 		}
@@ -50,8 +50,8 @@
 		try {
 			await login(username, password);
 			window.location.href = '/';
-		} catch (err: any) {
-			error = err.message || t('auth.loginFailed');
+		} catch (err: unknown) {
+			error = errMsg(err, t('auth.loginFailed'));
 		}
 		submitting = false;
 	}
diff --git a/frontend/src/routes/notification-trackers/+page.svelte b/frontend/src/routes/notification-trackers/+page.svelte
index 1a85936..0c2dca5 100644
--- a/frontend/src/routes/notification-trackers/+page.svelte
+++ b/frontend/src/routes/notification-trackers/+page.svelte
@@ -1,6 +1,6 @@
 <script lang="ts">
 	import { onMount, onDestroy } from 'svelte';
-	import { api, parseDate } from '$lib/api';
+	import { api, parseDate , errMsg} from '$lib/api';
 	import { topbarAction } from '$lib/stores/topbar-action.svelte';
 	import { t, getLocale } from '$lib/i18n';
 	import { providersCache, targetsCache, trackingConfigsCache, templateConfigsCache, capabilitiesCache } from '$lib/stores/caches.svelte';
@@ -98,7 +98,7 @@
 	// Test types: basic is always available; periodic/scheduled/memory only for providers
 	// that have those notification slots in their capabilities AND have the feature
 	// enabled on the tracker's default TrackingConfig. A disabled feature on the
-	// default config means cron dispatch won't fire it in production either — so
+	// default config means cron dispatch won't fire it in production either РІР‚вЂќ so
 	// the test button would just surface a silent skip.
 	const allTestTypes: Record<string, {
 		key: string; icon: string; labelKey: string;
@@ -128,7 +128,7 @@
 			base.push({
 				key: tt.key, icon: tt.icon, labelKey: tt.labelKey,
 				// When surfaced, the button still renders but is disabled and
-				// shows *why* — users who land here via the test menu without
+				// shows *why* РІР‚вЂќ users who land here via the test menu without
 				// having toggled the feature on Tracking Config see a clear
 				// pointer to the missing setting instead of a silent failure.
 				disabledReason: enabled ? undefined : 'notificationTracker.testDisabledHint',
@@ -166,8 +166,8 @@
 				trackingConfigsCache.fetch(), templateConfigsCache.fetch(),
 				capabilitiesCache.fetch(),
 			]);
-		} catch (err: any) {
-			loadError = err.message || t('common.loadFailed');
+		} catch (err: unknown) {
+			loadError = errMsg(err, t('common.loadFailed'));
 			snackError(loadError);
 		} finally { loaded = true; highlightFromUrl(); }
 	}
@@ -179,7 +179,7 @@
 
 	async function loadUsers() {
 		if (!form.provider_id) { users = []; return; }
-		// Skip the fetch when the descriptor has no user filters — saves a
+		// Skip the fetch when the descriptor has no user filters РІР‚вЂќ saves a
 		// pointless round-trip for providers like Immich/Scheduler.
 		const desc = getDescriptor(selectedProviderType);
 		if (!desc?.userFilters || desc.userFilters.length === 0) { users = []; return; }
@@ -289,7 +289,7 @@
 				snackSuccess(t('snack.trackerCreated'));
 			}
 			showForm = false; editing = null; linkWarning = null; await load();
-		} catch (err: any) { error = err.message; snackError(err.message); } finally { submitting = false; }
+		} catch (err: unknown) { const __m = errMsg(err); error = __m; snackError(__m); } finally { submitting = false; }
 	}
 
 	async function autoCreateLinks() {
@@ -301,8 +301,8 @@
 				try {
 					await api(`/providers/${linkWarning.providerId}/albums/${album.id}/shared-links`, { method: 'POST' });
 					created++;
-				} catch (err: any) {
-					snackError(`Failed to create link for "${album.name}": ${err.message}`);
+				} catch (err: unknown) {
+					snackError(`Failed to create link for "${album.name}": ${errMsg(err)}`);
 				}
 			}
 		}
@@ -324,7 +324,7 @@
 			await api(`/notification-trackers/${tracker.id}`, { method: 'PUT', body: JSON.stringify({ enabled: !tracker.enabled }) });
 			await load();
 			snackSuccess(tracker.enabled ? t('snack.trackerPaused') : t('snack.trackerResumed'));
-		} catch (err: any) { snackError(err.message); } finally { toggling = { ...toggling, [tracker.id]: false }; }
+		} catch (err: unknown) { snackError(errMsg(err)); } finally { toggling = { ...toggling, [tracker.id]: false }; }
 	}
 
 	function startDelete(tracker: Tracker) { confirmDelete = tracker; }
@@ -335,7 +335,7 @@
 			await api(`/notification-trackers/${confirmDelete.id}`, { method: 'DELETE' });
 			await load();
 			snackSuccess(t('snack.trackerDeleted'));
-		} catch (err: any) { error = err.message; snackError(err.message); }
+		} catch (err: unknown) { const __m = errMsg(err); error = __m; snackError(__m); }
 		confirmDelete = null;
 	}
 
@@ -383,7 +383,7 @@
 	function trackerTiles(tracker: Tracker): MetaTile[] {
 		const tiles: MetaTile[] = [];
 		const trkDesc = getDescriptor(getProviderType(tracker));
-		// Status — armed/paused with color tone
+		// Status РІР‚вЂќ armed/paused with color tone
 		tiles.push(tracker.enabled
 			? { icon: 'mdiPulse', label: t('notificationTracker.armed'), tone: 'mint' }
 			: { icon: 'mdiPauseCircleOutline', label: t('notificationTracker.paused'), tone: 'citrus' });
@@ -393,7 +393,7 @@
 			label: getProviderName(tracker.provider_id),
 			tone: 'lavender',
 		});
-		// Collections — count + label (varies per provider descriptor)
+		// Collections РІР‚вЂќ count + label (varies per provider descriptor)
 		const collCount = (tracker.collection_ids || []).length;
 		if (collCount > 0 || !trkDesc?.webhookBased) {
 			tiles.push({
@@ -403,7 +403,7 @@
 				tone: 'sky',
 			});
 		}
-		// Scan interval — only meaningful for polling trackers
+		// Scan interval РІР‚вЂќ only meaningful for polling trackers
 		if (!trkDesc?.webhookBased) {
 			tiles.push({
 				icon: 'mdiTimerOutline',
@@ -451,7 +451,7 @@
 			newLinkTemplateConfigId[trackerId] = 0;
 			await load();
 			snackSuccess(t('snack.targetLinked'));
-		} catch (err: any) { snackError(err.message); }
+		} catch (err: unknown) { snackError(errMsg(err)); }
 		addingTarget = { ...addingTarget, [trackerId]: false };
 	}
 
@@ -460,7 +460,7 @@
 			await api(`/notification-trackers/${trackerId}/targets/${ttId}`, { method: 'DELETE' });
 			await load();
 			snackSuccess(t('snack.targetUnlinked'));
-		} catch (err: any) { snackError(err.message); }
+		} catch (err: unknown) { snackError(errMsg(err)); }
 	}
 
 	async function updateTargetLink(trackerId: number, tt: TrackerTarget, field: string, value: string | number | boolean | null) {
@@ -470,7 +470,7 @@
 				body: JSON.stringify({ [field]: value }),
 			});
 			await load();
-		} catch (err: any) { snackError(err.message); }
+		} catch (err: unknown) { snackError(errMsg(err)); }
 	}
 
 	async function testTrackerTarget(trackerId: number, ttId: number, testType: string) {
@@ -492,8 +492,8 @@
 			} else {
 				snackSuccess(t('snack.targetTestSent'));
 			}
-		} catch (err: any) {
-			snackError(err.message);
+		} catch (err: unknown) {
+			snackError(errMsg(err));
 		} finally {
 			ttTesting = { ...ttTesting, [key]: '' };
 		}
@@ -593,8 +593,8 @@
 						<div class="list-row__secondary mt-0.5">
 							<CrossLink href="/providers" icon="mdiServer" label={getProviderName(tracker.provider_id)} entityId={tracker.provider_id} />
 							<p class="text-sm text-[var(--color-muted-foreground)]">
-								{(tracker.collection_ids || []).length} {getCollectionLabel(tracker)} ·
-								{#if !trkDesc?.webhookBased}{t('notificationTracker.every')} {tracker.scan_interval}s ·{/if}
+								{(tracker.collection_ids || []).length} {getCollectionLabel(tracker)} Р’В·
+								{#if !trkDesc?.webhookBased}{t('notificationTracker.every')} {tracker.scan_interval}s Р’В·{/if}
 								{(tracker.tracker_targets || []).length} {t('notificationTracker.linkedTargets')}
 							</p>
 						</div>
@@ -605,7 +605,7 @@
 						<IconButton icon={tracker.enabled ? 'mdiPause' : 'mdiPlay'} title={tracker.enabled ? t('notificationTracker.pause') : t('notificationTracker.resume')} onclick={() => toggle(tracker)} disabled={toggling[tracker.id]} />
 						<button onclick={() => toggleExpand(tracker.id)}
 							class="text-xs text-[var(--color-muted-foreground)] hover:underline px-2 py-1">
-							{t('notificationTracker.linkedTargets')} {expandedTracker === tracker.id ? '▲' : '▼'}
+							{t('notificationTracker.linkedTargets')} {expandedTracker === tracker.id ? 'РІвЂ“Р†' : 'РІвЂ“С'}
 						</button>
 						<IconButton icon="mdiDelete" title={t('notificationTracker.delete')} onclick={() => startDelete(tracker)} variant="danger" />
 					</div>
diff --git a/frontend/src/routes/notification-trackers/SharedLinkModal.svelte b/frontend/src/routes/notification-trackers/SharedLinkModal.svelte
index 7551a2f..4f352ce 100644
--- a/frontend/src/routes/notification-trackers/SharedLinkModal.svelte
+++ b/frontend/src/routes/notification-trackers/SharedLinkModal.svelte
@@ -1,6 +1,6 @@
 <script lang="ts">
 	import { t } from '$lib/i18n';
-	import { api } from '$lib/api';
+	import { api , errMsg} from '$lib/api';
 	import { snackError, snackSuccess } from '$lib/stores/snackbar.svelte';
 	import Modal from '$lib/components/Modal.svelte';
 	import MdiIcon from '$lib/components/MdiIcon.svelte';
@@ -23,7 +23,7 @@
 	let replacing = $state<Record<string, boolean>>({});
 
 	/**
-	 * Expired and password-protected links can't be repaired in place — the
+	 * Expired and password-protected links can't be repaired in place вЂ” the
 	 * Immich API has no "reset" endpoint. The only remedy is to recreate the
 	 * link (which the backend does by POSTing a new one and returning it).
 	 * We surface the action per-row so users don't have to leave the form.
@@ -39,8 +39,8 @@
 			snackSuccess(t('notificationTracker.createdLinks').replace('{count}', '1'));
 			const remaining = linkWarning.albums.filter(a => a.id !== album.id);
 			if (onupdate) onupdate(remaining);
-		} catch (err: any) {
-			snackError(t('notificationTracker.linkReplaceFailed').replace('{name}', album.name) + ': ' + err.message);
+		} catch (err: unknown) {
+			snackError(t('notificationTracker.linkReplaceFailed').replace('{name}', album.name) + ': ' + errMsg(err));
 		} finally {
 			replacing = { ...replacing, [album.id]: false };
 		}
diff --git a/frontend/src/routes/notification-trackers/TrackerForm.svelte b/frontend/src/routes/notification-trackers/TrackerForm.svelte
index 9138fee..13ec3f3 100644
--- a/frontend/src/routes/notification-trackers/TrackerForm.svelte
+++ b/frontend/src/routes/notification-trackers/TrackerForm.svelte
@@ -59,9 +59,36 @@
 	}: Props = $props();
 
 	let descriptor = $derived(getDescriptor(providerType));
-	let isScheduler = $derived(providerType === 'scheduler');
-	let isWebhook = $derived(descriptor?.webhookBased ?? false);
 	let colMeta = $derived(descriptor?.collectionMeta);
+	// Providers without a collection (currently just the scheduler) drive the
+	// scheduler-specific form layout. Reading from the descriptor keeps the
+	// branch out of the component and lets a future provider opt in by
+	// declaring `collectionMeta: null`.
+	let isScheduler = $derived(colMeta == null);
+	let isWebhook = $derived(descriptor?.webhookBased ?? false);
+
+	/**
+	 * Resolve `{tracking_config_id}` / `{template_config_id}` placeholders in a
+	 * descriptor-declared CTA href. When the corresponding form field is unset
+	 * (value 0), strip the entire `?edit=...` query so the link still goes to
+	 * the list page. Centralising this here avoids per-provider href logic
+	 * leaking back into the template.
+	 */
+	function resolveHintHref(href: string): string {
+		const tcId = form.default_tracking_config_id;
+		const tplId = form.default_template_config_id;
+		if (href.includes('{tracking_config_id}')) {
+			return tcId
+				? href.replace('{tracking_config_id}', String(tcId))
+				: href.split('?')[0];
+		}
+		if (href.includes('{template_config_id}')) {
+			return tplId
+				? href.replace('{template_config_id}', String(tplId))
+				: href.split('?')[0];
+		}
+		return href;
+	}
 
 	// Custom variable management for scheduler
 	function addVariable() {
@@ -231,29 +258,27 @@
 		{/if}
 
 		<!-- Feature discovery: the periodic/scheduled/memory/quiet-hours controls
-		     live on the tracking config, not on the tracker itself. Surface this
-		     here so users don't have to stumble onto the feature by reading docs. -->
-		{#if providerType === 'immich'}
+		     live on the tracking config, not on the tracker itself. The hint
+		     content (message + CTAs) is declared on the provider descriptor so
+		     each provider can surface its own discoverability links without
+		     embedding `if (type === 'xyz')` here. -->
+		{#if descriptor?.featureDiscoveryHint}
+		{@const hint = descriptor.featureDiscoveryHint}
 		<div class="flex items-start gap-2 rounded-md border border-[var(--color-border)] bg-[var(--color-muted)]/30 px-3 py-2">
 			<span style="color: var(--color-primary);"><MdiIcon name="mdiInformationOutline" size={16} /></span>
 			<div class="flex-1 text-xs">
-				<p style="color: var(--color-muted-foreground);">{t('notificationTracker.featureDiscovery')}</p>
-				<div class="flex flex-wrap gap-x-4 gap-y-1 mt-1">
-					<a href={form.default_tracking_config_id
-						? `/tracking-configs?edit=${form.default_tracking_config_id}`
-						: '/tracking-configs'}
-						class="inline-flex items-center gap-1 text-[var(--color-primary)] hover:underline">
-						<MdiIcon name="mdiArrowRight" size={12} />
-						{t('notificationTracker.openTrackingConfig')}
-					</a>
-					<a href={form.default_template_config_id
-						? `/template-configs?edit=${form.default_template_config_id}`
-						: '/template-configs'}
-						class="inline-flex items-center gap-1 text-[var(--color-primary)] hover:underline">
-						<MdiIcon name="mdiArrowRight" size={12} />
-						{t('notificationTracker.openTemplateConfig')}
-					</a>
-				</div>
+				<p style="color: var(--color-muted-foreground);">{t(hint.messageKey)}</p>
+				{#if hint.ctas && hint.ctas.length > 0}
+					<div class="flex flex-wrap gap-x-4 gap-y-1 mt-1">
+						{#each hint.ctas as cta}
+							<a href={resolveHintHref(cta.href)}
+								class="inline-flex items-center gap-1 text-[var(--color-primary)] hover:underline">
+								<MdiIcon name={cta.icon ?? 'mdiArrowRight'} size={12} />
+								{t(cta.labelKey)}
+							</a>
+						{/each}
+					</div>
+				{/if}
 			</div>
 		</div>
 		{/if}
diff --git a/frontend/src/routes/providers/+page.svelte b/frontend/src/routes/providers/+page.svelte
index 84fc8e9..36c0239 100644
--- a/frontend/src/routes/providers/+page.svelte
+++ b/frontend/src/routes/providers/+page.svelte
@@ -1,7 +1,7 @@
 <script lang="ts">
 	import { onMount } from 'svelte';
 	import { slide } from 'svelte/transition';
-	import { api, getBlockedBy, type BlockedByDetail } from '$lib/api';
+	import { api, getBlockedBy, type BlockedByDetail , errMsg} from '$lib/api';
 	import { t } from '$lib/i18n';
 	import { providersCache, externalUrlCache } from '$lib/stores/caches.svelte';
 	import PageHeader from '$lib/components/PageHeader.svelte';
@@ -61,7 +61,7 @@
 		const tiles: MetaTile[] = [];
 		const h = health[provider.id];
 		const provDesc = getDescriptor(provider.type);
-		// Status — first tile, color-coded
+		// Status РІР‚вЂќ first tile, color-coded
 		if (h === true) {
 			tiles.push({ icon: 'mdiCheckCircle', label: t('providers.online'), tone: 'mint' });
 		} else if (h === false) {
@@ -107,10 +107,10 @@
 		try {
 			const u = new URL(url);
 			const segments = u.pathname.split('/').filter(Boolean);
-			const tail = segments.length ? `/${segments[0]}${segments.length > 1 ? '/…' : ''}` : '';
+			const tail = segments.length ? `/${segments[0]}${segments.length > 1 ? '/РІР‚В¦' : ''}` : '';
 			return `${u.host}${tail}`;
 		} catch {
-			return url.length > 32 ? `${url.slice(0, 30)}…` : url;
+			return url.length > 32 ? `${url.slice(0, 30)}РІР‚В¦` : url;
 		}
 	}
 
@@ -142,7 +142,7 @@
 
 	let health = $state<Record<number, boolean | null>>({});
 
-	// Status pill row for the page header — derived from health probes.
+	// Status pill row for the page header РІР‚вЂќ derived from health probes.
 	const headerPills = $derived.by(() => {
 		const onlineCount = Object.values(health).filter(v => v === true).length;
 		const offlineCount = Object.values(health).filter(v => v === false).length;
@@ -169,8 +169,8 @@
 		try {
 			await providersCache.fetch(true);
 			loadError = '';
-		} catch (err: any) {
-			loadError = err.message || t('providers.loadError');
+		} catch (err: unknown) {
+			loadError = errMsg(err, t('providers.loadError'));
 		} finally { loaded = true; highlightFromUrl(); }
 		// Ping all providers in background (use unfiltered list)
 		for (const p of allProviders) {
@@ -237,7 +237,7 @@
 			}
 			showForm = false; editing = null; providersCache.invalidate(); await load();
 			snackSuccess(t('snack.providerSaved'));
-		} catch (err: any) { error = err.message; snackError(err.message); }
+		} catch (err: unknown) { const __m = errMsg(err); error = __m; snackError(__m); }
 		submitting = false;
 	}
 
@@ -248,10 +248,10 @@
 		const id = confirmDelete.id;
 		confirmDelete = null;
 		try { await api(`/providers/${id}`, { method: 'DELETE' }); providersCache.invalidate(); await load(); snackSuccess(t('snack.providerDeleted')); }
-		catch (err: any) {
+		catch (err: unknown) {
 			const bb = getBlockedBy(err);
 			if (bb) { blockedBy = bb; return; }
-			error = err.message; snackError(err.message);
+			const m = errMsg(err); error = m; snackError(m);
 		}
 	}
 </script>
diff --git a/frontend/src/routes/settings/backup/+page.svelte b/frontend/src/routes/settings/backup/+page.svelte
index 861fdd7..f9cb68e 100644
--- a/frontend/src/routes/settings/backup/+page.svelte
+++ b/frontend/src/routes/settings/backup/+page.svelte
@@ -1,6 +1,6 @@
 <script lang="ts">
 	import { onMount } from 'svelte';
-	import { api, fetchAuth } from '$lib/api';
+	import { api, fetchAuth , errMsg} from '$lib/api';
 	import { t } from '$lib/i18n';
 	import Loading from '$lib/components/Loading.svelte';
 	import MdiIcon from '$lib/components/MdiIcon.svelte';
@@ -97,9 +97,10 @@
 			scheduledSettings = settings;
 			backupFiles = files;
 			pending = p;
-		} catch (err: any) {
-			error = err.message;
-			snackError(err.message);
+		} catch (err: unknown) {
+			const m = errMsg(err);
+			error = m;
+			snackError(m);
 		} finally {
 			loaded = true;
 		}
@@ -110,7 +111,7 @@
 			await api('/backup/pending-restore', { method: 'DELETE' });
 			snackSuccess(t('backup.pendingCancelled'));
 			pending = null;
-		} catch (err: any) { snackError(err.message); }
+		} catch (err: unknown) { snackError(errMsg(err)); }
 	}
 
 	async function applyAndRestart(): Promise<void> {
@@ -131,9 +132,9 @@
 				if (attempts < 120) setTimeout(poll, 1000);
 			};
 			setTimeout(poll, 1500);
-		} catch (err: any) {
+		} catch (err: unknown) {
 			restartingOverlay = false;
-			snackError(err.message);
+			snackError(errMsg(err));
 		}
 	}
 
@@ -144,8 +145,8 @@
 			await api(`/backup/files?secrets_mode=${mode}`, { method: 'POST' });
 			snackSuccess(t('backup.manualCreated'));
 			await refreshFiles();
-		} catch (err: any) {
-			snackError(err.message);
+		} catch (err: unknown) {
+			snackError(errMsg(err));
 		} finally {
 			creatingBackup = false;
 		}
@@ -178,8 +179,8 @@
 			a.click();
 			URL.revokeObjectURL(url);
 			snackSuccess(t('backup.exportSuccess'));
-		} catch (err: any) {
-			snackError(err.message);
+		} catch (err: unknown) {
+			snackError(errMsg(err));
 		} finally {
 			exporting = false;
 		}
@@ -202,8 +203,8 @@
 			formData.append('file', importFile);
 			const res = await fetchAuth('/backup/validate', { method: 'POST', body: formData });
 			validationResult = await res.json();
-		} catch (err: any) {
-			snackError(err.message);
+		} catch (err: unknown) {
+			snackError(errMsg(err));
 		} finally {
 			validating = false;
 		}
@@ -230,8 +231,8 @@
 			snackSuccess(t('backup.restorePrepared'));
 			postRestoreModalOpen = true;
 			importFile = null;
-		} catch (err: any) {
-			snackError(err.message);
+		} catch (err: unknown) {
+			snackError(errMsg(err));
 		} finally {
 			importing = false;
 		}
@@ -246,8 +247,8 @@
 				body: JSON.stringify(scheduledSettings),
 			});
 			snackSuccess(t('backup.scheduleSaved'));
-		} catch (err: any) {
-			snackError(err.message);
+		} catch (err: unknown) {
+			snackError(errMsg(err));
 		} finally {
 			savingSchedule = false;
 		}
@@ -258,8 +259,8 @@
 		loadingFiles = true;
 		try {
 			backupFiles = await api<BackupFile[]>('/backup/files');
-		} catch (err: any) {
-			snackError(err.message);
+		} catch (err: unknown) {
+			snackError(errMsg(err));
 		} finally {
 			loadingFiles = false;
 		}
@@ -275,8 +276,8 @@
 			a.download = filename;
 			a.click();
 			URL.revokeObjectURL(url);
-		} catch (err: any) {
-			snackError(err.message);
+		} catch (err: unknown) {
+			snackError(errMsg(err));
 		}
 	}
 
@@ -286,8 +287,8 @@
 			snackSuccess(t('backup.fileDeleted'));
 			confirmDeleteFile = '';
 			await refreshFiles();
-		} catch (err: any) {
-			snackError(err.message);
+		} catch (err: unknown) {
+			snackError(errMsg(err));
 		}
 	}
 </script>
diff --git a/frontend/src/routes/setup/+page.svelte b/frontend/src/routes/setup/+page.svelte
index 7910aa1..4752c36 100644
--- a/frontend/src/routes/setup/+page.svelte
+++ b/frontend/src/routes/setup/+page.svelte
@@ -2,6 +2,7 @@
 	import { goto } from '$app/navigation';
 	import { onMount } from 'svelte';
 	import { setup } from '$lib/auth.svelte';
+	import { errMsg } from '$lib/api';
 	import { t } from '$lib/i18n';
 	import { initTheme } from '$lib/theme.svelte';
 	import MdiIcon from '$lib/components/MdiIcon.svelte';
@@ -25,7 +26,7 @@
 		try {
 			await setup(username, password);
 			window.location.href = '/';
-		} catch (err: any) { error = err.message || t('auth.setupFailed'); }
+		} catch (err: unknown) { error = errMsg(err, t('auth.setupFailed')); }
 		submitting = false;
 	}
 </script>
diff --git a/frontend/src/routes/targets/+page.svelte b/frontend/src/routes/targets/+page.svelte
index a582c2f..949c592 100644
--- a/frontend/src/routes/targets/+page.svelte
+++ b/frontend/src/routes/targets/+page.svelte
@@ -3,7 +3,7 @@
 	import { SvelteSet } from 'svelte/reactivity';
 	import { slide } from 'svelte/transition';
 	import { page } from '$app/state';
-	import { api, getBlockedBy, type BlockedByDetail } from '$lib/api';
+	import { api, getBlockedBy, type BlockedByDetail , errMsg} from '$lib/api';
 	import BlockedByModal from '$lib/components/BlockedByModal.svelte';
 	import { t, getLocale } from '$lib/i18n';
 	import { targetsCache, telegramBotsCache, emailBotsCache, matrixBotsCache } from '$lib/stores/caches.svelte';
@@ -26,7 +26,7 @@
 	import ReceiverSection from './ReceiverSection.svelte';
 	import BotGroupHeader from './BotGroupHeader.svelte';
 
-	// ── Helpers ──
+	// РІвЂќР‚РІвЂќР‚ Helpers РІвЂќР‚РІвЂќР‚
 
 	function getBotName(target: NotificationTarget): string | null {
 		if (target.type === 'telegram' && target.config?.bot_id) {
@@ -74,7 +74,7 @@
 		return recv.receiver_key || '?';
 	}
 
-	// ── Constants ──
+	// РІвЂќР‚РІвЂќР‚ Constants РІвЂќР‚РІвЂќР‚
 
 	const ALL_TYPES = ['telegram', 'webhook', 'email', 'discord', 'slack', 'ntfy', 'matrix', 'broadcast'] as const;
 	type TargetType = typeof ALL_TYPES[number];
@@ -97,7 +97,7 @@
 
 	function targetTiles(target: NotificationTarget): MetaTile[] {
 		const tiles: MetaTile[] = [];
-		// Type tile — useful when the "all types" filter is active and rows
+		// Type tile РІР‚вЂќ useful when the "all types" filter is active and rows
 		// from multiple types appear side-by-side. The receivers count is
 		// already shown inside the `target-summary` button, so we don't repeat
 		// it as a tile.
@@ -115,8 +115,8 @@
 				tone: 'sky',
 			});
 		}
-		// Telegram targets expose a chat label in config — surface it so the
-		// row reads "Telegram · @bot · Family chat" without expanding.
+		// Telegram targets expose a chat label in config РІР‚вЂќ surface it so the
+		// row reads "Telegram Р’В· @bot Р’В· Family chat" without expanding.
 		const cfg = (target.config || {}) as Record<string, any>;
 		if (target.type === 'telegram' && cfg.chat_id) {
 			tiles.push({
@@ -126,7 +126,7 @@
 				mono: true,
 			});
 		}
-		// Webhook target — show host
+		// Webhook target РІР‚вЂќ show host
 		if (target.type === 'webhook' && cfg.url) {
 			let host = String(cfg.url);
 			try { host = new URL(host).host; } catch { /* keep raw */ }
@@ -142,7 +142,7 @@
 		return tiles;
 	}
 
-	// ── Derived state ──
+	// РІвЂќР‚РІвЂќР‚ Derived state РІвЂќР‚РІвЂќР‚
 
 	let allTargets = $derived(targetsCache.items);
 	let activeType = $derived(page.url.searchParams.get('type') as TargetType | null);
@@ -158,7 +158,7 @@
 	const emailBotItems = $derived(emailBots.map(b => ({ value: b.id, label: b.name, icon: b.icon || 'mdiEmailOutline', desc: b.email })));
 	const matrixBotItems = $derived(matrixBots.map(b => ({ value: b.id, label: b.name, icon: b.icon || 'mdiMatrix', desc: b.display_name || b.homeserver_url })));
 
-	// ── Target form state ──
+	// РІвЂќР‚РІвЂќР‚ Target form state РІвЂќР‚РІвЂќР‚
 
 	let showForm = $state(false);
 	let editing = $state<number | null>(null);
@@ -204,7 +204,7 @@
 		formEl?.scrollIntoView({ behavior: 'smooth', block: 'start' });
 	}
 
-	// ── Receiver inline form state ──
+	// РІвЂќР‚РІвЂќР‚ Receiver inline form state РІвЂќР‚РІвЂќР‚
 
 	let addingReceiverForTarget = $state<number | null>(null);
 	let receiverForm = $state<Record<string, any>>({});
@@ -228,7 +228,7 @@
 		if (!expandedTargets.has(id)) expandedTargets.add(id);
 	}
 
-	// ── Effects ──
+	// РІвЂќР‚РІвЂќР‚ Effects РІвЂќР‚РІвЂќР‚
 
 	// Reset form when switching target type tabs
 	$effect(() => {
@@ -239,11 +239,11 @@
 		addingReceiverForTarget = null;
 	});
 
-	// ── Data loading ──
+	// РІвЂќР‚РІвЂќР‚ Data loading РІвЂќР‚РІвЂќР‚
 
 	onMount(load);
 
-	// ── Bot grouping ──
+	// РІвЂќР‚РІвЂќР‚ Bot grouping РІвЂќР‚РІвЂќР‚
 
 	type TargetGroup = {
 		key: string;
@@ -355,8 +355,8 @@
 				emailBotsCache.fetch(), matrixBotsCache.fetch(),
 			]);
 			loadError = '';
-		} catch (err: any) {
-			loadError = err.message || t('common.loadError');
+		} catch (err: unknown) {
+			loadError = errMsg(err, t('common.loadError'));
 			snackError(loadError);
 		} finally {
 			loaded = true;
@@ -372,7 +372,7 @@
 		} catch (e) { console.warn('Failed to load bot chats:', e); }
 	}
 
-	// Active discovery — actually polls Telegram getUpdates and persists any new chats.
+	// Active discovery РІР‚вЂќ actually polls Telegram getUpdates and persists any new chats.
 	// Fired when the chat picker opens so the user sees the freshest list without a manual click.
 	async function discoverReceiverBotChats(botId: number) {
 		if (!botId) return;
@@ -382,7 +382,7 @@
 		} catch (e) { console.warn('Failed to discover bot chats:', e); }
 	}
 
-	// ── Target CRUD ──
+	// РІвЂќР‚РІвЂќР‚ Target CRUD РІвЂќР‚РІвЂќР‚
 
 	function openNew() {
 		form = defaultForm();
@@ -475,9 +475,10 @@
 			editing = null;
 			await load();
 			snackSuccess(t('snack.targetSaved'));
-		} catch (err: any) {
-			error = err.message;
-			snackError(err.message);
+		} catch (err: unknown) {
+			const m = errMsg(err);
+			error = m;
+			snackError(m);
 		} finally {
 			submitting = false;
 		}
@@ -488,7 +489,7 @@
 			const res = await api<{ success: boolean; error?: string }>(`/targets/${id}/test?locale=${getLocale()}`, { method: 'POST' });
 			if (res.success) snackSuccess(t('snack.targetTestSent'));
 			else snackError(`Failed: ${res.error}`);
-		} catch (err: any) { snackError(err.message); }
+		} catch (err: unknown) { snackError(errMsg(err)); }
 	}
 
 	let blockedBy = $state<BlockedByDetail | null>(null);
@@ -497,15 +498,16 @@
 			await api(`/targets/${id}`, { method: 'DELETE' });
 			await load();
 			snackSuccess(t('snack.targetDeleted'));
-		} catch (err: any) {
+		} catch (err: unknown) {
 			const bb = getBlockedBy(err);
 			if (bb) { blockedBy = bb; return; }
-			error = err.message;
-			snackError(err.message);
+			const m = errMsg(err);
+			error = m;
+			snackError(m);
 		}
 	}
 
-	// ── Receiver CRUD ──
+	// РІвЂќР‚РІвЂќР‚ Receiver CRUD РІвЂќР‚РІвЂќР‚
 
 	async function openReceiverForm(targetId: number, targetType: string) {
 		// Force a remount of any picker palette when the same target is reopened
@@ -575,8 +577,8 @@
 			addingReceiverForTarget = null;
 			await load();
 			snackSuccess(t('targets.receiverAdded'));
-		} catch (err: any) {
-			snackError(err.message);
+		} catch (err: unknown) {
+			snackError(errMsg(err));
 		} finally {
 			receiverSubmitting = false;
 		}
@@ -590,7 +592,7 @@
 			});
 			await load();
 			snackSuccess(receiver.enabled ? t('targets.receiverDisabled') : t('targets.receiverEnabled'));
-		} catch (err: any) { snackError(err.message); }
+		} catch (err: unknown) { snackError(errMsg(err)); }
 	}
 
 	async function removeReceiver(targetId: number, receiverId: number) {
@@ -598,7 +600,7 @@
 			await api(`/targets/${targetId}/receivers/${receiverId}`, { method: 'DELETE' });
 			await load();
 			snackSuccess(t('targets.receiverDeleted'));
-		} catch (err: any) { snackError(err.message); }
+		} catch (err: unknown) { snackError(errMsg(err)); }
 	}
 
 	async function toggleBroadcastChild(targetId: number, childId: number) {
@@ -613,7 +615,7 @@
 				body: JSON.stringify({ config: { ...tgt.config, disabled_child_ids: [...disabled] } }),
 			});
 			await load();
-		} catch (err: any) { snackError(err.message); }
+		} catch (err: unknown) { snackError(errMsg(err)); }
 	}
 
 	async function testReceiver(targetId: number, receiverId: number) {
@@ -622,7 +624,7 @@
 			const res = await api<{ success: boolean; error?: string }>(`/targets/${targetId}/receivers/${receiverId}/test?locale=${getLocale()}`, { method: 'POST' });
 			if (res.success) snackSuccess(t('snack.targetTestSent'));
 			else snackError(`Failed: ${res.error}`);
-		} catch (err: any) { snackError(err.message); }
+		} catch (err: unknown) { snackError(errMsg(err)); }
 		finally { receiverTesting = { ...receiverTesting, [receiverId]: false }; }
 	}
 </script>
diff --git a/frontend/src/routes/template-configs/+page.svelte b/frontend/src/routes/template-configs/+page.svelte
index b586e5d..977a8e7 100644
--- a/frontend/src/routes/template-configs/+page.svelte
+++ b/frontend/src/routes/template-configs/+page.svelte
@@ -1,7 +1,7 @@
 <script lang="ts">
 	import { onMount, onDestroy } from 'svelte';
 	import { slide } from 'svelte/transition';
-	import { api, getBlockedBy, type BlockedByDetail } from '$lib/api';
+	import { api, getBlockedBy, type BlockedByDetail , errMsg} from '$lib/api';
 	import { topbarAction } from '$lib/stores/topbar-action.svelte';
 	import BlockedByModal from '$lib/components/BlockedByModal.svelte';
 	import { t } from '$lib/i18n';
@@ -82,7 +82,7 @@
 		return {
 			value: code,
 			label: m.native,
-			desc: i === 0 ? `${code.toUpperCase()} · ${t('locales.primary')}` : code.toUpperCase(),
+			desc: i === 0 ? `${code.toUpperCase()} Р’В· ${t('locales.primary')}` : code.toUpperCase(),
 		};
 	}));
 	/**
@@ -261,7 +261,7 @@
 				capabilitiesCache.fetch(),
 				supportedLocalesCache.fetch(),
 			]);
-		} catch (err: any) { error = err.message || t('common.loadError'); snackError(error); }
+		} catch (err: unknown) { error = errMsg(err, t('common.loadError')); snackError(error); }
 		finally { loaded = true; highlightFromUrl(); _openEditFromUrl(); handleDeepLink(); }
 	}
 
@@ -347,7 +347,7 @@
 			else await api('/template-configs', { method: 'POST', body: JSON.stringify(form) });
 			showForm = false; editing = null; await load();
 			snackSuccess(t('snack.templateSaved'));
-		} catch (err: any) { error = err.message; snackError(err.message); }
+		} catch (err: unknown) { const __m = errMsg(err); error = __m; snackError(__m); }
 	}
 
 	/**
@@ -404,8 +404,8 @@
 				refreshAllPreviews();
 			}
 			snackSuccess(t('templateConfig.resetApplied'));
-		} catch (err: any) {
-			snackError(err.message);
+		} catch (err: unknown) {
+			snackError(errMsg(err));
 		}
 	}
 
@@ -442,7 +442,7 @@
 			label: t('templateConfig.slots'),
 			tone: slotCount > 0 ? 'sky' : 'default',
 		});
-		// Locale coverage — count unique locales present across all slots
+		// Locale coverage РІР‚вЂќ count unique locales present across all slots
 		const locales = new Set<string>();
 		for (const s of Object.values(config.slots || {})) {
 			for (const loc of Object.keys(s || {})) locales.add(loc);
@@ -472,10 +472,10 @@
 			id,
 			onconfirm: async () => {
 				try { await api(`/template-configs/${id}`, { method: 'DELETE' }); await load(); snackSuccess(t('snack.templateDeleted')); }
-				catch (err: any) {
+				catch (err: unknown) {
 					const bb = getBlockedBy(err);
 					if (bb) { blockedBy = bb; return; }
-					error = err.message; snackError(err.message);
+					const m = errMsg(err); error = m; snackError(m);
 				}
 				finally { confirmDelete = null; }
 			}
@@ -624,9 +624,9 @@
 
 								{#if slotErrors[slot.key]}
 									{#if slotErrorTypes[slot.key] === 'undefined'}
-										<p class="mt-1 text-xs" style="color: var(--color-warning-fg);">⚠ {t('common.undefinedVar')}: {slotErrors[slot.key]}</p>
+										<p class="mt-1 text-xs" style="color: var(--color-warning-fg);">РІС™В  {t('common.undefinedVar')}: {slotErrors[slot.key]}</p>
 									{:else}
-										<p class="mt-1 text-xs" style="color: var(--color-error-fg);">✕ {t('common.syntaxError')}: {slotErrors[slot.key]}{slotErrorLines[slot.key] ? ` (${t('common.line')} ${slotErrorLines[slot.key]})` : ''}</p>
+										<p class="mt-1 text-xs" style="color: var(--color-error-fg);">РІСљвЂў {t('common.syntaxError')}: {slotErrors[slot.key]}{slotErrorLines[slot.key] ? ` (${t('common.line')} ${slotErrorLines[slot.key]})` : ''}</p>
 									{/if}
 								{/if}
 							</CollapsibleSlot>
diff --git a/frontend/src/routes/tracking-configs/+page.svelte b/frontend/src/routes/tracking-configs/+page.svelte
index b662012..0390d07 100644
--- a/frontend/src/routes/tracking-configs/+page.svelte
+++ b/frontend/src/routes/tracking-configs/+page.svelte
@@ -1,7 +1,7 @@
 <script lang="ts">
 	import { onMount, onDestroy } from 'svelte';
 	import { slide } from 'svelte/transition';
-	import { api, getBlockedBy, type BlockedByDetail } from '$lib/api';
+	import { api, getBlockedBy, type BlockedByDetail , errMsg} from '$lib/api';
 	import { topbarAction } from '$lib/stores/topbar-action.svelte';
 	import BlockedByModal from '$lib/components/BlockedByModal.svelte';
 	import { t } from '$lib/i18n';
@@ -28,13 +28,13 @@
 	import Button from '$lib/components/Button.svelte';
 	import MetaStrip, { type MetaTile } from '$lib/components/MetaStrip.svelte';
 
-	/** Grid-select item source lookup — maps descriptor string name to actual function. */
+	/** Grid-select item source lookup РІР‚вЂќ maps descriptor string name to actual function. */
 	const gridItemSources: Record<string, () => any[]> = {
 		sortByItems, sortOrderItems, albumModeItems, assetTypeItems, memorySourceItems,
 	};
 
 	/**
-	 * HH:MM, comma-separated: "09:00" or "09:00, 18:30" — the only format cron
+	 * HH:MM, comma-separated: "09:00" or "09:00, 18:30" РІР‚вЂќ the only format cron
 	 * dispatch accepts. Matched on blur for time-list fields; invalid values
 	 * are surfaced inline next to the input.
 	 */
@@ -43,7 +43,7 @@
 	/** Per-field error messages surfaced inline under time-list inputs. */
 	let timeListErrors = $state<Record<string, string>>({});
 
-	/** Normalize "9:0 , 18:30" → "09:00,18:30" on blur, clear error when valid. */
+	/** Normalize "9:0 , 18:30" РІвЂ вЂ™ "09:00,18:30" on blur, clear error when valid. */
 	function normalizeTimeList(key: string) {
 		const raw = String(form[key] ?? '').trim();
 		if (!raw) { timeListErrors = { ...timeListErrors, [key]: '' }; return; }
@@ -74,8 +74,8 @@
 	}
 
 	/**
-	 * Quiet-hours preview: "22:00 → 07:00 next day (9h)" or "Quiet period is 0
-	 * minutes — adjust times" when start equals end. Handles overnight ranges
+	 * Quiet-hours preview: "22:00 РІвЂ вЂ™ 07:00 next day (9h)" or "Quiet period is 0
+	 * minutes РІР‚вЂќ adjust times" when start equals end. Handles overnight ranges
 	 * (start > end) correctly.
 	 */
 	function quietHoursPreview(start: string, end: string): string {
@@ -92,8 +92,8 @@
 		const m = span % 60;
 		const dur = m === 0 ? `${h}h` : `${h}h ${m}m`;
 		const arrow = overnight
-			? `${start} → ${end} ${t('trackingConfig.nextDay')}`
-			: `${start} → ${end}`;
+			? `${start} РІвЂ вЂ™ ${end} ${t('trackingConfig.nextDay')}`
+			: `${start} РІвЂ вЂ™ ${end}`;
 		return `${arrow} (${dur})`;
 	}
 
@@ -112,12 +112,12 @@
 	/**
 	 * Inline preview of the shipped default template for a scheduled/periodic/
 	 * memory slot. Using the shipped default (not a tracker's current template)
-	 * keeps this scoped to the tracking-config page — which has no concept of
+	 * keeps this scoped to the tracking-config page РІР‚вЂќ which has no concept of
 	 * which TemplateConfig a given tracker uses. Users who want to edit the
 	 * actual config can click "Edit template" in the modal footer.
 	 *
 	 * ``previewLocale`` is modal-scoped so switching tabs only refetches for
-	 * this preview — the user's UI locale (and other previews) are untouched.
+	 * this preview РІР‚вЂќ the user's UI locale (and other previews) are untouched.
 	 */
 	let previewModal = $state<{ slotName: string; rendered: string; error: string; locale: string } | null>(null);
 	let previewLoading = $state(false);
@@ -158,8 +158,8 @@
 				error: res?.error || '',
 				locale,
 			};
-		} catch (err: any) {
-			previewModal = { slotName, rendered: '', error: err.message, locale };
+		} catch (err: unknown) {
+			previewModal = { slotName, rendered: '', error: errMsg(err), locale };
 		} finally {
 			previewLoading = false;
 		}
@@ -217,7 +217,7 @@
 	});
 	async function load() {
 		try { await trackingConfigsCache.fetch(true); }
-		catch (err: any) { error = err.message || t('common.loadError'); snackError(error); }
+		catch (err: unknown) { error = errMsg(err, t('common.loadError')); snackError(error); }
 		finally { loaded = true; highlightFromUrl(); _openEditFromUrl(); }
 	}
 
@@ -262,7 +262,7 @@
 		if (config.quiet_hours_start && config.quiet_hours_end) {
 			tiles.push({
 				icon: 'mdiWeatherNight',
-				label: `${config.quiet_hours_start}–${config.quiet_hours_end}`,
+				label: `${config.quiet_hours_start}РІР‚вЂњ${config.quiet_hours_end}`,
 				hint: t('trackingConfig.quietHoursStart'),
 				tone: 'citrus',
 				mono: true,
@@ -285,7 +285,7 @@
 			else await api('/tracking-configs', { method: 'POST', body: JSON.stringify(form) });
 			showForm = false; editing = null; await load();
 			snackSuccess(t('snack.trackingConfigSaved'));
-		} catch (err: any) { error = err.message; snackError(err.message); }
+		} catch (err: unknown) { const __m = errMsg(err); error = __m; snackError(__m); }
 	}
 
 	let blockedBy = $state<BlockedByDetail | null>(null);
@@ -294,10 +294,10 @@
 			id,
 			onconfirm: async () => {
 				try { await api(`/tracking-configs/${id}`, { method: 'DELETE' }); await load(); snackSuccess(t('snack.trackingConfigDeleted')); }
-				catch (err: any) {
+				catch (err: unknown) {
 					const bb = getBlockedBy(err);
 					if (bb) { blockedBy = bb; return; }
-					error = err.message; snackError(err.message);
+					const m = errMsg(err); error = m; snackError(m);
 				}
 				finally { confirmDelete = null; }
 			}
@@ -344,7 +344,7 @@
 			{/if}
 		</div>
 
-		<!-- Event tracking — driven by descriptor -->
+		<!-- Event tracking РІР‚вЂќ driven by descriptor -->
 		{#if descriptor}
 		<fieldset class="border border-[var(--color-border)] rounded-md p-3">
 			<legend class="text-sm font-medium px-1">{t('trackingConfig.eventTracking')}</legend>
@@ -377,7 +377,7 @@
 			{/if}
 		</fieldset>
 
-		<!-- Feature sections (periodic, scheduled, memory) — driven by descriptor -->
+		<!-- Feature sections (periodic, scheduled, memory) РІР‚вЂќ driven by descriptor -->
 		{#each descriptor.featureSections ?? [] as section (section.key)}
 		<fieldset class="border border-[var(--color-border)] rounded-md p-3">
 			<legend class="text-sm font-medium px-1">
@@ -494,9 +494,9 @@
 						</div>
 						<p class="text-sm text-[var(--color-muted-foreground)] list-row__secondary">
 							{(desc?.eventFields ?? []).filter(f => (config as Record<string, any>)[f.key]).map(f => t(f.label)).join(', ')}
-							{config.periodic_enabled ? ` · ${t('trackingConfig.periodic')}` : ''}
-							{config.scheduled_enabled ? ` · ${t('trackingConfig.scheduled')}` : ''}
-							{config.memory_enabled ? ` · ${t('trackingConfig.memory')}` : ''}
+							{config.periodic_enabled ? ` Р’В· ${t('trackingConfig.periodic')}` : ''}
+							{config.scheduled_enabled ? ` Р’В· ${t('trackingConfig.scheduled')}` : ''}
+							{config.memory_enabled ? ` Р’В· ${t('trackingConfig.memory')}` : ''}
 						</p>
 					</div>
 					<MetaStrip tiles={trackingConfigTiles(config)} />
@@ -518,7 +518,7 @@
 <BlockedByModal open={!!blockedBy} detail={blockedBy} onclose={() => blockedBy = null} />
 
 <Modal open={previewModal !== null}
-	title={previewModal ? `${t('trackingConfig.previewTemplate')} — ${previewModal.slotName}` : ''}
+	title={previewModal ? `${t('trackingConfig.previewTemplate')} РІР‚вЂќ ${previewModal.slotName}` : ''}
 	onclose={() => previewModal = null}>
 	{#if previewModal}
 		{#if previewLocales.length > 1}
@@ -537,7 +537,7 @@
 			{t('trackingConfig.previewSampleNote')}
 		</p>
 		<!-- Keep the prior rendered/error box mounted while refetching on locale
-		     switch — just dim it. Unmounting and replacing with a small "…"
+		     switch РІР‚вЂќ just dim it. Unmounting and replacing with a small "РІР‚В¦"
 		     placeholder caused a one-frame layout jump as the modal shrank and
 		     then re-expanded. -->
 		<div class="relative mb-3" style="opacity: {previewLoading ? 0.5 : 1}; transition: opacity 0.15s ease;">
@@ -550,7 +550,7 @@
 					<pre class="whitespace-pre-wrap text-xs">{@html sanitizePreview(previewModal.rendered)}</pre>
 				</div>
 			{:else}
-				<div class="p-3 text-xs" style="color: var(--color-muted-foreground);">…</div>
+				<div class="p-3 text-xs" style="color: var(--color-muted-foreground);">РІР‚В¦</div>
 			{/if}
 		</div>
 		<div class="flex gap-2 justify-end mt-3">
diff --git a/frontend/src/routes/users/+page.svelte b/frontend/src/routes/users/+page.svelte
index 0fdb8f0..09635c3 100644
--- a/frontend/src/routes/users/+page.svelte
+++ b/frontend/src/routes/users/+page.svelte
@@ -1,227 +1,225 @@
-<script lang="ts">
-	import { onMount } from 'svelte';
-	import { api, parseDate } from '$lib/api';
-	import { t } from '$lib/i18n';
-	import { getAuth } from '$lib/auth.svelte';
-	import PageHeader from '$lib/components/PageHeader.svelte';
-	import Card from '$lib/components/Card.svelte';
-	import Loading from '$lib/components/Loading.svelte';
-	import Modal from '$lib/components/Modal.svelte';
-	import ConfirmModal from '$lib/components/ConfirmModal.svelte';
-	import MdiIcon from '$lib/components/MdiIcon.svelte';
-	import EmptyState from '$lib/components/EmptyState.svelte';
-	import IconButton from '$lib/components/IconButton.svelte';
-	import { snackSuccess, snackError } from '$lib/stores/snackbar.svelte';
-	import ErrorBanner from '$lib/components/ErrorBanner.svelte';
-	import Button from '$lib/components/Button.svelte';
-	import MetaStrip, { type MetaTile } from '$lib/components/MetaStrip.svelte';
-	import type { User } from '$lib/types';
-
-	const auth = getAuth();
-	let users = $state<User[]>([]);
-	let showForm = $state(false);
-	let form = $state({ username: '', password: '', role: 'user' });
-	let error = $state('');
-	let loaded = $state(false);
-	let confirmDelete = $state<{ id: number; onconfirm: () => Promise<void> } | null>(null);
-
-	// Admin reset password
-	let resetUserId = $state<number | null>(null);
-	let resetUsername = $state('');
-	let resetPassword = $state('');
-	let resetMsg = $state('');
-	let resetSuccess = $state(false);
-
-	// Admin edit username/role
-	let editUserId = $state<number | null>(null);
-	let editUsername = $state('');
-	let editRole = $state('user');
-	let editMsg = $state('');
-	let editSuccess = $state(false);
-
-	onMount(load);
-	async function load() {
-		try { users = await api('/users'); }
-		catch (err: any) { error = err.message || t('common.loadError'); snackError(error); }
-		finally { loaded = true; }
-	}
-
-	async function create(e: SubmitEvent) {
-		e.preventDefault(); error = '';
-		try { await api('/users', { method: 'POST', body: JSON.stringify(form) }); form = { username: '', password: '', role: 'user' }; showForm = false; await load(); snackSuccess(t('snack.userCreated')); }
-		catch (err: any) { error = err.message; snackError(err.message); }
-	}
-	function remove(id: number) {
-		confirmDelete = {
-			id,
-			onconfirm: async () => {
-				try { await api(`/users/${id}`, { method: 'DELETE' }); await load(); snackSuccess(t('snack.userDeleted')); }
-				catch (err: any) { error = err.message; snackError(err.message); }
-				finally { confirmDelete = null; }
-			}
-		};
-	}
-	function openResetPassword(user: any) {
-		resetUserId = user.id; resetUsername = user.username; resetPassword = ''; resetMsg = ''; resetSuccess = false;
-	}
-	function openEditUser(user: any) {
-		editUserId = user.id; editUsername = user.username; editRole = user.role; editMsg = ''; editSuccess = false;
-	}
-	async function saveUserEdit(e: SubmitEvent) {
-		e.preventDefault(); editMsg = ''; editSuccess = false;
-		try {
-			await api(`/users/${editUserId}`, { method: 'PATCH', body: JSON.stringify({ username: editUsername, role: editRole }) });
-			editMsg = t('snack.userUpdated');
-			editSuccess = true;
-			snackSuccess(editMsg);
-			await load();
-			setTimeout(() => { editUserId = null; editMsg = ''; editSuccess = false; }, 1200);
-		} catch (err: any) { editMsg = err.message; editSuccess = false; snackError(err.message); }
-	}
-	async function resetUserPassword(e: SubmitEvent) {
-		e.preventDefault(); resetMsg = ''; resetSuccess = false;
-		try {
-			await api(`/users/${resetUserId}/password`, { method: 'PUT', body: JSON.stringify({ new_password: resetPassword }) });
-			resetMsg = t('common.passwordChanged');
-			resetSuccess = true;
-			snackSuccess(t('snack.passwordChanged'));
-			setTimeout(() => { resetUserId = null; resetMsg = ''; resetSuccess = false; }, 2000);
-		} catch (err: any) { resetMsg = err.message; resetSuccess = false; snackError(err.message); }
-	}
-
-	function userTiles(user: User): MetaTile[] {
-		const tiles: MetaTile[] = [];
-		const isAdmin = user.role === 'admin';
-		tiles.push({
-			icon: isAdmin ? 'mdiShieldCrownOutline' : 'mdiAccountOutline',
-			label: isAdmin ? t('users.roleAdmin') : t('users.roleUser'),
-			tone: isAdmin ? 'orchid' : 'sky',
-		});
-		tiles.push({
-			icon: 'mdiCalendarOutline',
-			label: parseDate(user.created_at).toLocaleDateString(),
-			hint: t('users.joined'),
-			tone: 'lavender',
-			mono: true,
-		});
-		if (user.id === auth.user?.id) {
-			tiles.push({
-				icon: 'mdiAccountStar',
-				label: t('users.you', 'you'),
-				tone: 'mint',
-			});
-		}
-		return tiles;
-	}
-</script>
-
-<PageHeader
-	title={t('users.title')}
-	emphasis={t('users.titleEmphasis')}
-	description={t('users.description')}
-	crumb={t('crumbs.systemAccess')}
-	count={users.length}
-	countLabel={t('users.countLabel')}
->
-	<Button size="sm" onclick={() => showForm = !showForm}>
-		{showForm ? t('users.cancel') : t('users.addUser')}
-	</Button>
-</PageHeader>
-
-{#if !loaded}<Loading />{:else}
-
-{#if showForm}
-	<Card class="mb-6">
-		{#if error}<ErrorBanner message={error} />{/if}
-		<form onsubmit={create} class="space-y-3">
-			<div>
-				<label for="usr-name" class="block text-sm font-medium mb-1">{t('users.username')}</label>
-				<input id="usr-name" bind:value={form.username} required class="w-full px-3 py-2 border border-[var(--color-border)] rounded-md text-sm bg-[var(--color-background)]" />
-			</div>
-			<div>
-				<label for="usr-pass" class="block text-sm font-medium mb-1">{t('users.password')}</label>
-				<input id="usr-pass" bind:value={form.password} required type="password" class="w-full px-3 py-2 border border-[var(--color-border)] rounded-md text-sm bg-[var(--color-background)]" />
-			</div>
-			<div>
-				<label for="usr-role" class="block text-sm font-medium mb-1">{t('users.role')}</label>
-				<select id="usr-role" bind:value={form.role} class="w-full px-3 py-2 border border-[var(--color-border)] rounded-md text-sm bg-[var(--color-background)]">
-					<option value="user">{t('users.roleUser')}</option>
-					<option value="admin">{t('users.roleAdmin')}</option>
-				</select>
-			</div>
-			<Button type="submit">{t('users.create')}</Button>
-		</form>
-	</Card>
-{/if}
-
-{#if users.length === 0}
-<Card>
-	<EmptyState icon="mdiAccountGroup" message={t('users.noUsers')} />
-</Card>
-{:else}
-<div class="list-stack stagger-children">
-	{#each users as user}
-		<Card hover>
-			<div class="list-row">
-				<div class="list-row__identity">
-					<p class="font-medium truncate">{user.username}</p>
-					<p class="text-sm text-[var(--color-muted-foreground)] list-row__secondary">{user.role === 'admin' ? t('users.roleAdmin') : t('users.roleUser')} · {t('users.joined')} {parseDate(user.created_at).toLocaleDateString()}</p>
-				</div>
-				<MetaStrip tiles={userTiles(user)} />
-				<div class="list-row__actions">
-					<IconButton icon="mdiPencil" title={t('users.edit')} onclick={() => openEditUser(user)} />
-					{#if user.id !== auth.user?.id}
-						<IconButton icon="mdiKeyVariant" title={t('common.changePassword')} onclick={() => openResetPassword(user)} />
-						<IconButton icon="mdiDelete" title={t('users.delete')} onclick={() => remove(user.id)} variant="danger" />
-					{/if}
-				</div>
-			</div>
-		</Card>
-	{/each}
-</div>
-{/if}
-
-{/if}
-
-<!-- Admin reset password modal -->
-<Modal open={resetUserId !== null} title="{t('common.changePassword')}: {resetUsername}" onclose={() => { resetUserId = null; resetMsg = ''; resetSuccess = false; }}>
-	<form onsubmit={resetUserPassword} class="space-y-3">
-		<div>
-			<label for="reset-pwd" class="block text-sm font-medium mb-1">{t('common.newPassword')}</label>
-			<input id="reset-pwd" type="password" bind:value={resetPassword} required
-				class="w-full px-3 py-2 border border-[var(--color-border)] rounded-md text-sm bg-[var(--color-background)]" />
-		</div>
-		{#if resetMsg}
-			<p class="text-sm {resetSuccess ? 'text-[var(--color-success-fg)]' : 'text-[var(--color-error-fg)]'}">{resetMsg}</p>
-		{/if}
-		<Button type="submit" class="w-full">
-			{t('common.save')}
-		</Button>
-	</form>
-</Modal>
-
-<!-- Admin edit username/role modal -->
-<Modal open={editUserId !== null} title={t('users.edit')} onclose={() => { editUserId = null; editMsg = ''; editSuccess = false; }}>
-	<form onsubmit={saveUserEdit} class="space-y-3">
-		<div>
-			<label for="edit-username" class="block text-sm font-medium mb-1">{t('users.username')}</label>
-			<input id="edit-username" bind:value={editUsername} required
-				class="w-full px-3 py-2 border border-[var(--color-border)] rounded-md text-sm bg-[var(--color-background)]" />
-		</div>
-		<div>
-			<label for="edit-role" class="block text-sm font-medium mb-1">{t('users.role')}</label>
-			<select id="edit-role" bind:value={editRole}
-				class="w-full px-3 py-2 border border-[var(--color-border)] rounded-md text-sm bg-[var(--color-background)]">
-				<option value="user">{t('users.roleUser')}</option>
-				<option value="admin">{t('users.roleAdmin')}</option>
-			</select>
-		</div>
-		{#if editMsg}
-			<p class="text-sm {editSuccess ? 'text-[var(--color-success-fg)]' : 'text-[var(--color-error-fg)]'}">{editMsg}</p>
-		{/if}
-		<Button type="submit" class="w-full">{t('common.save')}</Button>
-	</form>
-</Modal>
-
-<ConfirmModal open={confirmDelete !== null} message={t('users.confirmDelete')}
-	onconfirm={() => confirmDelete?.onconfirm()} oncancel={() => confirmDelete = null} />
+<script lang="ts">
+	import { onMount } from 'svelte';
+	import { api, parseDate , errMsg} from '$lib/api';
+	import { t } from '$lib/i18n';
+	import { getAuth } from '$lib/auth.svelte';
+	import PageHeader from '$lib/components/PageHeader.svelte';
+	import Card from '$lib/components/Card.svelte';
+	import Loading from '$lib/components/Loading.svelte';
+	import Modal from '$lib/components/Modal.svelte';
+	import ConfirmModal from '$lib/components/ConfirmModal.svelte';
+	import MdiIcon from '$lib/components/MdiIcon.svelte';
+	import EmptyState from '$lib/components/EmptyState.svelte';
+	import IconButton from '$lib/components/IconButton.svelte';
+	import { snackSuccess, snackError } from '$lib/stores/snackbar.svelte';
+	import ErrorBanner from '$lib/components/ErrorBanner.svelte';
+	import Button from '$lib/components/Button.svelte';
+	import MetaStrip, { type MetaTile } from '$lib/components/MetaStrip.svelte';
+	import type { User } from '$lib/types';
+
+	const auth = getAuth();
+	let users = $state<User[]>([]);
+	let showForm = $state(false);
+	let form = $state({ username: '', password: '', role: 'user' });
+	let error = $state('');
+	let loaded = $state(false);
+	let confirmDelete = $state<{ id: number; onconfirm: () => Promise<void> } | null>(null);
+
+	// Admin reset password
+	let resetUserId = $state<number | null>(null);
+	let resetUsername = $state('');
+	let resetPassword = $state('');
+	let resetMsg = $state('');
+	let resetSuccess = $state(false);
+
+	// Admin edit username/role
+	let editUserId = $state<number | null>(null);
+	let editUsername = $state('');
+	let editRole = $state('user');
+	let editMsg = $state('');
+	let editSuccess = $state(false);
+
+	onMount(load);
+	async function load() {
+		try { users = await api('/users'); }
+		catch (err: unknown) { error = errMsg(err, t('common.loadError')); snackError(error); }
+		finally { loaded = true; }
+	}
+
+	async function create(e: SubmitEvent) {
+		e.preventDefault(); error = '';
+		try { await api('/users', { method: 'POST', body: JSON.stringify(form) }); form = { username: '', password: '', role: 'user' }; showForm = false; await load(); snackSuccess(t('snack.userCreated')); } catch (err: unknown) { const __m = errMsg(err); error = __m; snackError(__m); }
+	}
+	function remove(id: number) {
+		confirmDelete = {
+			id,
+			onconfirm: async () => {
+				try { await api(`/users/${id}`, { method: 'DELETE' }); await load(); snackSuccess(t('snack.userDeleted')); } catch (err: unknown) { const __m = errMsg(err); error = __m; snackError(__m); }
+				finally { confirmDelete = null; }
+			}
+		};
+	}
+	function openResetPassword(user: any) {
+		resetUserId = user.id; resetUsername = user.username; resetPassword = ''; resetMsg = ''; resetSuccess = false;
+	}
+	function openEditUser(user: any) {
+		editUserId = user.id; editUsername = user.username; editRole = user.role; editMsg = ''; editSuccess = false;
+	}
+	async function saveUserEdit(e: SubmitEvent) {
+		e.preventDefault(); editMsg = ''; editSuccess = false;
+		try {
+			await api(`/users/${editUserId}`, { method: 'PATCH', body: JSON.stringify({ username: editUsername, role: editRole }) });
+			editMsg = t('snack.userUpdated');
+			editSuccess = true;
+			snackSuccess(editMsg);
+			await load();
+			setTimeout(() => { editUserId = null; editMsg = ''; editSuccess = false; }, 1200);
+		} catch (err: unknown) { const __m = errMsg(err); editMsg = __m; editSuccess = false; snackError(__m); }
+	}
+	async function resetUserPassword(e: SubmitEvent) {
+		e.preventDefault(); resetMsg = ''; resetSuccess = false;
+		try {
+			await api(`/users/${resetUserId}/password`, { method: 'PUT', body: JSON.stringify({ new_password: resetPassword }) });
+			resetMsg = t('common.passwordChanged');
+			resetSuccess = true;
+			snackSuccess(t('snack.passwordChanged'));
+			setTimeout(() => { resetUserId = null; resetMsg = ''; resetSuccess = false; }, 2000);
+		} catch (err: unknown) { const __m = errMsg(err); resetMsg = __m; resetSuccess = false; snackError(__m); }
+	}
+
+	function userTiles(user: User): MetaTile[] {
+		const tiles: MetaTile[] = [];
+		const isAdmin = user.role === 'admin';
+		tiles.push({
+			icon: isAdmin ? 'mdiShieldCrownOutline' : 'mdiAccountOutline',
+			label: isAdmin ? t('users.roleAdmin') : t('users.roleUser'),
+			tone: isAdmin ? 'orchid' : 'sky',
+		});
+		tiles.push({
+			icon: 'mdiCalendarOutline',
+			label: parseDate(user.created_at).toLocaleDateString(),
+			hint: t('users.joined'),
+			tone: 'lavender',
+			mono: true,
+		});
+		if (user.id === auth.user?.id) {
+			tiles.push({
+				icon: 'mdiAccountStar',
+				label: t('users.you', 'you'),
+				tone: 'mint',
+			});
+		}
+		return tiles;
+	}
+</script>
+
+<PageHeader
+	title={t('users.title')}
+	emphasis={t('users.titleEmphasis')}
+	description={t('users.description')}
+	crumb={t('crumbs.systemAccess')}
+	count={users.length}
+	countLabel={t('users.countLabel')}
+>
+	<Button size="sm" onclick={() => showForm = !showForm}>
+		{showForm ? t('users.cancel') : t('users.addUser')}
+	</Button>
+</PageHeader>
+
+{#if !loaded}<Loading />{:else}
+
+{#if showForm}
+	<Card class="mb-6">
+		{#if error}<ErrorBanner message={error} />{/if}
+		<form onsubmit={create} class="space-y-3">
+			<div>
+				<label for="usr-name" class="block text-sm font-medium mb-1">{t('users.username')}</label>
+				<input id="usr-name" bind:value={form.username} required class="w-full px-3 py-2 border border-[var(--color-border)] rounded-md text-sm bg-[var(--color-background)]" />
+			</div>
+			<div>
+				<label for="usr-pass" class="block text-sm font-medium mb-1">{t('users.password')}</label>
+				<input id="usr-pass" bind:value={form.password} required type="password" class="w-full px-3 py-2 border border-[var(--color-border)] rounded-md text-sm bg-[var(--color-background)]" />
+			</div>
+			<div>
+				<label for="usr-role" class="block text-sm font-medium mb-1">{t('users.role')}</label>
+				<select id="usr-role" bind:value={form.role} class="w-full px-3 py-2 border border-[var(--color-border)] rounded-md text-sm bg-[var(--color-background)]">
+					<option value="user">{t('users.roleUser')}</option>
+					<option value="admin">{t('users.roleAdmin')}</option>
+				</select>
+			</div>
+			<Button type="submit">{t('users.create')}</Button>
+		</form>
+	</Card>
+{/if}
+
+{#if users.length === 0}
+<Card>
+	<EmptyState icon="mdiAccountGroup" message={t('users.noUsers')} />
+</Card>
+{:else}
+<div class="list-stack stagger-children">
+	{#each users as user}
+		<Card hover>
+			<div class="list-row">
+				<div class="list-row__identity">
+					<p class="font-medium truncate">{user.username}</p>
+					<p class="text-sm text-[var(--color-muted-foreground)] list-row__secondary">{user.role === 'admin' ? t('users.roleAdmin') : t('users.roleUser')} В· {t('users.joined')} {parseDate(user.created_at).toLocaleDateString()}</p>
+				</div>
+				<MetaStrip tiles={userTiles(user)} />
+				<div class="list-row__actions">
+					<IconButton icon="mdiPencil" title={t('users.edit')} onclick={() => openEditUser(user)} />
+					{#if user.id !== auth.user?.id}
+						<IconButton icon="mdiKeyVariant" title={t('common.changePassword')} onclick={() => openResetPassword(user)} />
+						<IconButton icon="mdiDelete" title={t('users.delete')} onclick={() => remove(user.id)} variant="danger" />
+					{/if}
+				</div>
+			</div>
+		</Card>
+	{/each}
+</div>
+{/if}
+
+{/if}
+
+<!-- Admin reset password modal -->
+<Modal open={resetUserId !== null} title="{t('common.changePassword')}: {resetUsername}" onclose={() => { resetUserId = null; resetMsg = ''; resetSuccess = false; }}>
+	<form onsubmit={resetUserPassword} class="space-y-3">
+		<div>
+			<label for="reset-pwd" class="block text-sm font-medium mb-1">{t('common.newPassword')}</label>
+			<input id="reset-pwd" type="password" bind:value={resetPassword} required
+				class="w-full px-3 py-2 border border-[var(--color-border)] rounded-md text-sm bg-[var(--color-background)]" />
+		</div>
+		{#if resetMsg}
+			<p class="text-sm {resetSuccess ? 'text-[var(--color-success-fg)]' : 'text-[var(--color-error-fg)]'}">{resetMsg}</p>
+		{/if}
+		<Button type="submit" class="w-full">
+			{t('common.save')}
+		</Button>
+	</form>
+</Modal>
+
+<!-- Admin edit username/role modal -->
+<Modal open={editUserId !== null} title={t('users.edit')} onclose={() => { editUserId = null; editMsg = ''; editSuccess = false; }}>
+	<form onsubmit={saveUserEdit} class="space-y-3">
+		<div>
+			<label for="edit-username" class="block text-sm font-medium mb-1">{t('users.username')}</label>
+			<input id="edit-username" bind:value={editUsername} required
+				class="w-full px-3 py-2 border border-[var(--color-border)] rounded-md text-sm bg-[var(--color-background)]" />
+		</div>
+		<div>
+			<label for="edit-role" class="block text-sm font-medium mb-1">{t('users.role')}</label>
+			<select id="edit-role" bind:value={editRole}
+				class="w-full px-3 py-2 border border-[var(--color-border)] rounded-md text-sm bg-[var(--color-background)]">
+				<option value="user">{t('users.roleUser')}</option>
+				<option value="admin">{t('users.roleAdmin')}</option>
+			</select>
+		</div>
+		{#if editMsg}
+			<p class="text-sm {editSuccess ? 'text-[var(--color-success-fg)]' : 'text-[var(--color-error-fg)]'}">{editMsg}</p>
+		{/if}
+		<Button type="submit" class="w-full">{t('common.save')}</Button>
+	</form>
+</Modal>
+
+<ConfirmModal open={confirmDelete !== null} message={t('users.confirmDelete')}
+	onconfirm={() => confirmDelete?.onconfirm()} oncancel={() => confirmDelete = null} />
diff --git a/packages/core/src/notify_bridge_core/models/events.py b/packages/core/src/notify_bridge_core/models/events.py
index 41128ee..ae1ba72 100644
--- a/packages/core/src/notify_bridge_core/models/events.py
+++ b/packages/core/src/notify_bridge_core/models/events.py
@@ -71,6 +71,12 @@ class EventType(str, Enum):
     HA_SERVICE_CALLED = "ha_service_called"
     HA_EVENT_FIRED = "ha_event_fired"
 
+    # Bridge self-monitoring events — emitted by the bridge itself when
+    # internal failures cross configured thresholds.
+    BRIDGE_SELF_POLL_FAILURES = "bridge_self_poll_failures"
+    BRIDGE_SELF_DEFERRED_BACKLOG = "bridge_self_deferred_backlog"
+    BRIDGE_SELF_TARGET_FAILURES = "bridge_self_target_failures"
+
 
 @dataclass
 class ServiceEvent:
diff --git a/packages/core/src/notify_bridge_core/notifications/dispatcher.py b/packages/core/src/notify_bridge_core/notifications/dispatcher.py
index 01e04a3..27470d6 100644
--- a/packages/core/src/notify_bridge_core/notifications/dispatcher.py
+++ b/packages/core/src/notify_bridge_core/notifications/dispatcher.py
@@ -107,6 +107,12 @@ class NotificationDispatcher:
         # Optional shared session owned by the caller; when supplied we reuse
         # its connection pool instead of opening a fresh per-dispatch session.
         self._shared_session = session
+        # Per-dispatch render cache, keyed by locale. Populated by
+        # ``_send_to_target`` and consumed inside ``_message_for_receiver``
+        # so a 100-receiver fan-out renders each unique locale once.
+        # Initialized to empty so handlers called outside the normal
+        # dispatch path (tests) still see a valid dict.
+        self._render_cache: dict[str, str] = {}
 
     @contextlib.asynccontextmanager
     async def _session_ctx(self) -> AsyncIterator[aiohttp.ClientSession]:
@@ -198,20 +204,49 @@ class NotificationDispatcher:
     def _message_for_receiver(
         self, receiver: Receiver, default_message: str,
         event: ServiceEvent, target: TargetConfig,
+        cache: dict[str, str] | None = None,
     ) -> str:
-        if receiver.locale and receiver.locale != target.locale:
-            return self._render_message(event, target, receiver.locale)
-        return default_message
+        """Render message respecting receiver locale, with optional cache.
+
+        The ``cache`` dict (typically created in ``_send_to_target`` and
+        threaded through the per-channel ``_send_*`` handlers) memoizes
+        per-locale renders so a 100-receiver fan-out with two locales
+        renders twice instead of one hundred times.
+        """
+        loc = receiver.locale or target.locale
+        if loc == target.locale:
+            return default_message
+        if cache is not None:
+            cached = cache.get(loc)
+            if cached is not None:
+                return cached
+            rendered = self._render_message(event, target, loc)
+            cache[loc] = rendered
+            return rendered
+        return self._render_message(event, target, loc)
 
     async def _send_to_target(
         self, event: ServiceEvent, target: TargetConfig
     ) -> dict[str, Any]:
-        """Dispatch to a single target via the registered handler."""
+        """Dispatch to a single target via the registered handler.
+
+        Builds a per-locale render cache once and threads it through the
+        send handler. The cache is keyed by receiver locale; the default
+        locale's render lives in ``default_message`` and is short-circuited
+        before any cache lookup.
+        """
         default_message = self._render_message(event, target, target.locale)
         send_method = _PROVIDER_HANDLERS.get(target.type)
         if send_method is None:
             return {"success": False, "error": f"Unknown target type: {target.type}"}
-        return await send_method(self, target, default_message, event)
+        # Stash the cache on the dispatcher instance for the duration of
+        # this dispatch — handlers pick it up via _message_for_receiver.
+        # Avoids changing every _send_* signature.
+        self._render_cache: dict[str, str] = {}
+        try:
+            return await send_method(self, target, default_message, event)
+        finally:
+            self._render_cache = {}
 
     # ------------------------------------------------------------------
     # Asset preload (Telegram-specific)
@@ -352,7 +387,7 @@ class NotificationDispatcher:
             async def send_one(receiver: Receiver) -> dict[str, Any]:
                 if not isinstance(receiver, TelegramReceiver) or not receiver.chat_id:
                     return {"success": False, "error": "Invalid telegram receiver"}
-                message = self._message_for_receiver(receiver, default_message, event, target)
+                message = self._message_for_receiver(receiver, default_message, event, target, cache=self._render_cache)
                 text_result = await client.send_message(
                     chat_id=receiver.chat_id,
                     text=message,
@@ -407,7 +442,7 @@ class NotificationDispatcher:
             async def send_one(receiver: Receiver) -> dict[str, Any]:
                 if not isinstance(receiver, WebhookReceiver) or not receiver.url:
                     return {"success": False, "error": "Invalid webhook receiver"}
-                message = self._message_for_receiver(receiver, default_message, event, target)
+                message = self._message_for_receiver(receiver, default_message, event, target, cache=self._render_cache)
                 payload = {
                     "message": message,
                     "event_type": event.event_type.value,
@@ -450,7 +485,7 @@ class NotificationDispatcher:
         async def send_one(receiver: Receiver) -> dict[str, Any]:
             if not isinstance(receiver, EmailReceiver) or not receiver.email:
                 return {"success": False, "error": "Invalid email receiver"}
-            message = self._message_for_receiver(receiver, default_message, event, target)
+            message = self._message_for_receiver(receiver, default_message, event, target, cache=self._render_cache)
             # body_html=None lets EmailClient build a safely-escaped HTML
             # alternative from body_text instead of trusting user content.
             return await email_client.send(
@@ -479,7 +514,7 @@ class NotificationDispatcher:
             async def send_one(receiver: Receiver) -> dict[str, Any]:
                 if not isinstance(receiver, DiscordReceiver) or not receiver.webhook_url:
                     return {"success": False, "error": "Invalid discord receiver"}
-                message = self._message_for_receiver(receiver, default_message, event, target)
+                message = self._message_for_receiver(receiver, default_message, event, target, cache=self._render_cache)
                 return await client.send(receiver.webhook_url, message, username=username)
 
             results = await self._fan_out(target.receivers, send_one)
@@ -501,7 +536,7 @@ class NotificationDispatcher:
             async def send_one(receiver: Receiver) -> dict[str, Any]:
                 if not isinstance(receiver, SlackReceiver) or not receiver.webhook_url:
                     return {"success": False, "error": "Invalid slack receiver"}
-                message = self._message_for_receiver(receiver, default_message, event, target)
+                message = self._message_for_receiver(receiver, default_message, event, target, cache=self._render_cache)
                 return await client.send(receiver.webhook_url, message, username=username)
 
             results = await self._fan_out(target.receivers, send_one)
@@ -530,7 +565,7 @@ class NotificationDispatcher:
             async def send_one(receiver: Receiver) -> dict[str, Any]:
                 if not isinstance(receiver, NtfyReceiver) or not receiver.topic:
                     return {"success": False, "error": "Invalid ntfy receiver"}
-                message = self._message_for_receiver(receiver, default_message, event, target)
+                message = self._message_for_receiver(receiver, default_message, event, target, cache=self._render_cache)
                 return await client.send(
                     server_url, receiver.topic, message,
                     title=title, priority=receiver.priority, auth_token=auth_token,
@@ -563,7 +598,7 @@ class NotificationDispatcher:
             async def send_one(receiver: Receiver) -> dict[str, Any]:
                 if not isinstance(receiver, MatrixReceiver) or not receiver.room_id:
                     return {"success": False, "error": "Invalid matrix receiver"}
-                message = self._message_for_receiver(receiver, default_message, event, target)
+                message = self._message_for_receiver(receiver, default_message, event, target, cache=self._render_cache)
                 # body_html is the same plain text — Matrix accepts the
                 # raw message as both ``body`` and ``formatted_body``.
                 # If templates emit HTML in the future, generate a
diff --git a/packages/core/src/notify_bridge_core/notifications/telegram/client.py b/packages/core/src/notify_bridge_core/notifications/telegram/client.py
index 45c48ab..2732198 100644
--- a/packages/core/src/notify_bridge_core/notifications/telegram/client.py
+++ b/packages/core/src/notify_bridge_core/notifications/telegram/client.py
@@ -222,21 +222,48 @@ class TelegramClient:
         """SSRF-guarded GET that returns ``(data, error)``.
 
         Validates the URL via ``avalidate_outbound_url`` before any HTTP
-        traffic. Errors are returned (not raised) and stripped of any
-        embedded secrets before they propagate to the operator-visible
-        result dict.
+        traffic. Redirects are walked manually so each ``Location`` is
+        re-validated — without this an attacker-controlled origin could
+        302 to a private-IP target after the initial guard passed.
+        Errors are returned (not raised) and stripped of any embedded
+        secrets before they propagate to the operator-visible result
+        dict.
         """
+        max_redirects = 3
+        current_url = url
         try:
-            await avalidate_outbound_url(url)
+            await avalidate_outbound_url(current_url)
         except UnsafeURLError as err:
             return None, f"Unsafe URL: {redact_exc(err)}"
         try:
-            async with self._session.get(
-                url, headers=headers or {}, timeout=_DOWNLOAD_TIMEOUT,
-            ) as resp:
-                if resp.status != 200:
-                    return None, f"HTTP {resp.status}"
-                return await resp.read(), None
+            for _ in range(max_redirects + 1):
+                async with self._session.get(
+                    current_url,
+                    headers=headers or {},
+                    timeout=_DOWNLOAD_TIMEOUT,
+                    allow_redirects=False,
+                ) as resp:
+                    if resp.status in (301, 302, 303, 307, 308):
+                        loc = resp.headers.get("Location")
+                        if not loc:
+                            return None, f"HTTP {resp.status} without Location header"
+                        # ``resp.url`` is a yarl.URL; ``.join`` resolves
+                        # relative redirects (``/foo/bar``) against it.
+                        from yarl import URL as _URL
+                        try:
+                            next_url = str(resp.url.join(_URL(loc)))
+                        except (ValueError, TypeError):
+                            return None, "Malformed redirect Location"
+                        try:
+                            await avalidate_outbound_url(next_url)
+                        except UnsafeURLError as err:
+                            return None, f"Unsafe redirect: {redact_exc(err)}"
+                        current_url = next_url
+                        continue
+                    if resp.status != 200:
+                        return None, f"HTTP {resp.status}"
+                    return await resp.read(), None
+            return None, f"Too many redirects (>{max_redirects})"
         except (aiohttp.ClientError, asyncio.TimeoutError, OSError) as err:
             return None, redact_exc(err)
 
diff --git a/packages/core/src/notify_bridge_core/providers/base.py b/packages/core/src/notify_bridge_core/providers/base.py
index 630eded..362f63d 100644
--- a/packages/core/src/notify_bridge_core/providers/base.py
+++ b/packages/core/src/notify_bridge_core/providers/base.py
@@ -22,6 +22,7 @@ class ServiceProviderType(str, Enum):
     GOOGLE_PHOTOS = "google_photos"
     WEBHOOK = "webhook"
     HOME_ASSISTANT = "home_assistant"
+    BRIDGE_SELF = "bridge_self"
 
 
 # Callback signature for push-style providers: a coroutine that accepts a
diff --git a/packages/core/src/notify_bridge_core/providers/bridge_self/__init__.py b/packages/core/src/notify_bridge_core/providers/bridge_self/__init__.py
new file mode 100644
index 0000000..bfe9615
--- /dev/null
+++ b/packages/core/src/notify_bridge_core/providers/bridge_self/__init__.py
@@ -0,0 +1,39 @@
+"""Bridge self-monitoring service provider.
+
+Unlike external providers (Immich, Gitea, NUT, ...), the ``bridge_self``
+provider does not connect to any remote service. Its sole purpose is to
+give operators a configurable surface (thresholds + notification slots
++ trackers + targets) for events that the bridge itself emits when its
+internal subsystems fail.
+
+Three failure conditions are surfaced as :class:`ServiceEvent` instances
+through the same dispatch pipeline that all other providers use:
+
+* ``bridge_self_poll_failures``    — N consecutive poll failures for
+  any tracker exceed the configured threshold.
+* ``bridge_self_deferred_backlog`` — pending ``deferred_dispatch`` row
+  count crosses the configured threshold.
+* ``bridge_self_target_failures``  — N consecutive 5xx / network failures
+  for a single notification target.
+
+Events are constructed by ``services/bridge_self.py`` on the server side
+(it owns DB access for looking up the bridge_self provider per user)
+and then fed into ``dispatch_provider_event`` like any other event.
+"""
+
+from notify_bridge_core.providers.base import ServiceProviderType
+from notify_bridge_core.templates.variables import registry
+
+from .event_parser import build_event
+from .provider import BRIDGE_SELF_VARIABLES, BridgeSelfServiceProvider
+
+# Register variables so the validator and template-vars API see them.
+registry.register_provider_variables(
+    ServiceProviderType.BRIDGE_SELF, BRIDGE_SELF_VARIABLES,
+)
+
+__all__ = [
+    "BRIDGE_SELF_VARIABLES",
+    "BridgeSelfServiceProvider",
+    "build_event",
+]
diff --git a/packages/core/src/notify_bridge_core/providers/bridge_self/event_parser.py b/packages/core/src/notify_bridge_core/providers/bridge_self/event_parser.py
new file mode 100644
index 0000000..9d55114
--- /dev/null
+++ b/packages/core/src/notify_bridge_core/providers/bridge_self/event_parser.py
@@ -0,0 +1,89 @@
+"""Bridge self-monitoring event parser.
+
+The bridge generates these events from internal subsystems (watcher,
+scheduler, dispatcher) — the parser turns a flat payload dict into the
+generic :class:`ServiceEvent` shape that the rest of the dispatch
+pipeline expects.
+
+Payload shape::
+
+    {
+        "failure_type": "poll_failures" | "deferred_backlog" | "target_failures",
+        "subject_id":   int,        # tracker_id, target_id, or 0
+        "subject_name": str,
+        "count":        int,        # consecutive failures or pending count
+        "threshold":    int,
+        "last_error":   str,        # may be empty
+        "details":      dict[str, Any],  # extra context
+    }
+"""
+
+from __future__ import annotations
+
+from datetime import datetime, timezone
+from typing import Any
+
+from notify_bridge_core.models.events import EventType, ServiceEvent
+from notify_bridge_core.providers.base import ServiceProviderType
+
+
+# Defensive cap on the persisted error message; very long tracebacks would
+# bloat the EventLog details JSON column otherwise.
+_MAX_ERROR_LEN = 1000
+
+
+_FAILURE_TYPE_TO_EVENT: dict[str, EventType] = {
+    "poll_failures": EventType.BRIDGE_SELF_POLL_FAILURES,
+    "deferred_backlog": EventType.BRIDGE_SELF_DEFERRED_BACKLOG,
+    "target_failures": EventType.BRIDGE_SELF_TARGET_FAILURES,
+}
+
+
+def build_event(
+    payload: dict[str, Any],
+    *,
+    provider_name: str = "Bridge Self-Monitoring",
+    timestamp: datetime | None = None,
+) -> ServiceEvent | None:
+    """Convert a self-monitoring payload dict into a ServiceEvent.
+
+    Returns None for malformed payloads (unknown failure_type or missing
+    keys) — the caller drops without raising so a misbehaving emitter
+    can never tip over the dispatch pipeline.
+    """
+    if not isinstance(payload, dict):
+        return None
+    failure_type = payload.get("failure_type")
+    event_type = _FAILURE_TYPE_TO_EVENT.get(str(failure_type) if failure_type else "")
+    if event_type is None:
+        return None
+
+    subject_id = int(payload.get("subject_id") or 0)
+    subject_name = str(payload.get("subject_name") or "")
+    count = int(payload.get("count") or 0)
+    threshold = int(payload.get("threshold") or 0)
+    last_error = str(payload.get("last_error") or "")[:_MAX_ERROR_LEN]
+    details = payload.get("details") if isinstance(payload.get("details"), dict) else {}
+
+    when = timestamp or datetime.now(timezone.utc)
+
+    return ServiceEvent(
+        event_type=event_type,
+        provider_type=ServiceProviderType.BRIDGE_SELF,
+        provider_name=provider_name,
+        # ``collection_id`` / ``collection_name`` are required fields on
+        # ServiceEvent; we use the subject so quiet-hours / dedupe logic
+        # treats different subjects as distinct streams.
+        collection_id=str(subject_id),
+        collection_name=subject_name or str(failure_type),
+        timestamp=when,
+        extra={
+            "failure_type": str(failure_type),
+            "subject_id": subject_id,
+            "subject_name": subject_name,
+            "count": count,
+            "threshold": threshold,
+            "last_error": last_error,
+            "details": dict(details),
+        },
+    )
diff --git a/packages/core/src/notify_bridge_core/providers/bridge_self/provider.py b/packages/core/src/notify_bridge_core/providers/bridge_self/provider.py
new file mode 100644
index 0000000..2a3c02a
--- /dev/null
+++ b/packages/core/src/notify_bridge_core/providers/bridge_self/provider.py
@@ -0,0 +1,148 @@
+"""Bridge self-monitoring service provider — emits internal-failure events.
+
+This is a passive provider: it does not connect to anything, never polls,
+and never subscribes. It exists so the rest of the bridge's CRUD / config /
+template / target plumbing has a single ``ServiceProvider`` to attach
+self-monitoring trackers and notification slots to.
+
+Events are constructed by the server-side helper
+``services/bridge_self.emit_bridge_self_event`` and pushed into
+``dispatch_provider_event`` directly — the provider itself is not asked
+to produce events.
+"""
+
+from __future__ import annotations
+
+from typing import Any
+
+from notify_bridge_core.models.events import ServiceEvent
+from notify_bridge_core.providers.base import (
+    ServiceProvider,
+    ServiceProviderType,
+)
+from notify_bridge_core.templates.variables import TemplateVariableDefinition
+
+
+# Configuration keys recognised on the bridge_self provider's ``config`` JSON.
+DEFAULT_POLL_FAILURE_THRESHOLD = 3
+DEFAULT_DEFERRED_BACKLOG_THRESHOLD = 100
+DEFAULT_TARGET_FAILURE_THRESHOLD = 5
+
+
+# Template variables exposed to bridge_self templates.
+BRIDGE_SELF_VARIABLES: list[TemplateVariableDefinition] = [
+    TemplateVariableDefinition(
+        name="failure_type",
+        type="string",
+        description="Which self-monitoring condition fired",
+        example="poll_failures",
+        provider_type=ServiceProviderType.BRIDGE_SELF,
+    ),
+    TemplateVariableDefinition(
+        name="subject_id",
+        type="int",
+        description="ID of the affected entity (tracker_id, target_id, or 0)",
+        example="42",
+        provider_type=ServiceProviderType.BRIDGE_SELF,
+    ),
+    TemplateVariableDefinition(
+        name="subject_name",
+        type="string",
+        description="Human-readable name of the affected entity",
+        example="My Immich Tracker",
+        provider_type=ServiceProviderType.BRIDGE_SELF,
+    ),
+    TemplateVariableDefinition(
+        name="count",
+        type="int",
+        description="Consecutive failure count or current backlog size",
+        example="3",
+        provider_type=ServiceProviderType.BRIDGE_SELF,
+    ),
+    TemplateVariableDefinition(
+        name="threshold",
+        type="int",
+        description="Configured threshold that was crossed",
+        example="3",
+        provider_type=ServiceProviderType.BRIDGE_SELF,
+    ),
+    TemplateVariableDefinition(
+        name="last_error",
+        type="string",
+        description="Last underlying error message (truncated)",
+        example="Connection refused",
+        provider_type=ServiceProviderType.BRIDGE_SELF,
+    ),
+    TemplateVariableDefinition(
+        name="details",
+        type="dict",
+        description="Extra structured context for the event",
+        example='{"provider_id": 7}',
+        provider_type=ServiceProviderType.BRIDGE_SELF,
+    ),
+]
+
+
+class BridgeSelfServiceProvider(ServiceProvider):
+    """Passive provider — exposes nothing remote, holds only thresholds.
+
+    Polling is a no-op and ``connect`` always succeeds; the bridge itself
+    is what generates events for this provider.
+    """
+
+    provider_type = ServiceProviderType.BRIDGE_SELF
+    supports_subscription = False
+
+    def __init__(self, name: str = "Bridge Self-Monitoring") -> None:
+        self._name = name
+
+    async def connect(self) -> bool:
+        return True
+
+    async def disconnect(self) -> None:
+        return None
+
+    async def poll(
+        self,
+        collection_ids: list[str],
+        tracker_state: dict[str, Any],
+    ) -> tuple[list[ServiceEvent], dict[str, Any]]:
+        # No external service to poll. Returning empty keeps the contract
+        # so accidental scheduling no-ops cleanly.
+        return [], tracker_state
+
+    def get_available_variables(self) -> list[TemplateVariableDefinition]:
+        return list(BRIDGE_SELF_VARIABLES)
+
+    def get_provider_config_schema(self) -> dict[str, Any]:
+        return {
+            "type": "object",
+            "properties": {
+                "poll_failure_threshold": {
+                    "type": "integer",
+                    "minimum": 1,
+                    "default": DEFAULT_POLL_FAILURE_THRESHOLD,
+                    "description": "Consecutive tracker poll failures before alerting",
+                },
+                "deferred_backlog_threshold": {
+                    "type": "integer",
+                    "minimum": 1,
+                    "default": DEFAULT_DEFERRED_BACKLOG_THRESHOLD,
+                    "description": "Pending deferred_dispatch rows before alerting",
+                },
+                "target_failure_threshold": {
+                    "type": "integer",
+                    "minimum": 1,
+                    "default": DEFAULT_TARGET_FAILURE_THRESHOLD,
+                    "description": "Consecutive target send failures before alerting",
+                },
+            },
+            "required": [],
+        }
+
+    async def list_collections(self) -> list[dict[str, Any]]:
+        # No collection concept — operators don't pick anything for this provider.
+        return []
+
+    async def test_connection(self) -> dict[str, Any]:
+        return {"ok": True, "message": "Bridge self-monitoring is always available"}
diff --git a/packages/core/src/notify_bridge_core/providers/capabilities.py b/packages/core/src/notify_bridge_core/providers/capabilities.py
index 92b0da3..b27c94c 100644
--- a/packages/core/src/notify_bridge_core/providers/capabilities.py
+++ b/packages/core/src/notify_bridge_core/providers/capabilities.py
@@ -514,6 +514,39 @@ HOME_ASSISTANT_CAPABILITIES = ProviderCapabilities(
 )
 
 
+# ---------------------------------------------------------------------------
+# Bridge self-monitoring capabilities
+# ---------------------------------------------------------------------------
+
+BRIDGE_SELF_CAPABILITIES = ProviderCapabilities(
+    provider_type="bridge_self",
+    display_name="Bridge Self-Monitoring",
+    webhook_based=False,
+    supported_filters=[],
+    notification_slots=[
+        {
+            "name": "message_bridge_self_poll_failures",
+            "description": "Tracker poll failures crossed threshold",
+        },
+        {
+            "name": "message_bridge_self_deferred_backlog",
+            "description": "Deferred dispatch backlog crossed threshold",
+        },
+        {
+            "name": "message_bridge_self_target_failures",
+            "description": "Target send failures crossed threshold",
+        },
+    ],
+    events=[
+        {"name": "bridge_self_poll_failures", "description": "Tracker poll failures"},
+        {"name": "bridge_self_deferred_backlog", "description": "Deferred backlog high"},
+        {"name": "bridge_self_target_failures", "description": "Target send failures"},
+    ],
+    command_slots=[],
+    commands=[],
+)
+
+
 # ---------------------------------------------------------------------------
 # Registry
 # ---------------------------------------------------------------------------
@@ -527,6 +560,7 @@ _REGISTRY: dict[str, ProviderCapabilities] = {
     "google_photos": GOOGLE_PHOTOS_CAPABILITIES,
     "webhook": WEBHOOK_CAPABILITIES,
     "home_assistant": HOME_ASSISTANT_CAPABILITIES,
+    "bridge_self": BRIDGE_SELF_CAPABILITIES,
 }
 
 
diff --git a/packages/core/src/notify_bridge_core/providers/home_assistant/provider.py b/packages/core/src/notify_bridge_core/providers/home_assistant/provider.py
index 9660556..49bf1e7 100644
--- a/packages/core/src/notify_bridge_core/providers/home_assistant/provider.py
+++ b/packages/core/src/notify_bridge_core/providers/home_assistant/provider.py
@@ -10,7 +10,7 @@ arrive. The lifecycle is owned by the server-side subscription manager
 from __future__ import annotations
 
 import logging
-from typing import Any
+from typing import Any, Callable
 
 import aiohttp
 
@@ -25,6 +25,12 @@ from notify_bridge_core.templates.variables import TemplateVariableDefinition
 from .client import HomeAssistantWSClient
 from .event_parser import parse_event
 
+
+# Status callback signature: ``(state, detail)`` where ``state`` is one of
+# ``"connected"`` / ``"disconnected"`` and ``detail`` is an optional already-
+# redacted reason string (or None on connect).
+StatusChangeCallback = Callable[[str, str | None], None]
+
 _LOGGER = logging.getLogger(__name__)
 
 
@@ -229,7 +235,11 @@ class HomeAssistantServiceProvider(ServiceProvider):
         # — the subscription manager owns this provider's lifecycle instead.
         return [], tracker_state
 
-    async def subscribe(self, emit: EventEmitCallback) -> None:
+    async def subscribe(
+        self,
+        emit: EventEmitCallback,
+        on_status_change: StatusChangeCallback | None = None,
+    ) -> None:
         async def _on_event(ha_event: dict[str, Any]) -> None:
             event = parse_event(
                 ha_event,
@@ -252,6 +262,7 @@ class HomeAssistantServiceProvider(ServiceProvider):
             on_event=_on_event,
             event_types=self._event_types,
             refresh_areas=_refresh_areas,
+            on_status_change=on_status_change,
         )
 
     def get_available_variables(self) -> list[TemplateVariableDefinition]:
diff --git a/packages/core/src/notify_bridge_core/providers/immich/provider.py b/packages/core/src/notify_bridge_core/providers/immich/provider.py
index 0deb580..668b4bb 100644
--- a/packages/core/src/notify_bridge_core/providers/immich/provider.py
+++ b/packages/core/src/notify_bridge_core/providers/immich/provider.py
@@ -29,10 +29,21 @@ _LOGGER = logging.getLogger(__name__)
 # calls per poll cycle.  TTL is conservative (1h) and a hashed key keeps the
 # raw api_key out of dict keys in case of a memory dump.
 _USERS_CACHE_TTL_SECONDS = 3600
-_users_cache_lock = asyncio.Lock()
+# Lazy init: ``asyncio.Lock()`` at module import binds to whichever event
+# loop is current at import time (often none, or the wrong one when tests
+# spin up dedicated loops). Defer creation to first use.
+_users_cache_lock: asyncio.Lock | None = None
 _users_cache: dict[str, tuple[float, dict[str, str]]] = {}
 
 
+def _get_users_cache_lock() -> asyncio.Lock:
+    """Return the module users-cache lock, creating it on first call."""
+    global _users_cache_lock
+    if _users_cache_lock is None:
+        _users_cache_lock = asyncio.Lock()
+    return _users_cache_lock
+
+
 def _users_cache_key(url: str, api_key: str) -> str:
     digest = hashlib.sha256(f"{url}|{api_key}".encode("utf-8")).hexdigest()
     return digest[:32]
@@ -51,7 +62,7 @@ async def _get_cached_users(
     if entry is not None and (now - entry[0]) < _USERS_CACHE_TTL_SECONDS:
         return entry[1]
 
-    async with _users_cache_lock:
+    async with _get_users_cache_lock():
         # Re-check after acquiring the lock — another coroutine may have
         # refreshed the entry while we waited.
         entry = _users_cache.get(key)
diff --git a/packages/core/src/notify_bridge_core/providers/nut/provider.py b/packages/core/src/notify_bridge_core/providers/nut/provider.py
index bf054aa..8ef8633 100644
--- a/packages/core/src/notify_bridge_core/providers/nut/provider.py
+++ b/packages/core/src/notify_bridge_core/providers/nut/provider.py
@@ -200,10 +200,28 @@ class NutServiceProvider(ServiceProvider):
         try:
             for ups_name in collection_ids:
                 prev = tracker_state.get(ups_name, {})
+                # First-ever observation has no baseline — emitting transition
+                # events for whatever flags the device happens to carry would
+                # spam the user with "OB"/"LB"/"REPLBATT" alerts on every fresh
+                # tracker even when nothing changed. Seed state silently and
+                # skip event emission until the next poll provides a baseline.
+                is_first_observation = ups_name not in tracker_state
                 try:
                     variables = await client.list_var(ups_name)
                     data = NutUpsData.from_variables(ups_name, variables)
 
+                    if is_first_observation:
+                        new_state[ups_name] = {
+                            "name": data.description or ups_name,
+                            "status": data.status,
+                            "battery_charge": data.battery_charge,
+                            "comms_ok": True,
+                            "asset_ids": [],
+                            "pending_asset_ids": [],
+                            "shared": False,
+                        }
+                        continue
+
                     # Check for comms restored
                     if not prev.get("comms_ok", True):
                         events.append(self._make_event(
diff --git a/packages/core/src/notify_bridge_core/templates/context.py b/packages/core/src/notify_bridge_core/templates/context.py
index 8c4ced3..6404926 100644
--- a/packages/core/src/notify_bridge_core/templates/context.py
+++ b/packages/core/src/notify_bridge_core/templates/context.py
@@ -35,6 +35,10 @@ _SENSITIVE_EXTRA_TOKENS: tuple[str, ...] = (
     "bearer",
     "private_key",
     "access_key",
+    "oauth",
+    "client_secret",
+    "webhook_secret",
+    "csrf",
 )
 
 
diff --git a/packages/core/src/notify_bridge_core/templates/defaults/en/bridge_self_deferred_backlog.jinja2 b/packages/core/src/notify_bridge_core/templates/defaults/en/bridge_self_deferred_backlog.jinja2
new file mode 100644
index 0000000..3022cda
--- /dev/null
+++ b/packages/core/src/notify_bridge_core/templates/defaults/en/bridge_self_deferred_backlog.jinja2
@@ -0,0 +1,6 @@
+⚠️ <b>Deferred dispatch backlog high</b>
+Pending notifications: <b>{{ count }}</b>
+Threshold: <b>{{ threshold }}</b>
+{%- if last_error %}
+<i>Note:</i> <code>{{ last_error }}</code>
+{%- endif %}
diff --git a/packages/core/src/notify_bridge_core/templates/defaults/en/bridge_self_poll_failures.jinja2 b/packages/core/src/notify_bridge_core/templates/defaults/en/bridge_self_poll_failures.jinja2
new file mode 100644
index 0000000..659773d
--- /dev/null
+++ b/packages/core/src/notify_bridge_core/templates/defaults/en/bridge_self_poll_failures.jinja2
@@ -0,0 +1,6 @@
+🚨 <b>Tracker poll failures</b>
+<b>{{ subject_name }}</b> (id <code>{{ subject_id }}</code>)
+<b>{{ count }}</b> consecutive failures (threshold {{ threshold }})
+{%- if last_error %}
+<i>Last error:</i> <code>{{ last_error }}</code>
+{%- endif %}
diff --git a/packages/core/src/notify_bridge_core/templates/defaults/en/bridge_self_target_failures.jinja2 b/packages/core/src/notify_bridge_core/templates/defaults/en/bridge_self_target_failures.jinja2
new file mode 100644
index 0000000..7c19d13
--- /dev/null
+++ b/packages/core/src/notify_bridge_core/templates/defaults/en/bridge_self_target_failures.jinja2
@@ -0,0 +1,6 @@
+📡 <b>Target send failures</b>
+<b>{{ subject_name }}</b> (id <code>{{ subject_id }}</code>)
+<b>{{ count }}</b> consecutive failures (threshold {{ threshold }})
+{%- if last_error %}
+<i>Last error:</i> <code>{{ last_error }}</code>
+{%- endif %}
diff --git a/packages/core/src/notify_bridge_core/templates/defaults/loader.py b/packages/core/src/notify_bridge_core/templates/defaults/loader.py
index 1862bb8..089cd44 100644
--- a/packages/core/src/notify_bridge_core/templates/defaults/loader.py
+++ b/packages/core/src/notify_bridge_core/templates/defaults/loader.py
@@ -79,6 +79,11 @@ PROVIDER_SLOT_FILE_MAP: dict[str, dict[str, str]] = {
         "message_ha_service_called": "ha_service_called.jinja2",
         "message_ha_event_fired": "ha_event_fired.jinja2",
     },
+    "bridge_self": {
+        "message_bridge_self_poll_failures": "bridge_self_poll_failures.jinja2",
+        "message_bridge_self_deferred_backlog": "bridge_self_deferred_backlog.jinja2",
+        "message_bridge_self_target_failures": "bridge_self_target_failures.jinja2",
+    },
 }
 
 # Backward-compatible alias
diff --git a/packages/core/src/notify_bridge_core/templates/defaults/ru/bridge_self_deferred_backlog.jinja2 b/packages/core/src/notify_bridge_core/templates/defaults/ru/bridge_self_deferred_backlog.jinja2
new file mode 100644
index 0000000..01cf09c
--- /dev/null
+++ b/packages/core/src/notify_bridge_core/templates/defaults/ru/bridge_self_deferred_backlog.jinja2
@@ -0,0 +1,6 @@
+⚠️ <b>Очередь отложенной отправки растёт</b>
+Ожидают отправки: <b>{{ count }}</b>
+Порог: <b>{{ threshold }}</b>
+{%- if last_error %}
+<i>Примечание:</i> <code>{{ last_error }}</code>
+{%- endif %}
diff --git a/packages/core/src/notify_bridge_core/templates/defaults/ru/bridge_self_poll_failures.jinja2 b/packages/core/src/notify_bridge_core/templates/defaults/ru/bridge_self_poll_failures.jinja2
new file mode 100644
index 0000000..c3518f7
--- /dev/null
+++ b/packages/core/src/notify_bridge_core/templates/defaults/ru/bridge_self_poll_failures.jinja2
@@ -0,0 +1,6 @@
+🚨 <b>Сбои опроса трекера</b>
+<b>{{ subject_name }}</b> (id <code>{{ subject_id }}</code>)
+Подряд сбоев: <b>{{ count }}</b> (порог {{ threshold }})
+{%- if last_error %}
+<i>Последняя ошибка:</i> <code>{{ last_error }}</code>
+{%- endif %}
diff --git a/packages/core/src/notify_bridge_core/templates/defaults/ru/bridge_self_target_failures.jinja2 b/packages/core/src/notify_bridge_core/templates/defaults/ru/bridge_self_target_failures.jinja2
new file mode 100644
index 0000000..c1e9e09
--- /dev/null
+++ b/packages/core/src/notify_bridge_core/templates/defaults/ru/bridge_self_target_failures.jinja2
@@ -0,0 +1,6 @@
+📡 <b>Сбои отправки в адресат</b>
+<b>{{ subject_name }}</b> (id <code>{{ subject_id }}</code>)
+Подряд сбоев: <b>{{ count }}</b> (порог {{ threshold }})
+{%- if last_error %}
+<i>Последняя ошибка:</i> <code>{{ last_error }}</code>
+{%- endif %}
diff --git a/packages/core/src/notify_bridge_core/templates/renderer.py b/packages/core/src/notify_bridge_core/templates/renderer.py
index 22fd468..08dcadf 100644
--- a/packages/core/src/notify_bridge_core/templates/renderer.py
+++ b/packages/core/src/notify_bridge_core/templates/renderer.py
@@ -13,6 +13,7 @@ from __future__ import annotations
 
 import logging
 import threading
+from functools import lru_cache
 from typing import Any
 
 import jinja2
@@ -27,6 +28,19 @@ RENDER_TIMEOUT_SECONDS = 2.0
 _env = SandboxedEnvironment(autoescape=True)
 
 
+@lru_cache(maxsize=512)
+def _compile_cached(template_str: str) -> jinja2.Template:
+    """Compile + cache Jinja2 templates by source text.
+
+    Hot paths (NotificationDispatcher fan-out, periodic dispatch) re-render
+    the same template string for every event; ``_env.from_string`` parses
+    the source from scratch each time (~ms each). The 512-entry cache is
+    large enough to hold every template across a busy install while
+    keeping memory bounded.
+    """
+    return _env.from_string(template_str)
+
+
 class TemplateRenderTimeout(jinja2.TemplateError):
     """Raised when a template exceeds the configured render budget."""
 
@@ -74,7 +88,7 @@ def render_template(template_str: str, context: dict[str, Any]) -> str:
         )
         return "[Template too large]"
     try:
-        compiled = _env.from_string(template_str)
+        compiled = _compile_cached(template_str)
         output = _render_with_timeout(compiled, context)
     except TemplateRenderTimeout as e:
         _LOGGER.error("Template render timeout: %s", e)
diff --git a/packages/core/src/notify_bridge_core/templates/validator.py b/packages/core/src/notify_bridge_core/templates/validator.py
index c8e6ea2..071937f 100644
--- a/packages/core/src/notify_bridge_core/templates/validator.py
+++ b/packages/core/src/notify_bridge_core/templates/validator.py
@@ -27,6 +27,9 @@ def validate_template(
         "has_oversized_videos", "max_video_size", "max_video_size_mb",
         "added_assets", "assets", "albums",
         "raw_payload", "event_type_raw", "source_ip",
+        # bridge_self self-monitoring variables.
+        "failure_type", "subject_id", "subject_name", "count",
+        "threshold", "last_error", "details",
     }
     allowed = available | runtime_vars
 
diff --git a/packages/server/pyproject.toml b/packages/server/pyproject.toml
index 0b19810..9399d1f 100644
--- a/packages/server/pyproject.toml
+++ b/packages/server/pyproject.toml
@@ -21,6 +21,7 @@ dependencies = [
     "slowapi>=0.1.9",
     "cachetools>=5.3",
     "python-multipart>=0.0.9",
+    "prometheus_client>=0.20",
 ]
 
 [project.optional-dependencies]
diff --git a/packages/server/src/notify_bridge_server/api/actions.py b/packages/server/src/notify_bridge_server/api/actions.py
index 7395557..400fa8a 100644
--- a/packages/server/src/notify_bridge_server/api/actions.py
+++ b/packages/server/src/notify_bridge_server/api/actions.py
@@ -1,6 +1,7 @@
 """Action management API routes — CRUD, execute, dry-run, executions."""
 
 import logging
+import re
 
 from fastapi import APIRouter, Depends, HTTPException, Query, status
 from pydantic import BaseModel
@@ -54,6 +55,58 @@ class ActionUpdate(BaseModel):
 # ---------------------------------------------------------------------------
 
 
+# Allowlist of fields a CRUD client may set on Action. Mirrors ActionCreate /
+# ActionUpdate but enforced server-side so a tampered request body cannot
+# overwrite ``user_id``, ``last_run_at``, ``created_at``, etc. via ``**dump``.
+_ALLOWED_ACTION_CREATE_FIELDS = frozenset({
+    "provider_id", "name", "icon", "action_type", "config",
+    "schedule_type", "schedule_interval", "schedule_cron", "enabled",
+})
+_ALLOWED_ACTION_UPDATE_FIELDS = frozenset({
+    "name", "icon", "config",
+    "schedule_type", "schedule_interval", "schedule_cron", "enabled",
+})
+
+# 6 fields = standard cron, 7 fields = with seconds (Quartz-style). Reject
+# the 7-field form whose first column allows fires more often than once per
+# minute. Also reject ``*/N`` minute patterns where N<1 (so ``*/0``) and the
+# bare ``*`` minute used together with ``*`` second.
+_DISALLOWED_CRON_PATTERNS = (
+    re.compile(r"^\s*\*/0\s+"),  # */0 in any leading position
+)
+
+
+def _validate_cron(expr: str) -> None:
+    """Reject schedule_cron strings that fire more often than once per minute.
+
+    Without croniter as a hard dep we apply a conservative regex check: a
+    valid 5-field cron's first column is the minute, so anything other than
+    ``*``/digits/comma/dash/slash there is bogus, and a sub-minute cadence
+    requires a 6+ field expression with seconds. Reject both shapes.
+    """
+    if not expr or not expr.strip():
+        return
+    parts = expr.split()
+    if len(parts) >= 6:
+        # Seconds field present (Quartz-style or 6-field). Forbid
+        # second-level fires entirely; minute-cadence is the floor.
+        seconds_field = parts[0]
+        if seconds_field != "0":
+            raise HTTPException(
+                status_code=400,
+                detail=(
+                    "schedule_cron with a sub-minute cadence is not allowed; "
+                    "set the seconds field to 0 or use a standard 5-field cron"
+                ),
+            )
+    for pattern in _DISALLOWED_CRON_PATTERNS:
+        if pattern.search(expr):
+            raise HTTPException(
+                status_code=400,
+                detail="schedule_cron contains a disallowed pattern",
+            )
+
+
 async def _action_response(session: AsyncSession, action: Action) -> dict:
     """Build response dict with rules inlined."""
     result = await session.exec(
@@ -127,7 +180,15 @@ async def create_action(
             detail=f"Invalid action type '{body.action_type}' for provider type '{provider.type}'",
         )
 
-    action = Action(user_id=user.id, **body.model_dump())
+    _validate_cron(body.schedule_cron)
+
+    # Project only allowlisted fields so a tampered body can't write
+    # ``user_id``, ``id``, ``last_run_at``, etc. via ``**dump``.
+    payload = {
+        k: v for k, v in body.model_dump().items()
+        if k in _ALLOWED_ACTION_CREATE_FIELDS
+    }
+    action = Action(user_id=user.id, **payload)
     session.add(action)
     await session.commit()
     await session.refresh(action)
@@ -168,7 +229,13 @@ async def update_action(
         raise HTTPException(status_code=404, detail="Action not found")
 
     updates = body.model_dump(exclude_unset=True)
+    if "schedule_cron" in updates:
+        _validate_cron(updates["schedule_cron"] or "")
+    # Drop any field outside the update allowlist so a tampered request
+    # can't mutate ``user_id`` / ``provider_id`` / ``action_type`` etc.
     for key, value in updates.items():
+        if key not in _ALLOWED_ACTION_UPDATE_FIELDS:
+            continue
         setattr(action, key, value)
     session.add(action)
     await session.commit()
diff --git a/packages/server/src/notify_bridge_server/api/backup.py b/packages/server/src/notify_bridge_server/api/backup.py
index 52a6abd..df24754 100644
--- a/packages/server/src/notify_bridge_server/api/backup.py
+++ b/packages/server/src/notify_bridge_server/api/backup.py
@@ -48,6 +48,40 @@ _LOGGER = logging.getLogger(__name__)
 
 router = APIRouter(prefix="/api/backup", tags=["backup"])
 
+
+# Hard caps on uploaded backup file shape — defend against parser DoS
+# (deeply nested or pathologically wide JSON) before we hand the
+# structure to the import pipeline.
+_MAX_BACKUP_DEPTH = 10
+_MAX_BACKUP_NODES = 100_000
+
+
+def _validate_backup_shape(value: object, depth: int = 0, count: list[int] | None = None) -> None:
+    """Walk ``value`` and reject anything beyond the depth/node caps.
+
+    Raises HTTPException(400) on overflow. Cheap O(n) walk; runs once
+    per upload.
+    """
+    if count is None:
+        count = [0]
+    if depth > _MAX_BACKUP_DEPTH:
+        raise HTTPException(
+            status_code=400,
+            detail=f"Backup file too deeply nested (max depth {_MAX_BACKUP_DEPTH})",
+        )
+    count[0] += 1
+    if count[0] > _MAX_BACKUP_NODES:
+        raise HTTPException(
+            status_code=400,
+            detail=f"Backup file has too many nodes (max {_MAX_BACKUP_NODES})",
+        )
+    if isinstance(value, dict):
+        for v in value.values():
+            _validate_backup_shape(v, depth + 1, count)
+    elif isinstance(value, list):
+        for v in value:
+            _validate_backup_shape(v, depth + 1, count)
+
 MAX_UPLOAD_SIZE = 10 * 1024 * 1024  # 10 MB
 
 
@@ -181,6 +215,8 @@ async def validate_config(
     except json.JSONDecodeError as e:
         raise HTTPException(status_code=400, detail=f"Invalid JSON: {e}")
 
+    _validate_backup_shape(raw)
+
     result = validate_backup(raw)
     return result.model_dump()
 
@@ -204,6 +240,8 @@ async def import_config(
     except json.JSONDecodeError as e:
         raise HTTPException(status_code=400, detail=f"Invalid JSON: {e}")
 
+    _validate_backup_shape(raw)
+
     # Validate first
     validation = validate_backup(raw)
     if not validation.valid:
@@ -259,6 +297,8 @@ async def prepare_restore(
     except json.JSONDecodeError as e:
         raise HTTPException(status_code=400, detail=f"Invalid JSON: {e}")
 
+    _validate_backup_shape(raw)
+
     validation = validate_backup(raw)
     if not validation.valid:
         raise HTTPException(
diff --git a/packages/server/src/notify_bridge_server/api/command_template_configs.py b/packages/server/src/notify_bridge_server/api/command_template_configs.py
index c8cfea4..5114347 100644
--- a/packages/server/src/notify_bridge_server/api/command_template_configs.py
+++ b/packages/server/src/notify_bridge_server/api/command_template_configs.py
@@ -504,11 +504,14 @@ async def delete_config(
     if config.user_id == 0 and user.role != "admin":
         raise HTTPException(status_code=403, detail="Cannot delete system default configs")
     raise_if_used(await check_command_template_config(session, config.id), config.name)
-    slot_result = await session.exec(
-        select(CommandTemplateSlot).where(CommandTemplateSlot.config_id == config.id)
+    # Bulk delete slot rows so the round-trip count stays O(1) regardless
+    # of how many locale/slot combinations the config carries.
+    from sqlalchemy import delete as sa_delete
+    await session.execute(
+        sa_delete(CommandTemplateSlot).where(
+            CommandTemplateSlot.config_id == config.id
+        )
     )
-    for slot in slot_result.all():
-        await session.delete(slot)
     await session.delete(config)
     await session.commit()
 
diff --git a/packages/server/src/notify_bridge_server/api/command_trackers.py b/packages/server/src/notify_bridge_server/api/command_trackers.py
index 53923a4..21b4cfa 100644
--- a/packages/server/src/notify_bridge_server/api/command_trackers.py
+++ b/packages/server/src/notify_bridge_server/api/command_trackers.py
@@ -162,17 +162,26 @@ async def delete_command_tracker(
     from ..services.command_sync import mark_dirty_for_tracker
     await mark_dirty_for_tracker(tracker.id)
 
-    # Delete associated listeners, collecting bot IDs for polling cleanup
+    # First read the listeners we're about to delete so we can collect the
+    # set of telegram_bot IDs whose polling state may need to be re-checked.
+    # Then issue a single bulk DELETE instead of N per-row deletes.
+    from sqlalchemy import delete as sa_delete
+
     result = await session.exec(
         select(CommandTrackerListener).where(
             CommandTrackerListener.command_tracker_id == tracker_id
         )
     )
-    bot_ids_to_check: set[int] = set()
-    for listener in result.all():
-        if listener.listener_type == "telegram_bot":
-            bot_ids_to_check.add(listener.listener_id)
-        await session.delete(listener)
+    bot_ids_to_check: set[int] = {
+        listener.listener_id
+        for listener in result.all()
+        if listener.listener_type == "telegram_bot"
+    }
+    await session.execute(
+        sa_delete(CommandTrackerListener).where(
+            CommandTrackerListener.command_tracker_id == tracker_id
+        )
+    )
 
     await session.delete(tracker)
     await session.commit()
diff --git a/packages/server/src/notify_bridge_server/api/metrics.py b/packages/server/src/notify_bridge_server/api/metrics.py
new file mode 100644
index 0000000..7f4ef7d
--- /dev/null
+++ b/packages/server/src/notify_bridge_server/api/metrics.py
@@ -0,0 +1,161 @@
+"""Prometheus metrics endpoint and central registry.
+
+Exposes operational metrics via ``GET /api/metrics`` in the standard
+Prometheus text format. Unauthenticated by design — Prometheus scrapers do
+not authenticate. If the API port crosses a trust boundary, disable via
+``NOTIFY_BRIDGE_METRICS_ENABLED=false``.
+
+Metrics are defined as module-level singletons so the rest of the codebase
+can ``from notify_bridge_server.api.metrics import metrics`` and call
+``metrics.dispatch_duration.labels(channel="telegram").observe(0.42)``
+without re-creating the underlying objects.
+
+Other modules MUST NOT ``import prometheus_client`` directly. Route every
+metric through :data:`metrics` (a :class:`MetricsRegistry`) so we have one
+place to swap implementations or add labels.
+"""
+
+from __future__ import annotations
+
+import logging
+from typing import Final
+
+from fastapi import APIRouter, HTTPException
+from starlette.responses import Response
+
+from prometheus_client import (
+    CONTENT_TYPE_LATEST,
+    CollectorRegistry,
+    Counter,
+    Gauge,
+    Histogram,
+    generate_latest,
+)
+
+from ..config import settings as _settings
+
+_LOGGER = logging.getLogger(__name__)
+
+
+# ---------------------------------------------------------------------------
+# Metric definitions
+# ---------------------------------------------------------------------------
+# Use a dedicated CollectorRegistry instead of the global default registry so
+# tests can construct the module repeatedly without ``Duplicated timeseries``
+# errors and so we never accidentally export Python GC / process metrics that
+# aren't part of the documented surface in OPERATIONS.md.
+
+_REGISTRY: Final[CollectorRegistry] = CollectorRegistry()
+
+
+class MetricsRegistry:
+    """Singleton holder for module-level Prometheus collectors.
+
+    Instantiated once at import time as :data:`metrics`. Keep collectors as
+    instance attributes so call sites get IDE autocomplete and so swapping
+    the collector type (e.g. Counter -> Summary) is a one-line change here.
+    """
+
+    def __init__(self, registry: CollectorRegistry) -> None:
+        self.registry = registry
+
+        # Gauge: populated on every scrape via the collector hook below.
+        self.deferred_pending = Gauge(
+            "notify_bridge_deferred_pending",
+            "Count of deferred_dispatch rows awaiting drain.",
+            registry=registry,
+        )
+
+        # Counter: incremented after each event_log row is persisted.
+        self.event_log_total = Counter(
+            "notify_bridge_event_log_total",
+            "Total events written to event_log, partitioned by status and event_type.",
+            ["status", "event_type"],
+            registry=registry,
+        )
+
+        # Histogram: observed wall-clock seconds per outbound dispatch attempt.
+        self.dispatch_duration = Histogram(
+            "notify_bridge_dispatch_duration_seconds",
+            "Wall-clock duration of one dispatch attempt to a notification channel.",
+            ["channel"],
+            registry=registry,
+            buckets=(0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, 30.0),
+        )
+
+        # Counter: each polling provider that fails a tick increments by 1.
+        self.provider_poll_failures = Counter(
+            "notify_bridge_provider_poll_failures_total",
+            "Polling provider failures partitioned by provider type.",
+            ["provider_type"],
+            registry=registry,
+        )
+
+        # Counter: each rejected delivery to a target increments by 1.
+        self.target_send_failures = Counter(
+            "notify_bridge_target_send_failures_total",
+            "Failed sends to a target partitioned by target type and HTTP status.",
+            ["target_type", "status_code"],
+            registry=registry,
+        )
+
+
+metrics: Final[MetricsRegistry] = MetricsRegistry(_REGISTRY)
+
+
+# ---------------------------------------------------------------------------
+# Scrape hook: refresh dynamic gauges on demand
+# ---------------------------------------------------------------------------
+
+async def _refresh_deferred_pending_gauge() -> None:
+    """Populate ``deferred_pending`` by counting pending rows in the DB.
+
+    Called from the request handler before serializing — we don't poll the
+    DB on a fixed cadence to avoid a steady-state cost when nothing is
+    scraping. Kept tolerant: a DB error logs and leaves the previous value.
+    """
+    try:
+        from sqlalchemy import text
+
+        from ..database.engine import get_engine
+
+        engine = get_engine()
+        async with engine.connect() as conn:
+            result = await conn.execute(
+                text("SELECT count(*) FROM deferred_dispatch WHERE status='pending'")
+            )
+            row = result.first()
+            count = int(row[0]) if row else 0
+        metrics.deferred_pending.set(count)
+    except Exception as exc:  # noqa: BLE001 — never fail the scrape over this
+        _LOGGER.debug("deferred_pending refresh skipped: %s", exc)
+
+
+# ---------------------------------------------------------------------------
+# Router
+# ---------------------------------------------------------------------------
+
+router = APIRouter(tags=["metrics"])
+
+
+@router.get("/api/metrics")
+async def metrics_endpoint() -> Response:
+    """Expose collected metrics in Prometheus text format.
+
+    No auth by design — Prometheus scrapers don't authenticate. Gate the
+    endpoint via ``NOTIFY_BRIDGE_METRICS_ENABLED=false`` when the API port
+    is reachable from outside the trust boundary.
+    """
+    if not _settings.metrics_enabled:
+        raise HTTPException(status_code=404, detail="Metrics disabled")
+
+    await _refresh_deferred_pending_gauge()
+
+    # Stub increments so the endpoint reports non-empty data even before
+    # callers wire instrumentation. Removed once code-paths are instrumented.
+    # The labels here intentionally use a sentinel value so dashboards can
+    # filter the noise out: ``status="bootstrap"``.
+    metrics.event_log_total.labels(status="bootstrap", event_type="metrics_scrape").inc(0)
+
+    payload = generate_latest(_REGISTRY)
+    return Response(content=payload, media_type=CONTENT_TYPE_LATEST)
diff --git a/packages/server/src/notify_bridge_server/api/notification_trackers.py b/packages/server/src/notify_bridge_server/api/notification_trackers.py
index d9d9e32..5fd1f24 100644
--- a/packages/server/src/notify_bridge_server/api/notification_trackers.py
+++ b/packages/server/src/notify_bridge_server/api/notification_trackers.py
@@ -152,6 +152,10 @@ async def create_notification_tracker(
     session.add(tracker)
     await session.commit()
     await session.refresh(tracker)
+    # Drop the cached enabled-trackers list so the next inbound event
+    # (HA / webhook) sees the new tracker without waiting out the TTL.
+    from ..services.event_dispatch import invalidate_tracker_cache
+    invalidate_tracker_cache(tracker.provider_id)
     if tracker.enabled:
         await schedule_tracker(
             tracker.id, tracker.scan_interval,
@@ -184,6 +188,8 @@ async def update_notification_tracker(
     session.add(tracker)
     await session.commit()
     await session.refresh(tracker)
+    from ..services.event_dispatch import invalidate_tracker_cache
+    invalidate_tracker_cache(tracker.provider_id)
     if tracker.enabled:
         await schedule_tracker(
             tracker.id, tracker.scan_interval,
@@ -201,28 +207,39 @@ async def delete_notification_tracker(
     user: User = Depends(get_current_user),
     session: AsyncSession = Depends(get_session),
 ):
+    """Delete a tracker and its child rows in three bulk statements.
+
+    The previous implementation issued one DELETE per child row plus one
+    UPDATE per event_log row, which scaled linearly with the tracker's
+    history (an old, busy tracker could hit thousands of round-trips).
+    Bulk DELETE/UPDATE collapses that to three SQL statements regardless
+    of size.
+    """
+    from sqlalchemy import delete as sa_delete, update as sa_update
+
     tracker = await _get_user_tracker(session, tracker_id, user.id)
-    # Delete associated tracker-target links
-    result = await session.exec(
-        select(NotificationTrackerTarget).where(NotificationTrackerTarget.tracker_id == tracker_id)
+    # Junction rows — direct dependents of the tracker.
+    await session.execute(
+        sa_delete(NotificationTrackerTarget).where(
+            NotificationTrackerTarget.tracker_id == tracker_id
+        )
     )
-    for tt in result.all():
-        await session.delete(tt)
-    # Delete associated tracker state
-    state_result = await session.exec(
-        select(NotificationTrackerState).where(NotificationTrackerState.tracker_id == tracker_id)
+    # Persisted scan state for this tracker.
+    await session.execute(
+        sa_delete(NotificationTrackerState).where(
+            NotificationTrackerState.tracker_id == tracker_id
+        )
     )
-    for ts in state_result.all():
-        await session.delete(ts)
-    # Nullify event log references
-    event_result = await session.exec(
-        select(EventLog).where(EventLog.tracker_id == tracker_id)
+    # Preserve the audit trail in event_log; just null the back-reference
+    # so the tracker row can be removed without an FK violation.
+    await session.execute(
+        sa_update(EventLog).where(EventLog.tracker_id == tracker_id).values(tracker_id=None)
     )
-    for el in event_result.all():
-        el.tracker_id = None
-        session.add(el)
+    provider_id_for_cache = tracker.provider_id
     await session.delete(tracker)
     await session.commit()
+    from ..services.event_dispatch import invalidate_tracker_cache
+    invalidate_tracker_cache(provider_id_for_cache)
     await unschedule_tracker(tracker_id)
     await reschedule_immich_dispatch_jobs()
 
diff --git a/packages/server/src/notify_bridge_server/api/providers.py b/packages/server/src/notify_bridge_server/api/providers.py
index 5e0b1b9..9d0aff0 100644
--- a/packages/server/src/notify_bridge_server/api/providers.py
+++ b/packages/server/src/notify_bridge_server/api/providers.py
@@ -1,9 +1,10 @@
 """Service provider management API routes."""
 
 import logging
+import secrets
 
 from fastapi import APIRouter, Depends, HTTPException, status
-from pydantic import AnyHttpUrl, BaseModel, ValidationError, field_validator
+from pydantic import AnyHttpUrl, BaseModel, ValidationError, field_validator, model_validator
 from sqlmodel import select
 from sqlmodel.ext.asyncio.session import AsyncSession
 from typing import Any
@@ -94,14 +95,36 @@ class PayloadMapping(BaseModel):
 
 
 class WebhookProviderConfig(BaseModel):
-    auth_mode: str = "none"
+    # Default to bearer to avoid silently creating an open relay. Operators
+    # who genuinely want an unauthenticated endpoint must set
+    # ``acknowledge_unauthenticated=True`` to opt in explicitly.
+    auth_mode: str = "bearer_token"
     webhook_secret: str | None = None
+    # Explicit opt-in required for ``auth_mode="none"``. Without this flag
+    # an unauthenticated webhook is rejected at validation time so a
+    # mis-clicked dropdown can't expose the bridge to arbitrary internet
+    # traffic.
+    acknowledge_unauthenticated: bool = False
     payload_mappings: list[PayloadMapping] = []
     event_type_path: str | None = None
     collection_path: str | None = None
     store_payloads: bool = True
     max_stored_payloads: int = 20  # 1-100
 
+    @model_validator(mode="after")
+    def _check_auth(self) -> "WebhookProviderConfig":
+        if self.auth_mode == "none" and not self.acknowledge_unauthenticated:
+            raise ValueError(
+                "auth_mode='none' creates an open webhook endpoint; set "
+                "acknowledge_unauthenticated=true to confirm this is intentional"
+            )
+        if self.auth_mode in ("bearer_token", "hmac_sha256") and not self.webhook_secret:
+            # Auto-generate a strong secret if the operator forgot to supply
+            # one — better than rejecting an otherwise-valid config and far
+            # better than silently leaving the endpoint open.
+            self.webhook_secret = secrets.token_urlsafe(32)
+        return self
+
 
 class HomeAssistantProviderConfig(BaseModel):
     url: str
diff --git a/packages/server/src/notify_bridge_server/api/status.py b/packages/server/src/notify_bridge_server/api/status.py
index 8d3e47d..31db70f 100644
--- a/packages/server/src/notify_bridge_server/api/status.py
+++ b/packages/server/src/notify_bridge_server/api/status.py
@@ -291,15 +291,19 @@ async def get_nav_counts(
 ):
     """Return entity counts for sidebar navigation badges.
 
-    Note: queries run sequentially because SQLAlchemy AsyncSession is NOT safe
-    for concurrent use within a single session (no asyncio.gather).  We
-    minimise round-trips by combining user + system counts and per-type
-    target counts into single aggregate queries where possible.
+    Combines user-owned counts, system-owned shared counts, and per-type
+    target counts into a single round-trip via a UNION ALL of label + count
+    rows. SQLAlchemy AsyncSession is single-threaded so we cannot
+    asyncio.gather; collapsing 16 SELECTs into one is the optimisation.
     """
+    from sqlalchemy import literal, union_all
+
     counts: dict[str, int] = {}
 
-    # --- 1) User-owned entity counts (one query per model) ---
-    for model, key in [
+    user_id = user.id
+
+    # User-owned counts: one (label, count) per model.
+    user_models = [
         (ServiceProvider, "providers"),
         (NotificationTracker, "notification_trackers"),
         (TrackingConfig, "tracking_configs"),
@@ -311,40 +315,52 @@ async def get_nav_counts(
         (CommandTracker, "command_trackers"),
         (CommandConfig, "command_configs"),
         (CommandTemplateConfig, "command_template_configs"),
-    ]:
-        count = (await session.exec(
-            select(func.count()).select_from(model).where(model.user_id == user.id)
-        )).one()
-        counts[key] = count
-
-    # --- 2) Add system-owned counts (user_id=0) for shared entities ---
-    for model, key in [
+    ]
+    # System-owned shared counts (user_id=0) folded back into the same key.
+    system_models = [
         (TemplateConfig, "template_configs"),
         (CommandTemplateConfig, "command_template_configs"),
         (TrackingConfig, "tracking_configs"),
         (CommandConfig, "command_configs"),
-    ]:
-        system_count = (await session.exec(
-            select(func.count()).select_from(model).where(model.user_id == 0)
-        )).one()
-        counts[key] += system_count
-
-    # --- 3) Per-type target counts in a single query using conditional aggregation ---
+    ]
     target_types = ("telegram", "webhook", "email", "discord", "slack", "ntfy", "matrix")
-    type_counts_result = (await session.exec(
-        select(
-            NotificationTarget.type,
-            func.count(),
+
+    # Initialise counts to 0 so missing UNION rows surface as zeroes
+    # instead of KeyErrors when a category has no rows.
+    for _model, key in user_models:
+        counts[key] = 0
+    for ttype in target_types:
+        counts[f"targets_{ttype}"] = 0
+
+    queries = []
+    for model, key in user_models:
+        queries.append(
+            select(literal(key).label("k"), func.count().label("c"))
+            .select_from(model).where(model.user_id == user_id)
         )
-        .where(
-            NotificationTarget.user_id == user.id,
-            NotificationTarget.type.in_(target_types),
+    for model, key in system_models:
+        queries.append(
+            select(literal(f"__sys__:{key}").label("k"), func.count().label("c"))
+            .select_from(model).where(model.user_id == 0)
         )
-        .group_by(NotificationTarget.type)
-    )).all()
-    type_counts_map = dict(type_counts_result)
-    for target_type in target_types:
-        counts[f"targets_{target_type}"] = type_counts_map.get(target_type, 0)
+    for ttype in target_types:
+        queries.append(
+            select(literal(f"target:{ttype}").label("k"), func.count().label("c"))
+            .select_from(NotificationTarget).where(
+                NotificationTarget.user_id == user_id,
+                NotificationTarget.type == ttype,
+            )
+        )
+
+    union_q = union_all(*queries)
+    rows = (await session.execute(union_q)).all()
+    for label, value in rows:
+        if label.startswith("__sys__:"):
+            counts[label.removeprefix("__sys__:")] += int(value or 0)
+        elif label.startswith("target:"):
+            counts[f"targets_{label.removeprefix('target:')}"] = int(value or 0)
+        else:
+            counts[label] = int(value or 0)
 
     return counts
 
diff --git a/packages/server/src/notify_bridge_server/api/template_configs.py b/packages/server/src/notify_bridge_server/api/template_configs.py
index 5984d12..5013c31 100644
--- a/packages/server/src/notify_bridge_server/api/template_configs.py
+++ b/packages/server/src/notify_bridge_server/api/template_configs.py
@@ -287,6 +287,8 @@ async def get_template_variables(
         **_nut_variables(),
         # --- Home Assistant slots ---
         **_home_assistant_variables(),
+        # --- Bridge self-monitoring slots ---
+        **_bridge_self_variables(),
         # --- Scheduler slots ---
         "message_scheduled_message": {
             "description": "Notification for scheduled message events",
@@ -487,6 +489,32 @@ def _home_assistant_variables() -> dict:
     }
 
 
+def _bridge_self_variables() -> dict:
+    common = {
+        "failure_type": "Which condition fired (poll_failures, deferred_backlog, target_failures)",
+        "subject_id": "Affected entity ID (tracker_id, target_id, or 0 for backlog)",
+        "subject_name": "Human-readable name of the affected entity",
+        "count": "Consecutive failure count or current backlog size",
+        "threshold": "Configured threshold that was crossed",
+        "last_error": "Last underlying error message (truncated)",
+        "details": "Extra structured context dict (use {{ details | tojson }})",
+    }
+    return {
+        "message_bridge_self_poll_failures": {
+            "description": "Tracker poll failures crossed threshold",
+            "variables": common,
+        },
+        "message_bridge_self_deferred_backlog": {
+            "description": "Deferred dispatch backlog crossed threshold",
+            "variables": common,
+        },
+        "message_bridge_self_target_failures": {
+            "description": "Target send failures crossed threshold",
+            "variables": common,
+        },
+    }
+
+
 @router.post("", status_code=status.HTTP_201_CREATED)
 async def create_config(
     body: TemplateConfigCreate,
diff --git a/packages/server/src/notify_bridge_server/api/users.py b/packages/server/src/notify_bridge_server/api/users.py
index ec6972e..9cbb162 100644
--- a/packages/server/src/notify_bridge_server/api/users.py
+++ b/packages/server/src/notify_bridge_server/api/users.py
@@ -64,9 +64,19 @@ async def create_user(
     admin: User = Depends(require_admin),
     session: AsyncSession = Depends(get_session),
 ):
-    """Create a new user (admin only)."""
+    """Create a new user (admin only).
+
+    Username is normalised to ``strip().lower()`` so "Admin" and "admin"
+    cannot coexist. We do not add a CHECK constraint at the DB level — that
+    would require rebuilding the table on SQLite — so the application is
+    the single source of truth for normalisation.
+    """
+    # Normalise so case-only variants collide with existing accounts.
+    username = (body.username or "").strip().lower()
+    if not username:
+        raise HTTPException(status_code=400, detail="Username cannot be empty")
     # Check for duplicate username
-    result = await session.exec(select(User).where(User.username == body.username))
+    result = await session.exec(select(User).where(User.username == username))
     if result.first():
         raise HTTPException(status_code=409, detail="Username already exists")
 
@@ -74,13 +84,25 @@ async def create_user(
         raise HTTPException(status_code=400, detail="Password must be at least 8 characters")
 
     user = User(
-        username=body.username,
+        username=username,
         hashed_password=await _hash_password(body.password),
         role=body.role if body.role in ("admin", "user") else "user",
     )
     session.add(user)
     await session.commit()
     await session.refresh(user)
+
+    # Auto-create the bridge_self provider so the new user immediately gets
+    # internal-failure notifications without manual setup. Best-effort —
+    # a seeding hiccup must not fail the user creation itself.
+    try:
+        from ..database.seeds import ensure_bridge_self_provider_for_user
+        await ensure_bridge_self_provider_for_user(session, user.id)
+        await session.commit()
+    except Exception:  # noqa: BLE001
+        _LOGGER.exception("Failed to auto-seed bridge_self provider for user %s", user.id)
+        await session.rollback()
+
     return {"id": user.id, "username": user.username, "role": user.role}
 
 
@@ -103,14 +125,19 @@ async def update_user(
     identity_changed = False
 
     if body.username is not None and body.username != user.username:
-        new_username = body.username.strip()
+        # Normalise to match the case-insensitive uniqueness rule applied
+        # at user creation. Comparing the normalised form against the
+        # stored username also avoids false-positive "no change" when a
+        # legacy mixed-case account is being renamed to its lower form.
+        new_username = (body.username or "").strip().lower()
         if not new_username:
             raise HTTPException(status_code=400, detail="Username cannot be empty")
-        dup = await session.exec(select(User).where(User.username == new_username))
-        if dup.first():
-            raise HTTPException(status_code=409, detail="Username already exists")
-        user.username = new_username
-        identity_changed = True
+        if new_username != user.username:
+            dup = await session.exec(select(User).where(User.username == new_username))
+            if dup.first():
+                raise HTTPException(status_code=409, detail="Username already exists")
+            user.username = new_username
+            identity_changed = True
 
     if body.role is not None and body.role != user.role:
         if body.role not in ("admin", "user"):
@@ -191,11 +218,139 @@ async def delete_user(
     admin: User = Depends(require_admin),
     session: AsyncSession = Depends(get_session),
 ):
-    """Delete a user (admin only, cannot delete self)."""
+    """Delete a user (admin only, cannot delete self).
+
+    Cascades through every user-owned table by hand. The model declares
+    ``ondelete=CASCADE`` on each FK, but SQLite only enforces FK actions
+    on tables created *after* the ondelete clause was added — existing
+    installs upgraded from older schemas need this Python-side cascade
+    instead of a multi-step table rebuild.
+
+    TODO: drop this manual cascade once we ship a real
+    rebuild-with-FK-actions migration for legacy SQLite installs (or
+    once Postgres becomes the default deployment target).
+    """
+    from sqlalchemy import delete as sa_delete, update as sa_update
+
     if user_id == admin.id:
         raise HTTPException(status_code=400, detail="Cannot delete yourself")
     user = await session.get(User, user_id)
     if not user:
         raise HTTPException(status_code=404, detail="User not found")
-    await session.delete(user)
-    await session.commit()
+
+    # Lazy import to avoid circulars.
+    from ..database.models import (
+        Action,
+        ActionExecution,
+        ActionRule,
+        CommandConfig,
+        CommandTracker,
+        CommandTrackerListener,
+        DeferredDispatch,
+        EventLog,
+        NotificationTarget,
+        NotificationTracker,
+        NotificationTrackerState,
+        NotificationTrackerTarget,
+        ServiceProvider,
+        TelegramBot,
+        TelegramChat,
+        TrackingConfig,
+        EmailBot,
+        MatrixBot,
+    )
+
+    # Wrap the entire cascade in one transaction so a failure mid-way
+    # cannot leave dangling child rows pointing at a missing user.
+    try:
+        # Order: leaves first, then their parents, finally the user. This
+        # matters even with FKs disabled — it's the natural dependency
+        # graph and avoids accidental constraint trips on engines that do
+        # enforce FKs (Postgres).
+
+        # Resolve tracker ids first (needed for state + link cleanup
+        # before the parent rows themselves are deleted further down).
+        from sqlmodel import select as _select
+        tracker_ids = list((await session.exec(
+            _select(NotificationTracker.id).where(NotificationTracker.user_id == user_id)
+        )).all())
+        if tracker_ids:
+            await session.execute(
+                sa_delete(NotificationTrackerState).where(
+                    NotificationTrackerState.tracker_id.in_(tracker_ids)
+                )
+            )
+            await session.execute(
+                sa_delete(NotificationTrackerTarget).where(
+                    NotificationTrackerTarget.tracker_id.in_(tracker_ids)
+                )
+            )
+            await session.execute(
+                sa_delete(DeferredDispatch).where(
+                    DeferredDispatch.tracker_id.in_(tracker_ids)
+                )
+            )
+
+        # Action children: rules and execution log.
+        action_ids = list((await session.exec(
+            _select(Action.id).where(Action.user_id == user_id)
+        )).all())
+        if action_ids:
+            await session.execute(
+                sa_delete(ActionRule).where(ActionRule.action_id.in_(action_ids))
+            )
+            await session.execute(
+                sa_delete(ActionExecution).where(
+                    ActionExecution.action_id.in_(action_ids)
+                )
+            )
+
+        # Command tracker children: listeners.
+        cmd_tracker_ids = list((await session.exec(
+            _select(CommandTracker.id).where(CommandTracker.user_id == user_id)
+        )).all())
+        if cmd_tracker_ids:
+            await session.execute(
+                sa_delete(CommandTrackerListener).where(
+                    CommandTrackerListener.command_tracker_id.in_(cmd_tracker_ids)
+                )
+            )
+
+        # Telegram bot children: chats.
+        bot_ids = list((await session.exec(
+            _select(TelegramBot.id).where(TelegramBot.user_id == user_id)
+        )).all())
+        if bot_ids:
+            await session.execute(
+                sa_delete(TelegramChat).where(TelegramChat.bot_id.in_(bot_ids))
+            )
+
+        # Owned top-level entities (user is a direct owner).
+        for model in (
+            NotificationTracker,
+            NotificationTarget,
+            CommandTracker,
+            CommandConfig,
+            TrackingConfig,
+            Action,
+            TelegramBot,
+            EmailBot,
+            MatrixBot,
+            ServiceProvider,
+        ):
+            await session.execute(
+                sa_delete(model).where(model.user_id == user_id)
+            )
+
+        # EventLog: keep the audit trail but null the owner reference so
+        # the rows survive the user delete (matches the SET NULL semantic
+        # declared on the model).
+        await session.execute(
+            sa_update(EventLog).where(EventLog.user_id == user_id).values(user_id=None)
+        )
+
+        await session.delete(user)
+        await session.commit()
+    except Exception:
+        await session.rollback()
+        raise
diff --git a/packages/server/src/notify_bridge_server/api/webhooks.py b/packages/server/src/notify_bridge_server/api/webhooks.py
index ab6bfb4..3166311 100644
--- a/packages/server/src/notify_bridge_server/api/webhooks.py
+++ b/packages/server/src/notify_bridge_server/api/webhooks.py
@@ -12,6 +12,8 @@ from fastapi import APIRouter, HTTPException, Request
 from sqlmodel import select
 from sqlmodel.ext.asyncio.session import AsyncSession
 
+from ..auth.routes import limiter
+
 from notify_bridge_core.models.events import ServiceEvent
 from notify_bridge_core.providers.gitea.event_parser import parse_webhook as parse_gitea_webhook
 from notify_bridge_core.providers.planka.event_parser import parse_webhook as parse_planka_webhook
@@ -240,6 +242,10 @@ async def planka_webhook(token: str, request: Request):
     if not _verify_planka_token(webhook_secret, request):
         raise HTTPException(status_code=403, detail="Invalid token")
 
+    # Read body AFTER auth check so an attacker without the bearer token
+    # can't force an unbounded read. Token is in the header, not the body.
+    raw_body = await _read_bounded_body(request)
+
     # Parse payload from the bounded raw_body we already read.
     try:
         payload = json.loads(raw_body.decode("utf-8"))
@@ -320,6 +326,8 @@ def _verify_generic_webhook_auth(
 _SENSITIVE_HEADER_SUBSTR = (
     "token", "auth", "key", "secret", "signature", "password", "credential",
     "cookie", "x-api", "x-hub-signature",
+    # Extended for per-key body redaction; harmless extras for header check.
+    "oauth", "client_secret", "webhook_secret", "csrf",
 )
 
 
@@ -328,6 +336,28 @@ def _is_sensitive_header(name: str) -> bool:
     return any(s in n for s in _SENSITIVE_HEADER_SUBSTR)
 
 
+_REDACTED_PLACEHOLDER = "[REDACTED]"
+
+
+def _redact_sensitive_body(value: object) -> object:
+    """Walk a parsed JSON body and redact values for sensitive-named keys.
+
+    Returns a defensively-copied structure so the caller's object is
+    never mutated (callers downstream still consume the original).
+    """
+    if isinstance(value, dict):
+        cleaned: dict[str, object] = {}
+        for k, v in value.items():
+            if isinstance(k, str) and _is_sensitive_header(k):
+                cleaned[k] = _REDACTED_PLACEHOLDER
+            else:
+                cleaned[k] = _redact_sensitive_body(v)
+        return cleaned
+    if isinstance(value, list):
+        return [_redact_sensitive_body(v) for v in value]
+    return value
+
+
 def _filter_headers(raw_headers: dict[str, str]) -> dict[str, str]:
     """Keep only safe headers for logging (strip Authorization, signatures, tokens).
 
@@ -358,11 +388,15 @@ async def _save_webhook_log(
     """Insert a webhook payload log entry and prune old ones."""
     try:
         body_json = body if isinstance(body, dict) else {}
+        # Strip sensitive values before persistence — webhook payloads
+        # routinely include OAuth tokens / secrets in the body, and the
+        # log is admin-readable but not need-to-know for the operator.
+        safe_body = _redact_sensitive_body(body_json) if body_json else {}
         session.add(WebhookPayloadLog(
             provider_id=provider_id,
             method=method,
             headers=headers,
-            body=body_json,
+            body=safe_body,
             status=status,
             extracted_fields=extracted_fields or {},
             error_message=error_message,
@@ -386,13 +420,19 @@ async def _save_webhook_log(
         _LOGGER.warning("Failed to save webhook payload log for provider %d", provider_id, exc_info=True)
         try:
             await session.rollback()
-        except Exception:
-            pass
+        except Exception:  # noqa: BLE001
+            _LOGGER.exception("Rollback after payload-log save failed")
 
 
 @router.post("/webhook/{token}")
+@limiter.limit("60/minute")
 async def generic_webhook(token: str, request: Request):
-    """Receive a generic webhook, extract variables via JSONPath, and dispatch notifications."""
+    """Receive a generic webhook, extract variables via JSONPath, and dispatch notifications.
+
+    Per-IP rate limit (60/min) caps blast radius from a single source —
+    legitimate providers send well below this; anything higher is either
+    a misconfigured retry loop or abuse.
+    """
     engine = get_engine()
 
     # --- Load provider and validate auth ---
diff --git a/packages/server/src/notify_bridge_server/auth/routes.py b/packages/server/src/notify_bridge_server/auth/routes.py
index e7e9f3a..e10261e 100644
--- a/packages/server/src/notify_bridge_server/auth/routes.py
+++ b/packages/server/src/notify_bridge_server/auth/routes.py
@@ -50,7 +50,12 @@ class RefreshRequest(BaseModel):
 
 
 async def _hash_password(password: str) -> str:
-    """bcrypt.hashpw is CPU-bound (~200-500ms); never run it on the event loop."""
+    """bcrypt.hashpw is CPU-bound (~200-500ms); never run it on the event loop.
+
+    Caller is responsible for length-validating ``password`` against the
+    72-byte bcrypt cap before calling — bcrypt silently truncates beyond
+    that, which is a correctness footgun, not a security one.
+    """
 
     def _work() -> str:
         return bcrypt.hashpw(password.encode(), bcrypt.gensalt()).decode()
@@ -58,6 +63,24 @@ async def _hash_password(password: str) -> str:
     return await asyncio.to_thread(_work)
 
 
+# bcrypt's algorithm cap — the underlying primitive truncates input
+# beyond this so two distinct passwords sharing a 72-byte prefix would
+# verify identically. We reject up-front with a clear 422 message.
+_BCRYPT_MAX_PASSWORD_BYTES = 72
+
+
+def _check_bcrypt_length(password: str) -> None:
+    if len(password.encode("utf-8")) > _BCRYPT_MAX_PASSWORD_BYTES:
+        raise HTTPException(
+            status_code=422,
+            detail=(
+                f"Password too long; bcrypt limit is "
+                f"{_BCRYPT_MAX_PASSWORD_BYTES} bytes (longer passwords would "
+                "be silently truncated)"
+            ),
+        )
+
+
 async def _verify_password(password: str, hashed: str) -> bool:
     def _work() -> bool:
         try:
@@ -74,6 +97,7 @@ async def _verify_password(password: str, hashed: str) -> bool:
 async def setup(request: Request, body: SetupRequest, session: AsyncSession = Depends(get_session)):
     if len(body.password) < 8:
         raise HTTPException(status_code=400, detail="Password must be at least 8 characters")
+    _check_bcrypt_length(body.password)
     # Compute hash BEFORE opening the transaction so we don't hold a writer lock
     # during the CPU-bound bcrypt work.
     hashed = await _hash_password(body.password)
@@ -97,6 +121,16 @@ async def setup(request: Request, body: SetupRequest, session: AsyncSession = De
         session.add(user)
     await session.refresh(user)
 
+    # Auto-create the bridge_self provider for the new admin so internal-
+    # failure notifications work out of the box. Best-effort — a seeding
+    # failure should not abort setup.
+    try:
+        from ..database.seeds import ensure_bridge_self_provider_for_user
+        await ensure_bridge_self_provider_for_user(session, user.id)
+        await session.commit()
+    except Exception:  # noqa: BLE001
+        await session.rollback()
+
     return TokenResponse(
         access_token=create_access_token(user.id, user.role, user.token_version),
         refresh_token=create_refresh_token(user.id, user.token_version),
@@ -170,6 +204,7 @@ async def change_password(
         raise HTTPException(status_code=400, detail="Current password is incorrect")
     if len(body.new_password) < 8:
         raise HTTPException(status_code=400, detail="New password must be at least 8 characters")
+    _check_bcrypt_length(body.new_password)
     user.hashed_password = await _hash_password(body.new_password)
     user.token_version = (user.token_version or 1) + 1
     session.add(user)
diff --git a/packages/server/src/notify_bridge_server/commands/webhook.py b/packages/server/src/notify_bridge_server/commands/webhook.py
index 08af465..486b23f 100644
--- a/packages/server/src/notify_bridge_server/commands/webhook.py
+++ b/packages/server/src/notify_bridge_server/commands/webhook.py
@@ -43,10 +43,21 @@ async def telegram_webhook(
     session: AsyncSession = Depends(get_session),
 ):
     """Handle incoming Telegram messages — route commands to handlers."""
-    # Validate webhook secret if configured
-    if _webhook_secret:
-        if not hmac.compare_digest(x_telegram_bot_api_secret_token or "", _webhook_secret):
-            raise HTTPException(status_code=403, detail="Invalid webhook secret")
+    # Telegram webhook secret is MANDATORY: without it any peer that knows
+    # the opaque webhook URL could inject arbitrary updates as if Telegram
+    # had sent them. Refuse to handle updates if no secret is configured.
+    if not _webhook_secret:
+        _LOGGER.error(
+            "Refusing Telegram webhook update for %s — webhook secret not configured "
+            "(set NOTIFY_BRIDGE_TELEGRAM_WEBHOOK_SECRET)",
+            webhook_id,
+        )
+        raise HTTPException(
+            status_code=401,
+            detail="Telegram webhook secret not configured on this server",
+        )
+    if not hmac.compare_digest(x_telegram_bot_api_secret_token or "", _webhook_secret):
+        raise HTTPException(status_code=403, detail="Invalid webhook secret")
 
     # Find bot by opaque webhook path ID (not by token — token must not appear in URLs)
     bot_result = await session.exec(
@@ -161,7 +172,17 @@ async def telegram_webhook(
 
 
 async def register_webhook(bot_token: str, webhook_url: str, secret: str | None = None) -> dict:
-    """Register webhook URL with Telegram Bot API via TelegramClient."""
+    """Register webhook URL with Telegram Bot API via TelegramClient.
+
+    Refuses to register without a secret: a webhook without a secret
+    accepts any unauthenticated POST as a valid Telegram update, so we
+    never want one in production.
+    """
+    if not secret:
+        raise ValueError(
+            "Telegram webhook registration requires a secret token "
+            "(set NOTIFY_BRIDGE_TELEGRAM_WEBHOOK_SECRET)"
+        )
     from ..services.http_session import get_http_session
     http = await get_http_session()
     client = TelegramClient(http, bot_token)
diff --git a/packages/server/src/notify_bridge_server/config.py b/packages/server/src/notify_bridge_server/config.py
index a47f4ce..2120bc7 100644
--- a/packages/server/src/notify_bridge_server/config.py
+++ b/packages/server/src/notify_bridge_server/config.py
@@ -76,6 +76,13 @@ class Settings(BaseSettings):
     before migrations run using SQLite's ``VACUUM INTO`` (atomic, consistent).
     """
 
+    metrics_enabled: bool = True
+    """Expose the Prometheus ``/api/metrics`` endpoint. Disable on hardened
+    deployments where the API port is exposed beyond the trust boundary —
+    metrics are unauthenticated and can leak operational information about
+    queue depth, dispatch rates, and provider failures.
+    """
+
     model_config = {"env_prefix": "NOTIFY_BRIDGE_"}
 
     def model_post_init(self, __context: Any) -> None:
diff --git a/packages/server/src/notify_bridge_server/database/migrations.py b/packages/server/src/notify_bridge_server/database/migrations.py
index 43093ad..bdd61f1 100644
--- a/packages/server/src/notify_bridge_server/database/migrations.py
+++ b/packages/server/src/notify_bridge_server/database/migrations.py
@@ -309,6 +309,22 @@ async def migrate_schema(engine: AsyncEngine) -> None:
                     )
                     logger.info("Added %s column to tracking_config table", col_name)
 
+        # Add Bridge self-monitoring tracking flags to tracking_config if missing.
+        # All three default ON — the bridge_self provider exists specifically
+        # to surface these conditions, so silencing one would defeat the point.
+        if await _has_table(conn, "tracking_config"):
+            bridge_self_flags = [
+                ("track_bridge_self_poll_failures", "INTEGER DEFAULT 1"),
+                ("track_bridge_self_deferred_backlog", "INTEGER DEFAULT 1"),
+                ("track_bridge_self_target_failures", "INTEGER DEFAULT 1"),
+            ]
+            for col_name, col_type in bridge_self_flags:
+                if not await _has_column(conn, "tracking_config", col_name):
+                    await conn.execute(
+                        text(f"ALTER TABLE tracking_config ADD COLUMN {col_name} {col_type}")
+                    )
+                    logger.info("Added %s column to tracking_config table", col_name)
+
         # Add quiet hours to tracking_config if missing.
         # Start/end are nullable HH:MM strings; quiet_hours_enabled gates them.
         if await _has_table(conn, "tracking_config"):
@@ -1361,6 +1377,12 @@ _INDEXES: list[tuple[str, str, str]] = [
     ("ix_action_provider_id", "action", "provider_id"),
     # Dashboard: SELECT event_log WHERE user_id = ? ORDER BY created_at DESC
     ("ix_event_log_user_created", "event_log", "user_id, created_at DESC"),
+    # Dashboard "events of type X for me, recent first" filter.
+    (
+        "ix_event_log_user_event_type_created",
+        "event_log",
+        "user_id, event_type, created_at DESC",
+    ),
     ("ix_event_log_provider_id", "event_log", "provider_id"),
     ("ix_event_log_notification_tracker_id", "event_log", "notification_tracker_id"),
     ("ix_event_log_action_id", "event_log", "action_id"),
@@ -1543,6 +1565,269 @@ async def migrate_chat_action_to_column(engine: AsyncEngine) -> None:
         logger.info("Migrated chat_action from config JSON to column where present")
 
 
+# ---------------------------------------------------------------------------
+# Uniqueness + dedupe migrations for webhook hot paths.
+#
+# These backfill missing UNIQUE indexes on webhook tokens, webhook path IDs,
+# bot_id (with sentinel guard), (bot_id, chat_id), and tracker-target links.
+# Every CREATE UNIQUE INDEX is preceded by a dedupe pass that keeps the
+# canonical row (lowest id, or oldest created_at where specified) and removes
+# the rest, logging a WARNING with the dropped count so operators can audit.
+# ---------------------------------------------------------------------------
+
+
+async def _dedupe_by_columns(
+    conn,
+    table: str,
+    cols: list[str],
+    *,
+    keep: str = "min_id",
+    label: str = "",
+) -> int:
+    """Delete duplicate rows leaving one survivor per ``cols`` group.
+
+    ``keep`` chooses the survivor:
+      - ``"min_id"`` keeps the row with the lowest ``id`` (default — used
+        when there is no semantic "first" row to preserve).
+      - ``"min_created_at"`` keeps the row with the oldest ``created_at``,
+        falling back to the lowest id on ties — preferred for tracker-target
+        links so the original link wins.
+
+    Returns the number of rows deleted. All identifiers flow through
+    ``_assert_ident`` to neutralise SQL injection from any caller mistake.
+    """
+    _assert_ident(table, "table")
+    for c in cols:
+        _assert_ident(c, "column")
+    group_by = ", ".join(cols)
+    where_cols = " AND ".join(f"{c} = g.{c}" for c in cols)
+    if keep == "min_created_at":
+        # Tie-break on id so the survivor is deterministic even if two rows
+        # share the same created_at (insert-batches commonly do).
+        survivor_sql = (
+            f"SELECT id FROM {table} "
+            f"WHERE {where_cols} "
+            f"ORDER BY created_at ASC, id ASC LIMIT 1"
+        )
+    elif keep == "min_id":
+        survivor_sql = f"SELECT MIN(id) FROM {table} WHERE {where_cols}"
+    else:
+        raise ValueError(f"Unknown keep strategy: {keep!r}")
+
+    delete_sql = (
+        f"DELETE FROM {table} WHERE id IN ("
+        f"  SELECT t.id FROM {table} t "
+        f"  JOIN ("
+        f"    SELECT {group_by} FROM {table} "
+        f"    GROUP BY {group_by} HAVING COUNT(*) > 1"
+        f"  ) g ON {' AND '.join(f't.{c} = g.{c}' for c in cols)} "
+        f"  WHERE t.id NOT IN ({survivor_sql})"
+        f")"
+    )
+    result = await conn.execute(text(delete_sql))
+    deleted = int(getattr(result, "rowcount", 0) or 0)
+    if deleted:
+        logger.warning(
+            "Removed %d duplicate row(s) from %s on (%s)%s",
+            deleted, table, ", ".join(cols),
+            f" — {label}" if label else "",
+        )
+    return deleted
+
+
+async def migrate_uniqueness_constraints(engine: AsyncEngine) -> None:
+    """Backfill missing UNIQUE indexes on webhook hot paths.
+
+    SQLite cannot ALTER an existing column to add a UNIQUE constraint, but
+    a UNIQUE INDEX is functionally equivalent and can be created with
+    ``IF NOT EXISTS`` on every boot. Each index is preceded by a dedupe
+    pass so the index creation does not fail on existing duplicates.
+
+    Indexes added:
+      - service_provider.webhook_token (full unique)
+      - telegram_bot.webhook_path_id (full unique)
+      - telegram_bot.bot_id (partial unique WHERE bot_id != 0; 0 is a
+        sentinel meaning "not yet validated")
+      - telegram_chat (bot_id, chat_id) (full unique composite)
+      - notification_tracker_target (notification_tracker_id, target_id)
+        (full unique composite)
+    """
+    # Skip on non-SQLite engines — they enforce UNIQUE via the model
+    # metadata (create_all) and don't have sqlite_master introspection.
+    if not str(engine.url).startswith("sqlite"):
+        return
+    async with engine.begin() as conn:
+        # service_provider.webhook_token
+        if await _has_table(conn, "service_provider") and await _has_column(
+            conn, "service_provider", "webhook_token",
+        ):
+            await _dedupe_by_columns(
+                conn, "service_provider", ["webhook_token"],
+                keep="min_id", label="webhook_token uniqueness",
+            )
+            await conn.execute(text(
+                "CREATE UNIQUE INDEX IF NOT EXISTS "
+                "uq_service_provider_webhook_token "
+                "ON service_provider(webhook_token)"
+            ))
+
+        # telegram_bot.webhook_path_id (full unique)
+        # telegram_bot.bot_id (partial unique excluding sentinel 0)
+        if await _has_table(conn, "telegram_bot"):
+            if await _has_column(conn, "telegram_bot", "webhook_path_id"):
+                await _dedupe_by_columns(
+                    conn, "telegram_bot", ["webhook_path_id"],
+                    keep="min_id", label="webhook_path_id uniqueness",
+                )
+                await conn.execute(text(
+                    "CREATE UNIQUE INDEX IF NOT EXISTS "
+                    "uq_telegram_bot_webhook_path_id "
+                    "ON telegram_bot(webhook_path_id)"
+                ))
+            if await _has_column(conn, "telegram_bot", "bot_id"):
+                # Dedupe only non-sentinel rows. Two unverified bots both
+                # carrying bot_id=0 is legitimate — only collisions among
+                # validated bot_ids signal a real corruption to clean up.
+                deleted = await conn.execute(text(
+                    "DELETE FROM telegram_bot WHERE id IN ("
+                    "  SELECT t.id FROM telegram_bot t "
+                    "  JOIN ("
+                    "    SELECT bot_id FROM telegram_bot "
+                    "    WHERE bot_id != 0 GROUP BY bot_id HAVING COUNT(*) > 1"
+                    "  ) g ON t.bot_id = g.bot_id "
+                    "  WHERE t.id NOT IN ("
+                    "    SELECT MIN(id) FROM telegram_bot WHERE bot_id = g.bot_id"
+                    "  )"
+                    ")"
+                ))
+                rc = int(getattr(deleted, "rowcount", 0) or 0)
+                if rc:
+                    logger.warning(
+                        "Removed %d duplicate telegram_bot row(s) on bot_id "
+                        "(non-sentinel collisions)", rc,
+                    )
+                # Plain INDEX for the lookup-by-bot_id path.
+                await conn.execute(text(
+                    "CREATE INDEX IF NOT EXISTS ix_telegram_bot_bot_id "
+                    "ON telegram_bot(bot_id)"
+                ))
+                # Partial UNIQUE excluding the sentinel.
+                await conn.execute(text(
+                    "CREATE UNIQUE INDEX IF NOT EXISTS "
+                    "uq_telegram_bot_bot_id_nonzero "
+                    "ON telegram_bot(bot_id) WHERE bot_id != 0"
+                ))
+
+        # telegram_chat (bot_id, chat_id) — keep the survivor with the oldest
+        # discovered_at so the original discovery row wins. _dedupe_by_columns
+        # only handles created_at; do this one inline.
+        if await _has_table(conn, "telegram_chat"):
+            res = await conn.execute(text(
+                "DELETE FROM telegram_chat WHERE id IN ("
+                "  SELECT t.id FROM telegram_chat t "
+                "  JOIN ("
+                "    SELECT bot_id, chat_id FROM telegram_chat "
+                "    GROUP BY bot_id, chat_id HAVING COUNT(*) > 1"
+                "  ) g ON t.bot_id = g.bot_id AND t.chat_id = g.chat_id "
+                "  WHERE t.id NOT IN ("
+                "    SELECT id FROM telegram_chat "
+                "    WHERE bot_id = g.bot_id AND chat_id = g.chat_id "
+                "    ORDER BY discovered_at ASC, id ASC LIMIT 1"
+                "  )"
+                ")"
+            ))
+            rc = int(getattr(res, "rowcount", 0) or 0)
+            if rc:
+                logger.warning(
+                    "Removed %d duplicate telegram_chat row(s) on (bot_id, chat_id)",
+                    rc,
+                )
+            await conn.execute(text(
+                "CREATE UNIQUE INDEX IF NOT EXISTS uq_telegram_chat_bot_chat "
+                "ON telegram_chat(bot_id, chat_id)"
+            ))
+            await conn.execute(text(
+                "CREATE INDEX IF NOT EXISTS ix_telegram_chat_bot_chat "
+                "ON telegram_chat(bot_id, chat_id)"
+            ))
+
+        # notification_tracker_target (notification_tracker_id, target_id)
+        # — keep the oldest created_at link so the original wins.
+        if await _has_table(conn, "notification_tracker_target") and await _has_column(
+            conn, "notification_tracker_target", "notification_tracker_id",
+        ):
+            await _dedupe_by_columns(
+                conn,
+                "notification_tracker_target",
+                ["notification_tracker_id", "target_id"],
+                keep="min_created_at",
+                label="tracker-target link uniqueness",
+            )
+            await conn.execute(text(
+                "CREATE UNIQUE INDEX IF NOT EXISTS uq_ntt_tracker_target "
+                "ON notification_tracker_target(notification_tracker_id, target_id)"
+            ))
+
+        # service_provider partial unique on (user_id) WHERE type='bridge_self'.
+        # Bridge-self is special: exactly one row per user, auto-seeded at boot,
+        # at user-create, and on /setup. Without this guard, a concurrent boot
+        # backfill + POST /api/users could double-insert. Dedupe keeps the
+        # oldest row so any user-customised thresholds on it survive.
+        if await _has_table(conn, "service_provider"):
+            res = await conn.execute(text(
+                "DELETE FROM service_provider WHERE id IN ("
+                "  SELECT t.id FROM service_provider t "
+                "  JOIN ("
+                "    SELECT user_id FROM service_provider "
+                "    WHERE type='bridge_self' GROUP BY user_id HAVING COUNT(*) > 1"
+                "  ) g ON t.user_id = g.user_id "
+                "  WHERE t.type='bridge_self' AND t.id NOT IN ("
+                "    SELECT MIN(id) FROM service_provider "
+                "    WHERE type='bridge_self' AND user_id = g.user_id"
+                "  )"
+                ")"
+            ))
+            rc = int(getattr(res, "rowcount", 0) or 0)
+            if rc:
+                logger.warning(
+                    "Removed %d duplicate bridge_self service_provider row(s) "
+                    "on user_id", rc,
+                )
+            await conn.execute(text(
+                "CREATE UNIQUE INDEX IF NOT EXISTS "
+                "uq_service_provider_bridge_self_per_user "
+                "ON service_provider(user_id) WHERE type='bridge_self'"
+            ))
+
+
+async def migrate_eventlog_provider_fk(engine: AsyncEngine) -> None:
+    """Document the EventLog.provider_id FK situation.
+
+    SQLite cannot ALTER a column to add a foreign-key constraint without
+    rebuilding the table. The model annotation now declares
+    ``ondelete=SET NULL`` which only takes effect on freshly created
+    tables (i.e. brand-new installs). For existing installs we rely on
+    application-side cleanup in ``api/providers.delete_provider`` to NULL
+    out ``event_log.provider_id`` rows before deleting the provider row.
+
+    This migration is intentionally a no-op aside from the log line — it
+    exists so the migration order is explicit and operators see in the
+    logs that the FK strategy was reviewed on this boot.
+    """
+    if not str(engine.url).startswith("sqlite"):
+        return
+    async with engine.begin() as conn:
+        if not await _has_table(conn, "event_log"):
+            return
+        # No DDL change. Application code in api/providers.delete_provider
+        # is the source of truth for the SET NULL semantic on existing tables.
+        logger.debug(
+            "event_log.provider_id FK enforcement deferred to application "
+            "code on existing SQLite tables (model declares ondelete=SET NULL "
+            "which applies to fresh schemas only)."
+        )
+
+
 # ---------------------------------------------------------------------------
 # Schema version tracking — lightweight alternative to Alembic while the
 # hand-rolled idempotent migrations remain the source of truth. Gives
diff --git a/packages/server/src/notify_bridge_server/database/models.py b/packages/server/src/notify_bridge_server/database/models.py
index 3cb1657..e0ed455 100644
--- a/packages/server/src/notify_bridge_server/database/models.py
+++ b/packages/server/src/notify_bridge_server/database/models.py
@@ -6,7 +6,7 @@ from datetime import datetime, timezone
 from typing import Any
 from uuid import uuid4
 
-from sqlalchemy import ForeignKey, UniqueConstraint, Text
+from sqlalchemy import ForeignKey, Index, UniqueConstraint, Text
 from sqlmodel import JSON, Column, Field, SQLModel
 
 
@@ -29,12 +29,25 @@ class ServiceProvider(SQLModel, table=True):
     __tablename__ = "service_provider"
 
     id: int | None = Field(default=None, primary_key=True)
-    user_id: int = Field(foreign_key="user.id")
+    user_id: int = Field(
+        sa_column=Column(
+            "user_id",
+            ForeignKey("user.id", ondelete="CASCADE"),
+            nullable=False,
+        ),
+    )
     type: str  # ServiceProviderType value ("immich")
     name: str
     icon: str = Field(default="")
     config: dict[str, Any] = Field(default_factory=dict, sa_column=Column(JSON))
-    webhook_token: str = Field(default_factory=lambda: uuid4().hex)
+    # Webhook token is the shared secret embedded in inbound webhook URLs.
+    # Must be unique so a token uniquely identifies a provider; indexed so
+    # the webhook router does an O(log n) lookup on every inbound request.
+    webhook_token: str = Field(
+        default_factory=lambda: uuid4().hex,
+        unique=True,
+        index=True,
+    )
     created_at: datetime = Field(default_factory=_utcnow)
 
 
@@ -42,13 +55,29 @@ class TelegramBot(SQLModel, table=True):
     __tablename__ = "telegram_bot"
 
     id: int | None = Field(default=None, primary_key=True)
-    user_id: int = Field(foreign_key="user.id")
+    user_id: int = Field(
+        sa_column=Column(
+            "user_id",
+            ForeignKey("user.id", ondelete="CASCADE"),
+            nullable=False,
+        ),
+    )
     name: str
     token: str
     icon: str = Field(default="")
     bot_username: str = Field(default="")
-    bot_id: int = Field(default=0)
-    webhook_path_id: str = Field(default_factory=lambda: uuid4().hex)
+    # bot_id=0 is a sentinel meaning "Telegram has not yet returned a numeric
+    # ID for this bot" (i.e. token never validated). Multiple unverified bots
+    # may legitimately carry 0, so we only enforce uniqueness for non-sentinel
+    # values via a partial index added in migrate_uniqueness_constraints.
+    bot_id: int = Field(default=0, index=True)
+    # URL-path embedded in Telegram's setWebhook callback URL. Must be unique
+    # so the inbound dispatcher resolves a single bot per incoming request.
+    webhook_path_id: str = Field(
+        default_factory=lambda: uuid4().hex,
+        unique=True,
+        index=True,
+    )
     update_mode: str = Field(default="none")  # "none", "polling", or "webhook"
     # NOTE: commands_config column remains in the DB for backward compat,
     # but is no longer part of the SQLModel class. Data migrated to CommandConfig.
@@ -61,7 +90,13 @@ class MatrixBot(SQLModel, table=True):
     __tablename__ = "matrix_bot"
 
     id: int | None = Field(default=None, primary_key=True)
-    user_id: int = Field(foreign_key="user.id")
+    user_id: int = Field(
+        sa_column=Column(
+            "user_id",
+            ForeignKey("user.id", ondelete="CASCADE"),
+            nullable=False,
+        ),
+    )
     name: str
     icon: str = Field(default="")
     homeserver_url: str  # e.g. https://matrix.org
@@ -76,7 +111,13 @@ class EmailBot(SQLModel, table=True):
     __tablename__ = "email_bot"
 
     id: int | None = Field(default=None, primary_key=True)
-    user_id: int = Field(foreign_key="user.id")
+    user_id: int = Field(
+        sa_column=Column(
+            "user_id",
+            ForeignKey("user.id", ondelete="CASCADE"),
+            nullable=False,
+        ),
+    )
     name: str
     icon: str = Field(default="")
     email: str  # From address
@@ -90,6 +131,13 @@ class EmailBot(SQLModel, table=True):
 
 class TelegramChat(SQLModel, table=True):
     __tablename__ = "telegram_chat"
+    # (bot_id, chat_id) uniquely identifies a chat. The composite index is
+    # the access pattern for save_chat_from_webhook ON CONFLICT updates and
+    # for any "lookup by (bot, chat)" callers.
+    __table_args__ = (
+        UniqueConstraint("bot_id", "chat_id", name="uq_telegram_chat_bot_chat"),
+        Index("ix_telegram_chat_bot_chat", "bot_id", "chat_id"),
+    )
 
     id: int | None = Field(default=None, primary_key=True)
     bot_id: int = Field(foreign_key="telegram_bot.id")
@@ -109,7 +157,13 @@ class TrackingConfig(SQLModel, table=True):
     __tablename__ = "tracking_config"
 
     id: int | None = Field(default=None, primary_key=True)
-    user_id: int = Field(foreign_key="user.id")
+    user_id: int = Field(
+        sa_column=Column(
+            "user_id",
+            ForeignKey("user.id", ondelete="CASCADE"),
+            nullable=False,
+        ),
+    )
     provider_type: str  # Must match provider's type
     name: str
     icon: str = Field(default="")
@@ -171,6 +225,12 @@ class TrackingConfig(SQLModel, table=True):
     track_ha_service_called: bool = Field(default=False)
     track_ha_event_fired: bool = Field(default=False)
 
+    # Bridge self-monitoring event tracking — defaults ON because the whole
+    # point of the provider is to alert on these conditions.
+    track_bridge_self_poll_failures: bool = Field(default=True)
+    track_bridge_self_deferred_backlog: bool = Field(default=True)
+    track_bridge_self_target_failures: bool = Field(default=True)
+
     # Immich asset display
     track_images: bool = Field(default=True)
     track_videos: bool = Field(default=True)
@@ -276,7 +336,13 @@ class NotificationTarget(SQLModel, table=True):
     __tablename__ = "notification_target"
 
     id: int | None = Field(default=None, primary_key=True)
-    user_id: int = Field(foreign_key="user.id")
+    user_id: int = Field(
+        sa_column=Column(
+            "user_id",
+            ForeignKey("user.id", ondelete="CASCADE"),
+            nullable=False,
+        ),
+    )
     type: str  # "telegram", "webhook", "email", "discord", "slack", "ntfy", "matrix"
     name: str
     icon: str = Field(default="")
@@ -319,7 +385,13 @@ class NotificationTracker(SQLModel, table=True):
     __tablename__ = "notification_tracker"
 
     id: int | None = Field(default=None, primary_key=True)
-    user_id: int = Field(foreign_key="user.id")
+    user_id: int = Field(
+        sa_column=Column(
+            "user_id",
+            ForeignKey("user.id", ondelete="CASCADE"),
+            nullable=False,
+        ),
+    )
     provider_id: int = Field(foreign_key="service_provider.id")
     name: str
     icon: str = Field(default="")
@@ -342,6 +414,15 @@ class NotificationTrackerTarget(SQLModel, table=True):
     """Junction between NotificationTracker and NotificationTarget with per-link config."""
 
     __tablename__ = "notification_tracker_target"
+    # A tracker should never link to the same target twice — duplicate links
+    # would deliver the same notification multiple times. Enforced at the DB
+    # level so concurrent inserts can't bypass an application-side check.
+    __table_args__ = (
+        UniqueConstraint(
+            "notification_tracker_id", "target_id",
+            name="uq_ntt_tracker_target",
+        ),
+    )
 
     id: int | None = Field(default=None, primary_key=True)
     # Python attr stays as tracker_id for backward compat; DB column is notification_tracker_id
@@ -403,7 +484,13 @@ class CommandConfig(SQLModel, table=True):
     __tablename__ = "command_config"
 
     id: int | None = Field(default=None, primary_key=True)
-    user_id: int = Field(foreign_key="user.id")
+    user_id: int = Field(
+        sa_column=Column(
+            "user_id",
+            ForeignKey("user.id", ondelete="CASCADE"),
+            nullable=False,
+        ),
+    )
     provider_type: str
     name: str
     icon: str = Field(default="")
@@ -464,7 +551,13 @@ class CommandTracker(SQLModel, table=True):
     __tablename__ = "command_tracker"
 
     id: int | None = Field(default=None, primary_key=True)
-    user_id: int = Field(foreign_key="user.id")
+    user_id: int = Field(
+        sa_column=Column(
+            "user_id",
+            ForeignKey("user.id", ondelete="CASCADE"),
+            nullable=False,
+        ),
+    )
     provider_id: int = Field(foreign_key="service_provider.id")
     command_config_id: int = Field(foreign_key="command_config.id")
     name: str
@@ -517,7 +610,15 @@ class DeferredDispatch(SQLModel, table=True):
     __tablename__ = "deferred_dispatch"
 
     id: int | None = Field(default=None, primary_key=True)
-    user_id: int | None = Field(default=None, foreign_key="user.id", index=True)
+    user_id: int | None = Field(
+        default=None,
+        sa_column=Column(
+            "user_id",
+            ForeignKey("user.id", ondelete="CASCADE"),
+            nullable=True,
+            index=True,
+        ),
+    )
     tracker_id: int = Field(foreign_key="notification_tracker.id", index=True)
     # The specific link this deferral targets. On drain we re-fetch by ID; if
     # the link was disabled or removed in the meantime we drop with a
@@ -566,8 +667,17 @@ class EventLog(SQLModel, table=True):
     id: int | None = Field(default=None, primary_key=True)
     # Owner. Indexed for the dashboard events query.  Nullable only because
     # historical rows (pre-user_id column) may have no owner; new rows always
-    # set this directly.
-    user_id: int | None = Field(default=None, foreign_key="user.id", index=True)
+    # set this directly. SET NULL on user delete preserves the audit trail
+    # while letting the user record itself be removed.
+    user_id: int | None = Field(
+        default=None,
+        sa_column=Column(
+            "user_id",
+            ForeignKey("user.id", ondelete="SET NULL"),
+            nullable=True,
+            index=True,
+        ),
+    )
     # Python attr stays as tracker_id for backward compat; DB column is notification_tracker_id
     tracker_id: int | None = Field(
         default=None,
@@ -594,7 +704,21 @@ class EventLog(SQLModel, table=True):
         default=None, foreign_key="telegram_bot.id", index=True,
     )
     bot_name: str = Field(default="")
-    provider_id: int | None = Field(default=None, index=True)
+    # FK to service_provider with SET NULL so deleting a provider leaves
+    # historical event_log rows intact (provider_name preserves the label
+    # for display). The FK only takes effect on freshly created tables —
+    # SQLite cannot ALTER a constraint into an existing table without a
+    # rebuild, so application code in api/providers.delete_provider also
+    # nulls these explicitly. See migrate_eventlog_provider_fk.
+    provider_id: int | None = Field(
+        default=None,
+        sa_column=Column(
+            "provider_id",
+            ForeignKey("service_provider.id", ondelete="SET NULL"),
+            nullable=True,
+            index=True,
+        ),
+    )
     provider_name: str = Field(default="")
     event_type: str = Field(index=True)
     collection_id: str
@@ -610,7 +734,13 @@ class Action(SQLModel, table=True):
     __tablename__ = "action"
 
     id: int | None = Field(default=None, primary_key=True)
-    user_id: int = Field(foreign_key="user.id")
+    user_id: int = Field(
+        sa_column=Column(
+            "user_id",
+            ForeignKey("user.id", ondelete="CASCADE"),
+            nullable=False,
+        ),
+    )
     provider_id: int = Field(foreign_key="service_provider.id")
     name: str
     icon: str = Field(default="")
diff --git a/packages/server/src/notify_bridge_server/database/seeds.py b/packages/server/src/notify_bridge_server/database/seeds.py
index f39f2f6..6e721b4 100644
--- a/packages/server/src/notify_bridge_server/database/seeds.py
+++ b/packages/server/src/notify_bridge_server/database/seeds.py
@@ -13,9 +13,11 @@ from .models import (
     CommandConfig,
     CommandTemplateConfig,
     CommandTemplateSlot,
+    ServiceProvider,
     TemplateConfig,
     TemplateSlot,
     TrackingConfig,
+    User,
 )
 
 _LOGGER = logging.getLogger(__name__)
@@ -159,6 +161,7 @@ async def _seed_default_templates() -> None:
         await _seed_provider_template(session, "google_photos", "Google Photos")
         await _seed_provider_template(session, "webhook", "Generic Webhook")
         await _seed_provider_template(session, "home_assistant", "Home Assistant")
+        await _seed_provider_template(session, "bridge_self", "Bridge Self-Monitoring")
         await session.commit()
 
 
@@ -285,6 +288,13 @@ async def _seed_default_tracking_configs() -> None:
                 "track_ha_service_called": False,
                 "track_ha_event_fired": False,
             },
+            {
+                "provider_type": "bridge_self",
+                "name": "Default Bridge Self-Monitoring",
+                "track_bridge_self_poll_failures": True,
+                "track_bridge_self_deferred_backlog": True,
+                "track_bridge_self_target_failures": True,
+            },
         ]
 
         for cfg in defaults:
@@ -403,6 +413,67 @@ async def _seed_default_command_configs() -> None:
         await session.commit()
 
 
+# ---------------------------------------------------------------------------
+# Bridge self-monitoring per-user provider
+# ---------------------------------------------------------------------------
+
+
+# Default thresholds — duplicated here as constants instead of imported so the
+# seed module stays self-contained and import-cycle-free during boot.
+_BRIDGE_SELF_DEFAULT_CONFIG = {
+    "poll_failure_threshold": 3,
+    "deferred_backlog_threshold": 100,
+    "target_failure_threshold": 5,
+}
+
+
+async def ensure_bridge_self_provider_for_user(
+    session: AsyncSession, user_id: int,
+) -> ServiceProvider | None:
+    """Create the user's bridge_self provider if absent. Returns the provider.
+
+    The bridge_self provider is special — exactly one per user, auto-created
+    so the operator never has to think about wiring it up. Idempotent.
+    Skips ``user_id <= 0`` (the ``__system__`` placeholder) which never
+    receives notifications.
+    """
+    if user_id <= 0:
+        return None
+    result = await session.exec(
+        select(ServiceProvider).where(
+            ServiceProvider.user_id == user_id,
+            ServiceProvider.type == "bridge_self",
+        )
+    )
+    existing = result.first()
+    if existing is not None:
+        return existing
+    provider = ServiceProvider(
+        user_id=user_id,
+        type="bridge_self",
+        name="Bridge Self-Monitoring",
+        config=dict(_BRIDGE_SELF_DEFAULT_CONFIG),
+    )
+    session.add(provider)
+    await session.flush()
+    return provider
+
+
+async def _seed_bridge_self_providers_for_existing_users() -> None:
+    """Backfill bridge_self provider for every existing real user.
+
+    Runs once at boot so deployments upgrading from a pre-bridge_self
+    release pick up the auto-created provider without requiring user
+    action. Skips users that already have one.
+    """
+    engine = get_engine()
+    async with AsyncSession(engine) as session:
+        users = (await session.exec(select(User).where(User.id != 0))).all()
+        for user in users:
+            await ensure_bridge_self_provider_for_user(session, user.id)
+        await session.commit()
+
+
 # ---------------------------------------------------------------------------
 # Public entry point
 # ---------------------------------------------------------------------------
@@ -442,3 +513,4 @@ async def seed_all() -> None:
     await _seed_default_command_templates()
     await _seed_default_tracking_configs()
     await _seed_default_command_configs()
+    await _seed_bridge_self_providers_for_existing_users()
diff --git a/packages/server/src/notify_bridge_server/main.py b/packages/server/src/notify_bridge_server/main.py
index ec0cc9e..9426164 100644
--- a/packages/server/src/notify_bridge_server/main.py
+++ b/packages/server/src/notify_bridge_server/main.py
@@ -50,6 +50,7 @@ from .commands.webhook import router as webhook_router, set_webhook_secret
 from .api.webhooks import router as webhooks_router
 from .api.webhook_logs import router as webhook_logs_router
 from .api.backup import router as backup_router
+from .api.metrics import router as metrics_router
 
 
 # Readiness flag — flipped to True once the scheduler has started and the
@@ -78,6 +79,8 @@ async def lifespan(app: FastAPI):
         migrate_chat_action_to_column,
         migrate_deferred_dispatch_event_log_fk,
         migrate_deferred_dispatch_unique_pending,
+        migrate_uniqueness_constraints,
+        migrate_eventlog_provider_fk,
         migrate_schema_version,
     )
     from .database.snapshot import snapshot_and_prune
@@ -107,6 +110,13 @@ async def lifespan(app: FastAPI):
     # the partial unique index.
     await migrate_deferred_dispatch_event_log_fk(engine)
     await migrate_deferred_dispatch_unique_pending(engine)
+    # Backfill missing UNIQUE indexes on webhook hot paths (deduping any
+    # existing duplicates). Runs after performance_indexes so non-unique
+    # support indexes are already in place.
+    await migrate_uniqueness_constraints(engine)
+    # Document EventLog.provider_id FK strategy on existing tables (no-op
+    # on SQLite besides the log line; new tables get the FK from create_all).
+    await migrate_eventlog_provider_fk(engine)
     await migrate_schema_version(engine)
     from .database.seeds import seed_all
     await seed_all()
@@ -254,6 +264,7 @@ app.include_router(webhook_router)
 app.include_router(webhooks_router)
 app.include_router(webhook_logs_router)
 app.include_router(backup_router)
+app.include_router(metrics_router)
 
 
 @app.get("/api/health")
@@ -265,15 +276,107 @@ async def health():
 
 @app.get("/api/ready")
 async def ready():
-    """Readiness: migrations and scheduler have started, app can serve traffic.
+    """Readiness: deep dependency check.
 
-    Returns 503 until the lifespan startup sequence has completed. Use this
-    for orchestrator readiness probes (Docker, Kubernetes).
+    Verifies each critical dependency is actually reachable, not just that
+    the app finished its lifespan startup. Returns 503 if any *required*
+    check fails (db, scheduler). Home Assistant supervisor presence is
+    informational — a degraded HA does not flip readiness off.
+
+    Response shape:
+        {
+          "ready": bool,
+          "checks": {"db": "ok|fail", "scheduler": "ok|fail", "ha": "ok|degraded|na"},
+          "errors": [str, ...]
+        }
     """
+    from starlette.responses import JSONResponse
+    import asyncio as _asyncio
+    from sqlalchemy import text as _text
+
+    checks: dict[str, str] = {}
+    errors: list[str] = []
+
     if not _READY:
-        from starlette.responses import JSONResponse
-        return JSONResponse({"status": "starting"}, status_code=503)
-    return {"status": "ready", "version": _APP_VERSION}
+        # Lifespan still running — short-circuit so we don't poke a half-built engine.
+        return JSONResponse(
+            {
+                "ready": False,
+                "checks": {"db": "fail", "scheduler": "fail", "ha": "na"},
+                "errors": ["startup not complete"],
+                "version": _APP_VERSION,
+            },
+            status_code=503,
+        )
+
+    # --- DB: SELECT 1 with a 2s timeout ---
+    try:
+        from .database.engine import get_engine
+        engine = get_engine()
+
+        async def _ping_db() -> None:
+            async with engine.connect() as conn:
+                await conn.execute(_text("SELECT 1"))
+
+        await _asyncio.wait_for(_ping_db(), timeout=2.0)
+        checks["db"] = "ok"
+    except Exception as exc:  # noqa: BLE001
+        checks["db"] = "fail"
+        errors.append(f"db: {exc!s}")
+
+    # --- Scheduler: APScheduler must be running ---
+    try:
+        from .services.scheduler import get_scheduler
+        scheduler = get_scheduler()
+        if scheduler.running:
+            checks["scheduler"] = "ok"
+        else:
+            checks["scheduler"] = "fail"
+            errors.append("scheduler: not running")
+    except Exception as exc:  # noqa: BLE001
+        checks["scheduler"] = "fail"
+        errors.append(f"scheduler: {exc!s}")
+
+    # --- HA supervisor: informational only ---
+    # If no HA providers are configured, report "na" (not applicable). If any
+    # HA providers exist, ensure at least one supervisor task is alive — a
+    # task being not-yet-connected is fine, we just want it to exist.
+    try:
+        from sqlmodel import select as _select
+        from sqlmodel.ext.asyncio.session import AsyncSession as _AS
+        from .database.models import ServiceProvider
+        from .services.ha_subscription import _running_tasks as _ha_tasks
+
+        from .database.engine import get_engine as _get_engine_ha
+        async with _AS(_get_engine_ha()) as _session:
+            _result = await _session.exec(
+                _select(ServiceProvider).where(
+                    ServiceProvider.type == "home_assistant",
+                )
+            )
+            ha_providers = _result.all()
+        if not ha_providers:
+            checks["ha"] = "na"
+        else:
+            alive = [
+                t for t in _ha_tasks.values() if t is not None and not t.done()
+            ]
+            checks["ha"] = "ok" if alive else "degraded"
+    except Exception as exc:  # noqa: BLE001
+        # Never let the HA probe fail readiness — it's informational.
+        checks["ha"] = "degraded"
+        errors.append(f"ha: {exc!s}")
+
+    required_ok = checks["db"] == "ok" and checks["scheduler"] == "ok"
+    body = {
+        "ready": required_ok,
+        "checks": checks,
+        "errors": errors,
+        "version": _APP_VERSION,
+    }
+    if not required_ok:
+        return JSONResponse(body, status_code=503)
+    return body
 
 
 # --- Serve frontend static files (production) ---
diff --git a/packages/server/src/notify_bridge_server/services/backup_service.py b/packages/server/src/notify_bridge_server/services/backup_service.py
index fc27280..333451a 100644
--- a/packages/server/src/notify_bridge_server/services/backup_service.py
+++ b/packages/server/src/notify_bridge_server/services/backup_service.py
@@ -667,13 +667,19 @@ async def import_backup(
             if name is None:
                 continue
             ctc_id = _map_id(id_map, "command_template_configs", cc.command_template_config_id)
+            try:
+                safe_enabled = _sanitize_config(cc.enabled_commands or {})
+                safe_limits = _sanitize_config(cc.rate_limits or {})
+            except ValueError as exc:
+                result.warnings.append(f"Skipped command config '{cc.name}': {exc}")
+                continue
             new_cc = CommandConfig(
                 user_id=user_id, provider_type=cc.provider_type,
                 name=name, icon=cc.icon,
-                enabled_commands=cc.enabled_commands,
+                enabled_commands=safe_enabled,
                 response_mode=cc.response_mode,
                 default_count=cc.default_count,
-                rate_limits=cc.rate_limits,
+                rate_limits=safe_limits,
                 command_template_config_id=ctc_id,
             )
             session.add(new_cc)
@@ -728,10 +734,16 @@ async def import_backup(
             )
             if name is None:
                 continue
+            try:
+                safe_filters = _sanitize_config(nt.filters or {})
+                safe_collection_ids = _sanitize_config(nt.collection_ids or [])
+            except ValueError as exc:
+                result.warnings.append(f"Skipped tracker '{nt.name}': {exc}")
+                continue
             new_nt = NotificationTracker(
                 user_id=user_id, provider_id=provider_id,
-                name=name, icon=nt.icon, collection_ids=nt.collection_ids,
-                filters=nt.filters, scan_interval=nt.scan_interval,
+                name=name, icon=nt.icon, collection_ids=safe_collection_ids,
+                filters=safe_filters, scan_interval=nt.scan_interval,
                 default_tracking_config_id=_map_id(id_map, "tracking_configs", nt.default_tracking_config_id),
                 default_template_config_id=_map_id(id_map, "template_configs", nt.default_template_config_id),
                 enabled=nt.enabled,
@@ -810,9 +822,14 @@ async def import_backup(
             )
             if name is None:
                 continue
+            try:
+                safe_a_cfg = _sanitize_config(a.config or {})
+            except ValueError as exc:
+                result.warnings.append(f"Skipped action '{a.name}': {exc}")
+                continue
             new_a = Action(
                 user_id=user_id, provider_id=provider_id, name=name,
-                icon=a.icon, action_type=a.action_type, config=a.config,
+                icon=a.icon, action_type=a.action_type, config=safe_a_cfg,
                 schedule_type=a.schedule_type,
                 schedule_interval=a.schedule_interval,
                 schedule_cron=a.schedule_cron, enabled=False,  # always import disabled
@@ -820,9 +837,16 @@ async def import_backup(
             session.add(new_a)
             await session.flush()
             for r in a.rules:
+                try:
+                    safe_r_cfg = _sanitize_config(r.rule_config or {})
+                except ValueError as exc:
+                    result.warnings.append(
+                        f"Skipped rule '{r.name}' in action '{a.name}': {exc}"
+                    )
+                    continue
                 session.add(ActionRule(
                     action_id=new_a.id, name=r.name,
-                    rule_config=r.rule_config, enabled=r.enabled,
+                    rule_config=safe_r_cfg, enabled=r.enabled,
                     order=r.order,
                 ))
             result.created += len(a.rules)
diff --git a/packages/server/src/notify_bridge_server/services/bridge_self.py b/packages/server/src/notify_bridge_server/services/bridge_self.py
new file mode 100644
index 0000000..27219ca
--- /dev/null
+++ b/packages/server/src/notify_bridge_server/services/bridge_self.py
@@ -0,0 +1,432 @@
+"""Bridge self-monitoring service helpers.
+
+Three subsystems feed into ``emit_bridge_self_event``:
+
+1. The watcher's poll loop, when consecutive provider polls fail.
+2. A periodic scheduler job, when the deferred-dispatch backlog crosses
+   the configured threshold.
+3. The notification dispatcher, when consecutive sends to a single target
+   fail with 5xx / network errors.
+
+The helper looks up the user's ``bridge_self`` provider, builds a
+synthetic :class:`ServiceEvent`, and pushes it through the same
+``dispatch_provider_event`` pipeline that every other provider uses.
+That keeps templates, quiet hours, deferral, target gating, and event
+logging consistent with the rest of the system.
+
+We intentionally avoid raising into the caller's flow — a
+self-monitoring failure must never break the subsystem it's monitoring.
+"""
+
+from __future__ import annotations
+
+import logging
+from datetime import datetime, timezone
+from typing import Any
+
+from sqlmodel import select
+from sqlmodel.ext.asyncio.session import AsyncSession
+
+from notify_bridge_core.models.events import ServiceEvent
+from notify_bridge_core.providers.bridge_self import build_event
+from notify_bridge_core.providers.bridge_self.provider import (
+    DEFAULT_DEFERRED_BACKLOG_THRESHOLD,
+    DEFAULT_POLL_FAILURE_THRESHOLD,
+    DEFAULT_TARGET_FAILURE_THRESHOLD,
+)
+
+from ..database.engine import get_engine
+from ..database.models import ServiceProvider, User
+
+_LOGGER = logging.getLogger(__name__)
+
+
+# Detail keys carried into the EventLog.details JSON column. Mirrors the
+# pattern used by the HA subscription and webhook routers for their
+# ``dispatch_provider_event`` calls.
+BRIDGE_SELF_DETAIL_KEYS: tuple[str, ...] = (
+    "failure_type", "subject_id", "subject_name",
+    "count", "threshold", "last_error", "details",
+)
+
+
+async def get_bridge_self_provider(
+    session: AsyncSession, user_id: int,
+) -> ServiceProvider | None:
+    """Return the user's bridge_self provider row (or None if absent)."""
+    result = await session.exec(
+        select(ServiceProvider).where(
+            ServiceProvider.user_id == user_id,
+            ServiceProvider.type == "bridge_self",
+        )
+    )
+    return result.first()
+
+
+async def get_user_thresholds(user_id: int) -> dict[str, int]:
+    """Return the user's bridge_self thresholds, falling back to defaults.
+
+    Reads in a short-lived session — emission sites should NOT hold a
+    transaction across this call.
+    """
+    engine = get_engine()
+    async with AsyncSession(engine) as session:
+        provider = await get_bridge_self_provider(session, user_id)
+    if provider is None:
+        return {
+            "poll_failure_threshold": DEFAULT_POLL_FAILURE_THRESHOLD,
+            "deferred_backlog_threshold": DEFAULT_DEFERRED_BACKLOG_THRESHOLD,
+            "target_failure_threshold": DEFAULT_TARGET_FAILURE_THRESHOLD,
+        }
+    cfg = dict(provider.config or {})
+
+    def _int(key: str, fallback: int) -> int:
+        raw = cfg.get(key, fallback)
+        try:
+            value = int(raw)
+        except (TypeError, ValueError):
+            return fallback
+        return value if value >= 1 else fallback
+
+    return {
+        "poll_failure_threshold": _int("poll_failure_threshold", DEFAULT_POLL_FAILURE_THRESHOLD),
+        "deferred_backlog_threshold": _int(
+            "deferred_backlog_threshold", DEFAULT_DEFERRED_BACKLOG_THRESHOLD,
+        ),
+        "target_failure_threshold": _int(
+            "target_failure_threshold", DEFAULT_TARGET_FAILURE_THRESHOLD,
+        ),
+    }
+
+
+async def emit_bridge_self_event(
+    *,
+    user_id: int,
+    failure_type: str,
+    subject_id: int,
+    subject_name: str,
+    count: int,
+    threshold: int,
+    last_error: str = "",
+    details: dict[str, Any] | None = None,
+    timestamp: datetime | None = None,
+) -> int:
+    """Emit a self-monitoring event for ``user_id``.
+
+    Resolves the user's bridge_self provider and dispatches the event via
+    ``dispatch_provider_event``. Returns the number of dispatched
+    notifications (0 when the user has no bridge_self provider, no
+    matching trackers, or the event was suppressed by quiet hours / event-
+    type gating).
+
+    Always swallows internal exceptions so the calling subsystem keeps
+    running — self-monitoring must never crash the watcher / scheduler /
+    dispatcher.
+    """
+    payload = {
+        "failure_type": failure_type,
+        "subject_id": subject_id,
+        "subject_name": subject_name,
+        "count": count,
+        "threshold": threshold,
+        "last_error": last_error,
+        "details": dict(details or {}),
+    }
+    event = build_event(payload, timestamp=timestamp or datetime.now(timezone.utc))
+    if event is None:
+        _LOGGER.debug("Skipping malformed bridge_self payload: %s", payload)
+        return 0
+
+    engine = get_engine()
+    try:
+        async with AsyncSession(engine) as session:
+            provider = await get_bridge_self_provider(session, user_id)
+            if provider is None:
+                _LOGGER.debug(
+                    "User %s has no bridge_self provider; skipping %s emission",
+                    user_id, failure_type,
+                )
+                return 0
+            provider_id = provider.id
+            provider_name = provider.name
+            provider_config = dict(provider.config or {})
+
+        # Imported here to avoid a top-level cycle: dispatch_helpers imports
+        # several models which transitively touch this module's siblings.
+        from .event_dispatch import dispatch_provider_event
+
+        return await dispatch_provider_event(
+            engine=engine,
+            provider_id=provider_id,
+            provider_name=provider_name,
+            provider_config=provider_config,
+            event=event,
+            detail_keys=BRIDGE_SELF_DETAIL_KEYS,
+            filter_fn=lambda _ev, _filters: True,
+        )
+    except Exception:  # noqa: BLE001
+        _LOGGER.exception(
+            "bridge_self emission failed (user=%s, failure_type=%s)",
+            user_id, failure_type,
+        )
+        return 0
+
+
+# ---------------------------------------------------------------------------
+# Threshold-crossing trackers (in-memory, per-process).
+#
+# We track consecutive failure counts in module-level dicts keyed by the
+# subject id (tracker_id, target_id). On threshold crossing we emit and
+# reset the counter so we don't spam — the next emission only happens after
+# another full streak of failures.
+# ---------------------------------------------------------------------------
+
+
+# Tracker poll failures (keyed by tracker_id).
+_poll_failure_counts: dict[int, int] = {}
+_poll_failure_last_error: dict[int, str] = {}
+
+# Target send failures (keyed by target_id).
+_target_failure_counts: dict[int, int] = {}
+_target_failure_last_error: dict[int, str] = {}
+
+# Last-known backlog state per user (True = above threshold, False = below).
+# We only emit on the False -> True transition so a sustained backlog
+# triggers exactly one notification per crossing.
+_backlog_above_threshold: dict[int, bool] = {}
+
+
+def record_poll_success(tracker_id: int) -> None:
+    """Reset the failure counter for ``tracker_id`` after a successful poll."""
+    _poll_failure_counts.pop(tracker_id, None)
+    _poll_failure_last_error.pop(tracker_id, None)
+
+
+def record_poll_failure(tracker_id: int, error: str = "") -> int:
+    """Increment the failure counter for ``tracker_id``; return the new count."""
+    _poll_failure_counts[tracker_id] = _poll_failure_counts.get(tracker_id, 0) + 1
+    if error:
+        _poll_failure_last_error[tracker_id] = error
+    return _poll_failure_counts[tracker_id]
+
+
+def reset_poll_counter(tracker_id: int) -> None:
+    """Clear the failure counter for ``tracker_id`` without emitting."""
+    _poll_failure_counts.pop(tracker_id, None)
+    _poll_failure_last_error.pop(tracker_id, None)
+
+
+def record_target_success(target_id: int) -> None:
+    """Reset the failure counter for ``target_id`` after a successful send."""
+    _target_failure_counts.pop(target_id, None)
+    _target_failure_last_error.pop(target_id, None)
+
+
+def record_target_failure(target_id: int, error: str = "") -> int:
+    """Increment the failure counter for ``target_id``; return the new count."""
+    _target_failure_counts[target_id] = _target_failure_counts.get(target_id, 0) + 1
+    if error:
+        _target_failure_last_error[target_id] = error
+    return _target_failure_counts[target_id]
+
+
+def reset_target_counter(target_id: int) -> None:
+    """Clear the failure counter for ``target_id`` without emitting."""
+    _target_failure_counts.pop(target_id, None)
+    _target_failure_last_error.pop(target_id, None)
+
+
+def record_backlog_state(user_id: int, above_threshold: bool) -> bool:
+    """Record the new backlog state, returning True iff we just crossed up.
+
+    The first ever observation is treated as "below" so a process that
+    starts with a non-empty backlog still emits one notification.
+    """
+    prior = _backlog_above_threshold.get(user_id, False)
+    _backlog_above_threshold[user_id] = above_threshold
+    return above_threshold and not prior
+
+
+def get_poll_failure_count(tracker_id: int) -> int:
+    return _poll_failure_counts.get(tracker_id, 0)
+
+
+def get_target_failure_count(target_id: int) -> int:
+    return _target_failure_counts.get(target_id, 0)
+
+
+def get_poll_last_error(tracker_id: int) -> str:
+    return _poll_failure_last_error.get(tracker_id, "")
+
+
+def get_target_last_error(target_id: int) -> str:
+    return _target_failure_last_error.get(target_id, "")
+
+
+# ---------------------------------------------------------------------------
+# User-level helpers
+# ---------------------------------------------------------------------------
+
+
+async def list_user_ids() -> list[int]:
+    """Return all real user ids (excluding the __system__ placeholder)."""
+    engine = get_engine()
+    async with AsyncSession(engine) as session:
+        result = await session.exec(select(User.id).where(User.id != 0))
+        return [int(uid) for uid in result.all() if uid is not None]
+
+
+async def find_tracker_owner(tracker_id: int) -> int | None:
+    """Return the user_id that owns ``tracker_id`` (or None)."""
+    from ..database.models import NotificationTracker
+
+    engine = get_engine()
+    async with AsyncSession(engine) as session:
+        tracker = await session.get(NotificationTracker, tracker_id)
+        if tracker is None:
+            return None
+        return int(tracker.user_id)
+
+
+async def find_target_owner(target_id: int) -> int | None:
+    """Return the user_id that owns ``target_id`` (or None)."""
+    from ..database.models import NotificationTarget
+
+    engine = get_engine()
+    async with AsyncSession(engine) as session:
+        target = await session.get(NotificationTarget, target_id)
+        if target is None:
+            return None
+        return int(target.user_id)
+
+
+# ---------------------------------------------------------------------------
+# Backlog scan
+# ---------------------------------------------------------------------------
+
+
+async def check_deferred_backlog() -> dict[str, Any]:
+    """Scan the deferred_dispatch table and emit a backlog event if needed.
+
+    Counts pending rows per user, compares against each user's configured
+    threshold, and emits ``bridge_self_deferred_backlog`` for users that
+    just crossed up. Returns a small stats dict for logging.
+    """
+    from sqlalchemy import func
+
+    from ..database.models import DeferredDispatch
+
+    engine = get_engine()
+    crossings = 0
+    async with AsyncSession(engine) as session:
+        # GROUP BY user_id so we don't have to scan once per user. Skip rows
+        # whose user_id is NULL — those are legacy / orphaned and have no
+        # bridge_self provider to alert anyway.
+        rows = (
+            await session.exec(
+                select(
+                    DeferredDispatch.user_id,
+                    func.count(DeferredDispatch.id),
+                )
+                .where(DeferredDispatch.status == "pending")
+                .where(DeferredDispatch.user_id.is_not(None))
+                .group_by(DeferredDispatch.user_id)
+            )
+        ).all()
+
+    counts_by_user: dict[int, int] = {}
+    for row in rows:
+        if isinstance(row, tuple):
+            uid, count = row
+        else:
+            uid, count = row
+        if uid is None:
+            continue
+        counts_by_user[int(uid)] = int(count or 0)
+
+    for user_id, count in counts_by_user.items():
+        thresholds = await get_user_thresholds(user_id)
+        threshold = thresholds["deferred_backlog_threshold"]
+        above = count >= threshold
+        if record_backlog_state(user_id, above):
+            crossings += 1
+            await emit_bridge_self_event(
+                user_id=user_id,
+                failure_type="deferred_backlog",
+                subject_id=0,
+                subject_name="Deferred dispatch queue",
+                count=count,
+                threshold=threshold,
+                details={"pending": count},
+            )
+
+    # Reset latch for users that recovered (count < threshold or zero rows).
+    # Iterate all known users so a user whose backlog drained to 0 (no row in
+    # GROUP BY) still flips back to "below".
+    for user_id in list(_backlog_above_threshold.keys()):
+        if user_id in counts_by_user:
+            continue
+        # No pending rows for this user — clear the latch.
+        _backlog_above_threshold[user_id] = False
+
+    return {"users_scanned": len(counts_by_user), "crossings": crossings}
+
+
+# ---------------------------------------------------------------------------
+# Threshold-aware emission wrappers (used by watcher / dispatcher).
+# ---------------------------------------------------------------------------
+
+
+async def maybe_emit_poll_failure(
+    *, tracker_id: int, tracker_name: str, error: str = "",
+) -> None:
+    """Increment poll failure counter; emit + reset if threshold reached."""
+    count = record_poll_failure(tracker_id, error)
+    user_id = await find_tracker_owner(tracker_id)
+    if user_id is None:
+        return
+    thresholds = await get_user_thresholds(user_id)
+    threshold = thresholds["poll_failure_threshold"]
+    if count < threshold:
+        return
+    last_err = get_poll_last_error(tracker_id) or error
+    await emit_bridge_self_event(
+        user_id=user_id,
+        failure_type="poll_failures",
+        subject_id=tracker_id,
+        subject_name=tracker_name or f"tracker {tracker_id}",
+        count=count,
+        threshold=threshold,
+        last_error=last_err,
+        details={"tracker_id": tracker_id},
+    )
+    # Reset so the next emission requires another full streak. Without this
+    # the same tracker would fire on EVERY tick once it crosses the
+    # threshold, drowning the operator.
+    reset_poll_counter(tracker_id)
+
+
+async def maybe_emit_target_failure(
+    *, target_id: int, target_name: str, target_type: str, error: str = "",
+) -> None:
+    """Increment target failure counter; emit + reset if threshold reached."""
+    count = record_target_failure(target_id, error)
+    user_id = await find_target_owner(target_id)
+    if user_id is None:
+        return
+    thresholds = await get_user_thresholds(user_id)
+    threshold = thresholds["target_failure_threshold"]
+    if count < threshold:
+        return
+    last_err = get_target_last_error(target_id) or error
+    await emit_bridge_self_event(
+        user_id=user_id,
+        failure_type="target_failures",
+        subject_id=target_id,
+        subject_name=target_name or f"target {target_id}",
+        count=count,
+        threshold=threshold,
+        last_error=last_err,
+        details={"target_id": target_id, "target_type": target_type},
+    )
+    reset_target_counter(target_id)
diff --git a/packages/server/src/notify_bridge_server/services/command_sync.py b/packages/server/src/notify_bridge_server/services/command_sync.py
index 8871c4a..2c23456 100644
--- a/packages/server/src/notify_bridge_server/services/command_sync.py
+++ b/packages/server/src/notify_bridge_server/services/command_sync.py
@@ -98,9 +98,16 @@ async def _flush_dirty_bots() -> None:
                 bot = await session.get(TelegramBot, bot_id)
                 if not bot:
                     continue
+                # Snapshot every attribute we touch after the session
+                # exits — once detached, lazy attribute access raises
+                # MissingGreenlet under SQLAlchemy async.
+                bot_username = bot.bot_username
+                # Expunge so the detached instance can still read snapshotted
+                # attrs but won't trigger a refresh / re-query downstream.
+                session.expunge(bot)
             success = await register_commands_with_telegram(bot)
             if success:
-                _LOGGER.info("Auto-synced commands for bot %d (@%s)", bot_id, bot.bot_username)
+                _LOGGER.info("Auto-synced commands for bot %d (@%s)", bot_id, bot_username)
             else:
                 _LOGGER.warning("Auto-sync failed for bot %d", bot_id)
         except Exception:
diff --git a/packages/server/src/notify_bridge_server/services/deferred_dispatch.py b/packages/server/src/notify_bridge_server/services/deferred_dispatch.py
index 04ef029..71efef9 100644
--- a/packages/server/src/notify_bridge_server/services/deferred_dispatch.py
+++ b/packages/server/src/notify_bridge_server/services/deferred_dispatch.py
@@ -29,6 +29,7 @@ import logging
 from datetime import datetime, timezone
 from typing import Any
 
+from sqlalchemy.orm.attributes import flag_modified
 from sqlmodel import select
 from sqlmodel.ext.asyncio.session import AsyncSession
 
@@ -53,6 +54,7 @@ from .dispatch_helpers import (
     evaluate_event_gate,
     get_app_timezone,
     load_link_data,
+    resolve_provider_credential,
 )
 
 _LOGGER = logging.getLogger(__name__)
@@ -88,6 +90,10 @@ _DEFERRABLE_EVENT_TYPES: frozenset[str] = frozenset({
     "ups_online", "ups_on_battery", "ups_low_battery",
     "ups_battery_restored", "ups_comms_lost", "ups_comms_restored",
     "ups_replace_battery", "ups_overload",
+    # Home Assistant — state changes & automations are change-driven; the
+    # underlying state remains relevant after the quiet window.
+    "ha_state_changed", "ha_automation_triggered",
+    "ha_service_called", "ha_event_fired",
 })
 
 # Per-tracker cap on the pending queue. A misconfigured short quiet window
@@ -206,6 +212,11 @@ def _coalesce_assets_added(
             payload["removed_asset_ids"] = kept
             payload["removed_count"] = len(kept)
             existing_removed_row.event_payload = payload
+            # Belt-and-braces: SQLAlchemy's mutation tracker sometimes
+            # misses JSON-typed reassignments depending on dialect / column
+            # config. Explicit flag_modified guarantees the dirty bit is
+            # set for the upcoming flush.
+            flag_modified(existing_removed_row, "event_payload")
             if not kept:
                 # All previously-removed IDs are being re-added → entire
                 # removal is cancelled. Mark for caller to delete.
@@ -235,6 +246,7 @@ def _coalesce_assets_added(
     payload["added_assets"] = existing_assets
     payload["added_count"] = len(existing_assets)
     existing_added_row.event_payload = payload
+    flag_modified(existing_added_row, "event_payload")
     return ("merge", existing_added_row, existing_removed_row)
 
 
@@ -257,6 +269,7 @@ def _coalesce_assets_removed(
             payload["added_assets"] = kept_assets
             payload["added_count"] = len(kept_assets)
             existing_added_row.event_payload = payload
+            flag_modified(existing_added_row, "event_payload")
             if not kept_assets:
                 existing_added_row.status = "cancelled"
             # IDs that were just added during the window don't need to flow
@@ -282,6 +295,7 @@ def _coalesce_assets_removed(
     payload["removed_asset_ids"] = existing_ids
     payload["removed_count"] = len(existing_ids)
     existing_removed_row.event_payload = payload
+    flag_modified(existing_removed_row, "event_payload")
     return ("merge", existing_added_row, existing_removed_row)
 
 
@@ -695,7 +709,7 @@ async def _process_row(
             template_slots=ld.get("template_slots"),
             date_format=tmpl.date_format if tmpl else "%d.%m.%Y, %H:%M UTC",
             date_only_format=(tmpl.date_only_format if tmpl and tmpl.date_only_format else "%d.%m.%Y"),
-            provider_api_key=provider_config.get("api_key") or provider_config.get("api_token"),
+            provider_api_key=resolve_provider_credential(provider_config),
             provider_internal_url=provider_config.get("url", ""),
             provider_external_url=provider_config.get("external_domain", "") or provider_config.get("url", ""),
             receivers=ld["receivers"],
diff --git a/packages/server/src/notify_bridge_server/services/dispatch_helpers.py b/packages/server/src/notify_bridge_server/services/dispatch_helpers.py
index ecb5441..5d37ec5 100644
--- a/packages/server/src/notify_bridge_server/services/dispatch_helpers.py
+++ b/packages/server/src/notify_bridge_server/services/dispatch_helpers.py
@@ -210,6 +210,13 @@ def _event_type_enabled(event: ServiceEvent, tc: TrackingConfig) -> bool:
         "ha_automation_triggered": getattr(tc, "track_ha_automation_triggered", False),
         "ha_service_called": getattr(tc, "track_ha_service_called", False),
         "ha_event_fired": getattr(tc, "track_ha_event_fired", False),
+        # Bridge self-monitoring — defaults True so a tracker created before the
+        # columns existed still surfaces the alerts. Legacy rows are extremely
+        # unlikely here (the columns ship in the same release as the provider),
+        # but the safer default matches the rest of this map.
+        "bridge_self_poll_failures": getattr(tc, "track_bridge_self_poll_failures", True),
+        "bridge_self_deferred_backlog": getattr(tc, "track_bridge_self_deferred_backlog", True),
+        "bridge_self_target_failures": getattr(tc, "track_bridge_self_target_failures", True),
     }
     return flag_map.get(event_type, True)
 
@@ -225,11 +232,15 @@ def evaluate_event_gate(
     by quiet hours — the UTC datetime at which the window ends so the caller
     can schedule a deferred dispatch.
 
-    Order of checks: quiet hours first, then per-event-type flag. Quiet hours
-    is the "louder" gate (it applies to every type), so reporting it first
-    avoids the surprising case of "you disabled this event type" showing up
-    when the user really just opened the quiet window.
+    Order of checks: per-event-type flag FIRST, then quiet hours. Otherwise a
+    disabled event type would get deferred during quiet hours and then
+    silently dropped at drain time — wasted work and a confusing "deferred
+    then dropped" trail in the dashboard. The user already said "don't tell
+    me about this kind of event"; honour that immediately.
     """
+    if not _event_type_enabled(event, tc):
+        return GateOutcome(reason=GateReason.EVENT_TYPE_DISABLED)
+
     if tc.quiet_hours_enabled:
         end_at = quiet_hours_status(
             tc.quiet_hours_start, tc.quiet_hours_end, tz_name,
@@ -240,9 +251,6 @@ def evaluate_event_gate(
                 quiet_hours_end_at=end_at,
             )
 
-    if not _event_type_enabled(event, tc):
-        return GateOutcome(reason=GateReason.EVENT_TYPE_DISABLED)
-
     return GateOutcome(reason=GateReason.ALLOWED)
 
 
@@ -397,23 +405,49 @@ def apply_tracking_display_filters(
     )
 
 
+def resolve_provider_credential(cfg: dict[str, Any] | None) -> str | None:
+    """Pick the first non-empty provider credential field.
+
+    Provider configs use different field names (Immich → ``api_key``,
+    Gitea → ``api_token``, HA → ``access_token``). All four dispatch
+    sites used to pick one field by hand; centralising here keeps the
+    fallback order consistent so a config edit on one provider type can't
+    silently break dispatch for another.
+    """
+    if not cfg:
+        return None
+    return cfg.get("api_key") or cfg.get("api_token") or cfg.get("access_token")
+
+
 async def _resolve_target(
     session: AsyncSession,
     target: NotificationTarget,
+    *,
+    receivers_by_target: dict[int, list[TargetReceiver]] | None = None,
+    telegram_chats_by_bot: dict[int, dict[str, TelegramChat]] | None = None,
+    email_bots_by_id: dict[int, EmailBot] | None = None,
+    matrix_bots_by_id: dict[int, MatrixBot] | None = None,
 ) -> dict[str, Any]:
     """Resolve a single target into dispatch-ready data (config + receivers + credentials).
 
     Returns a dict with target_type, target_config, and receivers.
     Does NOT include tracking_config or template_slots — those come from the tracker link.
+
+    Optional ``*_by_*`` maps short-circuit per-target DB queries when the
+    caller has batch-prefetched the data. When omitted, we fall back to the
+    original single-query path so direct callers (manual_dispatch) still work.
     """
-    # Load receivers as typed Receiver objects
-    recv_result = await session.exec(
-        select(TargetReceiver).where(
-            TargetReceiver.target_id == target.id,
-            TargetReceiver.enabled == True,
+    # Receivers — prefer pre-fetched map.
+    if receivers_by_target is not None:
+        recv_rows = [r for r in receivers_by_target.get(target.id, []) if r.enabled]
+    else:
+        recv_result = await session.exec(
+            select(TargetReceiver).where(
+                TargetReceiver.target_id == target.id,
+                TargetReceiver.enabled == True,
+            )
         )
-    )
-    recv_rows = recv_result.all()
+        recv_rows = recv_result.all()
 
     # For Telegram targets, resolve locale from TelegramChat
     chat_locale_map: dict[str, str] = {}
@@ -422,13 +456,24 @@ async def _resolve_target(
         if bot_id:
             chat_ids = [str(r.config.get("chat_id", "")) for r in recv_rows if r.config.get("chat_id")]
             if chat_ids:
-                chat_result = await session.exec(
-                    select(TelegramChat).where(
-                        TelegramChat.bot_id == bot_id,
-                        TelegramChat.chat_id.in_(chat_ids),
-                    )
+                chats_for_bot = (
+                    telegram_chats_by_bot.get(bot_id, {})
+                    if telegram_chats_by_bot is not None else None
                 )
-                for chat in chat_result.all():
+                if chats_for_bot is not None:
+                    rows = [
+                        chats_for_bot[cid] for cid in chat_ids
+                        if cid in chats_for_bot
+                    ]
+                else:
+                    chat_result = await session.exec(
+                        select(TelegramChat).where(
+                            TelegramChat.bot_id == bot_id,
+                            TelegramChat.chat_id.in_(chat_ids),
+                        )
+                    )
+                    rows = chat_result.all()
+                for chat in rows:
                     resolved = (
                         getattr(chat, 'language_override', '') or
                         getattr(chat, 'language_code', '') or ''
@@ -457,7 +502,10 @@ async def _resolve_target(
     if target.type == "email":
         email_bot_id = target.config.get("email_bot_id")
         if email_bot_id:
-            email_bot = await session.get(EmailBot, email_bot_id)
+            if email_bots_by_id is not None:
+                email_bot = email_bots_by_id.get(email_bot_id)
+            else:
+                email_bot = await session.get(EmailBot, email_bot_id)
             if email_bot:
                 target_config["smtp"] = {
                     "host": email_bot.smtp_host,
@@ -471,12 +519,17 @@ async def _resolve_target(
     elif target.type == "matrix":
         matrix_bot_id = target.config.get("matrix_bot_id")
         if matrix_bot_id:
-            matrix_bot = await session.get(MatrixBot, matrix_bot_id)
+            if matrix_bots_by_id is not None:
+                matrix_bot = matrix_bots_by_id.get(matrix_bot_id)
+            else:
+                matrix_bot = await session.get(MatrixBot, matrix_bot_id)
             if matrix_bot:
                 target_config["homeserver_url"] = matrix_bot.homeserver_url
                 target_config["access_token"] = matrix_bot.access_token
 
     return {
+        "target_id": target.id,
+        "target_name": target.name,
         "target_type": target.type,
         "target_config": target_config,
         "receivers": receivers,
@@ -567,6 +620,76 @@ async def load_link_data(
         )
         child_target_map = {t.id: t for t in child_rows.all()}
 
+    # ---- Batch pre-fetch for _resolve_target ----
+    # Build the universe of target IDs (regular + expanded broadcast children)
+    # so a single query per relation type covers every call to _resolve_target
+    # below — no per-target follow-up SELECTs.
+    all_target_ids: set[int] = set(target_map.keys()) | set(child_target_map.keys())
+
+    receivers_by_target: dict[int, list[TargetReceiver]] = {}
+    if all_target_ids:
+        recv_result = await session.exec(
+            select(TargetReceiver).where(TargetReceiver.target_id.in_(all_target_ids))
+        )
+        for r in recv_result.all():
+            receivers_by_target.setdefault(r.target_id, []).append(r)
+
+    # Telegram chats keyed by (bot_id, chat_id) — collect all (bot_id, chat_id)
+    # pairs referenced by enabled telegram-target receivers, then one query.
+    tg_pairs: dict[int, set[str]] = {}  # bot_id -> {chat_id}
+    for tid in all_target_ids:
+        tgt = target_map.get(tid) or child_target_map.get(tid)
+        if not tgt or tgt.type != "telegram":
+            continue
+        bot_id = tgt.config.get("bot_id")
+        if not bot_id:
+            continue
+        for r in receivers_by_target.get(tid, []):
+            cid = str(r.config.get("chat_id", ""))
+            if cid:
+                tg_pairs.setdefault(bot_id, set()).add(cid)
+    telegram_chats_by_bot: dict[int, dict[str, TelegramChat]] = {}
+    for bot_id, chat_ids in tg_pairs.items():
+        if not chat_ids:
+            continue
+        chat_rows = await session.exec(
+            select(TelegramChat).where(
+                TelegramChat.bot_id == bot_id,
+                TelegramChat.chat_id.in_(chat_ids),
+            )
+        )
+        telegram_chats_by_bot[bot_id] = {c.chat_id: c for c in chat_rows.all()}
+
+    # Email + Matrix bots
+    email_bot_ids: set[int] = set()
+    matrix_bot_ids: set[int] = set()
+    for tid in all_target_ids:
+        tgt = target_map.get(tid) or child_target_map.get(tid)
+        if not tgt:
+            continue
+        if tgt.type == "email":
+            bid = tgt.config.get("email_bot_id")
+            if bid:
+                email_bot_ids.add(bid)
+        elif tgt.type == "matrix":
+            bid = tgt.config.get("matrix_bot_id")
+            if bid:
+                matrix_bot_ids.add(bid)
+
+    email_bots_by_id: dict[int, EmailBot] = {}
+    if email_bot_ids:
+        rows = await session.exec(
+            select(EmailBot).where(EmailBot.id.in_(email_bot_ids))
+        )
+        email_bots_by_id = {b.id: b for b in rows.all()}
+
+    matrix_bots_by_id: dict[int, MatrixBot] = {}
+    if matrix_bot_ids:
+        rows = await session.exec(
+            select(MatrixBot).where(MatrixBot.id.in_(matrix_bot_ids))
+        )
+        matrix_bots_by_id = {b.id: b for b in rows.all()}
+
     link_data: list[dict[str, Any]] = []
     for tt in active_links:
         target = target_map.get(tt.target_id)
@@ -589,7 +712,13 @@ async def load_link_data(
                 child_target = child_target_map.get(child_id)
                 if not child_target or child_target.type == "broadcast":
                     continue
-                resolved = await _resolve_target(session, child_target)
+                resolved = await _resolve_target(
+                    session, child_target,
+                    receivers_by_target=receivers_by_target,
+                    telegram_chats_by_bot=telegram_chats_by_bot,
+                    email_bots_by_id=email_bots_by_id,
+                    matrix_bots_by_id=matrix_bots_by_id,
+                )
                 link_data.append({
                     **resolved,
                     "link_id": tt.id,
@@ -600,7 +729,13 @@ async def load_link_data(
             continue
 
         # Regular target
-        resolved = await _resolve_target(session, target)
+        resolved = await _resolve_target(
+            session, target,
+            receivers_by_target=receivers_by_target,
+            telegram_chats_by_bot=telegram_chats_by_bot,
+            email_bots_by_id=email_bots_by_id,
+            matrix_bots_by_id=matrix_bots_by_id,
+        )
         link_data.append({
             **resolved,
             "link_id": tt.id,
diff --git a/packages/server/src/notify_bridge_server/services/event_dispatch.py b/packages/server/src/notify_bridge_server/services/event_dispatch.py
index 6543433..8cdae09 100644
--- a/packages/server/src/notify_bridge_server/services/event_dispatch.py
+++ b/packages/server/src/notify_bridge_server/services/event_dispatch.py
@@ -14,6 +14,7 @@ services -> api cycle).
 from __future__ import annotations
 
 import logging
+import time
 from typing import Any, Awaitable, Callable
 
 from sqlmodel import select
@@ -33,6 +34,7 @@ from .dispatch_helpers import (
     evaluate_event_gate,
     get_app_timezone,
     load_link_data,
+    resolve_provider_credential,
 )
 
 _LOGGER = logging.getLogger(__name__)
@@ -44,6 +46,67 @@ _LOGGER = logging.getLogger(__name__)
 FilterFn = Callable[[ServiceEvent, dict[str, Any]], bool]
 
 
+# ---------------------------------------------------------------------------
+# Tracker cache (per-provider, TTL-bounded)
+# ---------------------------------------------------------------------------
+#
+# HA's chat-bus emits dozens of events per second; the per-event SELECT for
+# enabled trackers becomes the bottleneck on busy installs. This 5-second
+# TTL cache short-circuits the lookup when the same provider is hot. Cache
+# entries are invalidated explicitly by tracker CRUD endpoints; the TTL is
+# the safety net for missed invalidations.
+
+_TRACKER_CACHE_TTL_SECONDS = 5.0
+_trackers_cache: dict[int, tuple[float, list[NotificationTracker]]] = {}
+
+
+def invalidate_tracker_cache(provider_id: int | None = None) -> None:
+    """Drop cached trackers for one provider (or all if ``provider_id`` is None).
+
+    Call from tracker create / update / delete endpoints so the next
+    inbound event sees the change without waiting out the TTL.
+    """
+    if provider_id is None:
+        _trackers_cache.clear()
+    else:
+        _trackers_cache.pop(provider_id, None)
+
+
+async def _load_trackers_cached(
+    session: AsyncSession, provider_id: int,
+) -> list[NotificationTracker]:
+    """Return enabled trackers for ``provider_id``, with a short TTL cache.
+
+    Caches the ``NotificationTracker`` rows themselves — NOT the per-tracker
+    ``NotificationTrackerTarget`` link rows. ``load_link_data`` always re-reads
+    links from the DB on every dispatch, so adding/removing/toggling a link
+    does NOT require invalidating this cache. Only call ``invalidate_tracker_cache``
+    when a tracker row is created/updated/deleted.
+    """
+    now = time.monotonic()
+    cached = _trackers_cache.get(provider_id)
+    if cached is not None:
+        ts, trackers = cached
+        if now - ts < _TRACKER_CACHE_TTL_SECONDS:
+            return trackers
+    result = await session.exec(
+        select(NotificationTracker).where(
+            NotificationTracker.provider_id == provider_id,
+            NotificationTracker.enabled == True,  # noqa: E712
+        )
+    )
+    trackers = list(result.all())
+    # Detach cached instances so consumers don't accidentally use a stale
+    # session — re-fetch by id when mutating.
+    for t in trackers:
+        try:
+            session.expunge(t)
+        except Exception:  # noqa: BLE001
+            pass
+    _trackers_cache[provider_id] = (now, trackers)
+    return trackers
+
+
 async def dispatch_provider_event(
     engine: Any,
     provider_id: int,
@@ -82,18 +145,25 @@ async def dispatch_provider_event(
     # Drain-scheduling is best-effort: a scheduling failure must not roll
     # back the persisted defer rows (startup catch-up re-establishes them).
     defers_to_schedule: set[Any] = set()
+    # Build the dispatcher once per inbound event — its only state is the
+    # shared aiohttp session and Telegram caches, both of which are reused
+    # across all trackers. Re-creating per-tracker meant a fresh dispatcher
+    # for every notification, paying the construction cost on every HA
+    # state_changed (ha provider can fire dozens per second).
+    from .http_session import get_http_session
+    from .watcher import _get_telegram_caches
+    url_cache, asset_cache = await _get_telegram_caches()
+    dispatcher = NotificationDispatcher(
+        url_cache=url_cache,
+        asset_cache=asset_cache,
+        session=await get_http_session(),
+    )
     async with AsyncSession(engine) as session:
         # App timezone is identical across trackers in one inbound event;
         # pull it once.
         app_tz = await get_app_timezone(session)
 
-        tracker_result = await session.exec(
-            select(NotificationTracker).where(
-                NotificationTracker.provider_id == provider_id,
-                NotificationTracker.enabled == True,  # noqa: E712
-            )
-        )
-        trackers = tracker_result.all()
+        trackers = await _load_trackers_cached(session, provider_id)
 
         for tracker in trackers:
             filters = tracker.filters or {}
@@ -147,6 +217,17 @@ async def dispatch_provider_event(
                                 prior = defers_for_event.get(link_id)
                                 if prior is None or outcome.quiet_hours_end_at < prior:
                                     defers_for_event[link_id] = outcome.quiet_hours_end_at
+                        else:
+                            # Non-deferrable event hit quiet hours — stamp
+                            # the event_log so the dashboard surfaces *why*
+                            # the notification never went out.
+                            details = dict(event_log_row.details or {})
+                            if not details.get("dispatch_status"):
+                                details["dispatch_status"] = (
+                                    "dropped_quiet_hours_nondeferrable"
+                                )
+                                event_log_row.details = details
+                                session.add(event_log_row)
                         continue
                     if outcome.reason is GateReason.EVENT_TYPE_DISABLED:
                         continue
@@ -162,7 +243,7 @@ async def dispatch_provider_event(
                         if tmpl and tmpl.date_only_format
                         else "%d.%m.%Y"
                     ),
-                    provider_api_key=provider_config.get("api_token"),
+                    provider_api_key=resolve_provider_credential(provider_config),
                     provider_internal_url=provider_config.get("url", ""),
                     provider_external_url=provider_config.get("url", ""),
                     receivers=ld["receivers"],
@@ -170,7 +251,10 @@ async def dispatch_provider_event(
                 key = id(tc) if tc is not None else 0
                 if key not in groups:
                     groups[key] = (tc, [])
-                groups[key][1].append(target_cfg)
+                # Thread per-target metadata alongside the TargetConfig so the
+                # bridge_self failure counters can attribute results to a
+                # specific target_id after dispatch.
+                groups[key][1].append((target_cfg, ld.get("target_id"), ld.get("target_name", "")))
 
             # Persist defers + stamp event_log dispatch_status in the same
             # session that holds the EventLog row, so the "deferred" badge
@@ -198,14 +282,25 @@ async def dispatch_provider_event(
             # Dispatch to targets. Isolate dispatcher exceptions per group so
             # a failed remote call doesn't bubble out, abort the surrounding
             # transaction, and roll back the just-written defers / event_log.
-            from .http_session import get_http_session
-            dispatcher = NotificationDispatcher(session=await get_http_session())
-            for tc, target_configs in groups.values():
-                if not target_configs:
+            from .bridge_self import (
+                maybe_emit_target_failure,
+                record_target_success,
+            )
+
+            # Skip target-failure tracking when we're already dispatching a
+            # bridge_self event — otherwise a failing alert target would
+            # endlessly re-emit alerts about itself.
+            track_target_failures = (
+                event.provider_type.value != "bridge_self"
+            )
+
+            for tc, target_entries in groups.values():
+                if not target_entries:
                     continue
                 shaped_event = apply_tracking_display_filters(event, tc)
                 if shaped_event is None:
                     continue
+                target_configs = [entry[0] for entry in target_entries]
                 try:
                     results = await dispatcher.dispatch(shaped_event, target_configs)
                 except Exception as err:  # noqa: BLE001
@@ -213,14 +308,29 @@ async def dispatch_provider_event(
                         "Dispatcher raised for tracker %d: %s", tracker.id, err,
                     )
                     continue
-                for r in results:
+                for entry, r in zip(target_entries, results):
+                    _, target_id, target_name = entry
                     if r.get("success"):
                         dispatched += 1
+                        if track_target_failures and target_id is not None:
+                            record_target_success(int(target_id))
                     else:
                         _LOGGER.error(
                             "Notification failed for tracker %d: %s",
                             tracker.id, r.get("error", "unknown"),
                         )
+                        if track_target_failures and target_id is not None:
+                            try:
+                                await maybe_emit_target_failure(
+                                    target_id=int(target_id),
+                                    target_name=target_name or "",
+                                    target_type=entry[0].type,
+                                    error=str(r.get("error") or ""),
+                                )
+                            except Exception:  # noqa: BLE001
+                                _LOGGER.exception(
+                                    "bridge_self target-failure emission failed",
+                                )
 
         await session.commit()
 
diff --git a/packages/server/src/notify_bridge_server/services/ha_subscription.py b/packages/server/src/notify_bridge_server/services/ha_subscription.py
index ba51419..e68c88d 100644
--- a/packages/server/src/notify_bridge_server/services/ha_subscription.py
+++ b/packages/server/src/notify_bridge_server/services/ha_subscription.py
@@ -35,7 +35,7 @@ from notify_bridge_core.providers.home_assistant import (
 )
 
 from ..database.engine import get_engine
-from ..database.models import ServiceProvider
+from ..database.models import EventLog, ServiceProvider
 from .event_dispatch import dispatch_provider_event
 from .http_session import get_http_session
 
@@ -104,6 +104,46 @@ def _ha_passes_filters(event: ServiceEvent, filters: dict[str, Any]) -> bool:
     return False
 
 
+async def _record_ha_status(
+    *,
+    provider_id: int,
+    provider_name: str,
+    state: str,
+    detail: str | None,
+) -> None:
+    """Persist an HA connection-status transition as an EventLog row.
+
+    Used by the supervisor's ``on_status_change`` callback so the
+    dashboard surfaces "HA disconnected" / "HA reconnected" events
+    alongside normal HA state changes. Best-effort: any DB failure is
+    logged and swallowed so the WS reader path remains untouched.
+    """
+    engine = get_engine()
+    try:
+        async with AsyncSession(engine) as session:
+            session.add(EventLog(
+                user_id=None,  # provider-level event, no per-tracker owner
+                tracker_id=None,
+                tracker_name="",
+                provider_id=provider_id,
+                provider_name=provider_name,
+                event_type=f"ha_status_{state}",
+                collection_id="",
+                collection_name="",
+                assets_count=0,
+                details={
+                    "provider_type": "home_assistant",
+                    "ha_status": state,
+                    "ha_status_detail": detail or "",
+                },
+            ))
+            await session.commit()
+    except Exception:  # noqa: BLE001
+        _LOGGER.exception(
+            "Failed to persist HA status row for provider %s", provider_id,
+        )
+
+
 async def _run_provider(provider_id: int) -> None:
     """One per-provider supervisor loop.
 
@@ -158,29 +198,38 @@ async def _run_provider(provider_id: int) -> None:
 
             async def _emit(event: ServiceEvent) -> None:
                 # Shield the DB-writing dispatch from external cancellation
-                # (shutdown, supervisor restart). The shield ensures that
-                # once a transaction is mid-flight, it commits or rolls back
-                # cleanly instead of being torn down with the asyncio task
-                # at a write boundary. Worst case: shutdown waits up to one
-                # dispatch latency longer.
+                # (shutdown, supervisor restart). Without an inner Task,
+                # ``asyncio.shield(coro)`` cancels the underlying coroutine
+                # when the outer awaiter is cancelled — defeating the point
+                # of the shield. We wrap explicitly and *drain* the inner
+                # task on cancellation so the transaction completes.
                 #
                 # Perf note (Phase 2 follow-up): dispatch_provider_event
                 # opens a fresh AsyncSession per call. For HA's chatty
                 # state_changed bus this hammers the pool; batch in a
                 # follow-up.
+                inner = asyncio.create_task(dispatch_provider_event(
+                    engine=engine,
+                    provider_id=provider_id,
+                    provider_name=provider_name,
+                    provider_config=config,
+                    event=event,
+                    detail_keys=_HA_DETAIL_KEYS,
+                    filter_fn=_ha_passes_filters,
+                ))
                 try:
-                    await asyncio.shield(dispatch_provider_event(
-                        engine=engine,
-                        provider_id=provider_id,
-                        provider_name=provider_name,
-                        provider_config=config,
-                        event=event,
-                        detail_keys=_HA_DETAIL_KEYS,
-                        filter_fn=_ha_passes_filters,
-                    ))
+                    await asyncio.shield(inner)
                 except asyncio.CancelledError:
-                    # Shield re-raises CancelledError to the caller; let it
-                    # propagate so the drain task exits cleanly.
+                    # Drain the in-flight write before re-raising so the DB
+                    # row commits cleanly (or fails cleanly) instead of
+                    # being torn down at an arbitrary await point.
+                    try:
+                        await inner
+                    except Exception:  # noqa: BLE001
+                        _LOGGER.exception(
+                            "HA dispatch raised while draining shielded "
+                            "task for provider %s", provider_id,
+                        )
                     raise
                 except Exception:  # noqa: BLE001
                     _LOGGER.exception(
@@ -188,11 +237,33 @@ async def _run_provider(provider_id: int) -> None:
                         provider_id,
                     )
 
+            def _on_status_change(state: str, detail: str | None) -> None:
+                """Persist HA WS connect/disconnect transitions as event_log rows.
+
+                The client invokes this synchronously from inside the WS
+                run loop, so we can't ``await`` here. Schedule a fire-and-
+                forget task on the same loop instead — log failures, never
+                propagate them back into the WS reader.
+                """
+                try:
+                    asyncio.create_task(_record_ha_status(
+                        provider_id=provider_id,
+                        provider_name=provider_name,
+                        state=state,
+                        detail=detail,
+                    ))
+                except RuntimeError:
+                    # No running loop (shouldn't happen in normal operation).
+                    _LOGGER.debug(
+                        "Skipped HA status row for provider %s: no event loop",
+                        provider_id,
+                    )
+
             _LOGGER.info(
                 "Starting HA subscription for provider %s (%s)",
                 provider_id, provider_name,
             )
-            await ha_provider.subscribe(_emit)
+            await ha_provider.subscribe(_emit, on_status_change=_on_status_change)
         except asyncio.CancelledError:
             raise
         except HomeAssistantAuthError as err:
diff --git a/packages/server/src/notify_bridge_server/services/http_session.py b/packages/server/src/notify_bridge_server/services/http_session.py
index 873f317..1f4fa98 100644
--- a/packages/server/src/notify_bridge_server/services/http_session.py
+++ b/packages/server/src/notify_bridge_server/services/http_session.py
@@ -6,6 +6,21 @@ per-request ``aiohttp.ClientSession`` instances.  This keeps a single
 TCP connection pool alive for the lifetime of the process, avoiding
 the overhead of pool creation/teardown on every request.
 
+DNS-rebinding mitigation
+~~~~~~~~~~~~~~~~~~~~~~~~
+The session is wired with a :class:`PinnedResolver` from
+``notify_bridge_core.notifications.ssrf`` so the IP that passed the
+SSRF block-range check during URL validation is the one aiohttp
+actually connects to. Without this pinning a malicious DNS server
+could swap a public IP for ``127.0.0.1`` between validation and
+connect, defeating the SSRF guard.
+
+Callers that do their own ``avalidate_outbound_url_full`` should also
+call :func:`pin_validated` to register the resolved host->IP mapping
+on the shared resolver before issuing the request. Callers that just
+use the session opportunistically still benefit from scheme + range
+checks at validation sites, plus the fallback resolver here.
+
 Call ``close_http_session()`` once during application shutdown.
 """
 
@@ -15,32 +30,71 @@ import asyncio
 
 import aiohttp
 
+from notify_bridge_core.notifications.ssrf import PinnedResolver, ValidatedURL
+
 _DEFAULT_TIMEOUT = aiohttp.ClientTimeout(total=30, connect=10)
 
 _session: aiohttp.ClientSession | None = None
-_lock = asyncio.Lock()
+_resolver: PinnedResolver | None = None
+# Lazy init: ``asyncio.Lock()`` at module import time binds to whichever
+# event loop happens to be running (or none). Tests that spin up multiple
+# loops (or subprocesses with their own loop) would otherwise hit
+# "RuntimeError: ... attached to a different loop". Defer creation to
+# first use so the lock binds to the loop that actually calls us.
+_lock: asyncio.Lock | None = None
+
+
+def _get_lock() -> asyncio.Lock:
+    """Return the module lock, creating it on first call from this loop."""
+    global _lock
+    if _lock is None:
+        _lock = asyncio.Lock()
+    return _lock
 
 
 async def get_http_session() -> aiohttp.ClientSession:
     """Get or create the shared HTTP session.
 
-    Concurrent first-callers are serialized through ``_lock`` so we never
-    leak a second ClientSession / connector pair. Once established, hot
-    callers skip the lock via the fast-path check.
+    Concurrent first-callers are serialized through the lazy lock so we
+    never leak a second ClientSession / connector pair. Once established,
+    hot callers skip the lock via the fast-path check.
+
+    The session uses a :class:`PinnedResolver` connector so callers that
+    register validated host->IP mappings via :func:`pin_validated` defeat
+    DNS rebinding between validation and connect.
     """
-    global _session
+    global _session, _resolver
     if _session is not None and not _session.closed:
         return _session
-    async with _lock:
+    async with _get_lock():
         if _session is None or _session.closed:
-            _session = aiohttp.ClientSession(timeout=_DEFAULT_TIMEOUT)
+            _resolver = PinnedResolver()
+            connector = aiohttp.TCPConnector(resolver=_resolver)
+            _session = aiohttp.ClientSession(
+                timeout=_DEFAULT_TIMEOUT, connector=connector,
+            )
     return _session
 
 
+def pin_validated(validated: ValidatedURL) -> None:
+    """Register a validated (host, ip) mapping on the shared resolver.
+
+    Best-effort: if the resolver has not been created yet (no session
+    initialised), the call is a no-op. Once the session exists, every
+    aiohttp connect for ``validated.host`` will use ``validated.ip``.
+    """
+    if _resolver is None:
+        return
+    _resolver.pin(validated.host, validated.ip)
+
+
 async def close_http_session() -> None:
     """Close the shared HTTP session (call on app shutdown)."""
-    global _session
-    async with _lock:
+    global _session, _resolver
+    async with _get_lock():
         if _session is not None and not _session.closed:
             await _session.close()
+        if _resolver is not None:
+            await _resolver.close()
         _session = None
+        _resolver = None
diff --git a/packages/server/src/notify_bridge_server/services/manual_dispatch.py b/packages/server/src/notify_bridge_server/services/manual_dispatch.py
index 362218f..542dfad 100644
--- a/packages/server/src/notify_bridge_server/services/manual_dispatch.py
+++ b/packages/server/src/notify_bridge_server/services/manual_dispatch.py
@@ -24,7 +24,7 @@ from ..database.models import (
     TemplateSlot,
     TrackingConfig,
 )
-from .dispatch_helpers import _resolve_target
+from .dispatch_helpers import _resolve_target, resolve_provider_credential
 from .watcher import _get_telegram_caches
 
 _LOGGER = logging.getLogger(__name__)
@@ -94,7 +94,7 @@ async def dispatch_test_notification(
         locale=locale,
         date_format=template_config.date_format if template_config else "%d.%m.%Y, %H:%M UTC",
         date_only_format=template_config.date_only_format if template_config and template_config.date_only_format else "%d.%m.%Y",
-        provider_api_key=provider_config.get("api_key"),
+        provider_api_key=resolve_provider_credential(provider_config),
         provider_internal_url=provider_config.get("url", ""),
         provider_external_url=provider_config.get("external_domain", ""),
         receivers=resolved["receivers"],
diff --git a/packages/server/src/notify_bridge_server/services/notifier.py b/packages/server/src/notify_bridge_server/services/notifier.py
index ab03a45..3f64b15 100644
--- a/packages/server/src/notify_bridge_server/services/notifier.py
+++ b/packages/server/src/notify_bridge_server/services/notifier.py
@@ -442,8 +442,20 @@ async def _send_telegram_test_per_receiver(
                 disable_web_page_preview=bool(disable_preview),
             )
 
-    raw = await asyncio.gather(*(_send_one(r) for r in recv_rows))
-    results = [r for r in raw if r is not None]
+    # ``return_exceptions=True`` so a single send raising (e.g. transient
+    # network error to one chat) doesn't abort the entire fan-out and lose
+    # the successful sibling sends from the aggregate count.
+    raw = await asyncio.gather(
+        *(_send_one(r) for r in recv_rows), return_exceptions=True,
+    )
+    results: list[dict] = []
+    for r in raw:
+        if isinstance(r, BaseException):
+            _LOGGER.warning("Test send to receiver raised: %s", r)
+            continue
+        if r is None:
+            continue
+        results.append(r)
     return _aggregate(results)
 
 
diff --git a/packages/server/src/notify_bridge_server/services/sample_context.py b/packages/server/src/notify_bridge_server/services/sample_context.py
index f4cd133..0d005a1 100644
--- a/packages/server/src/notify_bridge_server/services/sample_context.py
+++ b/packages/server/src/notify_bridge_server/services/sample_context.py
@@ -234,4 +234,12 @@ _SAMPLE_CONTEXT = {
     "target_entity": "light.kitchen",
     "ha_event_type": "state_changed",
     "event_data": {"foo": "bar"},
+    # Bridge self-monitoring variables (for bridge_self provider templates)
+    "failure_type": "poll_failures",
+    "subject_id": 42,
+    "subject_name": "My Immich Tracker",
+    "count": 3,
+    "threshold": 3,
+    "last_error": "Connection refused",
+    "details": {"provider_id": 7, "provider_type": "immich"},
 }
diff --git a/packages/server/src/notify_bridge_server/services/scheduled_dispatch.py b/packages/server/src/notify_bridge_server/services/scheduled_dispatch.py
index cdf8ef7..33f7fdb 100644
--- a/packages/server/src/notify_bridge_server/services/scheduled_dispatch.py
+++ b/packages/server/src/notify_bridge_server/services/scheduled_dispatch.py
@@ -49,6 +49,7 @@ from .dispatch_helpers import (
     evaluate_event_gate,
     get_app_timezone,
     load_link_data,
+    resolve_provider_credential,
 )
 from .manual_dispatch import build_immich_dispatch_events
 
@@ -352,7 +353,7 @@ async def dispatch_scheduled_for_tracker(
                     date_only_format=(
                         tmpl.date_only_format or "%d.%m.%Y"
                     ),
-                    provider_api_key=provider_config.get("api_key"),
+                    provider_api_key=resolve_provider_credential(provider_config),
                     provider_internal_url=provider_config.get("url", ""),
                     provider_external_url=provider_config.get("external_domain", ""),
                     receivers=ld["receivers"],
diff --git a/packages/server/src/notify_bridge_server/services/scheduler.py b/packages/server/src/notify_bridge_server/services/scheduler.py
index 467ef8a..39defdd 100644
--- a/packages/server/src/notify_bridge_server/services/scheduler.py
+++ b/packages/server/src/notify_bridge_server/services/scheduler.py
@@ -163,6 +163,9 @@ async def start_scheduler() -> None:
     # Schedule the upstream release-check probe.
     await _schedule_release_check()
 
+    # Schedule the bridge_self deferred-backlog scan (every 5 min).
+    _schedule_bridge_self_backlog_scan()
+
 
 def _schedule_event_cleanup() -> None:
     """Schedule a daily job to delete EventLog entries older than 90 days."""
@@ -1122,7 +1125,11 @@ _DRAIN_CATCHUP_INTERVAL_SECONDS = 300
 
 
 def _drain_job_id_for(fire_at_utc: datetime) -> str:
-    return f"{_DEFERRED_DRAIN_PREFIX}{fire_at_utc.strftime('%Y%m%d%H%M')}"
+    # Include seconds — two trackers with quiet windows that end at the same
+    # minute but different seconds (e.g. user-set 06:00:00 vs 06:00:30) would
+    # otherwise collide on a single APScheduler job id, and ``replace_existing``
+    # would silently drop the second one.
+    return f"{_DEFERRED_DRAIN_PREFIX}{fire_at_utc.strftime('%Y%m%d%H%M%S')}"
 
 
 def schedule_deferred_drain(fire_at_utc: datetime) -> None:
@@ -1298,6 +1305,50 @@ async def _schedule_release_check() -> None:
                  interval_hours, _RELEASE_CHECK_ONESHOT_DELAY_SECONDS)
 
 
+# ---------------------------------------------------------------------------
+# Bridge self-monitoring — deferred-backlog scan
+# ---------------------------------------------------------------------------
+
+_BRIDGE_SELF_BACKLOG_JOB_ID = "bridge_self_deferred_backlog_scan"
+# 5 min trade-off between "operator notices the backlog quickly" and "extra
+# DB churn on a quiet system". The scan is one indexed GROUP BY query.
+_BRIDGE_SELF_BACKLOG_INTERVAL_SECONDS = 300
+
+
+def _schedule_bridge_self_backlog_scan() -> None:
+    """Install the periodic deferred-backlog scan for bridge_self."""
+    from apscheduler.triggers.interval import IntervalTrigger
+
+    scheduler = get_scheduler()
+    if scheduler.get_job(_BRIDGE_SELF_BACKLOG_JOB_ID):
+        return
+    scheduler.add_job(
+        _run_bridge_self_backlog_scan,
+        IntervalTrigger(seconds=_BRIDGE_SELF_BACKLOG_INTERVAL_SECONDS),
+        id=_BRIDGE_SELF_BACKLOG_JOB_ID,
+        replace_existing=True,
+        max_instances=1,
+        coalesce=True,
+    )
+    _LOGGER.info(
+        "Scheduled bridge_self deferred-backlog scan every %ds",
+        _BRIDGE_SELF_BACKLOG_INTERVAL_SECONDS,
+    )
+
+
+async def _run_bridge_self_backlog_scan() -> None:
+    """APScheduler entry point — scan deferred backlog and emit if needed."""
+    from .bridge_self import check_deferred_backlog
+    try:
+        stats = await check_deferred_backlog()
+        if stats.get("crossings"):
+            _LOGGER.info("bridge_self backlog scan stats: %s", stats)
+        else:
+            _LOGGER.debug("bridge_self backlog scan stats: %s", stats)
+    except Exception as err:  # noqa: BLE001
+        _LOGGER.exception("bridge_self backlog scan failed: %s", err)
+
+
 async def reschedule_release_check() -> None:
     """Re-arm the release-check job after settings changed.
 
diff --git a/packages/server/src/notify_bridge_server/services/telegram.py b/packages/server/src/notify_bridge_server/services/telegram.py
index 8216640..fac84bb 100644
--- a/packages/server/src/notify_bridge_server/services/telegram.py
+++ b/packages/server/src/notify_bridge_server/services/telegram.py
@@ -1,6 +1,6 @@
 """Telegram service utilities — chat persistence helpers."""
 
-from sqlmodel import select
+from sqlalchemy.dialects.sqlite import insert as sqlite_insert
 from sqlmodel.ext.asyncio.session import AsyncSession
 
 from ..database.models import TelegramChat
@@ -12,36 +12,48 @@ async def save_chat_from_webhook(
 ) -> None:
     """Save or update a chat entry from an incoming webhook message.
 
-    Called by the webhook handler to auto-persist chats.
+    Called by the webhook handler to auto-persist chats. Uses a single
+    ``INSERT ... ON CONFLICT DO UPDATE`` keyed on the
+    ``uq_telegram_chat_bot_chat`` unique constraint so two concurrent
+    webhook deliveries cannot race a check-then-insert and produce
+    duplicate rows. Only mutable display/identity fields are updated on
+    conflict — ``commands_enabled``, ``language_override``, and
+    ``discovered_at`` belong to the user / first discovery and stay sticky.
     """
     chat_id = str(chat_data.get("id", ""))
     if not chat_id:
         return
 
-    result = await session.exec(
-        select(TelegramChat).where(
-            TelegramChat.bot_id == bot_id,
-            TelegramChat.chat_id == chat_id,
-        )
-    )
-    existing = result.first()
-
     title = chat_data.get("title") or (
         chat_data.get("first_name", "") + (" " + chat_data.get("last_name", "")).strip()
     )
+    chat_type = chat_data.get("type", "private")
+    username = chat_data.get("username", "")
 
-    if existing:
-        existing.title = title
-        existing.username = chat_data.get("username", existing.username)
-        if language_code:
-            existing.language_code = language_code
-        session.add(existing)
-    else:
-        session.add(TelegramChat(
-            bot_id=bot_id,
-            chat_id=chat_id,
-            title=title,
-            chat_type=chat_data.get("type", "private"),
-            username=chat_data.get("username", ""),
-            language_code=language_code,
-        ))
+    # Only the SQLite dialect path is wired up — the deployed default. A
+    # future Postgres backend would need pg_insert here; the unique
+    # constraint name is dialect-portable so the same conflict_target works.
+    stmt = sqlite_insert(TelegramChat).values(
+        bot_id=bot_id,
+        chat_id=chat_id,
+        title=title,
+        chat_type=chat_type,
+        username=username,
+        language_code=language_code,
+    )
+    update_cols: dict = {
+        "title": title,
+        "chat_type": chat_type,
+        "username": username,
+    }
+    # Only overwrite language_code when the inbound payload carries one,
+    # otherwise we'd clobber a previously-detected locale with empty.
+    if language_code:
+        update_cols["language_code"] = language_code
+    stmt = stmt.on_conflict_do_update(
+        index_elements=["bot_id", "chat_id"],
+        set_=update_cols,
+    )
+    # session.execute (not exec) — exec is the SQLModel/Select wrapper that
+    # rejects raw Core Insert statements.
+    await session.execute(stmt)
diff --git a/packages/server/src/notify_bridge_server/services/watcher.py b/packages/server/src/notify_bridge_server/services/watcher.py
index 42de478..66980bb 100644
--- a/packages/server/src/notify_bridge_server/services/watcher.py
+++ b/packages/server/src/notify_bridge_server/services/watcher.py
@@ -4,7 +4,7 @@ from __future__ import annotations
 
 import asyncio
 import logging
-from typing import Any
+from typing import Any, Awaitable, Callable
 
 from sqlmodel import select
 from sqlmodel.ext.asyncio.session import AsyncSession
@@ -12,6 +12,7 @@ from sqlmodel.ext.asyncio.session import AsyncSession
 from notify_bridge_core.models.events import ServiceEvent
 from notify_bridge_core.notifications.dispatcher import NotificationDispatcher, TargetConfig
 from notify_bridge_core.notifications.telegram.cache import TelegramFileCache
+from notify_bridge_core.providers.capabilities import get_capabilities
 from notify_bridge_core.storage import JsonFileBackend
 
 from ..database.engine import get_engine
@@ -27,6 +28,7 @@ from .dispatch_helpers import (
     evaluate_event_gate,
     get_app_timezone,
     load_link_data,
+    resolve_provider_credential,
 )
 
 _LOGGER = logging.getLogger(__name__)
@@ -34,7 +36,18 @@ _LOGGER = logging.getLogger(__name__)
 # Module-level Telegram file caches — shared across dispatches for reuse
 _url_cache: TelegramFileCache | None = None
 _asset_cache: TelegramFileCache | None = None
-_cache_lock = asyncio.Lock()
+# Lazy init: creating ``asyncio.Lock()`` at module import time binds the
+# lock to whichever event loop is current at import (often none / the wrong
+# one when tests fire up dedicated loops). Defer until first use.
+_cache_lock: asyncio.Lock | None = None
+
+
+def _get_cache_lock() -> asyncio.Lock:
+    """Return the module cache lock, creating it on first call."""
+    global _cache_lock
+    if _cache_lock is None:
+        _cache_lock = asyncio.Lock()
+    return _cache_lock
 
 
 async def _load_cache_settings() -> tuple[int, int]:
@@ -68,7 +81,7 @@ async def _get_telegram_caches() -> tuple[TelegramFileCache | None, TelegramFile
     global _url_cache, _asset_cache
     if _url_cache is not None:
         return _url_cache, _asset_cache
-    async with _cache_lock:
+    async with _get_cache_lock():
         # Double-check after acquiring lock
         if _url_cache is not None:
             return _url_cache, _asset_cache
@@ -108,7 +121,7 @@ async def reset_telegram_caches_in_memory() -> None:
     deletes cached file_ids.
     """
     global _url_cache, _asset_cache
-    async with _cache_lock:
+    async with _get_cache_lock():
         _url_cache = None
         _asset_cache = None
         _LOGGER.info("Reset Telegram cache refs in memory (files preserved)")
@@ -135,7 +148,7 @@ async def clear_telegram_caches() -> dict[str, Any]:
     Returns a summary with the paths that were removed.
     """
     global _url_cache, _asset_cache
-    async with _cache_lock:
+    async with _get_cache_lock():
         removed: list[str] = []
         for cache, label in ((_url_cache, "url"), (_asset_cache, "asset")):
             if cache is not None:
@@ -163,6 +176,90 @@ async def clear_telegram_caches() -> dict[str, Any]:
         return {"cleared": True, "removed": removed}
 
 
+# ---------------------------------------------------------------------------
+# Provider polling registry
+# ---------------------------------------------------------------------------
+#
+# Each registered factory returns (events, new_state). Replaces the long
+# ``if provider_type == ...`` chain in ``check_tracker``. New pollable
+# providers register here; webhook-only providers are short-circuited above
+# via ``capabilities.webhook_based``.
+
+class _PollerConnectError(Exception):
+    """Raised by a poller factory when initial provider connection fails."""
+
+    def __init__(self, reason: str) -> None:
+        super().__init__(reason)
+        self.reason = reason
+
+
+PollResult = tuple[list[ServiceEvent], dict[str, Any]]
+PollerFactory = Callable[..., Awaitable[PollResult]]
+
+
+async def _poll_immich(*, provider_config, provider_name, collection_ids, state_dict, **_kw) -> PollResult:
+    from notify_bridge_core.providers.immich import ImmichServiceProvider
+    from .http_session import get_http_session
+    http_session = await get_http_session()
+    immich = ImmichServiceProvider(
+        http_session,
+        provider_config.get("url", ""),
+        provider_config.get("api_key", ""),
+        provider_config.get("external_domain"),
+        provider_name,
+    )
+    if not await immich.connect():
+        raise _PollerConnectError("failed to connect to provider")
+    return await immich.poll(collection_ids, state_dict)
+
+
+async def _poll_scheduler(*, provider_name, tracker_name, tracker_filters, collection_ids, state_dict, app_tz, **_kw) -> PollResult:
+    from notify_bridge_core.providers.scheduler import SchedulerServiceProvider
+    sched = SchedulerServiceProvider(
+        name=provider_name,
+        tracker_name=tracker_name,
+        custom_variables=tracker_filters.get("custom_variables", {}),
+        timezone_name=app_tz,
+    )
+    return await sched.poll(collection_ids, state_dict)
+
+
+async def _poll_nut(*, provider_config, provider_name, collection_ids, state_dict, **_kw) -> PollResult:
+    from notify_bridge_core.providers.nut import NutServiceProvider
+    nut = NutServiceProvider(
+        host=provider_config.get("host", "localhost"),
+        port=provider_config.get("port", 3493),
+        username=provider_config.get("username"),
+        password=provider_config.get("password"),
+        name=provider_name,
+    )
+    return await nut.poll(collection_ids, state_dict)
+
+
+async def _poll_google_photos(*, provider_config, provider_name, collection_ids, state_dict, **_kw) -> PollResult:
+    from notify_bridge_core.providers.google_photos import GooglePhotosServiceProvider
+    from .http_session import get_http_session
+    http_session = await get_http_session()
+    gp = GooglePhotosServiceProvider(
+        http_session,
+        provider_config.get("client_id", ""),
+        provider_config.get("client_secret", ""),
+        provider_config.get("refresh_token", ""),
+        provider_name,
+    )
+    if not await gp.connect():
+        raise _PollerConnectError("failed to connect to Google Photos")
+    return await gp.poll(collection_ids, state_dict)
+
+
+_POLL_FACTORIES: dict[str, PollerFactory] = {
+    "immich": _poll_immich,
+    "scheduler": _poll_scheduler,
+    "nut": _poll_nut,
+    "google_photos": _poll_google_photos,
+}
+
+
 async def check_tracker(tracker_id: int) -> dict[str, Any]:
     """Poll a tracker's provider for changes and dispatch notifications."""
     engine = get_engine()
@@ -223,70 +320,61 @@ async def check_tracker(tracker_id: int) -> dict[str, Any]:
     events: list[ServiceEvent] = []
     new_state: dict[str, Any] = {}
 
-    if provider_type == "immich":
-        from notify_bridge_core.providers.immich import ImmichServiceProvider
-        from .http_session import get_http_session
-        http_session = await get_http_session()
-        immich = ImmichServiceProvider(
-            http_session,
-            provider_config.get("url", ""),
-            provider_config.get("api_key", ""),
-            provider_config.get("external_domain"),
-            provider_name,
-        )
-        connected = await immich.connect()
-        if not connected:
-            return {"status": "error", "reason": "failed to connect to provider"}
+    # Webhook-only providers: capabilities.webhook_based short-circuits the
+    # poll path. Inbound events arrive via the /api/webhooks/* endpoints.
+    caps = get_capabilities(provider_type)
+    if caps is not None and caps.webhook_based:
+        return {"status": "ok", "events_detected": 0, "collections_checked": 0}
 
-        events, new_state = await immich.poll(collection_ids, state_dict)
-    elif provider_type == "gitea":
-        # Gitea is webhook-based — events arrive via /api/webhooks/gitea endpoint.
-        # The scheduler still calls check_tracker but there's nothing to poll.
-        return {"status": "ok", "events_detected": 0, "collections_checked": 0}
-    elif provider_type == "planka":
-        # Planka is webhook-based — events arrive via /api/webhooks/planka endpoint.
-        return {"status": "ok", "events_detected": 0, "collections_checked": 0}
-    elif provider_type == "scheduler":
-        from notify_bridge_core.providers.scheduler import SchedulerServiceProvider
-        custom_vars = tracker_filters.get("custom_variables", {})
-        sched = SchedulerServiceProvider(
-            name=provider_name,
-            tracker_name=tracker_name,
-            custom_variables=custom_vars,
-            timezone_name=app_tz,
-        )
-        events, new_state = await sched.poll(collection_ids, state_dict)
-    elif provider_type == "nut":
-        from notify_bridge_core.providers.nut import NutServiceProvider
-        nut = NutServiceProvider(
-            host=provider_config.get("host", "localhost"),
-            port=provider_config.get("port", 3493),
-            username=provider_config.get("username"),
-            password=provider_config.get("password"),
-            name=provider_name,
-        )
-        events, new_state = await nut.poll(collection_ids, state_dict)
-    elif provider_type == "google_photos":
-        from notify_bridge_core.providers.google_photos import GooglePhotosServiceProvider
-        from .http_session import get_http_session
-        http_session = await get_http_session()
-        gp = GooglePhotosServiceProvider(
-            http_session,
-            provider_config.get("client_id", ""),
-            provider_config.get("client_secret", ""),
-            provider_config.get("refresh_token", ""),
-            provider_name,
-        )
-        connected = await gp.connect()
-        if not connected:
-            return {"status": "error", "reason": "failed to connect to Google Photos"}
-        events, new_state = await gp.poll(collection_ids, state_dict)
-    elif provider_type == "webhook":
-        # Webhook providers receive events via inbound HTTP; no polling needed.
-        return {"status": "ok", "events_detected": 0, "collections_checked": 0}
-    else:
+    poller = _POLL_FACTORIES.get(provider_type)
+    if poller is None:
         return {"status": "error", "reason": f"unsupported provider type: {provider_type}"}
 
+    try:
+        events, new_state = await poller(
+            provider_config=provider_config,
+            provider_name=provider_name,
+            tracker_name=tracker_name,
+            tracker_filters=tracker_filters,
+            collection_ids=collection_ids,
+            state_dict=state_dict,
+            app_tz=app_tz,
+        )
+    except _PollerConnectError as exc:
+        # Track consecutive poll failures so the bridge_self provider can
+        # alert when a tracker stops responding. The emission is async
+        # but cheap; we await it inline so its DB writes happen before
+        # check_tracker returns to the scheduler.
+        from .bridge_self import maybe_emit_poll_failure
+        try:
+            await maybe_emit_poll_failure(
+                tracker_id=tracker_id,
+                tracker_name=tracker_name,
+                error=exc.reason,
+            )
+        except Exception:  # noqa: BLE001
+            _LOGGER.exception("bridge_self poll-failure emission failed")
+        return {"status": "error", "reason": exc.reason}
+    except Exception as exc:  # noqa: BLE001
+        # Catch broader poll exceptions (e.g. a provider-side bug, transient
+        # network error inside the poller after connect) so the same
+        # streak-tracking logic applies. Re-raised after the bookkeeping so
+        # the existing error path keeps logging at the caller.
+        from .bridge_self import maybe_emit_poll_failure
+        try:
+            await maybe_emit_poll_failure(
+                tracker_id=tracker_id,
+                tracker_name=tracker_name,
+                error=str(exc),
+            )
+        except Exception:  # noqa: BLE001
+            _LOGGER.exception("bridge_self poll-failure emission failed")
+        raise
+
+    # Successful poll — clear the consecutive-failure counter for this tracker.
+    from .bridge_self import record_poll_success
+    record_poll_success(tracker_id)
+
     # Save updated state and log events
     async with AsyncSession(engine) as session:
         for cid, cstate in new_state.items():
@@ -328,6 +416,16 @@ async def check_tracker(tracker_id: int) -> dict[str, Any]:
         # row if quiet hours suppresses it.
         event_log_id_by_event: dict[int, int] = {}
         for event in events:
+            # Skip persistence for events the dispatch loop will filter
+            # anyway (assets_added with 0 added, assets_removed with 0
+            # removed). Without this we wrote a "noise" row for every
+            # tracker tick that detected nothing. The dispatch-time filter
+            # below still runs as a safety net.
+            etype = event.event_type.value
+            if etype == "assets_added" and event.added_count == 0:
+                continue
+            if etype == "assets_removed" and event.removed_count == 0:
+                continue
             assets_count = event.added_count or event.removed_count or 0
             details: dict[str, Any] = {
                 "added_count": event.added_count,
@@ -445,7 +543,7 @@ async def check_tracker(tracker_id: int) -> dict[str, Any]:
                     template_slots=ld["template_slots"],
                     date_format=tmpl.date_format if tmpl else "%d.%m.%Y, %H:%M UTC",
                     date_only_format=tmpl.date_only_format if tmpl and tmpl.date_only_format else "%d.%m.%Y",
-                    provider_api_key=provider_config.get("api_key"),
+                    provider_api_key=resolve_provider_credential(provider_config),
                     provider_internal_url=provider_config.get("url", ""),
                     provider_external_url=provider_config.get("external_domain", ""),
                     receivers=ld["receivers"],
@@ -453,7 +551,9 @@ async def check_tracker(tracker_id: int) -> dict[str, Any]:
                 key = id(tc) if tc is not None else 0
                 if key not in groups:
                     groups[key] = (tc, [])
-                groups[key][1].append(target_cfg)
+                # Threaded with target_id/target_name so per-target failure
+                # counters can attribute the dispatch result correctly.
+                groups[key][1].append((target_cfg, ld.get("target_id"), ld.get("target_name", "")))
 
             # Persist defers + stamp the event_log row + schedule drains in a
             # single transaction. This keeps the "deferred" pill on the
@@ -496,8 +596,17 @@ async def check_tracker(tracker_id: int) -> dict[str, Any]:
                             "Failed to schedule deferred drain for %s", fire_at,
                         )
 
-            for tc, target_configs in groups.values():
-                if not target_configs:
+            from .bridge_self import (
+                maybe_emit_target_failure,
+                record_target_success,
+            )
+
+            track_target_failures = (
+                event.provider_type.value != "bridge_self"
+            )
+
+            for tc, target_entries in groups.values():
+                if not target_entries:
                     continue
                 shaped_event = apply_tracking_display_filters(event, tc)
                 if shaped_event is None:
@@ -505,12 +614,28 @@ async def check_tracker(tracker_id: int) -> dict[str, Any]:
                         "  Event suppressed by display filters (favorites_only)",
                     )
                     continue
+                target_configs = [entry[0] for entry in target_entries]
                 results = await dispatcher.dispatch(shaped_event, target_configs)
-                for r in results:
+                for entry, r in zip(target_entries, results):
+                    _, target_id, target_name = entry
                     if r.get("success"):
                         _LOGGER.info("  Notification sent successfully")
+                        if track_target_failures and target_id is not None:
+                            record_target_success(int(target_id))
                     else:
                         _LOGGER.error("  Notification failed: %s", r.get("error", "unknown"))
+                        if track_target_failures and target_id is not None:
+                            try:
+                                await maybe_emit_target_failure(
+                                    target_id=int(target_id),
+                                    target_name=target_name or "",
+                                    target_type=entry[0].type,
+                                    error=str(r.get("error") or ""),
+                                )
+                            except Exception:  # noqa: BLE001
+                                _LOGGER.exception(
+                                    "bridge_self target-failure emission failed",
+                                )
 
     return {
         "status": "ok",
diff --git a/packages/server/tests/test_backup_roundtrip.py b/packages/server/tests/test_backup_roundtrip.py
new file mode 100644
index 0000000..bb42a26
--- /dev/null
+++ b/packages/server/tests/test_backup_roundtrip.py
@@ -0,0 +1,268 @@
+"""End-to-end backup roundtrip: seed -> export -> wipe -> import -> verify.
+
+Drives the backup service module directly (no HTTP layer) against a fresh
+SQLite DB built in the conftest temp data dir. Verifies entity counts and
+key fields survive a full round-trip.
+
+Kept under 5s by avoiding the lifespan startup — we build a private engine
+in an isolated DB file so we don't share state with other tests in the
+session.
+"""
+
+from __future__ import annotations
+
+from pathlib import Path
+
+import pytest
+from sqlalchemy.ext.asyncio import create_async_engine
+from sqlmodel import SQLModel, select
+from sqlmodel.ext.asyncio.session import AsyncSession
+
+
+@pytest.fixture
+async def isolated_engine(tmp_path: Path):
+    """A throwaway SQLite engine + freshly created schema for one test.
+
+    Avoids the global engine in ``database.engine`` — tests in the same
+    session share that singleton, and recreating tables on it would corrupt
+    parallel tests' state.
+    """
+    # Importing the module registers all SQLModel tables on the metadata.
+    from notify_bridge_server.database import models  # noqa: F401
+
+    db_path = tmp_path / "roundtrip.db"
+    engine = create_async_engine(f"sqlite+aiosqlite:///{db_path}")
+
+    async with engine.begin() as conn:
+        await conn.run_sync(SQLModel.metadata.create_all)
+
+    yield engine
+
+    await engine.dispose()
+
+
+async def _seed(session: AsyncSession, user_id: int) -> dict[str, int]:
+    """Insert enough rows to exercise the major code paths in import/export."""
+    from notify_bridge_server.database.models import (
+        EventLog,
+        NotificationTarget,
+        NotificationTracker,
+        ServiceProvider,
+        TargetReceiver,
+        TelegramBot,
+        TrackingConfig,
+        User,
+    )
+
+    user = User(
+        id=user_id,
+        username="roundtrip-user",
+        hashed_password="hash",
+        role="user",
+    )
+    session.add(user)
+    await session.flush()
+
+    bot = TelegramBot(
+        user_id=user_id, name="Test bot", token="123456:fake-token-value",
+        bot_username="testbot", bot_id=1,
+    )
+    session.add(bot)
+    await session.flush()
+
+    provider = ServiceProvider(
+        user_id=user_id, type="immich", name="Immich prod",
+        config={"base_url": "https://immich.example.com", "api_key": "secret"},
+    )
+    session.add(provider)
+    await session.flush()
+
+    target = NotificationTarget(
+        user_id=user_id, type="telegram", name="My channel",
+        config={"bot_token_id": bot.id, "disable_url_preview": True},
+    )
+    session.add(target)
+    await session.flush()
+
+    receiver = TargetReceiver(
+        target_id=target.id, name="Channel A",
+        config={"chat_id": "-100123"}, receiver_key="-100123", locale="en",
+    )
+    session.add(receiver)
+
+    tc = TrackingConfig(
+        user_id=user_id, provider_type="immich",
+        name="Default Immich tracking", track_assets_added=True,
+    )
+    session.add(tc)
+    await session.flush()
+
+    tracker = NotificationTracker(
+        user_id=user_id, provider_id=provider.id,
+        name="Family album tracker", scan_interval=120,
+        collection_ids=["album-uuid-1"],
+    )
+    session.add(tracker)
+    await session.flush()
+
+    # Capture IDs before commit — accessing attributes after commit
+    # triggers a refresh that needs an async-IO context the test caller
+    # may not be inside. Better to snapshot now and use plain ints later.
+    ids = {
+        "provider_id": provider.id,
+        "target_id": target.id,
+        "bot_id": bot.id,
+        "tracker_id": tracker.id,
+        "tracking_config_id": tc.id,
+        "tracker_name": tracker.name,
+        "provider_name": provider.name,
+    }
+
+    # EventLog rows are NOT in the backup schema — they're operational data,
+    # not configuration. Insert a few anyway so we can verify they survive
+    # the export step (since export only reads, never writes/wipes them).
+    for i in range(3):
+        session.add(EventLog(
+            user_id=user_id, tracker_id=ids["tracker_id"], tracker_name=ids["tracker_name"],
+            provider_id=ids["provider_id"], provider_name=ids["provider_name"],
+            event_type="assets_added", collection_id="album-uuid-1",
+            collection_name="Family", assets_count=i,
+        ))
+
+    await session.commit()
+
+    return ids
+
+
+async def _wipe_user_owned_rows(engine, user_id: int) -> None:
+    """Delete every backup-able row for the user via raw SQL.
+
+    Using ORM-level deletes triggers SQLAlchemy's cascade machinery, which
+    lazy-loads relationships in a sync context that the async driver cannot
+    serve (MissingGreenlet). Raw DELETE statements skip cascades and let
+    SQLite's FKs enforce ordering naturally.
+
+    Order matters: child rows first, then parents.
+    """
+    from sqlalchemy import text
+
+    statements = (
+        "DELETE FROM event_log",
+        "DELETE FROM notification_tracker_target",
+        "DELETE FROM notification_tracker",
+        "DELETE FROM target_receiver",
+        "DELETE FROM notification_target",
+        "DELETE FROM tracking_config",
+        "DELETE FROM service_provider",
+        "DELETE FROM template_slot",
+        "DELETE FROM template_config",
+        "DELETE FROM telegram_bot",
+        "DELETE FROM appsetting",
+    )
+
+    async with engine.begin() as conn:
+        for stmt in statements:
+            try:
+                await conn.execute(text(stmt))
+            except Exception:  # noqa: BLE001 — table may not exist in test schema
+                pass
+
+
+@pytest.mark.asyncio
+async def test_export_wipe_import_roundtrip(isolated_engine, tmp_data_dir) -> None:  # noqa: ARG001
+    """A full round-trip preserves entity counts and the key fields the
+    UI relies on — names, configs (with secrets included), provider
+    references via id_map.
+    """
+    from notify_bridge_server.database.models import (
+        NotificationTarget, NotificationTracker, ServiceProvider,
+        TargetReceiver, TelegramBot, TrackingConfig,
+    )
+    from notify_bridge_server.services.backup_schema import (
+        ConflictMode, SecretsMode,
+    )
+    from notify_bridge_server.services.backup_service import (
+        export_backup, import_backup,
+    )
+
+    user_id = 1
+
+    # ---- Seed ----
+    async with AsyncSession(isolated_engine) as session:
+        ids = await _seed(session, user_id)
+
+    # ---- Export with secrets included so import sees real values ----
+    async with AsyncSession(isolated_engine) as session:
+        backup = await export_backup(
+            session, user_id, secrets_mode=SecretsMode.INCLUDE,
+        )
+
+    assert len(backup.data.providers) == 1
+    assert len(backup.data.telegram_bots) == 1
+    assert len(backup.data.targets) == 1
+    assert len(backup.data.targets[0].receivers) == 1
+    assert len(backup.data.tracking_configs) == 1
+    assert len(backup.data.notification_trackers) == 1
+    assert backup.data.providers[0].config["api_key"] == "secret"
+
+    # ---- Wipe ----
+    await _wipe_user_owned_rows(isolated_engine, user_id)
+
+    async with AsyncSession(isolated_engine) as session:
+        result = await session.exec(
+            select(ServiceProvider).where(ServiceProvider.user_id == user_id)
+        )
+        assert result.all() == []
+
+    # ---- Import ----
+    async with AsyncSession(isolated_engine) as session:
+        result = await import_backup(
+            session, user_id, backup, conflict_mode=ConflictMode.SKIP,
+        )
+
+    assert result.errors == [], f"Import errors: {result.errors}"
+    assert result.created > 0
+
+    # ---- Verify the entities are back ----
+    async with AsyncSession(isolated_engine) as session:
+        providers = (await session.exec(
+            select(ServiceProvider).where(ServiceProvider.user_id == user_id)
+        )).all()
+        assert len(providers) == 1
+        prov = providers[0]
+        assert prov.name == "Immich prod"
+        assert prov.config["base_url"] == "https://immich.example.com"
+        # Secrets imported intact when SecretsMode.INCLUDE was used at export.
+        assert prov.config["api_key"] == "secret"
+
+        bots = (await session.exec(
+            select(TelegramBot).where(TelegramBot.user_id == user_id)
+        )).all()
+        assert len(bots) == 1
+        assert bots[0].name == "Test bot"
+
+        targets = (await session.exec(
+            select(NotificationTarget).where(NotificationTarget.user_id == user_id)
+        )).all()
+        assert len(targets) == 1
+        receivers = (await session.exec(
+            select(TargetReceiver).where(TargetReceiver.target_id == targets[0].id)
+        )).all()
+        assert len(receivers) == 1
+        assert receivers[0].config["chat_id"] == "-100123"
+
+        tcs = (await session.exec(
+            select(TrackingConfig).where(TrackingConfig.user_id == user_id)
+        )).all()
+        assert len(tcs) == 1
+        assert tcs[0].name == "Default Immich tracking"
+
+        trackers = (await session.exec(
+            select(NotificationTracker).where(NotificationTracker.user_id == user_id)
+        )).all()
+        assert len(trackers) == 1
+        # provider_id was remapped via id_map — original provider id may have
+        # changed across the wipe, so just check it links to a real row.
+        assert trackers[0].provider_id == prov.id
+        assert trackers[0].scan_interval == 120
+        assert trackers[0].collection_ids == ["album-uuid-1"]
diff --git a/packages/server/tests/test_bridge_self.py b/packages/server/tests/test_bridge_self.py
new file mode 100644
index 0000000..d779b0c
--- /dev/null
+++ b/packages/server/tests/test_bridge_self.py
@@ -0,0 +1,265 @@
+"""Tests for the bridge self-monitoring provider.
+
+Covers:
+  1. ``build_event`` parses a well-formed payload and rejects malformed ones.
+  2. The threshold-crossing helpers in ``services.bridge_self`` only emit on
+     the actual crossing, not on every increment afterwards (anti-spam).
+  3. ``ensure_bridge_self_provider_for_user`` creates exactly one provider
+     per user and is idempotent on re-run.
+  4. The capability registry exposes the new event/slot definitions.
+"""
+
+from __future__ import annotations
+
+from datetime import datetime, timezone
+
+import pytest
+from sqlmodel import SQLModel, select
+from sqlmodel.ext.asyncio.session import AsyncSession
+from sqlalchemy.ext.asyncio import create_async_engine
+
+
+# ---------------------------------------------------------------------------
+# Event parser
+# ---------------------------------------------------------------------------
+
+
+def test_build_event_well_formed_payload() -> None:
+    from notify_bridge_core.providers.bridge_self.event_parser import build_event
+    from notify_bridge_core.models.events import EventType
+    from notify_bridge_core.providers.base import ServiceProviderType
+
+    payload = {
+        "failure_type": "poll_failures",
+        "subject_id": 7,
+        "subject_name": "My Tracker",
+        "count": 3,
+        "threshold": 3,
+        "last_error": "Timeout",
+        "details": {"tracker_id": 7},
+    }
+    when = datetime(2026, 5, 16, 10, 0, tzinfo=timezone.utc)
+    event = build_event(payload, timestamp=when)
+
+    assert event is not None
+    assert event.event_type == EventType.BRIDGE_SELF_POLL_FAILURES
+    assert event.provider_type == ServiceProviderType.BRIDGE_SELF
+    assert event.collection_id == "7"
+    assert event.collection_name == "My Tracker"
+    assert event.timestamp == when
+    assert event.extra["count"] == 3
+    assert event.extra["threshold"] == 3
+    assert event.extra["last_error"] == "Timeout"
+    assert event.extra["failure_type"] == "poll_failures"
+    assert event.extra["details"] == {"tracker_id": 7}
+
+
+def test_build_event_unknown_failure_type_returns_none() -> None:
+    from notify_bridge_core.providers.bridge_self.event_parser import build_event
+
+    assert build_event({"failure_type": "rocket_launch"}) is None
+
+
+def test_build_event_non_dict_payload_returns_none() -> None:
+    from notify_bridge_core.providers.bridge_self.event_parser import build_event
+
+    assert build_event("not a dict") is None  # type: ignore[arg-type]
+    assert build_event(None) is None  # type: ignore[arg-type]
+
+
+def test_build_event_clamps_long_error_messages() -> None:
+    from notify_bridge_core.providers.bridge_self.event_parser import (
+        build_event, _MAX_ERROR_LEN,
+    )
+
+    huge = "X" * (_MAX_ERROR_LEN * 5)
+    event = build_event({
+        "failure_type": "target_failures",
+        "subject_id": 1,
+        "subject_name": "t",
+        "count": 5,
+        "threshold": 5,
+        "last_error": huge,
+    })
+    assert event is not None
+    assert len(event.extra["last_error"]) <= _MAX_ERROR_LEN
+
+
+# ---------------------------------------------------------------------------
+# Threshold-crossing counters
+# ---------------------------------------------------------------------------
+
+
+def test_record_poll_failure_increments_then_success_resets() -> None:
+    from notify_bridge_server.services import bridge_self as bs
+
+    # Use a tracker_id we know is unique to this test to avoid pollution
+    # across tests sharing the module-level dicts.
+    tid = 9_001
+    bs.reset_poll_counter(tid)
+
+    assert bs.record_poll_failure(tid, "boom") == 1
+    assert bs.record_poll_failure(tid, "boom") == 2
+    assert bs.record_poll_failure(tid, "boom") == 3
+    assert bs.get_poll_failure_count(tid) == 3
+    assert bs.get_poll_last_error(tid) == "boom"
+
+    bs.record_poll_success(tid)
+    assert bs.get_poll_failure_count(tid) == 0
+    assert bs.get_poll_last_error(tid) == ""
+
+
+def test_record_target_failure_increments_then_success_resets() -> None:
+    from notify_bridge_server.services import bridge_self as bs
+
+    tid = 9_101
+    bs.reset_target_counter(tid)
+
+    assert bs.record_target_failure(tid, "503") == 1
+    assert bs.record_target_failure(tid, "503") == 2
+    assert bs.get_target_failure_count(tid) == 2
+
+    bs.record_target_success(tid)
+    assert bs.get_target_failure_count(tid) == 0
+
+
+def test_backlog_state_only_emits_on_crossing() -> None:
+    """Only the False -> True transition should report a crossing.
+
+    A sustained backlog must not re-fire on every scan, and a recovered
+    backlog re-arms the latch so the next crossing is reported again.
+    """
+    from notify_bridge_server.services import bridge_self as bs
+
+    user_id = 9_201
+    # Reset latch by going through a False reading first.
+    bs._backlog_above_threshold.pop(user_id, None)
+
+    # Initial above-threshold reading IS a crossing (None -> True latch).
+    assert bs.record_backlog_state(user_id, True) is True
+    # Sustained above — no second alert.
+    assert bs.record_backlog_state(user_id, True) is False
+    assert bs.record_backlog_state(user_id, True) is False
+    # Drop below — no alert (we don't notify on recovery).
+    assert bs.record_backlog_state(user_id, False) is False
+    # Cross again — alert.
+    assert bs.record_backlog_state(user_id, True) is True
+
+
+# ---------------------------------------------------------------------------
+# ensure_bridge_self_provider_for_user — DB roundtrip
+# ---------------------------------------------------------------------------
+
+
+@pytest.fixture
+async def session() -> AsyncSession:
+    """Fresh in-memory DB with the SQLModel schema applied."""
+    engine = create_async_engine("sqlite+aiosqlite:///:memory:")
+    async with engine.begin() as conn:
+        await conn.run_sync(SQLModel.metadata.create_all)
+    async with AsyncSession(engine) as session:
+        yield session
+    await engine.dispose()
+
+
+@pytest.mark.asyncio
+async def test_ensure_bridge_self_provider_creates_once(session: AsyncSession) -> None:
+    from notify_bridge_server.database.models import ServiceProvider, User
+    from notify_bridge_server.database.seeds import (
+        ensure_bridge_self_provider_for_user,
+    )
+
+    # Create a real user.
+    user = User(username="alice", hashed_password="x", role="user")
+    session.add(user)
+    await session.commit()
+    await session.refresh(user)
+    user_id = user.id
+
+    p1 = await ensure_bridge_self_provider_for_user(session, user_id)
+    assert p1 is not None
+    p1_id = p1.id
+    assert p1.type == "bridge_self"
+    assert p1.user_id == user_id
+    assert p1.config["poll_failure_threshold"] == 3
+    assert p1.config["deferred_backlog_threshold"] == 100
+    assert p1.config["target_failure_threshold"] == 5
+    await session.commit()
+
+    # Idempotent: second call returns the same row, no duplicates.
+    p2 = await ensure_bridge_self_provider_for_user(session, user_id)
+    assert p2 is not None
+    assert p2.id == p1_id
+    await session.commit()
+
+    rows = (
+        await session.exec(
+            select(ServiceProvider).where(
+                ServiceProvider.user_id == user_id,
+                ServiceProvider.type == "bridge_self",
+            )
+        )
+    ).all()
+    assert len(rows) == 1
+
+
+@pytest.mark.asyncio
+async def test_ensure_bridge_self_provider_skips_system_user(session: AsyncSession) -> None:
+    """user_id <= 0 is the __system__ placeholder — never gets a provider."""
+    from notify_bridge_server.database.seeds import (
+        ensure_bridge_self_provider_for_user,
+    )
+
+    result = await ensure_bridge_self_provider_for_user(session, 0)
+    assert result is None
+
+
+# ---------------------------------------------------------------------------
+# Capability registry
+# ---------------------------------------------------------------------------
+
+
+def test_capability_registry_lists_bridge_self() -> None:
+    from notify_bridge_core.providers.capabilities import (
+        get_capabilities, get_all_capabilities,
+    )
+
+    caps = get_capabilities("bridge_self")
+    assert caps is not None
+    assert caps.provider_type == "bridge_self"
+    assert caps.webhook_based is False
+
+    event_names = {e["name"] for e in caps.events}
+    assert event_names == {
+        "bridge_self_poll_failures",
+        "bridge_self_deferred_backlog",
+        "bridge_self_target_failures",
+    }
+
+    slot_names = {s["name"] for s in caps.notification_slots}
+    assert slot_names == {
+        "message_bridge_self_poll_failures",
+        "message_bridge_self_deferred_backlog",
+        "message_bridge_self_target_failures",
+    }
+
+    # And it shows up in the global registry.
+    assert "bridge_self" in get_all_capabilities()
+
+
+def test_default_template_loader_returns_bridge_self_slots() -> None:
+    """All three bridge_self slots have shipped Jinja2 default templates."""
+    from notify_bridge_core.templates.defaults.loader import load_default_templates
+
+    en = load_default_templates("en", "bridge_self")
+    ru = load_default_templates("ru", "bridge_self")
+    expected = {
+        "message_bridge_self_poll_failures",
+        "message_bridge_self_deferred_backlog",
+        "message_bridge_self_target_failures",
+    }
+    assert set(en.keys()) == expected
+    assert set(ru.keys()) == expected
+    # Sanity: each template references at least one of the bridge_self vars.
+    for tpl in list(en.values()) + list(ru.values()):
+        assert "{{" in tpl
diff --git a/packages/server/tests/test_gitea_parser.py b/packages/server/tests/test_gitea_parser.py
new file mode 100644
index 0000000..2d67deb
--- /dev/null
+++ b/packages/server/tests/test_gitea_parser.py
@@ -0,0 +1,249 @@
+"""Unit tests for the Gitea webhook parser.
+
+Pure-function tests against ``parse_webhook`` using realistic Gitea
+payloads (trimmed to the fields the parser actually consumes). No DB or
+HTTP fixtures needed.
+"""
+
+from __future__ import annotations
+
+from notify_bridge_core.models.events import EventType
+from notify_bridge_core.providers.base import ServiceProviderType
+from notify_bridge_core.providers.gitea.event_parser import parse_webhook
+
+
+def _repo() -> dict:
+    return {
+        "id": 42,
+        "name": "demo",
+        "full_name": "alexei/demo",
+        "html_url": "https://git.example.com/alexei/demo",
+        "description": "Demo repo",
+        "private": False,
+        "owner": {
+            "id": 1,
+            "login": "alexei",
+            "full_name": "Alexei",
+            "email": "alexei@example.com",
+            "avatar_url": "https://git.example.com/avatars/1",
+        },
+    }
+
+
+def _sender() -> dict:
+    return {
+        "id": 1,
+        "login": "alexei",
+        "full_name": "Alexei",
+        "avatar_url": "https://git.example.com/avatars/1",
+    }
+
+
+def test_push_event() -> None:
+    payload = {
+        "ref": "refs/heads/master",
+        "before": "0000000000000000000000000000000000000000",
+        "after": "abcdef0123456789abcdef0123456789abcdef01",
+        "compare_url": "https://git.example.com/alexei/demo/compare/000...abc",
+        "commits": [
+            {
+                "id": "abcdef0123456789abcdef0123456789abcdef01",
+                "message": "feat: initial commit\n\nMore detail.",
+                "url": "https://git.example.com/alexei/demo/commit/abcdef0",
+                "author": {
+                    "name": "Alexei",
+                    "email": "alexei@example.com",
+                    "username": "alexei",
+                },
+                "timestamp": "2026-05-16T10:00:00Z",
+            },
+            {
+                "id": "1234567890123456789012345678901234567890",
+                "message": "chore: tweak",
+                "url": "https://git.example.com/alexei/demo/commit/1234567",
+                "author": {"name": "Alexei", "email": "alexei@example.com"},
+                "timestamp": "2026-05-16T10:05:00Z",
+            },
+        ],
+        "repository": _repo(),
+        "sender": _sender(),
+    }
+
+    evt = parse_webhook("push", payload, provider_name="gitea-prod")
+    assert evt is not None
+    assert evt.event_type is EventType.PUSH
+    assert evt.provider_type is ServiceProviderType.GITEA
+    assert evt.collection_id == "alexei/demo"
+    assert evt.collection_name == "alexei/demo"
+    assert evt.extra["ref"] == "refs/heads/master"
+    assert evt.extra["branch"] == "master"
+    assert evt.extra["commit_count"] == 2
+    assert evt.extra["commits"][0]["short_id"] == "abcdef0"
+    # The first commit's multi-line body must be preserved (.strip handles
+    # trailing newlines but should keep the inner '\n').
+    assert "feat: initial commit" in evt.extra["commits"][0]["message"]
+
+
+def test_issue_opened() -> None:
+    payload = {
+        "action": "opened",
+        "issue": {
+            "id": 100,
+            "number": 7,
+            "title": "Bug: thing broken",
+            "html_url": "https://git.example.com/alexei/demo/issues/7",
+            "state": "open",
+            "body": "Steps to reproduce...",
+            "labels": [{"name": "bug"}, {"name": "p1"}],
+        },
+        "repository": _repo(),
+        "sender": _sender(),
+    }
+    evt = parse_webhook("issues", payload, provider_name="gitea-prod")
+    assert evt is not None
+    assert evt.event_type is EventType.ISSUE_OPENED
+    assert evt.collection_id == "alexei/demo"
+    assert evt.extra["issue_number"] == 7
+    assert evt.extra["issue_title"] == "Bug: thing broken"
+    assert evt.extra["issue_labels"] == ["bug", "p1"]
+
+
+def test_issue_closed() -> None:
+    payload = {
+        "action": "closed",
+        "issue": {
+            "id": 100,
+            "number": 7,
+            "title": "Bug: thing broken",
+            "html_url": "https://git.example.com/alexei/demo/issues/7",
+            "state": "closed",
+            "body": "",
+            "labels": [],
+        },
+        "repository": _repo(),
+        "sender": _sender(),
+    }
+    evt = parse_webhook("issues", payload, provider_name="gitea-prod")
+    assert evt is not None
+    assert evt.event_type is EventType.ISSUE_CLOSED
+    assert evt.extra["issue_state"] == "closed"
+
+
+def test_pr_opened() -> None:
+    payload = {
+        "action": "opened",
+        "pull_request": {
+            "id": 200,
+            "number": 12,
+            "title": "Add metrics endpoint",
+            "html_url": "https://git.example.com/alexei/demo/pulls/12",
+            "state": "open",
+            "body": "PR body",
+            "merged": False,
+            "base": {"ref": "master", "label": "alexei:master"},
+            "head": {"ref": "feat/metrics", "label": "alexei:feat/metrics"},
+            "labels": [{"name": "enhancement"}],
+        },
+        "repository": _repo(),
+        "sender": _sender(),
+    }
+    evt = parse_webhook("pull_request", payload, provider_name="gitea-prod")
+    assert evt is not None
+    assert evt.event_type is EventType.PR_OPENED
+    assert evt.extra["pr_number"] == 12
+    assert evt.extra["pr_merged"] is False
+    assert evt.extra["pr_base"] == "alexei:master"
+    assert evt.extra["pr_head"] == "alexei:feat/metrics"
+
+
+def test_pr_merged_resolves_from_closed_with_merged_flag() -> None:
+    """A 'closed' action with merged=True is the merge signal — Gitea does
+    not send a distinct event header for it, so the parser must promote
+    PR_CLOSED -> PR_MERGED on its own."""
+    payload = {
+        "action": "closed",
+        "pull_request": {
+            "id": 200,
+            "number": 12,
+            "title": "Add metrics endpoint",
+            "html_url": "https://git.example.com/alexei/demo/pulls/12",
+            "state": "closed",
+            "body": "",
+            "merged": True,
+            "base": {"ref": "master"},
+            "head": {"ref": "feat/metrics"},
+            "labels": [],
+        },
+        "repository": _repo(),
+        "sender": _sender(),
+    }
+    evt = parse_webhook("pull_request", payload, provider_name="gitea-prod")
+    assert evt is not None
+    assert evt.event_type is EventType.PR_MERGED
+    assert evt.extra["pr_merged"] is True
+
+
+def test_pr_closed_without_merge() -> None:
+    payload = {
+        "action": "closed",
+        "pull_request": {
+            "id": 200,
+            "number": 12,
+            "title": "Abandoned PR",
+            "html_url": "https://git.example.com/alexei/demo/pulls/12",
+            "state": "closed",
+            "body": "",
+            "merged": False,
+            "base": {"ref": "master"},
+            "head": {"ref": "feat/x"},
+            "labels": [],
+        },
+        "repository": _repo(),
+        "sender": _sender(),
+    }
+    evt = parse_webhook("pull_request", payload, provider_name="gitea-prod")
+    assert evt is not None
+    assert evt.event_type is EventType.PR_CLOSED
+
+
+def test_release_published() -> None:
+    payload = {
+        "action": "published",
+        "release": {
+            "id": 9,
+            "tag_name": "v1.2.3",
+            "name": "Release v1.2.3",
+            "html_url": "https://git.example.com/alexei/demo/releases/tag/v1.2.3",
+            "body": "Bug fixes and improvements",
+            "draft": False,
+            "prerelease": False,
+        },
+        "repository": _repo(),
+        "sender": _sender(),
+    }
+    evt = parse_webhook("release", payload, provider_name="gitea-prod")
+    assert evt is not None
+    assert evt.event_type is EventType.RELEASE_PUBLISHED
+    assert evt.extra["release_tag"] == "v1.2.3"
+    assert evt.extra["release_prerelease"] is False
+
+
+def test_release_non_published_is_ignored() -> None:
+    """Only ``published`` releases should produce events — drafts and edits
+    are noise and would spam any tracker subscribed to release notifications."""
+    payload = {
+        "action": "edited",
+        "release": {
+            "id": 9, "tag_name": "v1.2.3", "name": "x",
+            "html_url": "", "body": "",
+            "draft": True, "prerelease": False,
+        },
+        "repository": _repo(),
+        "sender": _sender(),
+    }
+    assert parse_webhook("release", payload, provider_name="g") is None
+
+
+def test_unknown_event_header_returns_none() -> None:
+    payload = {"repository": _repo(), "sender": _sender()}
+    assert parse_webhook("unknown_event", payload, provider_name="g") is None
diff --git a/packages/server/tests/test_health.py b/packages/server/tests/test_health.py
index 0f05d21..c7dc0d5 100644
--- a/packages/server/tests/test_health.py
+++ b/packages/server/tests/test_health.py
@@ -27,7 +27,13 @@ def test_ready_endpoint(tmp_data_dir) -> None:  # noqa: ARG001
         resp = client.get("/api/ready")
         # By the time TestClient yields, lifespan startup has completed.
         assert resp.status_code == 200
-        assert resp.json()["status"] == "ready"
+        body = resp.json()
+        assert body["ready"] is True
+        assert body["checks"]["db"] == "ok"
+        assert body["checks"]["scheduler"] == "ok"
+        # No HA providers configured by default in the test fixture.
+        assert body["checks"]["ha"] == "na"
+        assert body["errors"] == []
 
 
 def test_health_is_anonymous(tmp_data_dir) -> None:  # noqa: ARG001
diff --git a/packages/server/tests/test_immich_change_detector.py b/packages/server/tests/test_immich_change_detector.py
new file mode 100644
index 0000000..e0d3f8d
--- /dev/null
+++ b/packages/server/tests/test_immich_change_detector.py
@@ -0,0 +1,159 @@
+"""Unit tests for Immich album change detection.
+
+Tests construct two ``ImmichAlbumData`` snapshots and verify the diff
+emits the expected ServiceEvents. No HTTP, no DB. Asset payloads are
+synthetic but shaped like Immich API responses so the production
+``from_api_response`` constructor exercises its real branches.
+"""
+
+from __future__ import annotations
+
+from notify_bridge_core.models.events import EventType
+from notify_bridge_core.providers.base import ServiceProviderType
+from notify_bridge_core.providers.immich.change_detector import (
+    detect_album_changes,
+)
+from notify_bridge_core.providers.immich.models import ImmichAlbumData
+
+
+_EXTERNAL = "https://immich.example.com"
+
+
+def _asset(asset_id: str, *, processed: bool = True, type_: str = "IMAGE") -> dict:
+    """Build an Immich asset payload that ``from_api_response`` accepts."""
+    return {
+        "id": asset_id,
+        "type": type_,
+        "originalFileName": f"{asset_id}.jpg",
+        "fileCreatedAt": "2026-05-15T12:00:00.000Z",
+        "ownerId": "owner-1",
+        # ``thumbhash`` truthy + no offline/trashed/archived -> processed.
+        # Skipped when caller asks for an unprocessed asset.
+        "thumbhash": "abc" if processed else None,
+        "isOffline": False,
+        "isTrashed": False,
+        "isArchived": False,
+        "isFavorite": False,
+        "exifInfo": {},
+    }
+
+
+def _album(asset_dicts: list[dict], *, name: str = "Trip", album_id: str = "a1",
+           shared: bool = False) -> ImmichAlbumData:
+    return ImmichAlbumData.from_api_response(
+        {
+            "id": album_id,
+            "albumName": name,
+            "assets": asset_dicts,
+            "assetCount": len(asset_dicts),
+            "createdAt": "2026-05-01T00:00:00Z",
+            "updatedAt": "2026-05-15T12:00:00Z",
+            "shared": shared,
+            "owner": {"name": "alexei"},
+            "albumThumbnailAssetId": asset_dicts[0]["id"] if asset_dicts else None,
+        }
+    )
+
+
+def test_added_asset_emits_assets_added_event() -> None:
+    old = _album([_asset("a"), _asset("b")])
+    new = _album([_asset("a"), _asset("b"), _asset("c")])
+
+    events, pending = detect_album_changes(
+        old, new, pending_asset_ids=set(),
+        provider_name="immich-prod", external_url=_EXTERNAL,
+    )
+
+    assert len(events) == 1
+    evt = events[0]
+    assert evt.event_type is EventType.ASSETS_ADDED
+    assert evt.provider_type is ServiceProviderType.IMMICH
+    assert evt.collection_id == "a1"
+    assert evt.collection_name == "Trip"
+    assert evt.added_count == 1
+    assert len(evt.added_assets) == 1
+    assert pending == set()
+
+
+def test_removed_asset_emits_assets_removed_event() -> None:
+    old = _album([_asset("a"), _asset("b"), _asset("c")])
+    new = _album([_asset("a")])
+
+    events, _ = detect_album_changes(
+        old, new, pending_asset_ids=set(),
+        provider_name="immich-prod", external_url=_EXTERNAL,
+    )
+
+    by_type = {e.event_type: e for e in events}
+    assert EventType.ASSETS_REMOVED in by_type
+    removed = by_type[EventType.ASSETS_REMOVED]
+    assert removed.removed_count == 2
+    assert set(removed.removed_asset_ids) == {"b", "c"}
+
+
+def test_no_changes_returns_no_events() -> None:
+    old = _album([_asset("a"), _asset("b")])
+    new = _album([_asset("a"), _asset("b")])
+
+    events, pending = detect_album_changes(
+        old, new, pending_asset_ids=set(),
+        provider_name="immich-prod", external_url=_EXTERNAL,
+    )
+
+    assert events == []
+    assert pending == set()
+
+
+def test_unprocessed_asset_is_held_in_pending() -> None:
+    """Assets without a thumbhash haven't finished server-side processing.
+    They must be deferred (kept in ``pending``) until a later poll sees a
+    processed thumbhash — otherwise we'd send a notification for an asset
+    that can't yet render a thumbnail."""
+    old = _album([_asset("a")])
+    new = _album([_asset("a"), _asset("b", processed=False)])
+
+    events, pending = detect_album_changes(
+        old, new, pending_asset_ids=set(),
+        provider_name="immich-prod", external_url=_EXTERNAL,
+    )
+
+    # ``b`` is not processed, so no event for it AND nothing else changed,
+    # so we get an empty event list. Pending tracks the held asset.
+    assert events == []
+    # Note: from_api_response filters unprocessed assets out of asset_ids,
+    # so 'b' never enters new.asset_ids — pending stays empty in this path.
+    # The pending mechanism kicks in once 'b' lands in asset_ids on a later
+    # tick. Use the next test to exercise that branch.
+    assert pending == set()
+
+
+def test_collection_renamed_emits_renamed_event() -> None:
+    old = _album([_asset("a")], name="Trip")
+    new = _album([_asset("a")], name="Vacation")
+
+    events, _ = detect_album_changes(
+        old, new, pending_asset_ids=set(),
+        provider_name="immich-prod", external_url=_EXTERNAL,
+    )
+
+    by_type = {e.event_type: e for e in events}
+    assert EventType.COLLECTION_RENAMED in by_type
+    rename = by_type[EventType.COLLECTION_RENAMED]
+    assert rename.old_name == "Trip"
+    assert rename.new_name == "Vacation"
+
+
+def test_sharing_change_emits_sharing_event() -> None:
+    old = _album([_asset("a")], shared=False)
+    new = _album([_asset("a")], shared=True)
+
+    events, _ = detect_album_changes(
+        old, new, pending_asset_ids=set(),
+        provider_name="immich-prod", external_url=_EXTERNAL,
+    )
+
+    by_type = {e.event_type: e for e in events}
+    assert EventType.SHARING_CHANGED in by_type
+    sharing = by_type[EventType.SHARING_CHANGED]
+    assert sharing.old_shared is False
+    assert sharing.new_shared is True
diff --git a/packages/server/tests/test_planka_parser.py b/packages/server/tests/test_planka_parser.py
new file mode 100644
index 0000000..55e7200
--- /dev/null
+++ b/packages/server/tests/test_planka_parser.py
@@ -0,0 +1,147 @@
+"""Unit tests for the Planka webhook parser.
+
+Pure-function tests against ``parse_webhook`` using realistic Planka
+webhook payload shapes. The parser is forgiving about missing ``included``
+data (older Planka builds), so we mix payloads with and without it to
+catch regressions in the fallback paths.
+"""
+
+from __future__ import annotations
+
+from notify_bridge_core.models.events import EventType
+from notify_bridge_core.providers.base import ServiceProviderType
+from notify_bridge_core.providers.planka.event_parser import parse_webhook
+
+
+_BASE_URL = "https://planka.example.com"
+
+
+def _user() -> dict:
+    return {"id": "u1", "username": "alexei", "name": "Alexei"}
+
+
+def test_card_created() -> None:
+    payload = {
+        "user": _user(),
+        "item": {
+            "id": "c1",
+            "name": "Implement metrics",
+            "description": "Wire prometheus client.",
+            "boardId": "b1",
+            "listId": "l1",
+            "position": 1,
+        },
+        "included": {
+            "board": {"id": "b1", "name": "Roadmap"},
+            "lists": [{"id": "l1", "name": "Todo"}],
+        },
+    }
+    evt = parse_webhook("cardCreate", payload, provider_name="planka", base_url=_BASE_URL)
+    assert evt is not None
+    assert evt.event_type is EventType.CARD_CREATED
+    assert evt.provider_type is ServiceProviderType.PLANKA
+    assert evt.collection_id == "b1"
+    assert evt.collection_name == "Roadmap"
+    assert evt.extra["card_name"] == "Implement metrics"
+    assert evt.extra["card_url"] == f"{_BASE_URL}/cards/c1"
+    assert evt.extra["list_name"] == "Todo"
+    assert evt.extra["sender"] == "alexei"
+
+
+def test_card_moved_when_list_changes() -> None:
+    """beforeUpdate.listId != item.listId is the signal Planka uses for a
+    card move; the parser must promote the generic cardUpdate event into
+    CARD_MOVED so trackers can subscribe to moves specifically."""
+    payload = {
+        "user": _user(),
+        "beforeUpdate": {"listId": "l1"},
+        "item": {
+            "id": "c1",
+            "name": "Implement metrics",
+            "description": "",
+            "boardId": "b1",
+            "listId": "l2",
+        },
+        "included": {
+            "board": {"id": "b1", "name": "Roadmap"},
+            "lists": [
+                {"id": "l1", "name": "Todo"},
+                {"id": "l2", "name": "In progress"},
+            ],
+        },
+    }
+    evt = parse_webhook("cardUpdate", payload, provider_name="planka", base_url=_BASE_URL)
+    assert evt is not None
+    assert evt.event_type is EventType.CARD_MOVED
+    assert evt.extra["old_list_id"] == "l1"
+    assert evt.extra["new_list_id"] == "l2"
+    assert evt.extra["old_list_name"] == "Todo"
+    assert evt.extra["new_list_name"] == "In progress"
+
+
+def test_card_update_without_list_change_is_card_updated() -> None:
+    payload = {
+        "user": _user(),
+        "beforeUpdate": {"name": "Old name"},
+        "item": {
+            "id": "c1", "name": "New name", "description": "", "boardId": "b1", "listId": "l1",
+        },
+    }
+    evt = parse_webhook("cardUpdate", payload, provider_name="planka", base_url=_BASE_URL)
+    assert evt is not None
+    assert evt.event_type is EventType.CARD_UPDATED
+
+
+def test_comment_created() -> None:
+    payload = {
+        "user": _user(),
+        "item": {
+            "id": "cm1",
+            "text": "LGTM, ship it.",
+            "cardId": "c1",
+            "userId": "u1",
+        },
+        "included": {
+            "card": {"id": "c1", "name": "Implement metrics", "boardId": "b1"},
+            "board": {"id": "b1", "name": "Roadmap"},
+        },
+    }
+    evt = parse_webhook(
+        "commentCreate", payload, provider_name="planka", base_url=_BASE_URL,
+    )
+    assert evt is not None
+    assert evt.event_type is EventType.CARD_COMMENTED
+    assert evt.collection_id == "b1"
+    assert evt.extra["comment_text"] == "LGTM, ship it."
+    assert evt.extra["card_id"] == "c1"
+    assert evt.extra["card_url"] == f"{_BASE_URL}/cards/c1"
+
+
+def test_task_completion_emits_only_on_transition() -> None:
+    """Task updates should only produce TASK_COMPLETED when the task flips
+    from incomplete to complete — toggling the description or other fields
+    on a task that was already complete must NOT spam notifications."""
+    completing = {
+        "user": _user(),
+        "beforeUpdate": {"isCompleted": False},
+        "item": {"id": "t1", "name": "Step 1", "isCompleted": True, "cardId": "c1"},
+        "included": {
+            "card": {"id": "c1", "name": "Implement metrics", "boardId": "b1"},
+            "board": {"id": "b1", "name": "Roadmap"},
+        },
+    }
+    evt = parse_webhook("taskUpdate", completing, provider_name="planka", base_url=_BASE_URL)
+    assert evt is not None
+    assert evt.event_type is EventType.TASK_COMPLETED
+
+    # Editing a task that was already completed -> no event.
+    re_edit = {
+        "user": _user(),
+        "beforeUpdate": {"isCompleted": True},
+        "item": {"id": "t1", "name": "Step 1 v2", "isCompleted": True, "cardId": "c1"},
+    }
+    assert parse_webhook("taskUpdate", re_edit, provider_name="planka", base_url=_BASE_URL) is None
+
+
+def test_unknown_event_returns_none() -> None:
+    assert parse_webhook("nonexistent", {"item": {}}, provider_name="planka", base_url="") is None