Commit Graph

121 Commits

Author SHA1 Message Date
alexei.dolgolyov faaaa39f8a chore: release v0.8.1
Release / test-backend (push) Failing after 1m36s
Release / release (push) Has been skipped
2026-05-16 12:28:07 +03:00
alexei.dolgolyov 8651767112 feat: bridge_self bot commands — status, thresholds, reset, health
Adds bot commands for the bridge_self provider so operators can inspect
and manage bridge health from chat: /status, /thresholds, /reset, /health.
Includes Jinja2 templates for both locales, seed data, capability slots,
and a handler that exposes pending deferred backlog plus per-counter
reset. Also adds .claude/skills/ for project-scoped graph-aware skills.
2026-05-16 03:43:48 +03:00
alexei.dolgolyov 10d30fc956 feat: production readiness — security, perf, bug fixes, bridge self-monitoring
Comprehensive multi-area pass driven by a parallel 8-agent production
review. Frontend, backend, database, security, performance, operational,
plus a new self-monitoring feature.

## Critical fixes
- Planka webhook: reads bounded raw body (was NameError on every call)
- HA quiet hours: ha_state_changed/automation_triggered/service_called/
  event_fired added to deferrable set (were silently dropped)
- DNS-rebinding SSRF: PinnedResolver wired into shared aiohttp session
- Telegram inbound webhook: secret now mandatory (401 without)
- Generic webhook: auth_mode="none" requires explicit
  acknowledge_unauthenticated=true; per-IP rate limit 60/min
- svelte-check: 5 null-narrowing errors in EventDetailModal fixed
- Provider hardcoding: Immich-only block extracted to descriptor
  featureDiscoveryHint
- command_sync: snapshot+expunge bot before exiting AsyncSession

## Bug fixes
- notifier asyncio.gather(return_exceptions=True) — one bad chat no longer
  cancels peer sends
- NotificationDispatcher hoisted out of per-tracker loop
- Provider credential resolution unified across all 5 dispatch sites
- HA asyncio.shield now drains inner task on cancellation
- Provider construction switched from if/elif ladder to factory registry
- NUT first poll seeds silently (no spurious ups_on_battery)
- Quiet-hours gate: event-type-disabled now wins over deferral
- APScheduler drain job ID resolution upgraded to seconds
- HA on_status_change wired through to EventLog
- Webhook payload rollback failures now logged (not swallowed)
- Batched receivers/chats/bots in load_link_data (was per-target N+1)
- flag_modified on JSON column reassignments in deferred_dispatch

## Database
- UNIQUE indexes on service_provider.webhook_token,
  telegram_bot.webhook_path_id, partial UNIQUE on telegram_bot.bot_id,
  telegram_chat(bot_id, chat_id), notification_tracker_target unique link,
  partial UNIQUE on bridge_self provider per user
- Composite ix_event_log_user_event_type_created index
- save_chat_from_webhook switched to ON CONFLICT DO UPDATE
- ondelete=CASCADE on user-id FKs (model annotation; app-side cascade
  delete added for existing data)
- delete_notification_tracker converted from N+1 to bulk DELETE/UPDATE
- Module-level asyncio.Lock replaced with lazy _get_lock() pattern
- VACUUM INTO snapshot now PRAGMA integrity_check verified

## Performance
- Jinja2 template compilation LRU cached (lru_cache maxsize=512)
- Per-locale render cache in NotificationDispatcher (skips re-rendering
  identical content for receivers sharing a locale)
- Tracker list cached per provider_id with 5s TTL + explicit invalidation
  on tracker CRUD (relieves HA chat-bus rate query pressure)
- Nav-counts collapsed from 16 round-trips to single UNION ALL
- HA event_log: skip persisting empty assets_added/removed events

## Security hardening
- Mass-assignment guard on Action create/update; cron sub-minute reject
- Backup JSON depth/node-count cap (depth ≤ 10, nodes ≤ 100k)
- _sanitize_config extended to all JSON-typed fields on backup import
- Telegram _safe_get walks redirects manually with SSRF revalidation
- Bcrypt 72-byte password length cap with clear 422
- Webhook payload body redaction; sensitive substring set extended with
  oauth/client_secret/webhook_secret/csrf in both header filter and
  template extras filter

## Frontend
- 76 catch (err: any) sites converted to errMsg(err) helper
- globalProviderFilter: pure getter; reconciliation moved to one-time
  $effect in +layout
- Provider-filter binding: removed paired $effects + _syncingFilter flag,
  now one-way derived
- entity-cache: separate _refreshing flag for background re-fetches
- api.ts 401 handling: AuthRedirectError class + dedup _redirecting flag,
  goto() instead of window.location.href
- a11y: aria-expanded on mobile More, role=switch + aria-checked on
  Telegram bot toggles

## Tests & operations
- CI pytest gate added to .gitea/workflows/build.yml + release.yml
  (wheel-built install to dodge editable-install slowness)
- /api/ready upgraded to deep healthcheck (db SELECT 1, scheduler.running,
  HA supervisor presence) returning {ready, checks, errors, version}
- /api/metrics endpoint with prometheus_client (deferred_pending,
  event_log_total, dispatch_duration, poll_failures, send_failures)
- New OPERATIONS.md covering deploy, healthchecks, metrics, backup/restore
  procedures, log handling, common scenarios, upgrade flow
- New tests: test_bridge_self (11), test_gitea_parser (9),
  test_planka_parser (6), test_immich_change_detector (6),
  test_backup_roundtrip (1)

## New feature: bridge self-monitoring
- New bridge_self provider type — internal sink for bridge health events
- Three event types: bridge_self_poll_failures (consecutive tracker poll
  failures), bridge_self_deferred_backlog (pending count crosses
  threshold), bridge_self_target_failures (consecutive 5xx/network
  failures per target)
- Per-user thresholds (defaults: 3 / 100 / 5) configurable via the
  provider config form
- Auto-seeded on user create + /setup + boot backfill for existing users
- Anti-spam: counters reset after emission; backlog uses transition latch
- Self-loop guard: bridge_self failures don't count toward target-failure
  thresholds (logged only) — wire to your own Telegram/Email/Matrix to
  get notified when polls/dispatches/sends fail
- 6 default templates (3 events × 2 locales), tracking config columns
  with backfill migration, frontend descriptor (excluded from "create
  provider" wizard since auto-managed)

Operator-visible behavior changes (call out in release notes):
- NOTIFY_BRIDGE_TELEGRAM_WEBHOOK_SECRET now REQUIRED for webhook mode
- Existing webhook providers with auth_mode="none" need explicit opt-in
- Generic webhook endpoint rate-limited 60/min per source IP
- HA disconnect/reconnect writes ha_status_* EventLog rows
- Every user gets a bridge_self provider — wire it to a target to
  receive failure alerts

Pre-existing test failures (test_ssrf, test_release_provider) on
Python 3.13 are unrelated; CI runs on 3.12.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 02:16:49 +03:00
alexei.dolgolyov 22127e2a59 feat: Home Assistant provider — WebSocket subscription + bot commands
Adds Home Assistant as a service provider with two coordinated surfaces:

Notifications (subscription):
- Long-lived WebSocket client (aiohttp ws_connect) with auth handshake,
  exponential-backoff reconnect, bounded event queue, and area-registry
  enrichment cached per (re)connect
- ServiceProvider ABC gains an optional `subscribe()` method for push-style
  providers; HomeAssistantServiceProvider uses it via a per-provider
  supervisor task started in the FastAPI lifespan
- 4 event types (state_changed, automation_triggered, call_service,
  event_fired), 4 default Jinja templates (en + ru), HA-specific
  tracker filters (entity_glob, domain_allowlist, exact entity ids)
- Extracted shared dispatch pipeline (api/webhooks.py → services/
  event_dispatch.py) so subscription and webhook ingest share the same
  event_log + deferred-dispatch + quiet-hours code path

Bot commands:
- /status, /entities [glob], /state <entity_id>, /areas
- Multi-command WS session so /status and /areas cost one handshake
- Sensitive-attribute blocklist (camera access_token, entity_picture, etc.)
  and 30-attribute cap to keep /state output safe and within Telegram's
  message size
- Error-message redaction strips URL userinfo before surfacing to chat

Frontend:
- HA descriptor with toggle ConfigField type (new) and tag-input filter
  mode for free-text glob/domain lists (new TagInput component)
- 15 command slots + 4 notification slots wired into the existing
  template-config UI
2026-05-13 14:31:56 +03:00
alexei.dolgolyov 90f958bdc6 fix(server): honor periodic_interval_days for Immich periodic summary
The APScheduler cron job fires daily at every entry in `periodic_times`,
and nothing in the dispatch path consulted `periodic_interval_days` or
`periodic_start_date` — so a configured 3-day interval still produced a
daily summary.

Gate the dispatch in `dispatch_scheduled_for_tracker` for kind=periodic
via `(today_in_app_tz - start_date).days % interval == 0`. Log a skip
with reason `interval_not_due` on non-firing days so operators can tell
suppressed-by-interval apart from other skip causes.
2026-05-13 12:14:18 +03:00
alexei.dolgolyov dec0839853 feat: on-watch stats scope selector (page vs all)
Adds an icon selector to the "On watch" provider deck letting users
choose between page-scoped stats (legacy) and full-corpus stats that
aggregate across every event matching the current filters. Backend
returns a new provider_event_counts map alongside the paginated events.
2026-05-12 14:12:59 +03:00
alexei.dolgolyov dfd7329177 chore: release v0.8.0
Release / release (push) Successful in 1m55s
2026-05-12 03:01:06 +03:00
alexei.dolgolyov ba199f24bd feat: deferred dispatch, release-check provider, settings polish
- Defer quiet-hours dispatches into new deferred_dispatch table; drain
  job + periodic catch-up scan re-fire at window end with coalescing on
  (link, event_type, collection_id).
- Add ON DELETE SET NULL migration on event_log_id and partial unique
  index on (link_id, collection_id, event_type) WHERE status='pending'.
- Add release-check provider abstraction (Gitea/GitHub) with SSRF-safe
  URL validation, settings UI cassette, and scheduled polling.
- Replace importlib-only version lookup with version.py helper that
  prefers the higher of installed metadata vs source pyproject so stale
  editable dev installs stop misreporting.
- Aurora frontend polish: MetaStrip component, ReleaseCassette,
  EventDetailModal expansion, and i18n additions.
2026-05-12 02:58:07 +03:00
alexei.dolgolyov 5d41a39406 chore: release v0.7.2
Release / release (push) Successful in 1m10s
2026-05-11 00:39:06 +03:00
alexei.dolgolyov bede928a3f feat(server): add /status command handler for webhook providers
The generic-webhook provider has no upstream API, so /status reports
DB-derived stats: active/total trackers, provider name, and last event
timestamp (formatted via the shared get_last_event_str helper).

Includes pytest coverage for handler registration, populated stats with
a recent event, the empty-state dash sentinel, and unknown-command
fall-through. Template variable docs in command_template_configs.py
extended with the new trackers_active/trackers_total keys.
2026-05-10 23:51:25 +03:00
alexei.dolgolyov 757271dadf chore: release v0.7.1
Release / release (push) Successful in 2m11s
2026-05-07 23:33:09 +03:00
alexei.dolgolyov 35a3008896 feat: log bot command invocations to the event stream
Bot commands were the only user-initiated path that didn't surface in
the dashboard. They now produce ``command_handled`` /
``command_rate_limited`` / ``command_failed`` rows in ``EventLog``
alongside tracker and action events.

Backend
- ``EventLog`` gains nullable ``command_tracker_id`` / ``telegram_bot_id``
  FKs plus deletion-snapshot name columns (idempotent migration).
- New ``_log_command_event`` helper emits one row per invocation at the
  three branches in ``handle_command``. Logging failures are swallowed
  so they cannot block the user-visible reply.
- Telegram ``from`` is captured in poller and webhook, whitelisted to
  identity fields by ``_normalize_issuer`` (drops ``language_code`` and
  any future PII), persisted under ``details.issuer``.
- ``/api/status`` resolves live ``CommandTracker`` / ``TelegramBot``
  names (mirroring the action pattern) and exposes ``tracker_id``,
  ``command_tracker_id``, ``telegram_bot_id`` so the frontend can
  deep-link.

Frontend
- Event rows are now clickable and open a detail modal with full
  provenance (bot → chat → issuer → provider), raw ``details`` JSON,
  and per-entity action buttons.
- Buttons use the existing ``requestHighlight`` + ``goto`` crosslink
  pattern, so clicking lands on the entity's list page with that
  specific card scrolled into view and pulsing.
- Auto-refresh dropdown (Off / 10s / 30s / 1m / 5m) persisted in
  ``localStorage``; ticker pauses while the tab is hidden.
- Event-type filter, dashboard verb labels, and gradients extended for
  the three new ``command_*`` types.
- Filled in pre-existing missing i18n keys (``common.hide`` /
  ``common.show`` / ``commandConfig.noCommandsForProvider``).

Tests
- New ``test_command_event_logging.py`` covers subject formatting,
  issuer normalization, the three event branches, and graceful failure
  when the DB is unreachable. ``pytest packages/server/tests/`` → 96/96.
2026-05-07 22:22:41 +03:00
alexei.dolgolyov 632e4c1aa3 chore: release v0.7.0
Release / release (push) Successful in 1m19s
2026-05-07 14:03:48 +03:00
alexei.dolgolyov 0eb899afb9 feat: harden notification stack and switch logging selectors to icon grid
Notifications:
- Add shared http_base, redact, and SSRF hardening modules
- Refactor dispatcher, queue, receiver and per-provider clients
  (telegram, discord, email, matrix, ntfy, slack, webhook) to use
  the shared base, with bounded queue and redacted error logs
- Tests for ssrf, redact, http_base, queue bounds, dispatcher
  aggregation, telegram media partition, email and matrix clients

Frontend:
- Settings: log level / log format selectors now use IconGridSelect
  with per-option icons and i18n descriptions
- Minor providers page and entity-cache store updates

Tooling:
- Document code-review-graph MCP usage in CLAUDE.md
- Ignore .code-review-graph/, register .mcp.json
2026-05-07 13:53:26 +03:00
alexei.dolgolyov 349e9136a4 chore: release v0.6.5
Release / release (push) Successful in 1m36s
2026-04-28 19:10:49 +03:00
alexei.dolgolyov aa9548d884 chore: release v0.6.4
Release / release (push) Successful in 1m41s
2026-04-27 18:27:09 +03:00
alexei.dolgolyov 72dd611f8c fix(telegram): respect chat_action UI choice, drop phantom indicator
chat_action was stored in two places — the model column and config JSON —
and dispatch_helpers unconditionally overrode the config value with the
column. The frontend only ever wrote the JSON path, so the UI choice
silently had no effect on outgoing chat actions.

Make the column the single source of truth: frontend sends chat_action
top-level, dispatch_helpers reads from the column, and a one-time
backfill migrates existing config values to the column and strips the
legacy key.

Also fix a long-standing race where the keepalive's bare sleep(4) +
finally cancel could fire one last sendChatAction after the response
already arrived, leaving a phantom indicator for ~5s. Replace with a
stop event + wait_for so callers can signal stop cleanly via the new
stop_keepalive helper.
2026-04-27 18:20:50 +03:00
alexei.dolgolyov 0e675c4b38 chore: release v0.6.3
Release / release (push) Successful in 1m15s
2026-04-27 15:42:04 +03:00
alexei.dolgolyov 42af7a6551 feat(trackers): user filters for Gitea, webhook polling cleanup, dashboard navigability
- Gitea: NotificationTracker now exposes sender allowlist / blocklist filters
  via MultiEntitySelect, populated from Gitea /users/search merged with past
  EventLog senders so the picker is useful before the first webhook arrives.
- Webhook providers (gitea, planka, webhook): stop scheduling interval polling
  jobs on tracker create/update/startup; hide the "every Xs" indicator in the
  tracker list since there is no polling.
- Dashboard: stat cards are now <a> links that route to providers, trackers,
  targets, command-trackers, or scroll to the events panel. Provider deck
  rows highlight the target provider on click.
- Command trackers / command configs: auto-reselect the right config when the
  provider type changes (matches notification-tracker behavior).
- Migration: drop legacy batch_duration column from notification_tracker —
  the field is gone from the model but its NOT NULL constraint blocked
  inserts on older DBs.
- Docs: refresh entity-relationships.md with current NotificationTracker
  fields (filters, adaptive_max_skip, default_*_config_id).
2026-04-27 15:24:44 +03:00
alexei.dolgolyov c43dc598a1 chore: release v0.6.2
Release / release (push) Successful in 1m39s
2026-04-27 14:29:44 +03:00
alexei.dolgolyov b320090a56 chore: release v0.6.1
Release / release (push) Successful in 3m17s
2026-04-25 15:25:23 +03:00
alexei.dolgolyov 9eb478fdc9 chore: release v0.6.0
Release / release (push) Successful in 1m25s
2026-04-25 14:54:16 +03:00
alexei.dolgolyov ef942b77cc feat(telegram): per-chat command localization + unified locale resolver
Two related Telegram changes:

1. Per-chat command localization. setMyCommands now accepts a scope
   (BotCommandScopeChat) and deleteMyCommands clears scoped bindings.
   Command registration runs three tiers: default → per-language
   (Telegram client language) → per-chat (UI override). Saving a
   chat's language_override or commands_enabled toggle pushes the
   binding to Telegram inline rather than waiting on the 30s
   debounced bot-wide sync.

2. Unified Telegram locale resolution. Three test paths (bot test_chat,
   target receiver test, target-level fan-out) used to disagree on
   locale priority — the target receiver test in particular only
   consulted receiver.locale and ignored the chat's language_override.
   Introduced pick_telegram_locale (pure) and
   resolve_telegram_chat_locale (async DB lookup) in services/notifier
   so all three paths share one priority order:

       receiver.locale → chat.language_override → chat.language_code → fallback

   Fan-out keeps batch-loading TelegramChat rows for efficiency, just
   runs them through the same priority function now.
2026-04-25 14:41:28 +03:00
alexei.dolgolyov 770c198ac3 chore: release v0.5.2
Release / release (push) Successful in 1m48s
2026-04-24 21:58:40 +03:00
alexei.dolgolyov ab621b6abc feat: wire tracking-config display filters + per-tracker adaptive polling
Display filters (Immich tracking config):
- favorites_only drops events with no favorited new assets, or filters
  added_assets to favorites only
- assets_order_by/assets_order sort the rendered list
  (date / name / rating / random / none)
- max_assets_to_show caps rendered+attached media (default 5 -> 10)
- include_tags strips people from event extras and tags from each asset
- include_asset_details strips city/country/state/lat/lon/is_favorite/
  rating/description; load-bearing fields (thumbhash, file_size,
  playback_size, cache keys) preserved
- New apply_tracking_display_filters helper in dispatch_helpers; wired
  into watcher, webhooks, scheduled/periodic/memory, and manual
  test-dispatch
- Targets sharing a TrackingConfig dispatch together; targets with
  different TCs each see their own shaped event

Adaptive polling:
- Replace NotificationTracker.batch_duration with adaptive_max_skip
- Per-tracker opt-in: NULL/0 disables back-off (every tick runs);
  positive N caps the skip factor at (N-1)-in-N after long idle
- Scheduler caches the cap in module state for the tick fast-path
- Migration adds the new column; API schemas/responses, frontend types,
  i18n, and the tracker form updated to match
2026-04-24 21:12:10 +03:00
alexei.dolgolyov 187b889c45 chore: release v0.5.1
Release / release (push) Successful in 1m49s
2026-04-24 19:21:39 +03:00
alexei.dolgolyov b61394f057 feat(immich): per-album scheduled/memory dispatch + template tooling
Dispatch: honor {kind}_collection_mode on TrackingConfig — "per_collection"
fans out one event per album; "combined" pools assets as before. Extract
build_immich_dispatch_events shared by cron and test paths.

Assets: collect_scheduled_assets attaches album_name/album_url/album_public_url
to each asset so combined-mode templates can attribute rows to their source
album. Default scheduled_assets templates render a multi-album header with
inline album list and per-row album link; memory_mode follows the same pattern.

UI: "Reset to default" buttons on notification and command template slots
(per-slot and whole-template), backed by new GET /*-template-configs/defaults
endpoints. tracking-configs "Preview template" now opens an inline preview
modal with locale tabs instead of navigating away; Edit button deep-links
with ?edit_slot=<name> so the destination auto-opens the config and scrolls
to the slot. Reset confirmations use ConfirmModal instead of window.confirm.

Fixes:
* NotificationDispatcher._session_ctx infinite recursion when no shared
  aiohttp.ClientSession was passed — broke test dispatch for periodic/
  scheduled/memory (cron path was unaffected).
* telegram-bots /chats/{id}/test now resolves chat.language_override /
  language_code instead of using the raw ?locale query param, matching
  the resolution the tracker-target test endpoint already used.
* scheduled_assets default template no longer emits a blank line between
  header and the first asset when the multi-album branch is taken.
2026-04-24 19:15:54 +03:00
alexei.dolgolyov be15463fd2 feat(telegram): add 'none' listener mode for bots
Introduce a third update_mode option alongside polling/webhook. 'none'
disables both polling and webhook delivery — useful when another instance
owns the listener or when the bot is send-only. Switching into 'none' now
unschedules polling and unregisters any active webhook so Telegram stops
delivering updates.

New bots default to 'none' (safer when multiple bridges share a token).
Existing bots upgraded from a pre-update_mode schema keep 'polling' so
their behavior is unchanged.
2026-04-24 15:15:25 +03:00
alexei.dolgolyov 461fb495d7 chore: release v0.5.0
Release / release (push) Successful in 1m10s
2026-04-24 14:16:34 +03:00
alexei.dolgolyov 309dec2b44 feat(immich): wire cron-fired scheduled/periodic/memory dispatch
The scheduled_enabled / scheduled_times (and the periodic / memory
counterparts) on TrackingConfig had been wired into the model, the
API, and the test-dispatch path — but no production scheduler ever
read them, so users saw the slot in the UI and only ever got fires
through "Test". This adds the missing cron jobs and the dispatch
fan-out, both keyed off the app-level IANA timezone.

* services/scheduled_dispatch.py — production fan-out reusing the
  test-path event builders, picking the slot template per kind, and
  writing an EventLog row per fire so the dashboard reflects it.
* services/scheduler.py — _load_immich_dispatch_jobs builds one
  CronTrigger per (tracker, kind, HH:MM) from the tracker's default
  TrackingConfig; reschedule_immich_dispatch_jobs rebuilds them all
  on any relevant CRUD or timezone change.
* tracker / link / tracking-config CRUD endpoints now invalidate.

Also: skip dispatch when scheduled/memory yield zero matching assets
(prevents header-only "On this day:" spam), and update the EN/RU
default scheduled_assets templates to surface that the delivery is
a scheduled random selection.
2026-04-24 12:49:47 +03:00
alexei.dolgolyov 8f0346ea03 fix(csp): allow unsafe-inline scripts for SvelteKit hydration bootstrap
The static-adapter build emits an inline <script> with the hydration
payload; ``script-src 'self'`` alone blocks the SPA from starting
(browser error: "Executing inline script violates the following Content
Security Policy directive").

Mirrors the 'unsafe-inline' already present for style-src. Primary XSS
protection still comes from Svelte's auto-escaping and
frontend/src/lib/sanitize.ts for the {@html} paths that render user
content. CSP still blocks remote scripts (no https: in script-src),
framing (frame-ancestors 'none'), base-uri hijacking, and form
exfiltration.
2026-04-23 21:08:17 +03:00
alexei.dolgolyov e44d387c7f chore: release v0.4.0
Build and Test / test-backend (push) Failing after 1m6s
Release / test (push) Failing after 1m7s
Release / release (push) Has been skipped
Build and Test / test-frontend (push) Successful in 9m54s
Build and Test / build-image (push) Has been skipped
2026-04-23 20:18:34 +03:00
alexei.dolgolyov 7cbb02b1ef feat(db): pre-migration SQLite snapshots via VACUUM INTO
Build and Test / test-backend (push) Successful in 2m38s
Build and Test / test-frontend (push) Successful in 9m44s
Build and Test / build-image (push) Failing after 17m9s
Take a consistent, atomic copy of the DB at lifespan startup BEFORE
migrations run, so a botched future upgrade is recoverable by restoring
a single file instead of a data-loss incident.

Uses SQLite's VACUUM INTO — safe under WAL, cannot tear against
concurrent writes. Best-effort: failures are logged, never raised —
the main DB remains the source of truth.

Configurable via NOTIFY_BRIDGE_PRE_MIGRATE_SNAPSHOT_KEEP (default 5;
0 disables). Snapshots land in ``data_dir/backups/pre-migrate-<ts>.db``
and the N oldest are pruned each boot.
2026-04-23 19:53:15 +03:00
alexei.dolgolyov 920920bc67 feat: production-readiness hardening across security, async, DB, ops
Build and Test / test-frontend (push) Successful in 9m37s
Build and Test / test-backend (push) Successful in 10m53s
Build and Test / build-image (push) Failing after 14m52s
Security
- SSRF: async DNS resolver; allow_redirects=False on all outbound clients;
  matrix homeserver_url validated on create/update/test; update_provider
  and email_bot merge incoming config and reject ***-masked secrets.
- Auth: bcrypt offloaded to asyncio.to_thread; JWT now carries iss/aud +
  leeway and rejects missing claims; setup TOCTOU closed inside a
  transaction; rate limits extended (default 600/min, 10/min on password
  change, 30/min on needs-setup); constant-time login to prevent username
  enumeration.
- Config: rejects known dev secret keys; validates CORS origin schemes,
  port range, token lifetimes.
- Webhook handlers stream-read body with a 1 MiB cap; Discord 429 retries
  bounded (3 attempts, Retry-After capped at 60 s).
- CSP + HSTS added to SecurityHeadersMiddleware.

Async / runtime
- SQLite engine: WAL, synchronous=NORMAL, foreign_keys=ON, busy_timeout,
  pool_pre_ping, dispose on shutdown.
- Lifespan shutdown now stops scheduler before closing HTTP session and
  disposing the engine.
- Shared aiohttp session locked against concurrent first-caller races;
  core NotificationDispatcher accepts and reuses it.
- Storage and scheduled backup writes wrapped in asyncio.to_thread.
- NUT client writes bounded by asyncio.wait_for.
- Telegram poller switched from 3 s short-poll to 30 s interval + 25 s
  long-poll (~10x fewer API calls).

Database
- New performance-indexes migration covers every FK/owner column and
  hot-path composite (notification_tracker(provider_id, enabled);
  event_log(user_id, created_at DESC); webhook_payload_log(provider_id,
  created_at DESC); action_execution(action_id, started_at DESC)).
- New schema_version table for future upgrade gating.
- __system__ placeholder user (id=0) seeded so user_id=0 system defaults
  satisfy the newly enforced FK; filtered out of /auth/needs-setup,
  /api/users, and setup.
- list_notification_trackers rewritten to batched loads (was 1+N+N*M).
- Retention job extended to event_log, webhook_payload_log, and
  action_execution; retention days exposed as a setting.

Scheduler
- AsyncIOScheduler job_defaults: coalesce, misfire_grace_time=300,
  max_instances=1.

Ops
- uvicorn runs with proxy_headers, forwarded_allow_ips,
  timeout_graceful_shutdown; access log suppressed in non-debug.
- FastAPI version string now reads from importlib.metadata.
- New /api/ready endpoint separate from /api/health.
- docker-compose drops the ALLOW_PRIVATE_URLS=1 default, adds mem/cpu/pid
  limits, read_only + tmpfs, cap_drop:ALL, no-new-privileges; healthcheck
  targets /api/ready.
- CI now runs on push/PR with backend pytest, frontend svelte-check +
  build, and a non-push image build; release workflow gated on tests,
  publishes immutable sha-<commit> image tag, adds Trivy scan.

Tests
- New packages/server/tests/ with 29 passing tests: config validation,
  JWT round-trip + aud/alg=none rejection, SSRF scheme and private-range
  enforcement (sync + async), Discord bounded retry, and a lifespan-level
  /api/health + /api/ready smoke check.
- Renamed the misnamed services/test_dispatch.py to manual_dispatch.py so
  pytest never auto-collects production code.

Frontend
- /login now redirects already-authenticated users to /, shows a distinct
  'backend unreachable' banner (en/ru) when /auth/needs-setup fails.
2026-04-23 19:44:56 +03:00
alexei.dolgolyov f50d465c0e feat(logging): production-grade logging with context vars, secret masking, and runtime level control
Boot-time logging was a three-line basicConfig stub with no timestamps, no
correlation, and silent drops at every layer of the Telegram send path — a
/random command that delivered text but no media left zero evidence in the
log. This replaces the setup and closes every silent drop encountered end-to-end.

New infrastructure:
- notify_bridge_core.log_context: request_id/command/chat_id/bot_id/dispatch_id
  ContextVars with a bind_log_context() context manager so deep call sites
  (TelegramClient, NotificationDispatcher) inherit the correlation tag without
  threading args through.
- notify_bridge_server.logging_setup: dictConfig-based setup with a
  LogRecordFactory that tags every record, a SecretMaskingFilter that redacts
  /botN:TOKEN plus Authorization/x-api-key/password/secret in messages AND
  tracebacks, a JSON formatter for aggregators, text formatter with grep-friendly
  [req=... cmd=... bot=... chat=... disp=...] prefix, and default dampening
  for sqlalchemy/aiohttp/apscheduler/urllib3/PIL.

Runtime control:
- NOTIFY_BRIDGE_LOG_LEVEL / _FORMAT / _LEVELS env vars (boot).
- DB-backed log_level / log_format / log_levels AppSettings, applied on
  boot after migrations and live via apply_log_levels() when edited in
  the settings UI (format still requires restart, logs a WARN).
- Frontend settings page gains a Logging card (level dropdown, format
  dropdown, per-module overrides); en/ru i18n keys added.

Call-site fixes (/random media-group blind spot and adjacent):
- TelegramClient._fetch_asset: every silent drop now WARN-logs with reason
  (missing url, HTTP non-200, size/dimension limits, ClientError).
- TelegramClient._send_media_group: WARN on "chunk had N items but 0 usable",
  ERROR on sendMediaGroup non-ok/transport with full context; returns
  success=False + "no_items_delivered" instead of success=True with an empty
  message_ids list so callers can distinguish.
- TelegramClient.send_message / _upload_media / _send_from_cache: ERROR on
  non-ok + transport failures with status/code/desc; DEBUG for cache-hit
  fallbacks.
- NotificationDispatcher.dispatch: generates a dispatch_id, binds it, logs
  start/finish with failure count, uses exc_info for target failures.
- commands/handler: missing/failed templates -> ERROR + exc_info; send_reply
  and send_media_group errors upgraded WARNING -> ERROR with chat/error_code
  context; rate-limit and truncation cases logged with full context.
- commands/webhook and services/telegram_poller: bind_log_context(request_id
  =tg:<update_id>, command, chat_id, bot_id), INFO on receive/dispatch/
  completion with duration, exc_info on raise, INFO when commands disabled.
- commands/immich: INFO when album scope is empty; WARN per asset dropped
  from media payload and a summary WARN when "N assets in, 0 out".
2026-04-23 14:41:26 +03:00
alexei.dolgolyov 1f880daa0c chore: release v0.3.2
Release / release (push) Successful in 1m22s
2026-04-23 13:38:28 +03:00
alexei.dolgolyov 1024085cdd fix(scheduler): honor app timezone for cron triggers and log scheduled events
CronTrigger.from_crontab was constructed without a timezone, so a cron like
'0 9 * * *' fired at 09:00 host-local instead of 09:00 in the admin-configured
timezone. Now all tracker/action cron triggers are built with the app tz, and
the setting endpoint rebuilds existing cron jobs when the tz changes (since
CronTrigger freezes its tz at construction time).

The scheduler provider also renders current_date/time/datetime/weekday in the
configured tz and exposes a new 'timezone' template variable.

EventLog entries for scheduled_message now include schedule_type,
cron_expression/interval_seconds, timezone, and fire_count, and the dashboard
shows the event type with a label/icon/color.
2026-04-23 13:35:49 +03:00
alexei.dolgolyov 5604c733d1 chore: release v0.3.1
Release / release (push) Successful in 1m1s
2026-04-22 19:27:45 +03:00
alexei.dolgolyov 3b7808aa9c perf(immich): TTL cache for album bodies and shared-link listings
Bot commands like /random, /latest, /memory refetch the same albums in
quick succession; the GET /api/albums/{id} response can be tens of MB on
large albums, and /api/shared-links has no per-album filter so every
get_shared_links call was already paying for the full server-wide list.

- Module-level 60s TTL cache for album bodies, keyed by
  (server_digest, album_id), 32-entry FIFO cap.  Module-scoped (not
  instance-scoped) because ImmichClient is constructed fresh per request
  in several places, so an instance cache would never survive a second
  caller.  Mirrors the existing _users_cache pattern.
- Module-level 60s TTL cache for the bucketed shared-links map, keyed by
  server_digest.  get_shared_links(album_id) now delegates to a single
  server-wide fetch that serves every album.
- server_digest hashes url+api_key so raw creds don't sit in dict keys.
- get_album(use_cache=False) escape hatch for paths that must observe
  current server state — wired into ImmichActionExecutor.execute (diffs
  the album to decide what to add) and ImmichServiceProvider.poll's
  full-fetch path (stale data would silently delay removal events).
- Async locks guard cache writes with under-lock re-check so concurrent
  misses collapse to one fetch.
2026-04-22 19:27:09 +03:00
alexei.dolgolyov 155d25edf9 chore: release v0.3.0
Release / release (push) Successful in 1m19s
2026-04-22 18:59:15 +03:00
alexei.dolgolyov 69711bbc84 feat(commands): keep chat-action hint alive during slow command fetches
Slow bot commands (/latest, /random, /favorites, /memory, /search,
/find, /person, /place, /summary) spend most of their wall time
fetching assets from the service provider, not uploading to Telegram.
Telegram chat actions expire after ~5s, so the previous one-shot hint
vanished long before media arrived — users saw nothing happening.

- TelegramClient.start_chat_action_keepalive: promoted from private
  helper to public API, posts the action every 4s until cancelled.
- telegram_send.telegram_chat_action: async context manager that
  starts the keep-alive task on enter and cancels + awaits it on
  exit. A None action makes it a no-op so callers don't branch.
- classify_command_chat_action: maps command name to the right
  Telegram action (upload_photo for media-returning commands, typing
  for /summary, None for fast DB-only commands like /status /events).
- webhook.py + telegram_poller.py: wrap handle_command in the context
  manager so the hint persists through the whole fetch+upload window
  in both webhook and long-poll modes.
2026-04-22 18:56:18 +03:00
alexei.dolgolyov fe38d20b96 perf(immich): skip full album fetch on idle ticks; delta-fetch for active ones
Optimizes polling for large Immich albums (tested path targets ~200k
assets). Combined impact on idle albums drops per-tick cost from ~150 MB
fetch to ~few hundred bytes; active albums fetch O(changes) instead of
O(library).

Core changes
- ImmichAlbumMeta + get_album_meta() using ?withoutAssets=true as a
  cheap change-detection probe.
- poll() fast-path: skip full fetch when meta fingerprint matches and
  no pending assets are outstanding.
- poll() delta-path: search/metadata with updatedAfter when fingerprint
  changed, falling back to full fetch on count decrease or mixed
  add+remove that delta can't reconcile.
- asyncio.gather over meta probes so a 20-album tracker pays one
  round-trip of latency instead of 20.
- Event payload cap (50 added / 200 removed) so a bulk import can't
  explode a Jinja template or exceed Telegram's message limits.
- Module-level users cache (1h TTL, sha256-keyed) shared across
  providers on the same Immich server.
- Tick-scoped shared-links cache via new
  get_all_shared_links_by_album() — one /api/shared-links request per
  tick instead of one per changed album.

Server changes
- meta_fingerprint JSON column on NotificationTrackerState + migration.
- watcher skips the asset_ids DB rewrite when the fingerprint didn't
  change, avoiding ~8 MB JSON writes on idle ticks for huge albums.
- Adaptive polling: after 10 empty ticks skip 1-in-2, after 30 skip
  1-in-4, reset on first detected change; resets on schedule changes.
- APScheduler jitter (interval/4, capped at 30s) to smooth thundering-
  herd bursts when many trackers share the same scan_interval.
2026-04-22 18:55:26 +03:00
alexei.dolgolyov d02616069d chore: release v0.2.8
Release / release (push) Successful in 1m58s
2026-04-22 18:02:09 +03:00
alexei.dolgolyov 7dae68fd93 fix(commands): match notification cache-key format so writes share one namespace
common._format_assets was passing cache_key=<bare asset UUID>, but the
notification dispatcher writes keys as <host>:<uuid> (derived from the
URL by extract_asset_id_from_url). Result: the two paths populated
different keys for the same asset, so neither could hit the other's
cached file_id and the WebUI stats only ever reflected the notification
side.

Drop the explicit cache_key — TelegramClient derives <host>:<uuid> from
the URL, identical to the notification path, so one file_id cached by
any dispatch or /random / /latest reply is reused by every later send.
2026-04-22 17:00:07 +03:00
alexei.dolgolyov e6481605ca chore: release v0.2.7
Release / release (push) Successful in 4m23s
2026-04-22 16:47:25 +03:00
alexei.dolgolyov 6de9a1289e fix(telegram): unify send routine across notifications and commands
- Route cache_key values that look like asset UUIDs through asset_cache
  in TelegramClient._get_cache_and_key. Single-asset sends previously
  stored file_ids in url_cache while the media-group path stored them
  in asset_cache, so repeat sends never hit.
- Extract build_asset_media_urls so the notification dispatcher
  (asset_to_media) and the bot command handlers (common._format_assets)
  share one rule for /video/playback vs thumbnail URLs.
- Add services/telegram_send.py as the single factory for constructing
  a TelegramClient. It always wires the shared aiohttp session and both
  file caches, so commands now reuse file_ids populated by notification
  dispatches (and vice versa) instead of re-uploading the same bytes.
- send_reply / send_media_group in commands/handler.py now delegate to
  the factory rather than constructing their own uncached clients.
2026-04-22 16:45:31 +03:00
alexei.dolgolyov 325eabd751 chore: release v0.2.6
Release / release (push) Successful in 1m19s
2026-04-22 16:29:24 +03:00
alexei.dolgolyov fab6169cf9 fix(commands): enrich search assets, surface variables for all command slots
- UI: command-template-configs now resolves slot variables against the
  active provider first (varsRef[provider_type][slot]) before falling back
  to shared entries, so provider-specific slots like /search, /status,
  /repos, /issues, /boards show the Variables button and autocomplete.
- Backend: /search, /find, /person, /place now normalize raw Immich API
  responses through build_asset_dict, extracting city/country from
  exifInfo and mapping isFavorite -> is_favorite so templates render
  location and favorite indicators.
- Telegram: extract build_telegram_asset_entry into a shared helper so
  the notification dispatcher and command media groups agree on video
  typing and /video/playback URLs; videos no longer render as still
  thumbnails in /latest /random /favorites media mode.
- Commands: send_media_group now reuses the same Telegram file_id caches
  as the notification dispatcher, avoiding re-upload churn for repeated
  commands.
2026-04-22 16:28:26 +03:00
alexei.dolgolyov 85311684d9 fix(settings): don't clobber webhook secret with its mask on save
GET /settings returns the Telegram webhook secret masked as "***<last4>".
The frontend binds that masked value into its state, and any Save ships it
back — the PUT handler then persisted the mask as the new secret, silently
invalidating HMAC for every webhook-mode bot. The next GET re-masks the
mask to itself, so the UI showed no corruption.

Treat incoming values that begin with "***" as "unchanged" for the
webhook-secret field. Empty strings still pass through (explicit clear).
2026-04-22 16:10:34 +03:00
alexei.dolgolyov d7daadadc2 chore: release v0.2.5
Release / release (push) Successful in 1m9s
2026-04-22 15:56:20 +03:00