Commit Graph

8 Commits

Author SHA1 Message Date
alexei.dolgolyov 2d59a5b994 fix: production-readiness hardening from full-codebase review
Apply six isolated, low-risk fixes surfaced by the parallel
production-readiness review (backend, frontend, security, perf,
UI/UX, bugs+features).

Backend
- Mask access_token in provider GET responses and drop it on edit
  when carrying the *** placeholder — fixes plaintext leak of HA
  long-lived tokens (security H-1). Centralized via
  PROVIDER_SECRET_FIELDS so all call sites stay in sync (C-5).
- Hold HA status-change tasks in a module-level set with a
  done_callback — asyncio.create_task only keeps weak refs and
  the task could be GC'd before its row was written (C-1).
- Roll back the request session in the Telegram-webhook catch-all
  so a handler exception cannot leak uncommitted writes into the
  next request (C-2).
- Bail before reading the 1 MiB webhook body when the Gitea
  provider has no secret configured or the request has no
  signature header. For the generic webhook with bearer_token
  auth, verify the Authorization header before the body read.
  Closes the pre-auth resource-exhaustion amplifier (C-3).

Frontend
- Add supportsAutoOrganize capability to ProviderDescriptor and
  consume it from RuleEditor instead of `provider.type !== 'immich'`,
  bringing the last action-rule editor under CLAUDE.md rule 8
  (no provider-type hardcoding in components).
- Snackbar: add role="region" + per-toast role/aria-live/aria-atomic
  so screen readers announce success/error toasts.
- Sidebar nav: add aria-current="page" on the active link so the
  active state has an accessible name.
- New snackbar.region key in en + ru (locale parity preserved).

Out of scope for this commit (tracked in .claude/reviews/README.md
ship-blocker list): secret encryption at rest, JWT cookie move,
Alembic adoption, webhook idempotency, deferred-dispatch crash
window, persisted Telegram update watermark, bridge_self counter
lock — each needs more than a mechanical edit.
2026-05-22 22:47:20 +03:00
alexei.dolgolyov 10d30fc956 feat: production readiness — security, perf, bug fixes, bridge self-monitoring
Comprehensive multi-area pass driven by a parallel 8-agent production
review. Frontend, backend, database, security, performance, operational,
plus a new self-monitoring feature.

## Critical fixes
- Planka webhook: reads bounded raw body (was NameError on every call)
- HA quiet hours: ha_state_changed/automation_triggered/service_called/
  event_fired added to deferrable set (were silently dropped)
- DNS-rebinding SSRF: PinnedResolver wired into shared aiohttp session
- Telegram inbound webhook: secret now mandatory (401 without)
- Generic webhook: auth_mode="none" requires explicit
  acknowledge_unauthenticated=true; per-IP rate limit 60/min
- svelte-check: 5 null-narrowing errors in EventDetailModal fixed
- Provider hardcoding: Immich-only block extracted to descriptor
  featureDiscoveryHint
- command_sync: snapshot+expunge bot before exiting AsyncSession

## Bug fixes
- notifier asyncio.gather(return_exceptions=True) — one bad chat no longer
  cancels peer sends
- NotificationDispatcher hoisted out of per-tracker loop
- Provider credential resolution unified across all 5 dispatch sites
- HA asyncio.shield now drains inner task on cancellation
- Provider construction switched from if/elif ladder to factory registry
- NUT first poll seeds silently (no spurious ups_on_battery)
- Quiet-hours gate: event-type-disabled now wins over deferral
- APScheduler drain job ID resolution upgraded to seconds
- HA on_status_change wired through to EventLog
- Webhook payload rollback failures now logged (not swallowed)
- Batched receivers/chats/bots in load_link_data (was per-target N+1)
- flag_modified on JSON column reassignments in deferred_dispatch

## Database
- UNIQUE indexes on service_provider.webhook_token,
  telegram_bot.webhook_path_id, partial UNIQUE on telegram_bot.bot_id,
  telegram_chat(bot_id, chat_id), notification_tracker_target unique link,
  partial UNIQUE on bridge_self provider per user
- Composite ix_event_log_user_event_type_created index
- save_chat_from_webhook switched to ON CONFLICT DO UPDATE
- ondelete=CASCADE on user-id FKs (model annotation; app-side cascade
  delete added for existing data)
- delete_notification_tracker converted from N+1 to bulk DELETE/UPDATE
- Module-level asyncio.Lock replaced with lazy _get_lock() pattern
- VACUUM INTO snapshot now PRAGMA integrity_check verified

## Performance
- Jinja2 template compilation LRU cached (lru_cache maxsize=512)
- Per-locale render cache in NotificationDispatcher (skips re-rendering
  identical content for receivers sharing a locale)
- Tracker list cached per provider_id with 5s TTL + explicit invalidation
  on tracker CRUD (relieves HA chat-bus rate query pressure)
- Nav-counts collapsed from 16 round-trips to single UNION ALL
- HA event_log: skip persisting empty assets_added/removed events

## Security hardening
- Mass-assignment guard on Action create/update; cron sub-minute reject
- Backup JSON depth/node-count cap (depth ≤ 10, nodes ≤ 100k)
- _sanitize_config extended to all JSON-typed fields on backup import
- Telegram _safe_get walks redirects manually with SSRF revalidation
- Bcrypt 72-byte password length cap with clear 422
- Webhook payload body redaction; sensitive substring set extended with
  oauth/client_secret/webhook_secret/csrf in both header filter and
  template extras filter

## Frontend
- 76 catch (err: any) sites converted to errMsg(err) helper
- globalProviderFilter: pure getter; reconciliation moved to one-time
  $effect in +layout
- Provider-filter binding: removed paired $effects + _syncingFilter flag,
  now one-way derived
- entity-cache: separate _refreshing flag for background re-fetches
- api.ts 401 handling: AuthRedirectError class + dedup _redirecting flag,
  goto() instead of window.location.href
- a11y: aria-expanded on mobile More, role=switch + aria-checked on
  Telegram bot toggles

## Tests & operations
- CI pytest gate added to .gitea/workflows/build.yml + release.yml
  (wheel-built install to dodge editable-install slowness)
- /api/ready upgraded to deep healthcheck (db SELECT 1, scheduler.running,
  HA supervisor presence) returning {ready, checks, errors, version}
- /api/metrics endpoint with prometheus_client (deferred_pending,
  event_log_total, dispatch_duration, poll_failures, send_failures)
- New OPERATIONS.md covering deploy, healthchecks, metrics, backup/restore
  procedures, log handling, common scenarios, upgrade flow
- New tests: test_bridge_self (11), test_gitea_parser (9),
  test_planka_parser (6), test_immich_change_detector (6),
  test_backup_roundtrip (1)

## New feature: bridge self-monitoring
- New bridge_self provider type — internal sink for bridge health events
- Three event types: bridge_self_poll_failures (consecutive tracker poll
  failures), bridge_self_deferred_backlog (pending count crosses
  threshold), bridge_self_target_failures (consecutive 5xx/network
  failures per target)
- Per-user thresholds (defaults: 3 / 100 / 5) configurable via the
  provider config form
- Auto-seeded on user create + /setup + boot backfill for existing users
- Anti-spam: counters reset after emission; backlog uses transition latch
- Self-loop guard: bridge_self failures don't count toward target-failure
  thresholds (logged only) — wire to your own Telegram/Email/Matrix to
  get notified when polls/dispatches/sends fail
- 6 default templates (3 events × 2 locales), tracking config columns
  with backfill migration, frontend descriptor (excluded from "create
  provider" wizard since auto-managed)

Operator-visible behavior changes (call out in release notes):
- NOTIFY_BRIDGE_TELEGRAM_WEBHOOK_SECRET now REQUIRED for webhook mode
- Existing webhook providers with auth_mode="none" need explicit opt-in
- Generic webhook endpoint rate-limited 60/min per source IP
- HA disconnect/reconnect writes ha_status_* EventLog rows
- Every user gets a bridge_self provider — wire it to a target to
  receive failure alerts

Pre-existing test failures (test_ssrf, test_release_provider) on
Python 3.13 are unrelated; CI runs on 3.12.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 02:16:49 +03:00
alexei.dolgolyov ab621b6abc feat: wire tracking-config display filters + per-tracker adaptive polling
Display filters (Immich tracking config):
- favorites_only drops events with no favorited new assets, or filters
  added_assets to favorites only
- assets_order_by/assets_order sort the rendered list
  (date / name / rating / random / none)
- max_assets_to_show caps rendered+attached media (default 5 -> 10)
- include_tags strips people from event extras and tags from each asset
- include_asset_details strips city/country/state/lat/lon/is_favorite/
  rating/description; load-bearing fields (thumbhash, file_size,
  playback_size, cache keys) preserved
- New apply_tracking_display_filters helper in dispatch_helpers; wired
  into watcher, webhooks, scheduled/periodic/memory, and manual
  test-dispatch
- Targets sharing a TrackingConfig dispatch together; targets with
  different TCs each see their own shaped event

Adaptive polling:
- Replace NotificationTracker.batch_duration with adaptive_max_skip
- Per-tracker opt-in: NULL/0 disables back-off (every tick runs);
  positive N caps the skip factor at (N-1)-in-N after long idle
- Scheduler caches the cap in module state for the tick fast-path
- Migration adds the new column; API schemas/responses, frontend types,
  i18n, and the tracker form updated to match
2026-04-24 21:12:10 +03:00
alexei.dolgolyov b61394f057 feat(immich): per-album scheduled/memory dispatch + template tooling
Dispatch: honor {kind}_collection_mode on TrackingConfig — "per_collection"
fans out one event per album; "combined" pools assets as before. Extract
build_immich_dispatch_events shared by cron and test paths.

Assets: collect_scheduled_assets attaches album_name/album_url/album_public_url
to each asset so combined-mode templates can attribute rows to their source
album. Default scheduled_assets templates render a multi-album header with
inline album list and per-row album link; memory_mode follows the same pattern.

UI: "Reset to default" buttons on notification and command template slots
(per-slot and whole-template), backed by new GET /*-template-configs/defaults
endpoints. tracking-configs "Preview template" now opens an inline preview
modal with locale tabs instead of navigating away; Edit button deep-links
with ?edit_slot=<name> so the destination auto-opens the config and scrolls
to the slot. Reset confirmations use ConfirmModal instead of window.confirm.

Fixes:
* NotificationDispatcher._session_ctx infinite recursion when no shared
  aiohttp.ClientSession was passed — broke test dispatch for periodic/
  scheduled/memory (cron path was unaffected).
* telegram-bots /chats/{id}/test now resolves chat.language_override /
  language_code instead of using the raw ?locale query param, matching
  the resolution the tracker-target test endpoint already used.
* scheduled_assets default template no longer emits a blank line between
  header and the first asset when the multi-album branch is taken.
2026-04-24 19:15:54 +03:00
alexei.dolgolyov 6c3dd67c1b feat(tracking): per-config quiet hours with app-level IANA timezone
Add quiet_hours_enabled/start/end to TrackingConfig (HH:MM strings
interpreted in the app-level timezone AppSetting). The dispatch path
loads the app timezone once per run and passes it through
event_allowed_by_config -> in_quiet_hours, so overnight windows like
22:00-07:00 work correctly in any IANA tz.

Frontend exposes a Timezone field under Settings and a Quiet Hours
section on the Immich tracking-config form with time-picker inputs.
2026-04-22 02:31:48 +03:00
alexei.dolgolyov a7a2b4efa4 feat: large polish pass — UX fixes, per-chat scope, restore/backup, action events
Backend
- Per-chat album scope for Immich commands (search/latest/memory/...): new
  allowed_album_ids on CommandTrackerListener, threaded listener/page kwargs
  through ProviderCommandHandler.handle; PATCH listener-scope endpoint.
- /search and /find accept a trailing page number; Immich client search_smart
  / search_metadata take a page param.
- Immich person-asset lookup switched from removed GET /api/people/{id}/assets
  to POST /api/search/metadata with personIds (fixes /person command and
  auto_organize rules silently returning zero candidates on Immich 1.106+).
- Auto_organize rule now sets the target album's thumbnail to the first added
  image when missing (falls back to any asset type); failures do not fail the
  rule. add_assets_to_album surfaces the Immich error body on non-2xx.
- EventLog.user_id / action_id / action_name columns with defensive migration
  + backfill. Status query filters by user_id directly; Immich/webhook paths
  emit user_id explicitly. action_runner writes an action_success/partial/
  failed event on each non-dry-run.
- Dashboard DELETE /api/status/events (scoped to user_id) + rendering live
  tracker/provider/action names via FK join with snapshot fallback.
- PATCH /api/users/{id} for username/role change with last-admin guard.
- Deletion protection returns structured {message, entity, blocked_by}
  (ApiError carries .blockedBy; frontend opens BlockedByModal).
- Backup prepare-restore → AppSetting markers + atomic write of
  pending_restore.json; lifespan hook applies on next startup and archives
  under data/applied_restores/. apply-restart sends SIGTERM so the lifespan
  shutdown runs; NOTIFY_BRIDGE_SUPERVISED env override gates the button.
  Manual POST /api/backup/files (same format as scheduled).
- New periodic-summary test path reuses shared collect_scheduled_assets
  (limit=0) so test and future production code go through one primitive.
- Per-receiver locale for Telegram test messages (resolves
  TelegramChat.language_override per chat instead of applying the first
  receiver's locale to everyone).
- Bounded concurrency (semaphores) in NotificationDispatcher._preload_asset_data
  and _refresh_telegram_chat_titles; chat title sweep extended to 24h since
  save_chat_from_webhook covers active chats opportunistically.
- Telegram poller detects the \"webhook is active\" 409 and auto-calls
  deleteWebhook for bots whose DB update_mode is polling (throttled per bot).
- TelegramClient.get_chat added (CLAUDE.md rule 6); set_album_thumbnail added.
- Seeds: rename \"Default Commands\" → \"Default Immich Commands\";
  track_assets_removed default False.

Frontend
- Global provider selector visible when there is only one provider.
- Clear-events button + i18n + ConfirmModal on the dashboard; new icons/
  labels/filters/colors for action_success / action_partial / action_failed.
- Auto-select first available tracking/template/command/config + bot on
  create forms (trackers, command-trackers, targets, template/command
  configs).
- Telegram target disable_url_preview defaults to true.
- BlockedByModal wired into 8 deletion flows; fetchAuth helper for
  multipart/binary calls (reuses api()'s refresh + ApiError mapping).
- Immich tracker 'Checking links' parallelised (concurrency cap 6).
- Backup page: pending-restore banner + Apply-now / Apply-later modal,
  restarting overlay polling /api/health, manual 'Create backup' button.
- Command-trackers listener row gets an 'Edit album scope' modal with
  inherit/explicit multiselect.
- Users page: Edit user modal (username + role).
- parseDate helper for consistent UTC date rendering.

Migrations / schema
- event_log: + user_id, action_id, action_name (+ backfill user_id from
  notification_tracker).
- command_tracker_listener: + allowed_album_ids.
2026-04-22 01:13:11 +03:00
alexei.dolgolyov 1a8c95e942 refactor: replace favorites checkbox with toggle switch in grid layout
Move "Favorites only" from a separate checkboxes array into the regular
fields grid as a toggle switch, aligning Scheduled Assets and Memory Mode
sections visually. Memory source moved to last position.
2026-03-24 17:26:02 +03:00
alexei.dolgolyov 8cb836e16c refactor: provider descriptor registry — eliminate provider-specific hardcoding
Replace all if/else chains keyed on provider type strings with a
descriptor-driven architecture. Each provider type (immich, gitea,
planka, scheduler, nut, google_photos) has a descriptor in
frontend/src/lib/providers/ that declares config fields, event
tracking fields, collection metadata, validation, and hooks.

Components now use getDescriptor(type) and render dynamically.
Dashboard provider card shows provider name + type when global
filter is active. Grid-items derived from registry.
2026-03-24 12:40:33 +03:00