Commit Graph

8 Commits

Author SHA1 Message Date
alexei.dolgolyov 10d30fc956 feat: production readiness — security, perf, bug fixes, bridge self-monitoring
Comprehensive multi-area pass driven by a parallel 8-agent production
review. Frontend, backend, database, security, performance, operational,
plus a new self-monitoring feature.

## Critical fixes
- Planka webhook: reads bounded raw body (was NameError on every call)
- HA quiet hours: ha_state_changed/automation_triggered/service_called/
  event_fired added to deferrable set (were silently dropped)
- DNS-rebinding SSRF: PinnedResolver wired into shared aiohttp session
- Telegram inbound webhook: secret now mandatory (401 without)
- Generic webhook: auth_mode="none" requires explicit
  acknowledge_unauthenticated=true; per-IP rate limit 60/min
- svelte-check: 5 null-narrowing errors in EventDetailModal fixed
- Provider hardcoding: Immich-only block extracted to descriptor
  featureDiscoveryHint
- command_sync: snapshot+expunge bot before exiting AsyncSession

## Bug fixes
- notifier asyncio.gather(return_exceptions=True) — one bad chat no longer
  cancels peer sends
- NotificationDispatcher hoisted out of per-tracker loop
- Provider credential resolution unified across all 5 dispatch sites
- HA asyncio.shield now drains inner task on cancellation
- Provider construction switched from if/elif ladder to factory registry
- NUT first poll seeds silently (no spurious ups_on_battery)
- Quiet-hours gate: event-type-disabled now wins over deferral
- APScheduler drain job ID resolution upgraded to seconds
- HA on_status_change wired through to EventLog
- Webhook payload rollback failures now logged (not swallowed)
- Batched receivers/chats/bots in load_link_data (was per-target N+1)
- flag_modified on JSON column reassignments in deferred_dispatch

## Database
- UNIQUE indexes on service_provider.webhook_token,
  telegram_bot.webhook_path_id, partial UNIQUE on telegram_bot.bot_id,
  telegram_chat(bot_id, chat_id), notification_tracker_target unique link,
  partial UNIQUE on bridge_self provider per user
- Composite ix_event_log_user_event_type_created index
- save_chat_from_webhook switched to ON CONFLICT DO UPDATE
- ondelete=CASCADE on user-id FKs (model annotation; app-side cascade
  delete added for existing data)
- delete_notification_tracker converted from N+1 to bulk DELETE/UPDATE
- Module-level asyncio.Lock replaced with lazy _get_lock() pattern
- VACUUM INTO snapshot now PRAGMA integrity_check verified

## Performance
- Jinja2 template compilation LRU cached (lru_cache maxsize=512)
- Per-locale render cache in NotificationDispatcher (skips re-rendering
  identical content for receivers sharing a locale)
- Tracker list cached per provider_id with 5s TTL + explicit invalidation
  on tracker CRUD (relieves HA chat-bus rate query pressure)
- Nav-counts collapsed from 16 round-trips to single UNION ALL
- HA event_log: skip persisting empty assets_added/removed events

## Security hardening
- Mass-assignment guard on Action create/update; cron sub-minute reject
- Backup JSON depth/node-count cap (depth ≤ 10, nodes ≤ 100k)
- _sanitize_config extended to all JSON-typed fields on backup import
- Telegram _safe_get walks redirects manually with SSRF revalidation
- Bcrypt 72-byte password length cap with clear 422
- Webhook payload body redaction; sensitive substring set extended with
  oauth/client_secret/webhook_secret/csrf in both header filter and
  template extras filter

## Frontend
- 76 catch (err: any) sites converted to errMsg(err) helper
- globalProviderFilter: pure getter; reconciliation moved to one-time
  $effect in +layout
- Provider-filter binding: removed paired $effects + _syncingFilter flag,
  now one-way derived
- entity-cache: separate _refreshing flag for background re-fetches
- api.ts 401 handling: AuthRedirectError class + dedup _redirecting flag,
  goto() instead of window.location.href
- a11y: aria-expanded on mobile More, role=switch + aria-checked on
  Telegram bot toggles

## Tests & operations
- CI pytest gate added to .gitea/workflows/build.yml + release.yml
  (wheel-built install to dodge editable-install slowness)
- /api/ready upgraded to deep healthcheck (db SELECT 1, scheduler.running,
  HA supervisor presence) returning {ready, checks, errors, version}
- /api/metrics endpoint with prometheus_client (deferred_pending,
  event_log_total, dispatch_duration, poll_failures, send_failures)
- New OPERATIONS.md covering deploy, healthchecks, metrics, backup/restore
  procedures, log handling, common scenarios, upgrade flow
- New tests: test_bridge_self (11), test_gitea_parser (9),
  test_planka_parser (6), test_immich_change_detector (6),
  test_backup_roundtrip (1)

## New feature: bridge self-monitoring
- New bridge_self provider type — internal sink for bridge health events
- Three event types: bridge_self_poll_failures (consecutive tracker poll
  failures), bridge_self_deferred_backlog (pending count crosses
  threshold), bridge_self_target_failures (consecutive 5xx/network
  failures per target)
- Per-user thresholds (defaults: 3 / 100 / 5) configurable via the
  provider config form
- Auto-seeded on user create + /setup + boot backfill for existing users
- Anti-spam: counters reset after emission; backlog uses transition latch
- Self-loop guard: bridge_self failures don't count toward target-failure
  thresholds (logged only) — wire to your own Telegram/Email/Matrix to
  get notified when polls/dispatches/sends fail
- 6 default templates (3 events × 2 locales), tracking config columns
  with backfill migration, frontend descriptor (excluded from "create
  provider" wizard since auto-managed)

Operator-visible behavior changes (call out in release notes):
- NOTIFY_BRIDGE_TELEGRAM_WEBHOOK_SECRET now REQUIRED for webhook mode
- Existing webhook providers with auth_mode="none" need explicit opt-in
- Generic webhook endpoint rate-limited 60/min per source IP
- HA disconnect/reconnect writes ha_status_* EventLog rows
- Every user gets a bridge_self provider — wire it to a target to
  receive failure alerts

Pre-existing test failures (test_ssrf, test_release_provider) on
Python 3.13 are unrelated; CI runs on 3.12.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 02:16:49 +03:00
alexei.dolgolyov 56993d2ca3 fix(security,perf): harden restore, CSRF, token_version + perf pass
Security
- Sign pending_restore.json (SHA256 stored in AppSetting, verified on
  startup apply) + refuse path outside data_dir, tighten to 0600.
- Require same-origin Origin/Referer on POST /api/backup/apply-restart —
  Bearer-in-localStorage is CSRF-reachable from any XSS'd admin tab.
- Bump token_version on role/username change and admin password reset so
  demoted admins lose admin in already-issued JWTs.  Guard last-admin
  TOCTOU via COUNT + post-commit re-check that rolls back a race.
- SSRF guard (validate_outbound_url) in ImmichClient.__init__ and the
  external_domain setter — admin-mutable URLs were bypassing the check
  that webhook/slack/discord paths already used.  Dev restart script now
  sets NOTIFY_BRIDGE_ALLOW_PRIVATE_URLS=1 so homelab Immich still works.
- Redact + cap Immich error bodies to ~120 chars before they flow into
  ActionExecution.error / EventLog.details (both UI-visible).
- Deny-list sensitive keys (api_key / token / secret / password /
  authorization / cookie / ...) in template-context merges so a rogue
  template can't exfiltrate provider creds via {{ api_key }}.
- Cap user-controlled Immich search params (query ≤256, person_ids ≤50,
  size ≤100) so a Telegram listener can't DoS upstream.
- Stream upload reads with running byte counter + content-length precheck
  instead of buffering the full body and then rejecting.
- Log Telegram parse_mode fallbacks instead of swallowing silently;
  template escape bugs now surface in server logs.
- Rollback partial imports on pending-restore failure (error recorded on
  a fresh session).

Performance
- Fix N+1 in _refresh_telegram_chat_titles: single IN query instead of
  session.get per chat.
- Parallelize album + shared-link fetches in test_dispatch (asyncio.gather)
  and per-receiver Telegram test sends in notifier (semaphore 5).
- Early-exit collect_scheduled_assets(limit=0) so the periodic-summary
  test path skips full per-album filter/sample (was O(album_assets)).
- Emit explicit CREATE INDEX IF NOT EXISTS for event_log user_id /
  action_id / provider_id so the first boot after upgrade isn't left
  unindexed for the dashboard query.
- Add AbortController timeout (120s) to fetchAuth so uploads/downloads
  don't hang indefinitely.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 02:28:55 +03:00
alexei.dolgolyov a7a2b4efa4 feat: large polish pass — UX fixes, per-chat scope, restore/backup, action events
Backend
- Per-chat album scope for Immich commands (search/latest/memory/...): new
  allowed_album_ids on CommandTrackerListener, threaded listener/page kwargs
  through ProviderCommandHandler.handle; PATCH listener-scope endpoint.
- /search and /find accept a trailing page number; Immich client search_smart
  / search_metadata take a page param.
- Immich person-asset lookup switched from removed GET /api/people/{id}/assets
  to POST /api/search/metadata with personIds (fixes /person command and
  auto_organize rules silently returning zero candidates on Immich 1.106+).
- Auto_organize rule now sets the target album's thumbnail to the first added
  image when missing (falls back to any asset type); failures do not fail the
  rule. add_assets_to_album surfaces the Immich error body on non-2xx.
- EventLog.user_id / action_id / action_name columns with defensive migration
  + backfill. Status query filters by user_id directly; Immich/webhook paths
  emit user_id explicitly. action_runner writes an action_success/partial/
  failed event on each non-dry-run.
- Dashboard DELETE /api/status/events (scoped to user_id) + rendering live
  tracker/provider/action names via FK join with snapshot fallback.
- PATCH /api/users/{id} for username/role change with last-admin guard.
- Deletion protection returns structured {message, entity, blocked_by}
  (ApiError carries .blockedBy; frontend opens BlockedByModal).
- Backup prepare-restore → AppSetting markers + atomic write of
  pending_restore.json; lifespan hook applies on next startup and archives
  under data/applied_restores/. apply-restart sends SIGTERM so the lifespan
  shutdown runs; NOTIFY_BRIDGE_SUPERVISED env override gates the button.
  Manual POST /api/backup/files (same format as scheduled).
- New periodic-summary test path reuses shared collect_scheduled_assets
  (limit=0) so test and future production code go through one primitive.
- Per-receiver locale for Telegram test messages (resolves
  TelegramChat.language_override per chat instead of applying the first
  receiver's locale to everyone).
- Bounded concurrency (semaphores) in NotificationDispatcher._preload_asset_data
  and _refresh_telegram_chat_titles; chat title sweep extended to 24h since
  save_chat_from_webhook covers active chats opportunistically.
- Telegram poller detects the \"webhook is active\" 409 and auto-calls
  deleteWebhook for bots whose DB update_mode is polling (throttled per bot).
- TelegramClient.get_chat added (CLAUDE.md rule 6); set_album_thumbnail added.
- Seeds: rename \"Default Commands\" → \"Default Immich Commands\";
  track_assets_removed default False.

Frontend
- Global provider selector visible when there is only one provider.
- Clear-events button + i18n + ConfirmModal on the dashboard; new icons/
  labels/filters/colors for action_success / action_partial / action_failed.
- Auto-select first available tracking/template/command/config + bot on
  create forms (trackers, command-trackers, targets, template/command
  configs).
- Telegram target disable_url_preview defaults to true.
- BlockedByModal wired into 8 deletion flows; fetchAuth helper for
  multipart/binary calls (reuses api()'s refresh + ApiError mapping).
- Immich tracker 'Checking links' parallelised (concurrency cap 6).
- Backup page: pending-restore banner + Apply-now / Apply-later modal,
  restarting overlay polling /api/health, manual 'Create backup' button.
- Command-trackers listener row gets an 'Edit album scope' modal with
  inherit/explicit multiselect.
- Users page: Edit user modal (username + role).
- parseDate helper for consistent UTC date rendering.

Migrations / schema
- event_log: + user_id, action_id, action_name (+ backfill user_id from
  notification_tracker).
- command_tracker_listener: + allowed_album_ids.
2026-04-22 01:13:11 +03:00
alexei.dolgolyov 90bc3ccdc2 chore: pre-release cleanup
- Skip token clear/redirect on 401 for unauthenticated requests
- Fix typo in test secret key in restart-backend script
- Remove completed plan documents (entity-relationship-refactor, ux-notification-improvements)
2026-04-21 19:39:33 +03:00
alexei.dolgolyov f0739ca949 feat: security hardening — SSRF guard, template sandbox timeout, webhook log prune, auth & backup polish
- Add outbound URL validation (SSRF) for webhook/Discord/Slack/ntfy/Matrix dispatch
- Template renderer: input/output caps and thread-based render timeout
- Webhook log filter: strip Authorization/signature/token-like headers; atomic prune
- Auth/JWT/backup/config tightening; misc frontend UX fixes
2026-04-16 03:21:45 +03:00
alexei.dolgolyov 307871cae5 feat: Google Photos provider backend + API hardening
- Add Google Photos provider: client, models, change detector, capabilities
- Add notification templates (en/ru) for all GP event slots
- Add command templates (en/ru) for GP bot commands
- Register GP in slot/command loaders, capabilities, and seeds
- Harden provider API: validate OAuth credentials on create/update
- Add internal URL rewriting for asset fetches (LAN optimization)
- Fix template renderer to handle missing variables gracefully
- Improve webhook command routing for multi-provider support
- Add provider health check endpoint and watcher improvements
2026-03-25 22:07:03 +03:00
alexei.dolgolyov e0bae394ee feat: comprehensive code review fixes — security, performance, quality
Backend security:
- Reject Gitea webhooks when webhook_secret is empty (was silently skipping)
- Add slowapi rate limiting on login (5/min) and setup (3/min) endpoints
- Add CORS middleware with configurable origins
- Mask telegram_webhook_secret in settings API response
- Protect system-owned command template configs from regular user modification
- Increase minimum password length to 8 characters

Backend performance:
- Batch queries in _resolve_command_context (3 queries instead of 3N)
- Concurrent album fetching with asyncio.gather in immich commands
- Singleton Jinja2 SandboxedEnvironment (reuse instead of per-render creation)
- TTLCache for rate limits (bounded memory, auto-eviction)
- Optional aiohttp session reuse in send_reply/send_media_group

Backend code quality:
- Extract dispatch_helpers.py (shared link_data loading + event filtering)
- Extract database/seeds.py from main.py (490 lines → dedicated module)
- Split immich_handler.py (415 lines) into commands/immich/ subpackage
- Replace bare except blocks with logged warnings
- Add per-provider config validation (Pydantic models)
- Truncate command input to 512 chars
- Expose usage_* and desc_* slots in capabilities and variables API

Frontend security:
- CSS.escape() for user-controlled querySelector in highlight.ts
- Client-side password min 8 chars validation on setup and password change

Frontend code quality:
- Replace any types with proper interfaces across top files
- Decompose targets/+page.svelte into TargetForm + ReceiverSection
- Fix $derived.by usage, $state mutation patterns
- Add console.warn to empty catch blocks

Frontend UX:
- Auth redirect via goto() with "Redirecting..." state
- Platform-aware Ctrl/Cmd K keyboard hint
- Remove stat-card hover transform

Frontend accessibility:
- Modal: role=dialog, aria-modal, focus trap, restore focus
- EntitySelect/IconGridSelect: listbox/option roles, aria-selected/disabled
2026-03-23 01:59:51 +03:00
alexei.dolgolyov 751097b347 feat: comprehensive code review fixes + receivers-only architecture
Security:
- Refuse startup with default secret_key in production (was just logging)
- Settings endpoint now requires admin role
- Password validation on initial setup
- DOM-based HTML sanitizer replaces regex in template previews
- Add *.log to .gitignore

Performance & reliability:
- Token refresh deduplication prevents race condition on concurrent 401s
- Theme media query listener registered once (no leak)
- IconPicker uses $derived instead of function call per render
- Snackbar uses single-batch state update instead of while loop
- Replace 11 inline hover handlers with CSS :hover in layout

Architecture - receivers-only:
- Delivery endpoints (chat_id, email, url, room_id, topic) now stored
  exclusively in TargetReceiver rows, never in target.config
- Migration extracts existing delivery fields to receiver rows
- Notifier and dispatcher remove all config fallbacks
- Frontend targets page shows receivers list per target with
  add/remove/toggle/test per receiver
- Single-receiver test endpoint: POST /targets/{id}/receivers/{id}/test

Code quality:
- Extract AuthLayout.svelte from login/setup (150 lines CSS dedup)
- Split telegram-bots page (754→51 lines + 3 tab components)
- Split notification-trackers page (547→432 lines + 4 components)
- Deduplicate _send_reply into shared handler.send_reply()
- Add locale column to template models, replace name-based detection
- Fix delete_notification_tracker dead protection check
- Fix check_telegram_bot query (filter by type, remove bogus OR)
- Add graceful scheduler shutdown in lifespan
- Consistent /bots?tab=telegram URLs across all nav links

i18n:
- Error page, chat actions, target types, provider types internationalized
- All new receiver UI strings in EN + RU
2026-03-22 02:19:31 +03:00