Commit Graph

34 Commits

Author SHA1 Message Date
alexei.dolgolyov faaaa39f8a chore: release v0.8.1
Release / test-backend (push) Failing after 1m36s
Release / release (push) Has been skipped
2026-05-16 12:28:07 +03:00
alexei.dolgolyov 10d30fc956 feat: production readiness — security, perf, bug fixes, bridge self-monitoring
Comprehensive multi-area pass driven by a parallel 8-agent production
review. Frontend, backend, database, security, performance, operational,
plus a new self-monitoring feature.

## Critical fixes
- Planka webhook: reads bounded raw body (was NameError on every call)
- HA quiet hours: ha_state_changed/automation_triggered/service_called/
  event_fired added to deferrable set (were silently dropped)
- DNS-rebinding SSRF: PinnedResolver wired into shared aiohttp session
- Telegram inbound webhook: secret now mandatory (401 without)
- Generic webhook: auth_mode="none" requires explicit
  acknowledge_unauthenticated=true; per-IP rate limit 60/min
- svelte-check: 5 null-narrowing errors in EventDetailModal fixed
- Provider hardcoding: Immich-only block extracted to descriptor
  featureDiscoveryHint
- command_sync: snapshot+expunge bot before exiting AsyncSession

## Bug fixes
- notifier asyncio.gather(return_exceptions=True) — one bad chat no longer
  cancels peer sends
- NotificationDispatcher hoisted out of per-tracker loop
- Provider credential resolution unified across all 5 dispatch sites
- HA asyncio.shield now drains inner task on cancellation
- Provider construction switched from if/elif ladder to factory registry
- NUT first poll seeds silently (no spurious ups_on_battery)
- Quiet-hours gate: event-type-disabled now wins over deferral
- APScheduler drain job ID resolution upgraded to seconds
- HA on_status_change wired through to EventLog
- Webhook payload rollback failures now logged (not swallowed)
- Batched receivers/chats/bots in load_link_data (was per-target N+1)
- flag_modified on JSON column reassignments in deferred_dispatch

## Database
- UNIQUE indexes on service_provider.webhook_token,
  telegram_bot.webhook_path_id, partial UNIQUE on telegram_bot.bot_id,
  telegram_chat(bot_id, chat_id), notification_tracker_target unique link,
  partial UNIQUE on bridge_self provider per user
- Composite ix_event_log_user_event_type_created index
- save_chat_from_webhook switched to ON CONFLICT DO UPDATE
- ondelete=CASCADE on user-id FKs (model annotation; app-side cascade
  delete added for existing data)
- delete_notification_tracker converted from N+1 to bulk DELETE/UPDATE
- Module-level asyncio.Lock replaced with lazy _get_lock() pattern
- VACUUM INTO snapshot now PRAGMA integrity_check verified

## Performance
- Jinja2 template compilation LRU cached (lru_cache maxsize=512)
- Per-locale render cache in NotificationDispatcher (skips re-rendering
  identical content for receivers sharing a locale)
- Tracker list cached per provider_id with 5s TTL + explicit invalidation
  on tracker CRUD (relieves HA chat-bus rate query pressure)
- Nav-counts collapsed from 16 round-trips to single UNION ALL
- HA event_log: skip persisting empty assets_added/removed events

## Security hardening
- Mass-assignment guard on Action create/update; cron sub-minute reject
- Backup JSON depth/node-count cap (depth ≤ 10, nodes ≤ 100k)
- _sanitize_config extended to all JSON-typed fields on backup import
- Telegram _safe_get walks redirects manually with SSRF revalidation
- Bcrypt 72-byte password length cap with clear 422
- Webhook payload body redaction; sensitive substring set extended with
  oauth/client_secret/webhook_secret/csrf in both header filter and
  template extras filter

## Frontend
- 76 catch (err: any) sites converted to errMsg(err) helper
- globalProviderFilter: pure getter; reconciliation moved to one-time
  $effect in +layout
- Provider-filter binding: removed paired $effects + _syncingFilter flag,
  now one-way derived
- entity-cache: separate _refreshing flag for background re-fetches
- api.ts 401 handling: AuthRedirectError class + dedup _redirecting flag,
  goto() instead of window.location.href
- a11y: aria-expanded on mobile More, role=switch + aria-checked on
  Telegram bot toggles

## Tests & operations
- CI pytest gate added to .gitea/workflows/build.yml + release.yml
  (wheel-built install to dodge editable-install slowness)
- /api/ready upgraded to deep healthcheck (db SELECT 1, scheduler.running,
  HA supervisor presence) returning {ready, checks, errors, version}
- /api/metrics endpoint with prometheus_client (deferred_pending,
  event_log_total, dispatch_duration, poll_failures, send_failures)
- New OPERATIONS.md covering deploy, healthchecks, metrics, backup/restore
  procedures, log handling, common scenarios, upgrade flow
- New tests: test_bridge_self (11), test_gitea_parser (9),
  test_planka_parser (6), test_immich_change_detector (6),
  test_backup_roundtrip (1)

## New feature: bridge self-monitoring
- New bridge_self provider type — internal sink for bridge health events
- Three event types: bridge_self_poll_failures (consecutive tracker poll
  failures), bridge_self_deferred_backlog (pending count crosses
  threshold), bridge_self_target_failures (consecutive 5xx/network
  failures per target)
- Per-user thresholds (defaults: 3 / 100 / 5) configurable via the
  provider config form
- Auto-seeded on user create + /setup + boot backfill for existing users
- Anti-spam: counters reset after emission; backlog uses transition latch
- Self-loop guard: bridge_self failures don't count toward target-failure
  thresholds (logged only) — wire to your own Telegram/Email/Matrix to
  get notified when polls/dispatches/sends fail
- 6 default templates (3 events × 2 locales), tracking config columns
  with backfill migration, frontend descriptor (excluded from "create
  provider" wizard since auto-managed)

Operator-visible behavior changes (call out in release notes):
- NOTIFY_BRIDGE_TELEGRAM_WEBHOOK_SECRET now REQUIRED for webhook mode
- Existing webhook providers with auth_mode="none" need explicit opt-in
- Generic webhook endpoint rate-limited 60/min per source IP
- HA disconnect/reconnect writes ha_status_* EventLog rows
- Every user gets a bridge_self provider — wire it to a target to
  receive failure alerts

Pre-existing test failures (test_ssrf, test_release_provider) on
Python 3.13 are unrelated; CI runs on 3.12.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 02:16:49 +03:00
alexei.dolgolyov dfd7329177 chore: release v0.8.0
Release / release (push) Successful in 1m55s
2026-05-12 03:01:06 +03:00
alexei.dolgolyov 5d41a39406 chore: release v0.7.2
Release / release (push) Successful in 1m10s
2026-05-11 00:39:06 +03:00
alexei.dolgolyov 757271dadf chore: release v0.7.1
Release / release (push) Successful in 2m11s
2026-05-07 23:33:09 +03:00
alexei.dolgolyov 632e4c1aa3 chore: release v0.7.0
Release / release (push) Successful in 1m19s
2026-05-07 14:03:48 +03:00
alexei.dolgolyov 349e9136a4 chore: release v0.6.5
Release / release (push) Successful in 1m36s
2026-04-28 19:10:49 +03:00
alexei.dolgolyov aa9548d884 chore: release v0.6.4
Release / release (push) Successful in 1m41s
2026-04-27 18:27:09 +03:00
alexei.dolgolyov 0e675c4b38 chore: release v0.6.3
Release / release (push) Successful in 1m15s
2026-04-27 15:42:04 +03:00
alexei.dolgolyov c43dc598a1 chore: release v0.6.2
Release / release (push) Successful in 1m39s
2026-04-27 14:29:44 +03:00
alexei.dolgolyov b320090a56 chore: release v0.6.1
Release / release (push) Successful in 3m17s
2026-04-25 15:25:23 +03:00
alexei.dolgolyov 9eb478fdc9 chore: release v0.6.0
Release / release (push) Successful in 1m25s
2026-04-25 14:54:16 +03:00
alexei.dolgolyov 770c198ac3 chore: release v0.5.2
Release / release (push) Successful in 1m48s
2026-04-24 21:58:40 +03:00
alexei.dolgolyov 187b889c45 chore: release v0.5.1
Release / release (push) Successful in 1m49s
2026-04-24 19:21:39 +03:00
alexei.dolgolyov 461fb495d7 chore: release v0.5.0
Release / release (push) Successful in 1m10s
2026-04-24 14:16:34 +03:00
alexei.dolgolyov e44d387c7f chore: release v0.4.0
Build and Test / test-backend (push) Failing after 1m6s
Release / test (push) Failing after 1m7s
Release / release (push) Has been skipped
Build and Test / test-frontend (push) Successful in 9m54s
Build and Test / build-image (push) Has been skipped
2026-04-23 20:18:34 +03:00
alexei.dolgolyov 920920bc67 feat: production-readiness hardening across security, async, DB, ops
Build and Test / test-frontend (push) Successful in 9m37s
Build and Test / test-backend (push) Successful in 10m53s
Build and Test / build-image (push) Failing after 14m52s
Security
- SSRF: async DNS resolver; allow_redirects=False on all outbound clients;
  matrix homeserver_url validated on create/update/test; update_provider
  and email_bot merge incoming config and reject ***-masked secrets.
- Auth: bcrypt offloaded to asyncio.to_thread; JWT now carries iss/aud +
  leeway and rejects missing claims; setup TOCTOU closed inside a
  transaction; rate limits extended (default 600/min, 10/min on password
  change, 30/min on needs-setup); constant-time login to prevent username
  enumeration.
- Config: rejects known dev secret keys; validates CORS origin schemes,
  port range, token lifetimes.
- Webhook handlers stream-read body with a 1 MiB cap; Discord 429 retries
  bounded (3 attempts, Retry-After capped at 60 s).
- CSP + HSTS added to SecurityHeadersMiddleware.

Async / runtime
- SQLite engine: WAL, synchronous=NORMAL, foreign_keys=ON, busy_timeout,
  pool_pre_ping, dispose on shutdown.
- Lifespan shutdown now stops scheduler before closing HTTP session and
  disposing the engine.
- Shared aiohttp session locked against concurrent first-caller races;
  core NotificationDispatcher accepts and reuses it.
- Storage and scheduled backup writes wrapped in asyncio.to_thread.
- NUT client writes bounded by asyncio.wait_for.
- Telegram poller switched from 3 s short-poll to 30 s interval + 25 s
  long-poll (~10x fewer API calls).

Database
- New performance-indexes migration covers every FK/owner column and
  hot-path composite (notification_tracker(provider_id, enabled);
  event_log(user_id, created_at DESC); webhook_payload_log(provider_id,
  created_at DESC); action_execution(action_id, started_at DESC)).
- New schema_version table for future upgrade gating.
- __system__ placeholder user (id=0) seeded so user_id=0 system defaults
  satisfy the newly enforced FK; filtered out of /auth/needs-setup,
  /api/users, and setup.
- list_notification_trackers rewritten to batched loads (was 1+N+N*M).
- Retention job extended to event_log, webhook_payload_log, and
  action_execution; retention days exposed as a setting.

Scheduler
- AsyncIOScheduler job_defaults: coalesce, misfire_grace_time=300,
  max_instances=1.

Ops
- uvicorn runs with proxy_headers, forwarded_allow_ips,
  timeout_graceful_shutdown; access log suppressed in non-debug.
- FastAPI version string now reads from importlib.metadata.
- New /api/ready endpoint separate from /api/health.
- docker-compose drops the ALLOW_PRIVATE_URLS=1 default, adds mem/cpu/pid
  limits, read_only + tmpfs, cap_drop:ALL, no-new-privileges; healthcheck
  targets /api/ready.
- CI now runs on push/PR with backend pytest, frontend svelte-check +
  build, and a non-push image build; release workflow gated on tests,
  publishes immutable sha-<commit> image tag, adds Trivy scan.

Tests
- New packages/server/tests/ with 29 passing tests: config validation,
  JWT round-trip + aud/alg=none rejection, SSRF scheme and private-range
  enforcement (sync + async), Discord bounded retry, and a lifespan-level
  /api/health + /api/ready smoke check.
- Renamed the misnamed services/test_dispatch.py to manual_dispatch.py so
  pytest never auto-collects production code.

Frontend
- /login now redirects already-authenticated users to /, shows a distinct
  'backend unreachable' banner (en/ru) when /auth/needs-setup fails.
2026-04-23 19:44:56 +03:00
alexei.dolgolyov 1f880daa0c chore: release v0.3.2
Release / release (push) Successful in 1m22s
2026-04-23 13:38:28 +03:00
alexei.dolgolyov 5604c733d1 chore: release v0.3.1
Release / release (push) Successful in 1m1s
2026-04-22 19:27:45 +03:00
alexei.dolgolyov 155d25edf9 chore: release v0.3.0
Release / release (push) Successful in 1m19s
2026-04-22 18:59:15 +03:00
alexei.dolgolyov d02616069d chore: release v0.2.8
Release / release (push) Successful in 1m58s
2026-04-22 18:02:09 +03:00
alexei.dolgolyov e6481605ca chore: release v0.2.7
Release / release (push) Successful in 4m23s
2026-04-22 16:47:25 +03:00
alexei.dolgolyov 325eabd751 chore: release v0.2.6
Release / release (push) Successful in 1m19s
2026-04-22 16:29:24 +03:00
alexei.dolgolyov d7daadadc2 chore: release v0.2.5
Release / release (push) Successful in 1m9s
2026-04-22 15:56:20 +03:00
alexei.dolgolyov 93df538819 chore: release v0.2.4
Release / release (push) Successful in 1m18s
2026-04-22 15:36:08 +03:00
alexei.dolgolyov 5028f15f4f chore: release v0.2.3
Release / release (push) Successful in 1m17s
2026-04-22 03:30:45 +03:00
alexei.dolgolyov 83215473c7 chore: release v0.2.2
Release / release (push) Successful in 1m0s
2026-04-22 02:51:10 +03:00
alexei.dolgolyov 645331d320 chore: release v0.2.1
Release / release (push) Successful in 1m27s
2026-04-22 02:35:38 +03:00
alexei.dolgolyov fe92b206b7 chore: release v0.2.0
Release / release (push) Successful in 1m20s
2026-04-22 01:35:24 +03:00
alexei.dolgolyov 6b2211353d feat: person excludes for auto-organize rules, backup & restore system
Add person exclude criteria to Immich auto-organize — assets containing
excluded persons are filtered out after candidate gathering. Also adds
full backup/restore system with export, import, scheduled backups, and
retention management.
2026-04-02 14:13:42 +03:00
alexei.dolgolyov e0bae394ee feat: comprehensive code review fixes — security, performance, quality
Backend security:
- Reject Gitea webhooks when webhook_secret is empty (was silently skipping)
- Add slowapi rate limiting on login (5/min) and setup (3/min) endpoints
- Add CORS middleware with configurable origins
- Mask telegram_webhook_secret in settings API response
- Protect system-owned command template configs from regular user modification
- Increase minimum password length to 8 characters

Backend performance:
- Batch queries in _resolve_command_context (3 queries instead of 3N)
- Concurrent album fetching with asyncio.gather in immich commands
- Singleton Jinja2 SandboxedEnvironment (reuse instead of per-render creation)
- TTLCache for rate limits (bounded memory, auto-eviction)
- Optional aiohttp session reuse in send_reply/send_media_group

Backend code quality:
- Extract dispatch_helpers.py (shared link_data loading + event filtering)
- Extract database/seeds.py from main.py (490 lines → dedicated module)
- Split immich_handler.py (415 lines) into commands/immich/ subpackage
- Replace bare except blocks with logged warnings
- Add per-provider config validation (Pydantic models)
- Truncate command input to 512 chars
- Expose usage_* and desc_* slots in capabilities and variables API

Frontend security:
- CSS.escape() for user-controlled querySelector in highlight.ts
- Client-side password min 8 chars validation on setup and password change

Frontend code quality:
- Replace any types with proper interfaces across top files
- Decompose targets/+page.svelte into TargetForm + ReceiverSection
- Fix $derived.by usage, $state mutation patterns
- Add console.warn to empty catch blocks

Frontend UX:
- Auth redirect via goto() with "Redirecting..." state
- Platform-aware Ctrl/Cmd K keyboard hint
- Remove stat-card hover transform

Frontend accessibility:
- Modal: role=dialog, aria-modal, focus trap, restore focus
- EntitySelect/IconGridSelect: listbox/option roles, aria-selected/disabled
2026-03-23 01:59:51 +03:00
alexei.dolgolyov 03ec9b3c86 feat: telegram commands, app settings, bot polling, webhook handling, UI improvements
Adds telegram bot command system with 13 commands (search, latest, random, etc.),
webhook/polling handlers, rate limiting, app settings page, and various UI/UX
improvements across all entity pages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 23:11:42 +03:00
alexei.dolgolyov 7f99c895a4 feat(notify-bridge): phase 6 - database models and server API
New database schema with ServiceProvider abstraction:
- ServiceProvider (replaces ImmichServer): type + JSON config
- Tracker (replaces AlbumTracker): owns tracking_config_id
- TrackingConfig: provider_type scoped, owned by Tracker
- TemplateConfig: provider_type scoped, owned by Target
- NotificationTarget: owns template_config_id (not tracking_config_id)
- TrackerState, EventLog, User, TelegramBot, TelegramChat

Full FastAPI server:
- /api/providers: CRUD + test connection + list collections
- /api/trackers: CRUD
- /api/tracking-configs: CRUD with provider_type filter
- /api/template-configs: CRUD with provider_type filter, system defaults
- /api/targets: CRUD
- /api/template-vars: variable docs filtered by provider type
- /api/auth: setup, login, refresh, me, password change
- /api/health: health check
- Default template seeding on first startup (EN/RU for Immich)
- pydantic-settings with NOTIFY_BRIDGE_ env prefix

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 23:39:23 +03:00
alexei.dolgolyov b724447f4d feat(notify-bridge): phase 1 - project scaffolding
Set up the Notify Bridge project structure:
- packages/core (notify_bridge_core) with provider, model, notification, template packages
- packages/server (notify_bridge_server) with FastAPI skeleton and health endpoint
- frontend with SvelteKit 2, Svelte 5, Tailwind CSS v4, static adapter
- Root configs: .gitignore, README.md, CLAUDE.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 22:30:06 +03:00