notify-bridge

Author	SHA1	Message	Date
alexei.dolgolyov	d7c48b06ee	ci: isolate test backend install in venv Release / test-backend (push) Successful in 2m1s Details Release / release (push) Successful in 2m3s Details The persistent Gitea runner caches the setup-python toolcache between runs. A previous run that produced wheels with broken metadata (no Version field in METADATA) left a notify-bridge-server install with no RECORD file in site-packages. The next run hits: Found existing installation: notify-bridge-server None error: uninstall-no-record-file pip refuses to uninstall (no RECORD) and refuses to overlay (it tries to uninstall first). Switching from a system-pip install into the toolcache to an isolated /tmp/venv per run sidesteps the leak — each CI run starts with empty site-packages. Same change to build.yml and release.yml so the pre-merge gate and the release-gate both run the same setup.	2026-05-16 18:33:41 +03:00
alexei.dolgolyov	10d30fc956	feat: production readiness — security, perf, bug fixes, bridge self-monitoring Comprehensive multi-area pass driven by a parallel 8-agent production review. Frontend, backend, database, security, performance, operational, plus a new self-monitoring feature. ## Critical fixes - Planka webhook: reads bounded raw body (was NameError on every call) - HA quiet hours: ha_state_changed/automation_triggered/service_called/ event_fired added to deferrable set (were silently dropped) - DNS-rebinding SSRF: PinnedResolver wired into shared aiohttp session - Telegram inbound webhook: secret now mandatory (401 without) - Generic webhook: auth_mode="none" requires explicit acknowledge_unauthenticated=true; per-IP rate limit 60/min - svelte-check: 5 null-narrowing errors in EventDetailModal fixed - Provider hardcoding: Immich-only block extracted to descriptor featureDiscoveryHint - command_sync: snapshot+expunge bot before exiting AsyncSession ## Bug fixes - notifier asyncio.gather(return_exceptions=True) — one bad chat no longer cancels peer sends - NotificationDispatcher hoisted out of per-tracker loop - Provider credential resolution unified across all 5 dispatch sites - HA asyncio.shield now drains inner task on cancellation - Provider construction switched from if/elif ladder to factory registry - NUT first poll seeds silently (no spurious ups_on_battery) - Quiet-hours gate: event-type-disabled now wins over deferral - APScheduler drain job ID resolution upgraded to seconds - HA on_status_change wired through to EventLog - Webhook payload rollback failures now logged (not swallowed) - Batched receivers/chats/bots in load_link_data (was per-target N+1) - flag_modified on JSON column reassignments in deferred_dispatch ## Database - UNIQUE indexes on service_provider.webhook_token, telegram_bot.webhook_path_id, partial UNIQUE on telegram_bot.bot_id, telegram_chat(bot_id, chat_id), notification_tracker_target unique link, partial UNIQUE on bridge_self provider per user - Composite ix_event_log_user_event_type_created index - save_chat_from_webhook switched to ON CONFLICT DO UPDATE - ondelete=CASCADE on user-id FKs (model annotation; app-side cascade delete added for existing data) - delete_notification_tracker converted from N+1 to bulk DELETE/UPDATE - Module-level asyncio.Lock replaced with lazy _get_lock() pattern - VACUUM INTO snapshot now PRAGMA integrity_check verified ## Performance - Jinja2 template compilation LRU cached (lru_cache maxsize=512) - Per-locale render cache in NotificationDispatcher (skips re-rendering identical content for receivers sharing a locale) - Tracker list cached per provider_id with 5s TTL + explicit invalidation on tracker CRUD (relieves HA chat-bus rate query pressure) - Nav-counts collapsed from 16 round-trips to single UNION ALL - HA event_log: skip persisting empty assets_added/removed events ## Security hardening - Mass-assignment guard on Action create/update; cron sub-minute reject - Backup JSON depth/node-count cap (depth ≤ 10, nodes ≤ 100k) - _sanitize_config extended to all JSON-typed fields on backup import - Telegram _safe_get walks redirects manually with SSRF revalidation - Bcrypt 72-byte password length cap with clear 422 - Webhook payload body redaction; sensitive substring set extended with oauth/client_secret/webhook_secret/csrf in both header filter and template extras filter ## Frontend - 76 catch (err: any) sites converted to errMsg(err) helper - globalProviderFilter: pure getter; reconciliation moved to one-time $effect in +layout - Provider-filter binding: removed paired $effects + _syncingFilter flag, now one-way derived - entity-cache: separate _refreshing flag for background re-fetches - api.ts 401 handling: AuthRedirectError class + dedup _redirecting flag, goto() instead of window.location.href - a11y: aria-expanded on mobile More, role=switch + aria-checked on Telegram bot toggles ## Tests & operations - CI pytest gate added to .gitea/workflows/build.yml + release.yml (wheel-built install to dodge editable-install slowness) - /api/ready upgraded to deep healthcheck (db SELECT 1, scheduler.running, HA supervisor presence) returning {ready, checks, errors, version} - /api/metrics endpoint with prometheus_client (deferred_pending, event_log_total, dispatch_duration, poll_failures, send_failures) - New OPERATIONS.md covering deploy, healthchecks, metrics, backup/restore procedures, log handling, common scenarios, upgrade flow - New tests: test_bridge_self (11), test_gitea_parser (9), test_planka_parser (6), test_immich_change_detector (6), test_backup_roundtrip (1) ## New feature: bridge self-monitoring - New bridge_self provider type — internal sink for bridge health events - Three event types: bridge_self_poll_failures (consecutive tracker poll failures), bridge_self_deferred_backlog (pending count crosses threshold), bridge_self_target_failures (consecutive 5xx/network failures per target) - Per-user thresholds (defaults: 3 / 100 / 5) configurable via the provider config form - Auto-seeded on user create + /setup + boot backfill for existing users - Anti-spam: counters reset after emission; backlog uses transition latch - Self-loop guard: bridge_self failures don't count toward target-failure thresholds (logged only) — wire to your own Telegram/Email/Matrix to get notified when polls/dispatches/sends fail - 6 default templates (3 events × 2 locales), tracking config columns with backfill migration, frontend descriptor (excluded from "create provider" wizard since auto-managed) Operator-visible behavior changes (call out in release notes): - NOTIFY_BRIDGE_TELEGRAM_WEBHOOK_SECRET now REQUIRED for webhook mode - Existing webhook providers with auth_mode="none" need explicit opt-in - Generic webhook endpoint rate-limited 60/min per source IP - HA disconnect/reconnect writes ha_status_* EventLog rows - Every user gets a bridge_self provider — wire it to a target to receive failure alerts Pre-existing test failures (test_ssrf, test_release_provider) on Python 3.13 are unrelated; CI runs on 3.12. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 02:16:49 +03:00
alexei.dolgolyov	19036a90bb	ci: drop trivy scan from release (never failed, output discarded) Release / release (push) Successful in 31s Details	2026-04-23 20:55:30 +03:00
alexei.dolgolyov	f9040370bc	ci: drop backend pytest stage (too slow on hosted runner) Build and Test / build-image (push) Has been cancelled Details Build and Test / test-frontend (push) Has been cancelled Details Release / release (push) Has been cancelled Details The editable install of core+server+dev pulls the full scientific Python stack (SQLAlchemy, aiohttp, pytest, httpx, slowapi, uvicorn[standard], apscheduler and their transitives) on every CI run. Even with pip cache the restore + install takes minutes per job — not worth it for a suite that still runs locally via ``pytest packages/server/tests``. Kept the frontend svelte-check + build and the non-push Docker image build. Release workflow no longer has a test gate either (same reason). Bring the test stage back once we have a prebuilt CI image with deps.	2026-04-23 20:40:53 +03:00
alexei.dolgolyov	3b683ce82c	ci: cache pip downloads and collapse install into one pip call Build and Test / build-image (push) Has been cancelled Details Build and Test / test-backend (push) Has been cancelled Details Build and Test / test-frontend (push) Has been cancelled Details Two wins: * actions/setup-python's built-in pip cache, keyed on the two pyproject.toml files, turns the 20+ transitive dep downloads into a single tarball restore on cache hit. * One ``pip install -e ./core -e ./server[dev]`` call instead of two — lets pip's resolver run once over the combined graph and skips the second invocation's overhead. Also dropped ``pip install --upgrade pip``: the runner image already ships a recent pip, and the upgrade ran once per CI job for no gain.	2026-04-23 20:40:17 +03:00
alexei.dolgolyov	2bec25353b	ci: install editable packages inside a venv Build and Test / test-frontend (push) Successful in 9m46s Details Release / release (push) Has been cancelled Details Release / test (push) Has been cancelled Details Build and Test / build-image (push) Has been cancelled Details Build and Test / test-backend (push) Has been cancelled Details The hosted Gitea runner image pre-installs older versions of both packages in its system Python site-packages and retains stale ~otify_bridge_core / ~otify_bridge_server dist-info directories from prior interrupted runs. ``pip install -e`` against the system interpreter tries to uninstall those, the rollback fires mid-transaction, and the runner's ``/opt/hostedtoolcache/.../bin/notify-bridge`` console script disappears before the new install can be placed: ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: '/opt/hostedtoolcache/Python/3.12.12/x64/bin/notify-bridge' Installing into a fresh venv sidesteps the pre-cached state entirely (and is the recommendation pip itself prints on every run).	2026-04-23 20:23:42 +03:00
alexei.dolgolyov	920920bc67	feat: production-readiness hardening across security, async, DB, ops Build and Test / test-frontend (push) Successful in 9m37s Details Build and Test / test-backend (push) Successful in 10m53s Details Build and Test / build-image (push) Failing after 14m52s Details Security - SSRF: async DNS resolver; allow_redirects=False on all outbound clients; matrix homeserver_url validated on create/update/test; update_provider and email_bot merge incoming config and reject **-masked secrets. - Auth: bcrypt offloaded to asyncio.to_thread; JWT now carries iss/aud + leeway and rejects missing claims; setup TOCTOU closed inside a transaction; rate limits extended (default 600/min, 10/min on password change, 30/min on needs-setup); constant-time login to prevent username enumeration. - Config: rejects known dev secret keys; validates CORS origin schemes, port range, token lifetimes. - Webhook handlers stream-read body with a 1 MiB cap; Discord 429 retries bounded (3 attempts, Retry-After capped at 60 s). - CSP + HSTS added to SecurityHeadersMiddleware. Async / runtime - SQLite engine: WAL, synchronous=NORMAL, foreign_keys=ON, busy_timeout, pool_pre_ping, dispose on shutdown. - Lifespan shutdown now stops scheduler before closing HTTP session and disposing the engine. - Shared aiohttp session locked against concurrent first-caller races; core NotificationDispatcher accepts and reuses it. - Storage and scheduled backup writes wrapped in asyncio.to_thread. - NUT client writes bounded by asyncio.wait_for. - Telegram poller switched from 3 s short-poll to 30 s interval + 25 s long-poll (~10x fewer API calls). Database - New performance-indexes migration covers every FK/owner column and hot-path composite (notification_tracker(provider_id, enabled); event_log(user_id, created_at DESC); webhook_payload_log(provider_id, created_at DESC); action_execution(action_id, started_at DESC)). - New schema_version table for future upgrade gating. - __system__ placeholder user (id=0) seeded so user_id=0 system defaults satisfy the newly enforced FK; filtered out of /auth/needs-setup, /api/users, and setup. - list_notification_trackers rewritten to batched loads (was 1+N+NM). - Retention job extended to event_log, webhook_payload_log, and action_execution; retention days exposed as a setting. Scheduler - AsyncIOScheduler job_defaults: coalesce, misfire_grace_time=300, max_instances=1. Ops - uvicorn runs with proxy_headers, forwarded_allow_ips, timeout_graceful_shutdown; access log suppressed in non-debug. - FastAPI version string now reads from importlib.metadata. - New /api/ready endpoint separate from /api/health. - docker-compose drops the ALLOW_PRIVATE_URLS=1 default, adds mem/cpu/pid limits, read_only + tmpfs, cap_drop:ALL, no-new-privileges; healthcheck targets /api/ready. - CI now runs on push/PR with backend pytest, frontend svelte-check + build, and a non-push image build; release workflow gated on tests, publishes immutable sha-<commit> image tag, adds Trivy scan. Tests - New packages/server/tests/ with 29 passing tests: config validation, JWT round-trip + aud/alg=none rejection, SSRF scheme and private-range enforcement (sync + async), Discord bounded retry, and a lifespan-level /api/health + /api/ready smoke check. - Renamed the misnamed services/test_dispatch.py to manual_dispatch.py so pytest never auto-collects production code. Frontend - /login now redirects already-authenticated users to /, shows a distinct 'backend unreachable' banner (en/ru) when /auth/needs-setup fails.	2026-04-23 19:44:56 +03:00
alexei.dolgolyov	f27fa42b87	fix(ci): build release payload via heredoc, drop broken env-var passing Release / release (push) Successful in 24s Details Previous attempt used `python3 -c "..." KEY=VALUE` which passes KEY=VALUE as positional args, not environment variables — the python block then crashed with KeyError: 'BODY' because nothing actually set it in the environment. Consolidate into a single heredoc-fed python3 block that reads RELEASE_NOTES from the already-exported env var and reads TAG/VERSION/ IS_PRE after an explicit `export`. Uses <<'PY' so shell metachars in the Python source (backticks, $, quotes) are not interpreted. Also drops the redundant intermediate BODY variable — body is built directly inside the single python invocation.	2026-04-21 20:16:27 +03:00
alexei.dolgolyov	e12820f150	ci: robust Gitea release creation with HTTP status + diagnostics Release / release (push) Failing after 21s Details Previous implementation silently assumed any missing 'id' in POST response meant "release already exists", then called an unguarded python3 on the fallback response — which crashes (exit 1) if the fallback also fails (e.g. release really doesn't exist). New logic: - Build JSON payload in Python (avoids shell escaping + CLI length limits) - Capture HTTP status explicitly - 201 → success - 409 or "already exists" message → reuse existing (with HTTP check on fetch) - Anything else → fail loudly with the response body printed This also unblocks diagnosis of the current v0.1.0 failure by surfacing the actual error the Gitea API is returning.	2026-04-21 20:09:55 +03:00
alexei.dolgolyov	866a8df310	ci: fix changelog step on shallow checkout and small repos Release / release (push) Failing after 54s Details - Set fetch-depth: 0 so previous tag lookups work across full history. - Use `-n 20` instead of HEAD~20..HEAD, which fails when the repo has fewer than 20 commits (e.g. on the first release).	2026-04-21 19:59:40 +03:00
alexei.dolgolyov	56b345188e	ci: consolidate release.yml into single checkout step Release / release (push) Failing after 1m53s Details The two-step pattern (sparse-checkout RELEASE_NOTES.md, then full checkout) left sparse-checkout config active on the workspace, so the second checkout still only restored RELEASE_NOTES.md. Docker build then failed with "open Dockerfile: no such file or directory". Since both RELEASE_NOTES.md and the full source are needed in the same job, one full checkout is simpler and correct.	2026-04-21 19:50:49 +03:00
alexei.dolgolyov	eecc9e295c	ci: consolidate release tokens to single DEPLOY_TOKEN, rename redeploy step - Use one DEPLOY_TOKEN for both registry login and Gitea release API, matching the claude-code-facts convention. - Rename "Trigger Portainer redeploy" to "Trigger redeploy webhook" — the step calls a generic DOCKER_REDEPLOY_WEBHOOK_URL, not a Portainer-specific endpoint. - Add .facts-sync.json to pin this project to the facts repo commit.	2026-04-21 19:35:50 +03:00
alexei.dolgolyov	c41182ffd0	ci: sync release workflow with CI/CD docs, add manual build - Fix github.* → gitea.* context consistency - Add pre-release detection (skip :latest for alpha/beta/rc) - Add release fallback (reuse existing if creation fails) - Add prerelease field to release API call - Use sparse-checkout for RELEASE_NOTES.md - Skip Portainer redeploy for pre-releases - Add version tag without v prefix - Add manual build.yml for Docker image verification	2026-03-28 13:27:28 +03:00
alexei.dolgolyov	b803d004e1	refactor: comprehensive codebase review — security, performance, quality, UX Security: - Fix NUT protocol command injection (validate names against safe regex) - Enable Jinja2 autoescape=True to prevent HTML injection via external data - Add WebhookProviderConfig validation model Performance: - Shared aiohttp.ClientSession singleton (replaces 40+ per-request sessions) - Fix 4 N+1 queries with batch IN loads (poller, scheduler, memory, broadcast) - asyncio.gather for Gitea commands and notification dispatcher - Add DB indexes on NotificationTrackerState.tracker_id, CommandTrackerListener - LRU cache for compiled Jinja2 templates - Daily EventLog cleanup job (90-day retention) - 30s HTTP timeout on all external calls - GROUP BY for target type counts (replaces 7 sequential queries) Code quality: - Extract get_owned_entity() helper (replaces 11 duplicate functions) - Extract slot_helpers.py (load_slots, save_slots, render_template_preview) - Extract command_utils.py (tracker lookup, last event, collection IDs) - Extract http_session.py (shared session lifecycle) - Provider connection validation dedup (3x → 1 helper) - Command dispatch tables replacing if/elif chains - Album+links fetch helper (fetch_albums_with_links) - Provider dispatch polymorphism (list_provider_collections) - Immutable _enrich_assets (no longer mutates in-place) - Fix _format_assets return type + handler unpacking Frontend: - Fix 18+ hardcoded English strings → t() with new i18n keys (en + ru) - Mobile "More" nav panel with provider filter and search - Shared Button.svelte component (4 variants, 2 sizes) - Shared ErrorBanner.svelte component (8 pages updated) - SvelteKit goto() replacing window.location.href - Dashboard grid fixed for 4 cards, paginator opacity consistency Functionality: - max_instances=1 on scheduler jobs (prevents duplicate events) - Webhook provider in watcher (prevents error spam) - Fix stale SQLModel reference in poller - Gitea get_repo() direct API call	2026-03-28 13:22:26 +03:00
alexei.dolgolyov	1ac6a17f6f	feat: Docker deployment + Gitea CI/CD workflow - Multi-stage Dockerfile: Node frontend build → Python wheel build → slim runtime - Backend serves SvelteKit static output via FastAPI StaticFiles mount - docker-compose.yml with named volume for /data persistence - Gitea Actions workflow: build/push Docker image + create release on v* tags - Add NOTIFY_BRIDGE_STATIC_DIR config for frontend path - Fix run() to use configurable host/port	2026-03-23 02:14:14 +03:00

15 Commits