feat: production-readiness hardening across security, async, DB, ops

Security - SSRF: async DNS resolver; allow_redirects=False on all outbound clients; matrix homeserver_url validated on create/update/test; update_provider and email_bot merge incoming config and reject ***-masked secrets. - Auth: bcrypt offloaded to asyncio.to_thread; JWT now carries iss/aud + leeway and rejects missing claims; setup TOCTOU closed inside a transaction; rate limits extended (default 600/min, 10/min on password change, 30/min on needs-setup); constant-time login to prevent username enumeration. - Config: rejects known dev secret keys; validates CORS origin schemes, port range, token lifetimes. - Webhook handlers stream-read body with a 1 MiB cap; Discord 429 retries bounded (3 attempts, Retry-After capped at 60 s). - CSP + HSTS added to SecurityHeadersMiddleware. Async / runtime - SQLite engine: WAL, synchronous=NORMAL, foreign_keys=ON, busy_timeout, pool_pre_ping, dispose on shutdown. - Lifespan shutdown now stops scheduler before closing HTTP session and disposing the engine. - Shared aiohttp session locked against concurrent first-caller races; core NotificationDispatcher accepts and reuses it. - Storage and scheduled backup writes wrapped in asyncio.to_thread. - NUT client writes bounded by asyncio.wait_for. - Telegram poller switched from 3 s short-poll to 30 s interval + 25 s long-poll (~10x fewer API calls). Database - New performance-indexes migration covers every FK/owner column and hot-path composite (notification_tracker(provider_id, enabled); event_log(user_id, created_at DESC); webhook_payload_log(provider_id, created_at DESC); action_execution(action_id, started_at DESC)). - New schema_version table for future upgrade gating. - __system__ placeholder user (id=0) seeded so user_id=0 system defaults satisfy the newly enforced FK; filtered out of /auth/needs-setup, /api/users, and setup. - list_notification_trackers rewritten to batched loads (was 1+N+N*M). - Retention job extended to event_log, webhook_payload_log, and action_execution; retention days exposed as a setting. Scheduler - AsyncIOScheduler job_defaults: coalesce, misfire_grace_time=300, max_instances=1. Ops - uvicorn runs with proxy_headers, forwarded_allow_ips, timeout_graceful_shutdown; access log suppressed in non-debug. - FastAPI version string now reads from importlib.metadata. - New /api/ready endpoint separate from /api/health. - docker-compose drops the ALLOW_PRIVATE_URLS=1 default, adds mem/cpu/pid limits, read_only + tmpfs, cap_drop:ALL, no-new-privileges; healthcheck targets /api/ready. - CI now runs on push/PR with backend pytest, frontend svelte-check + build, and a non-push image build; release workflow gated on tests, publishes immutable sha-<commit> image tag, adds Trivy scan. Tests - New packages/server/tests/ with 29 passing tests: config validation, JWT round-trip + aud/alg=none rejection, SSRF scheme and private-range enforcement (sync + async), Discord bounded retry, and a lifespan-level /api/health + /api/ready smoke check. - Renamed the misnamed services/test_dispatch.py to manual_dispatch.py so pytest never auto-collects production code. Frontend - /login now redirects already-authenticated users to /, shows a distinct 'backend unreachable' banner (en/ru) when /auth/needs-setup fails.
2026-04-23 19:44:56 +03:00
parent f50d465c0e
commit 920920bc67
44 changed files with 1426 additions and 186 deletions
@@ -127,7 +127,14 @@ async def stop_bot_if_unused(bot_id: int) -> None:


 def schedule_bot_polling(bot_id: int) -> None:
-    """Add a polling job for a bot (idempotent)."""
+    """Add a polling job for a bot (idempotent).
+
+    We schedule at a 30 s interval, but each tick calls ``getUpdates`` with
+    ``timeout=25`` — Telegram holds the connection open until either an
+    update arrives or the timeout elapses, so in practice the bot streams
+    updates with sub-second latency while consuming ~2 API calls / minute
+    per bot (down from 20 under the old 3 s short-poll).
+    """
    scheduler = get_scheduler()
    job_id = f"telegram_poll_{bot_id}"
    if scheduler.get_job(job_id):
@@ -135,13 +142,13 @@ def schedule_bot_polling(bot_id: int) -> None:
    scheduler.add_job(
        _poll_bot,
        "interval",
-        seconds=3,
+        seconds=30,
        id=job_id,
        args=[bot_id],
        replace_existing=True,
        max_instances=1,
    )
-    _LOGGER.info("Started polling for bot %d", bot_id)
+    _LOGGER.info("Started polling for bot %d (long-poll, 25s timeout)", bot_id)


 def unschedule_bot_polling(bot_id: int) -> None:
@@ -233,8 +240,10 @@ async def _poll_bot(bot_id: int) -> None:
        from .http_session import get_http_session
        http = await get_http_session()
        client = TelegramClient(http, bot_token)
+        # Long-poll: hold connection open until an update arrives or 25 s
+        # elapse. Drastically cuts API calls vs. 3 s short-poll.
        result = await client.get_updates(
-            offset=offset + 1 if offset else None, limit=50,
+            offset=offset + 1 if offset else None, limit=50, timeout=25,
        )
        if not result.get("success"):
            err_text = str(result.get("error") or "")