feat: production-readiness hardening across security, async, DB, ops
Security - SSRF: async DNS resolver; allow_redirects=False on all outbound clients; matrix homeserver_url validated on create/update/test; update_provider and email_bot merge incoming config and reject ***-masked secrets. - Auth: bcrypt offloaded to asyncio.to_thread; JWT now carries iss/aud + leeway and rejects missing claims; setup TOCTOU closed inside a transaction; rate limits extended (default 600/min, 10/min on password change, 30/min on needs-setup); constant-time login to prevent username enumeration. - Config: rejects known dev secret keys; validates CORS origin schemes, port range, token lifetimes. - Webhook handlers stream-read body with a 1 MiB cap; Discord 429 retries bounded (3 attempts, Retry-After capped at 60 s). - CSP + HSTS added to SecurityHeadersMiddleware. Async / runtime - SQLite engine: WAL, synchronous=NORMAL, foreign_keys=ON, busy_timeout, pool_pre_ping, dispose on shutdown. - Lifespan shutdown now stops scheduler before closing HTTP session and disposing the engine. - Shared aiohttp session locked against concurrent first-caller races; core NotificationDispatcher accepts and reuses it. - Storage and scheduled backup writes wrapped in asyncio.to_thread. - NUT client writes bounded by asyncio.wait_for. - Telegram poller switched from 3 s short-poll to 30 s interval + 25 s long-poll (~10x fewer API calls). Database - New performance-indexes migration covers every FK/owner column and hot-path composite (notification_tracker(provider_id, enabled); event_log(user_id, created_at DESC); webhook_payload_log(provider_id, created_at DESC); action_execution(action_id, started_at DESC)). - New schema_version table for future upgrade gating. - __system__ placeholder user (id=0) seeded so user_id=0 system defaults satisfy the newly enforced FK; filtered out of /auth/needs-setup, /api/users, and setup. - list_notification_trackers rewritten to batched loads (was 1+N+N*M). - Retention job extended to event_log, webhook_payload_log, and action_execution; retention days exposed as a setting. Scheduler - AsyncIOScheduler job_defaults: coalesce, misfire_grace_time=300, max_instances=1. Ops - uvicorn runs with proxy_headers, forwarded_allow_ips, timeout_graceful_shutdown; access log suppressed in non-debug. - FastAPI version string now reads from importlib.metadata. - New /api/ready endpoint separate from /api/health. - docker-compose drops the ALLOW_PRIVATE_URLS=1 default, adds mem/cpu/pid limits, read_only + tmpfs, cap_drop:ALL, no-new-privileges; healthcheck targets /api/ready. - CI now runs on push/PR with backend pytest, frontend svelte-check + build, and a non-push image build; release workflow gated on tests, publishes immutable sha-<commit> image tag, adds Trivy scan. Tests - New packages/server/tests/ with 29 passing tests: config validation, JWT round-trip + aud/alg=none rejection, SSRF scheme and private-range enforcement (sync + async), Discord bounded retry, and a lifespan-level /api/health + /api/ready smoke check. - Renamed the misnamed services/test_dispatch.py to manual_dispatch.py so pytest never auto-collects production code. Frontend - /login now redirects already-authenticated users to /, shows a distinct 'backend unreachable' banner (en/ru) when /auth/needs-setup fails.
This commit is contained in:
@@ -85,7 +85,21 @@ def _compute_jitter(interval_seconds: int) -> int:
|
||||
def get_scheduler() -> AsyncIOScheduler:
|
||||
global _scheduler
|
||||
if _scheduler is None:
|
||||
_scheduler = AsyncIOScheduler()
|
||||
# Sensible production defaults applied to every job unless overridden:
|
||||
# * coalesce — collapse a queue of missed runs into one firing after
|
||||
# a restart / pause, instead of bursting to catch up.
|
||||
# * misfire_grace_time — accept firings up to 5 min late without
|
||||
# dropping them silently.
|
||||
# * max_instances=1 — never run two copies of the same tracker tick
|
||||
# concurrently; the scheduler already enforces this on add_job,
|
||||
# but we also set it as the default for safety.
|
||||
_scheduler = AsyncIOScheduler(
|
||||
job_defaults={
|
||||
"coalesce": True,
|
||||
"misfire_grace_time": 300,
|
||||
"max_instances": 1,
|
||||
},
|
||||
)
|
||||
return _scheduler
|
||||
|
||||
|
||||
@@ -279,21 +293,38 @@ async def _refresh_telegram_chat_titles() -> None:
|
||||
|
||||
|
||||
async def _cleanup_old_events() -> None:
|
||||
"""Delete EventLog entries older than 90 days."""
|
||||
"""Delete EventLog / WebhookPayloadLog / ActionExecution rows older than the
|
||||
configured retention window. A retention of 0 disables the job.
|
||||
"""
|
||||
from datetime import datetime, timedelta, timezone
|
||||
|
||||
from sqlmodel import delete
|
||||
from sqlmodel.ext.asyncio.session import AsyncSession
|
||||
|
||||
from ..config import settings
|
||||
from ..database.engine import get_engine
|
||||
from ..database.models import EventLog
|
||||
from ..database.models import ActionExecution, EventLog, WebhookPayloadLog
|
||||
|
||||
cutoff = datetime.now(timezone.utc) - timedelta(days=90)
|
||||
days = settings.event_log_retention_days
|
||||
if days <= 0:
|
||||
_LOGGER.debug("Event log retention disabled (days=0); skipping cleanup")
|
||||
return
|
||||
|
||||
cutoff = datetime.now(timezone.utc) - timedelta(days=days)
|
||||
engine = get_engine()
|
||||
async with AsyncSession(engine) as session:
|
||||
await session.exec(delete(EventLog).where(EventLog.created_at < cutoff))
|
||||
await session.exec(
|
||||
delete(WebhookPayloadLog).where(WebhookPayloadLog.created_at < cutoff)
|
||||
)
|
||||
await session.exec(
|
||||
delete(ActionExecution).where(ActionExecution.started_at < cutoff)
|
||||
)
|
||||
await session.commit()
|
||||
_LOGGER.info("Cleaned up event log entries older than %s", cutoff.date())
|
||||
_LOGGER.info(
|
||||
"Cleaned event_log / webhook_payload_log / action_execution older than %s",
|
||||
cutoff.date(),
|
||||
)
|
||||
|
||||
|
||||
async def _load_tracker_jobs() -> None:
|
||||
|
||||
Reference in New Issue
Block a user