feat: production-readiness hardening across security, async, DB, ops
Build and Test / test-frontend (push) Successful in 9m37s
Build and Test / test-backend (push) Successful in 10m53s
Build and Test / build-image (push) Failing after 14m52s

Security
- SSRF: async DNS resolver; allow_redirects=False on all outbound clients;
  matrix homeserver_url validated on create/update/test; update_provider
  and email_bot merge incoming config and reject ***-masked secrets.
- Auth: bcrypt offloaded to asyncio.to_thread; JWT now carries iss/aud +
  leeway and rejects missing claims; setup TOCTOU closed inside a
  transaction; rate limits extended (default 600/min, 10/min on password
  change, 30/min on needs-setup); constant-time login to prevent username
  enumeration.
- Config: rejects known dev secret keys; validates CORS origin schemes,
  port range, token lifetimes.
- Webhook handlers stream-read body with a 1 MiB cap; Discord 429 retries
  bounded (3 attempts, Retry-After capped at 60 s).
- CSP + HSTS added to SecurityHeadersMiddleware.

Async / runtime
- SQLite engine: WAL, synchronous=NORMAL, foreign_keys=ON, busy_timeout,
  pool_pre_ping, dispose on shutdown.
- Lifespan shutdown now stops scheduler before closing HTTP session and
  disposing the engine.
- Shared aiohttp session locked against concurrent first-caller races;
  core NotificationDispatcher accepts and reuses it.
- Storage and scheduled backup writes wrapped in asyncio.to_thread.
- NUT client writes bounded by asyncio.wait_for.
- Telegram poller switched from 3 s short-poll to 30 s interval + 25 s
  long-poll (~10x fewer API calls).

Database
- New performance-indexes migration covers every FK/owner column and
  hot-path composite (notification_tracker(provider_id, enabled);
  event_log(user_id, created_at DESC); webhook_payload_log(provider_id,
  created_at DESC); action_execution(action_id, started_at DESC)).
- New schema_version table for future upgrade gating.
- __system__ placeholder user (id=0) seeded so user_id=0 system defaults
  satisfy the newly enforced FK; filtered out of /auth/needs-setup,
  /api/users, and setup.
- list_notification_trackers rewritten to batched loads (was 1+N+N*M).
- Retention job extended to event_log, webhook_payload_log, and
  action_execution; retention days exposed as a setting.

Scheduler
- AsyncIOScheduler job_defaults: coalesce, misfire_grace_time=300,
  max_instances=1.

Ops
- uvicorn runs with proxy_headers, forwarded_allow_ips,
  timeout_graceful_shutdown; access log suppressed in non-debug.
- FastAPI version string now reads from importlib.metadata.
- New /api/ready endpoint separate from /api/health.
- docker-compose drops the ALLOW_PRIVATE_URLS=1 default, adds mem/cpu/pid
  limits, read_only + tmpfs, cap_drop:ALL, no-new-privileges; healthcheck
  targets /api/ready.
- CI now runs on push/PR with backend pytest, frontend svelte-check +
  build, and a non-push image build; release workflow gated on tests,
  publishes immutable sha-<commit> image tag, adds Trivy scan.

Tests
- New packages/server/tests/ with 29 passing tests: config validation,
  JWT round-trip + aud/alg=none rejection, SSRF scheme and private-range
  enforcement (sync + async), Discord bounded retry, and a lifespan-level
  /api/health + /api/ready smoke check.
- Renamed the misnamed services/test_dispatch.py to manual_dispatch.py so
  pytest never auto-collects production code.

Frontend
- /login now redirects already-authenticated users to /, shows a distinct
  'backend unreachable' banner (en/ru) when /auth/needs-setup fails.
This commit is contained in:
2026-04-23 19:44:56 +03:00
parent f50d465c0e
commit 920920bc67
44 changed files with 1426 additions and 186 deletions
@@ -2,8 +2,20 @@
from pathlib import Path
from typing import Any
from urllib.parse import urlparse
from pydantic_settings import BaseSettings
# Secret keys we will actively refuse. These cover the default template value
# and dev-only literals that have appeared in scripts or documentation.
_FORBIDDEN_SECRETS: frozenset[str] = frozenset(
{
"change-me-in-production",
"test-secret-key-minimum-32-chars",
"dev-secret-key-not-for-production",
}
)
class Settings(BaseSettings):
"""Application settings loaded from environment variables."""
@@ -13,29 +25,25 @@ class Settings(BaseSettings):
secret_key: str = "change-me-in-production"
def model_post_init(self, __context: Any) -> None:
if self.secret_key == "change-me-in-production":
raise ValueError(
"SECURITY: Refusing to start with the default secret_key. "
"Set NOTIFY_BRIDGE_SECRET_KEY to a random value (>=32 bytes) "
"before starting the server (debug mode included)."
)
if len(self.secret_key) < 32:
raise ValueError(
"SECURITY: NOTIFY_BRIDGE_SECRET_KEY must be at least 32 characters."
)
if "*" in self.cors_allowed_origins.split(","):
raise ValueError(
"SECURITY: wildcard '*' is not allowed in CORS origins when credentials are enabled."
)
access_token_expire_minutes: int = 60
access_token_expire_minutes: int = 15
refresh_token_expire_days: int = 30
jwt_issuer: str = "notify-bridge"
jwt_audience: str = "notify-bridge-api"
host: str = "0.0.0.0"
port: int = 8420
debug: bool = False
# Comma-separated list of trusted proxy IPs uvicorn will honor for
# X-Forwarded-For / X-Forwarded-Proto. Use "*" ONLY when you trust the
# network (never directly on the internet). Default matches uvicorn.
forwarded_allow_ips: str = "127.0.0.1"
# How long to wait for in-flight requests / scheduler jobs before force
# killing on SIGTERM.
graceful_shutdown_seconds: int = 60
anthropic_api_key: str = ""
ai_model: str = "claude-sonnet-4-20250514"
ai_max_tokens: int = 1024
@@ -49,9 +57,6 @@ class Settings(BaseSettings):
"""Path to frontend static files. Set to serve SvelteKit build via FastAPI (e.g. /app/static in Docker)."""
# --- Logging ---
# Boot-time logging configuration. DB AppSetting rows (``log_level`` /
# ``log_levels`` / ``log_format``) override these after startup, letting
# operators adjust levels from the settings UI without a restart.
log_level: str = "INFO"
"""Root log level for the app loggers (``DEBUG``/``INFO``/``WARNING``/``ERROR``)."""
@@ -61,8 +66,43 @@ class Settings(BaseSettings):
log_levels: str = ""
"""Comma-separated per-module overrides, e.g. ``notify_bridge_core.notifications.telegram.client=DEBUG,sqlalchemy.engine=INFO``."""
# --- Retention ---
event_log_retention_days: int = 30
"""Days of event_log history to retain. 0 disables the retention job."""
model_config = {"env_prefix": "NOTIFY_BRIDGE_"}
def model_post_init(self, __context: Any) -> None:
if self.secret_key in _FORBIDDEN_SECRETS:
raise ValueError(
"SECURITY: Refusing to start with a known/default secret_key. "
"Set NOTIFY_BRIDGE_SECRET_KEY to a random value (>=32 bytes) "
"before starting the server."
)
if len(self.secret_key) < 32:
raise ValueError(
"SECURITY: NOTIFY_BRIDGE_SECRET_KEY must be at least 32 characters."
)
origins = [o.strip() for o in self.cors_allowed_origins.split(",") if o.strip()]
if "*" in origins:
raise ValueError(
"SECURITY: wildcard '*' is not allowed in CORS origins when credentials are enabled."
)
for origin in origins:
parsed = urlparse(origin)
if parsed.scheme not in {"http", "https"} or not parsed.netloc:
raise ValueError(
f"CORS origin {origin!r} is invalid — must include scheme (http/https) and host."
)
if self.access_token_expire_minutes <= 0:
raise ValueError("access_token_expire_minutes must be > 0")
if self.refresh_token_expire_days <= 0:
raise ValueError("refresh_token_expire_days must be > 0")
if not (1 <= self.port <= 65535):
raise ValueError("port must be in range 1..65535")
if self.event_log_retention_days < 0:
raise ValueError("event_log_retention_days must be >= 0")
@property
def effective_database_url(self) -> str:
if self.database_url: