feat: production-readiness hardening across security, async, DB, ops
Build and Test / test-frontend (push) Successful in 9m37s
Build and Test / test-backend (push) Successful in 10m53s
Build and Test / build-image (push) Failing after 14m52s

Security
- SSRF: async DNS resolver; allow_redirects=False on all outbound clients;
  matrix homeserver_url validated on create/update/test; update_provider
  and email_bot merge incoming config and reject ***-masked secrets.
- Auth: bcrypt offloaded to asyncio.to_thread; JWT now carries iss/aud +
  leeway and rejects missing claims; setup TOCTOU closed inside a
  transaction; rate limits extended (default 600/min, 10/min on password
  change, 30/min on needs-setup); constant-time login to prevent username
  enumeration.
- Config: rejects known dev secret keys; validates CORS origin schemes,
  port range, token lifetimes.
- Webhook handlers stream-read body with a 1 MiB cap; Discord 429 retries
  bounded (3 attempts, Retry-After capped at 60 s).
- CSP + HSTS added to SecurityHeadersMiddleware.

Async / runtime
- SQLite engine: WAL, synchronous=NORMAL, foreign_keys=ON, busy_timeout,
  pool_pre_ping, dispose on shutdown.
- Lifespan shutdown now stops scheduler before closing HTTP session and
  disposing the engine.
- Shared aiohttp session locked against concurrent first-caller races;
  core NotificationDispatcher accepts and reuses it.
- Storage and scheduled backup writes wrapped in asyncio.to_thread.
- NUT client writes bounded by asyncio.wait_for.
- Telegram poller switched from 3 s short-poll to 30 s interval + 25 s
  long-poll (~10x fewer API calls).

Database
- New performance-indexes migration covers every FK/owner column and
  hot-path composite (notification_tracker(provider_id, enabled);
  event_log(user_id, created_at DESC); webhook_payload_log(provider_id,
  created_at DESC); action_execution(action_id, started_at DESC)).
- New schema_version table for future upgrade gating.
- __system__ placeholder user (id=0) seeded so user_id=0 system defaults
  satisfy the newly enforced FK; filtered out of /auth/needs-setup,
  /api/users, and setup.
- list_notification_trackers rewritten to batched loads (was 1+N+N*M).
- Retention job extended to event_log, webhook_payload_log, and
  action_execution; retention days exposed as a setting.

Scheduler
- AsyncIOScheduler job_defaults: coalesce, misfire_grace_time=300,
  max_instances=1.

Ops
- uvicorn runs with proxy_headers, forwarded_allow_ips,
  timeout_graceful_shutdown; access log suppressed in non-debug.
- FastAPI version string now reads from importlib.metadata.
- New /api/ready endpoint separate from /api/health.
- docker-compose drops the ALLOW_PRIVATE_URLS=1 default, adds mem/cpu/pid
  limits, read_only + tmpfs, cap_drop:ALL, no-new-privileges; healthcheck
  targets /api/ready.
- CI now runs on push/PR with backend pytest, frontend svelte-check +
  build, and a non-push image build; release workflow gated on tests,
  publishes immutable sha-<commit> image tag, adds Trivy scan.

Tests
- New packages/server/tests/ with 29 passing tests: config validation,
  JWT round-trip + aud/alg=none rejection, SSRF scheme and private-range
  enforcement (sync + async), Discord bounded retry, and a lifespan-level
  /api/health + /api/ready smoke check.
- Renamed the misnamed services/test_dispatch.py to manual_dispatch.py so
  pytest never auto-collects production code.

Frontend
- /login now redirects already-authenticated users to /, shows a distinct
  'backend unreachable' banner (en/ru) when /auth/needs-setup fails.
This commit is contained in:
2026-04-23 19:44:56 +03:00
parent f50d465c0e
commit 920920bc67
44 changed files with 1426 additions and 186 deletions
@@ -93,7 +93,14 @@ async def update_email_bot(
session: AsyncSession = Depends(get_session),
):
bot = await _get_user_bot(session, bot_id, user.id)
for field, value in body.model_dump(exclude_unset=True).items():
updates = body.model_dump(exclude_unset=True)
# Reject the masked value the GET response returns so the stored password
# is preserved if the user saves without retyping it.
if "smtp_password" in updates:
pw = updates["smtp_password"]
if isinstance(pw, str) and pw.startswith("***"):
updates.pop("smtp_password")
for field, value in updates.items():
setattr(bot, field, value)
session.add(bot)
await session.commit()
@@ -7,6 +7,11 @@ from pydantic import BaseModel
from sqlmodel import select
from sqlmodel.ext.asyncio.session import AsyncSession
from notify_bridge_core.notifications.ssrf import (
UnsafeURLError,
avalidate_outbound_url,
)
from ..auth.dependencies import get_current_user
from ..database.engine import get_session
from ..database.models import MatrixBot, User
@@ -33,6 +38,21 @@ class MatrixBotUpdate(BaseModel):
display_name: str | None = None
def _is_masked_secret(value: str | None) -> bool:
"""True when a field still carries our masked placeholder."""
return bool(value) and (value.startswith("***") or "..." in value)
async def _validate_homeserver_url(url: str) -> None:
"""Reject homeserver URLs that point to blocked networks."""
try:
await avalidate_outbound_url(url)
except UnsafeURLError as err:
raise HTTPException(
status_code=400, detail=f"Invalid homeserver_url: {err}"
) from err
@router.get("")
async def list_matrix_bots(
user: User = Depends(get_current_user),
@@ -50,6 +70,7 @@ async def create_matrix_bot(
user: User = Depends(get_current_user),
session: AsyncSession = Depends(get_session),
):
await _validate_homeserver_url(body.homeserver_url)
bot = MatrixBot(user_id=user.id, **body.model_dump())
session.add(bot)
await session.commit()
@@ -74,7 +95,19 @@ async def update_matrix_bot(
session: AsyncSession = Depends(get_session),
):
bot = await _get_user_bot(session, bot_id, user.id)
for field, value in body.model_dump(exclude_unset=True).items():
updates = body.model_dump(exclude_unset=True)
# Re-validate homeserver_url whenever the client supplies a new one so
# no private/loopback target can ever be saved, even via update.
if "homeserver_url" in updates and updates["homeserver_url"]:
await _validate_homeserver_url(updates["homeserver_url"])
# Never accept the masked placeholder the GET response returns. If the
# client echoes it back, keep the stored secret.
if "access_token" in updates and _is_masked_secret(updates["access_token"]):
updates.pop("access_token")
for field, value in updates.items():
setattr(bot, field, value)
session.add(bot)
await session.commit()
@@ -108,15 +141,17 @@ async def test_matrix_bot(
If room_id is not provided, just verifies the access token by calling /whoami.
"""
bot = await _get_user_bot(session, bot_id, user.id)
# Defense-in-depth: even though create/update validate the URL, a bot row
# written before this guard was added could still point at a blocked host.
await _validate_homeserver_url(bot.homeserver_url)
import aiohttp
from ..services.http_session import get_http_session
http = await get_http_session()
# Verify token with /whoami
whoami_url = f"{bot.homeserver_url.rstrip('/')}/_matrix/client/v3/account/whoami"
headers = {"Authorization": f"Bearer {bot.access_token}"}
try:
async with http.get(whoami_url, headers=headers) as resp:
async with http.get(whoami_url, headers=headers, allow_redirects=False) as resp:
if resp.status != 200:
body = await resp.text()
return {"success": False, "error": f"Auth failed: HTTP {resp.status}{body[:200]}"}
@@ -126,7 +161,6 @@ async def test_matrix_bot(
result = {"success": True, "user_id": whoami.get("user_id", "")}
# Optionally send a test message
if room_id:
from ..services.notifier import _get_test_message
from notify_bridge_core.notifications.matrix.client import MatrixClient
@@ -148,7 +182,7 @@ def _response(bot: MatrixBot) -> dict:
"name": bot.name,
"icon": bot.icon,
"homeserver_url": bot.homeserver_url,
"access_token": f"{bot.access_token[:8]}...{bot.access_token[-4:]}" if len(bot.access_token) > 12 else "***",
"access_token": f"***{bot.access_token[-4:]}" if len(bot.access_token) > 4 else "***",
"display_name": bot.display_name,
"created_at": bot.created_at.isoformat(),
}
@@ -22,7 +22,7 @@ from ..database.models import (
User,
)
from ..services.notifier import send_test_notification
from ..services.test_dispatch import dispatch_test_notification
from ..services.manual_dispatch import dispatch_test_notification
from .helpers import get_owned_entity
_LOGGER = logging.getLogger(__name__)
@@ -11,6 +11,7 @@ from ..auth.dependencies import get_current_user
from ..database.engine import get_session
from ..database.models import (
EventLog,
NotificationTarget,
NotificationTracker,
NotificationTrackerState,
NotificationTrackerTarget,
@@ -54,11 +55,79 @@ async def list_notification_trackers(
user: User = Depends(get_current_user),
session: AsyncSession = Depends(get_session),
):
# Batched loader: pull trackers, then all their tracker-target links in
# a single query, then the referenced targets in a single query. Avoids
# the old 1 + N + N*M pattern that ran ~60 round-trips for 10 trackers.
result = await session.exec(
select(NotificationTracker).where(NotificationTracker.user_id == user.id)
)
trackers = result.all()
return [await _tracker_response(session, t) for t in trackers]
trackers = list(result.all())
if not trackers:
return []
tracker_ids = [t.id for t in trackers]
tt_result = await session.exec(
select(NotificationTrackerTarget).where(
NotificationTrackerTarget.tracker_id.in_(tracker_ids)
)
)
tt_rows = list(tt_result.all())
target_ids = {tt.target_id for tt in tt_rows}
targets_by_id: dict[int, NotificationTarget] = {}
if target_ids:
tgt_result = await session.exec(
select(NotificationTarget).where(NotificationTarget.id.in_(target_ids))
)
targets_by_id = {t.id: t for t in tgt_result.all()}
tts_by_tracker: dict[int, list[NotificationTrackerTarget]] = {}
for tt in tt_rows:
tts_by_tracker.setdefault(tt.tracker_id, []).append(tt)
return [
_build_tracker_response(t, tts_by_tracker.get(t.id, []), targets_by_id)
for t in trackers
]
def _build_tracker_response(
t: NotificationTracker,
tts: list[NotificationTrackerTarget],
targets_by_id: dict[int, NotificationTarget],
) -> dict:
"""In-memory assembler for a tracker + its pre-loaded links/targets."""
tracker_targets = []
for tt in tts:
target = targets_by_id.get(tt.target_id)
tracker_targets.append({
"id": tt.id,
"tracker_id": tt.tracker_id,
"target_id": tt.target_id,
"target_name": target.name if target else None,
"target_type": target.type if target else None,
"target_icon": target.icon if target else None,
"tracking_config_id": tt.tracking_config_id,
"template_config_id": tt.template_config_id,
"enabled": tt.enabled,
"quiet_hours_start": tt.quiet_hours_start,
"quiet_hours_end": tt.quiet_hours_end,
"created_at": tt.created_at.isoformat(),
})
return {
"id": t.id,
"name": t.name,
"icon": t.icon,
"provider_id": t.provider_id,
"collection_ids": t.collection_ids,
"scan_interval": t.scan_interval,
"batch_duration": t.batch_duration,
"default_tracking_config_id": t.default_tracking_config_id,
"default_template_config_id": t.default_template_config_id,
"enabled": t.enabled,
"tracker_targets": tracker_targets,
"created_at": t.created_at.isoformat(),
}
@router.post("", status_code=status.HTTP_201_CREATED)
@@ -306,16 +306,31 @@ async def update_provider(
if body.icon is not None:
provider.icon = body.icon
config_changed = body.config is not None and body.config != provider.config
if body.config is not None:
_validate_provider_config(provider.type, body.config)
provider.config = body.config
# Merge rather than replace so the masked secrets the frontend
# receives on GET cannot silently nuke the stored values when the
# user saves without re-entering them. Any field that still carries
# our mask placeholder ("***…") is dropped from the incoming body.
incoming = dict(body.config)
for secret_field in (
"api_key", "api_token", "webhook_secret", "password",
"client_secret", "refresh_token",
):
value = incoming.get(secret_field)
if isinstance(value, str) and value.startswith("***"):
incoming.pop(secret_field, None)
new_config = {**provider.config, **incoming}
_validate_provider_config(provider.type, new_config)
config_changed = new_config != provider.config
provider.config = new_config
# Re-validate connection when config changes for known provider types
if config_changed:
test_result = await _validate_provider_connection(provider)
if test_result.get("external_domain"):
provider.config = {**provider.config, "external_domain": test_result["external_domain"]}
if config_changed:
test_result = await _validate_provider_connection(provider)
if test_result.get("external_domain"):
provider.config = {
**provider.config,
"external_domain": test_result["external_domain"],
}
session.add(provider)
await session.commit()
@@ -1,5 +1,6 @@
"""User management API routes (admin only)."""
import asyncio
import logging
from fastapi import APIRouter, Depends, HTTPException, status
@@ -14,6 +15,15 @@ from ..auth.dependencies import require_admin
from ..database.engine import get_session
from ..database.models import User
async def _hash_password(password: str) -> str:
"""Run bcrypt off the event loop. Matches the helper in auth/routes.py."""
def _work() -> str:
return bcrypt.hashpw(password.encode(), bcrypt.gensalt()).decode()
return await asyncio.to_thread(_work)
_LOGGER = logging.getLogger(__name__)
router = APIRouter(prefix="/api/users", tags=["users"])
@@ -36,8 +46,12 @@ async def list_users(
admin: User = Depends(require_admin),
session: AsyncSession = Depends(get_session),
):
"""List all users (admin only)."""
result = await session.exec(select(User))
"""List all users (admin only).
Excludes the internal ``__system__`` placeholder (id=0) used as the
owner of default templates/configs — it is never a real account.
"""
result = await session.exec(select(User).where(User.id != 0))
return [
{"id": u.id, "username": u.username, "role": u.role, "created_at": u.created_at.isoformat()}
for u in result.all()
@@ -61,7 +75,7 @@ async def create_user(
user = User(
username=body.username,
hashed_password=bcrypt.hashpw(body.password.encode(), bcrypt.gensalt()).decode(),
hashed_password=await _hash_password(body.password),
role=body.role if body.role in ("admin", "user") else "user",
)
session.add(user)
@@ -162,7 +176,7 @@ async def reset_user_password(
raise HTTPException(status_code=404, detail="User not found")
if len(body.new_password) < 8:
raise HTTPException(status_code=400, detail="Password must be at least 8 characters")
user.hashed_password = bcrypt.hashpw(body.new_password.encode(), bcrypt.gensalt()).decode()
user.hashed_password = await _hash_password(body.new_password)
# Invalidate all prior JWTs issued for this user — matches the self-serve
# password-change path in auth/routes.py.
user.token_version = (user.token_version or 1) + 1
@@ -37,6 +37,42 @@ _LOGGER = logging.getLogger(__name__)
router = APIRouter(prefix="/api/webhooks", tags=["webhooks"])
# Hard cap on inbound webhook body size (1 MiB is far larger than anything
# legitimate providers send and keeps the worst-case memory footprint bounded
# when a malicious peer lies about Content-Length or streams slowly).
_MAX_WEBHOOK_BODY_BYTES = 1_000_000
async def _read_bounded_body(request: Request, limit: int = _MAX_WEBHOOK_BODY_BYTES) -> bytes:
"""Reject oversized inbound bodies before they exhaust memory.
First checks ``Content-Length`` (fast-path for honest peers), then
streams the body in chunks enforcing the same cap on actual bytes
received so a peer that lies about Content-Length cannot slip through.
"""
declared = request.headers.get("content-length")
if declared:
try:
if int(declared) > limit:
raise HTTPException(
status_code=413,
detail=f"Payload too large (max {limit} bytes)",
)
except ValueError:
raise HTTPException(status_code=400, detail="Invalid Content-Length")
chunks: list[bytes] = []
size = 0
async for chunk in request.stream():
size += len(chunk)
if size > limit:
raise HTTPException(
status_code=413,
detail=f"Payload too large (max {limit} bytes)",
)
chunks.append(chunk)
return b"".join(chunks)
async def _get_provider_by_token(
session: AsyncSession, token: str, expected_type: str,
@@ -169,7 +205,8 @@ async def _dispatch_webhook_event(
))
# Dispatch to targets
dispatcher = NotificationDispatcher()
from ..services.http_session import get_http_session
dispatcher = NotificationDispatcher(session=await get_http_session())
target_configs = _build_target_configs(event, link_data, provider_config, app_tz)
if target_configs:
results = await dispatcher.dispatch(event, target_configs)
@@ -203,7 +240,7 @@ async def gitea_webhook(token: str, request: Request):
webhook_secret = (provider.config or {}).get("webhook_secret", "")
# Read raw body for HMAC check
raw_body = await request.body()
raw_body = await _read_bounded_body(request)
if not webhook_secret:
raise HTTPException(
@@ -221,8 +258,8 @@ async def gitea_webhook(token: str, request: Request):
return {"ok": True, "skipped": "no event header"}
try:
payload = await request.json()
except (json.JSONDecodeError, ValueError):
payload = json.loads(raw_body.decode("utf-8"))
except (UnicodeDecodeError, json.JSONDecodeError, ValueError):
raise HTTPException(status_code=400, detail="Invalid JSON")
event = parse_gitea_webhook(event_header, payload, provider.name)
@@ -280,10 +317,10 @@ async def planka_webhook(token: str, request: Request):
if not _verify_planka_token(webhook_secret, request):
raise HTTPException(status_code=403, detail="Invalid token")
# Parse payload
# Parse payload from the bounded raw_body we already read.
try:
payload = await request.json()
except (json.JSONDecodeError, ValueError):
payload = json.loads(raw_body.decode("utf-8"))
except (UnicodeDecodeError, json.JSONDecodeError, ValueError):
raise HTTPException(status_code=400, detail="Invalid JSON")
event_type = payload.get("type", "")
@@ -446,23 +483,22 @@ async def generic_webhook(token: str, request: Request):
store_payloads = provider_config.get("store_payloads", True)
max_stored = min(max(int(provider_config.get("max_stored_payloads", 20)), 1), 100)
raw_body = await request.body()
raw_body = await _read_bounded_body(request)
# Enforce payload size limit BEFORE parsing JSON
if len(raw_body) > 1_000_000:
raise HTTPException(status_code=413, detail="Payload too large (max 1 MB)")
# Bounded read above already enforces the size cap; no need to re-check.
if not _verify_generic_webhook_auth(provider_config, request, raw_body):
raise HTTPException(status_code=403, detail="Authentication failed")
safe_headers = _filter_headers(dict(request.headers))
# Parse JSON payload
# Parse JSON payload from the already-bounded raw_body (request.body()
# has been consumed, so request.json() is no longer usable here).
try:
payload = await request.json()
payload = json.loads(raw_body.decode("utf-8"))
if not isinstance(payload, dict):
raise ValueError("Payload must be a JSON object")
except (json.JSONDecodeError, ValueError):
except (UnicodeDecodeError, json.JSONDecodeError, ValueError):
if store_payloads:
async with AsyncSession(get_engine()) as log_session:
await _save_webhook_log(