feat: production-readiness hardening across security, async, DB, ops
Build and Test / test-frontend (push) Successful in 9m37s
Build and Test / test-backend (push) Successful in 10m53s
Build and Test / build-image (push) Failing after 14m52s

Security
- SSRF: async DNS resolver; allow_redirects=False on all outbound clients;
  matrix homeserver_url validated on create/update/test; update_provider
  and email_bot merge incoming config and reject ***-masked secrets.
- Auth: bcrypt offloaded to asyncio.to_thread; JWT now carries iss/aud +
  leeway and rejects missing claims; setup TOCTOU closed inside a
  transaction; rate limits extended (default 600/min, 10/min on password
  change, 30/min on needs-setup); constant-time login to prevent username
  enumeration.
- Config: rejects known dev secret keys; validates CORS origin schemes,
  port range, token lifetimes.
- Webhook handlers stream-read body with a 1 MiB cap; Discord 429 retries
  bounded (3 attempts, Retry-After capped at 60 s).
- CSP + HSTS added to SecurityHeadersMiddleware.

Async / runtime
- SQLite engine: WAL, synchronous=NORMAL, foreign_keys=ON, busy_timeout,
  pool_pre_ping, dispose on shutdown.
- Lifespan shutdown now stops scheduler before closing HTTP session and
  disposing the engine.
- Shared aiohttp session locked against concurrent first-caller races;
  core NotificationDispatcher accepts and reuses it.
- Storage and scheduled backup writes wrapped in asyncio.to_thread.
- NUT client writes bounded by asyncio.wait_for.
- Telegram poller switched from 3 s short-poll to 30 s interval + 25 s
  long-poll (~10x fewer API calls).

Database
- New performance-indexes migration covers every FK/owner column and
  hot-path composite (notification_tracker(provider_id, enabled);
  event_log(user_id, created_at DESC); webhook_payload_log(provider_id,
  created_at DESC); action_execution(action_id, started_at DESC)).
- New schema_version table for future upgrade gating.
- __system__ placeholder user (id=0) seeded so user_id=0 system defaults
  satisfy the newly enforced FK; filtered out of /auth/needs-setup,
  /api/users, and setup.
- list_notification_trackers rewritten to batched loads (was 1+N+N*M).
- Retention job extended to event_log, webhook_payload_log, and
  action_execution; retention days exposed as a setting.

Scheduler
- AsyncIOScheduler job_defaults: coalesce, misfire_grace_time=300,
  max_instances=1.

Ops
- uvicorn runs with proxy_headers, forwarded_allow_ips,
  timeout_graceful_shutdown; access log suppressed in non-debug.
- FastAPI version string now reads from importlib.metadata.
- New /api/ready endpoint separate from /api/health.
- docker-compose drops the ALLOW_PRIVATE_URLS=1 default, adds mem/cpu/pid
  limits, read_only + tmpfs, cap_drop:ALL, no-new-privileges; healthcheck
  targets /api/ready.
- CI now runs on push/PR with backend pytest, frontend svelte-check +
  build, and a non-push image build; release workflow gated on tests,
  publishes immutable sha-<commit> image tag, adds Trivy scan.

Tests
- New packages/server/tests/ with 29 passing tests: config validation,
  JWT round-trip + aud/alg=none rejection, SSRF scheme and private-range
  enforcement (sync + async), Discord bounded retry, and a lifespan-level
  /api/health + /api/ready smoke check.
- Renamed the misnamed services/test_dispatch.py to manual_dispatch.py so
  pytest never auto-collects production code.

Frontend
- /login now redirects already-authenticated users to /, shows a distinct
  'backend unreachable' banner (en/ru) when /auth/needs-setup fails.
This commit is contained in:
2026-04-23 19:44:56 +03:00
parent f50d465c0e
commit 920920bc67
44 changed files with 1426 additions and 186 deletions
@@ -1282,3 +1282,141 @@ async def migrate_user_token_version(engine: AsyncEngine) -> None:
text("ALTER TABLE user ADD COLUMN token_version INTEGER NOT NULL DEFAULT 1")
)
logger.info("Added token_version column to user table")
# ---------------------------------------------------------------------------
# Performance indexes — covers every FK / owner column the list endpoints
# and the webhook hot-path filter on. All use CREATE INDEX IF NOT EXISTS so
# they are safe to re-run on every boot.
# ---------------------------------------------------------------------------
_INDEXES: list[tuple[str, str, str]] = [
# (index_name, table, columns)
("ix_service_provider_user_id", "service_provider", "user_id"),
("ix_telegram_bot_user_id", "telegram_bot", "user_id"),
("ix_matrix_bot_user_id", "matrix_bot", "user_id"),
("ix_email_bot_user_id", "email_bot", "user_id"),
("ix_telegram_chat_bot_id", "telegram_chat", "bot_id"),
("ix_tracking_config_user_id", "tracking_config", "user_id"),
("ix_tracking_config_provider_type", "tracking_config", "provider_type"),
("ix_notification_target_user_id", "notification_target", "user_id"),
("ix_notification_target_type", "notification_target", "type"),
("ix_notification_tracker_user_id", "notification_tracker", "user_id"),
("ix_notification_tracker_provider_id", "notification_tracker", "provider_id"),
# Composite for the webhook hot path: WHERE provider_id = ? AND enabled = true
(
"ix_notification_tracker_provider_enabled",
"notification_tracker",
"provider_id, enabled",
),
("ix_command_config_user_id", "command_config", "user_id"),
("ix_command_template_config_user_id", "command_template_config", "user_id"),
("ix_command_tracker_user_id", "command_tracker", "user_id"),
("ix_command_tracker_provider_id", "command_tracker", "provider_id"),
("ix_action_user_id", "action", "user_id"),
("ix_action_provider_id", "action", "provider_id"),
# Dashboard: SELECT event_log WHERE user_id = ? ORDER BY created_at DESC
("ix_event_log_user_created", "event_log", "user_id, created_at DESC"),
("ix_event_log_provider_id", "event_log", "provider_id"),
("ix_event_log_notification_tracker_id", "event_log", "notification_tracker_id"),
("ix_event_log_action_id", "event_log", "action_id"),
# Webhook log hot path: WHERE provider_id = ? ORDER BY created_at DESC
(
"ix_webhook_payload_log_provider_created",
"webhook_payload_log",
"provider_id, created_at DESC",
),
# Notification tracker join tables
(
"ix_notification_tracker_target_notification_tracker_id",
"notification_tracker_target",
"notification_tracker_id",
),
(
"ix_notification_tracker_target_target_id",
"notification_tracker_target",
"target_id",
),
("ix_target_receiver_target_id", "target_receiver", "target_id"),
("ix_template_slot_config_id", "template_slot", "config_id"),
("ix_command_template_slot_config_id", "command_template_slot", "config_id"),
("ix_action_rule_action_id", "action_rule", "action_id"),
("ix_action_execution_action_started", "action_execution", "action_id, started_at DESC"),
]
async def migrate_performance_indexes(engine: AsyncEngine) -> None:
"""Create missing performance indexes on hot query paths.
Every index is created with IF NOT EXISTS so the migration is safe to
replay on every boot. We only create the index when the table exists —
early boots before other migrations land would otherwise raise.
"""
async with engine.begin() as conn:
for name, table, columns in _INDEXES:
_assert_ident(name, "index")
_assert_ident(table, "table")
# Columns list is a trusted literal constructed above — never user input.
if not await _has_table(conn, table):
continue
try:
await conn.execute(
text(f"CREATE INDEX IF NOT EXISTS {name} ON {table} ({columns})")
)
except Exception: # pragma: no cover — log and continue
logger.warning(
"Failed to create index %s on %s(%s)",
name, table, columns, exc_info=True,
)
# ---------------------------------------------------------------------------
# Schema version tracking — lightweight alternative to Alembic while the
# hand-rolled idempotent migrations remain the source of truth. Gives
# operators a single-row answer to "what schema is this DB at" and lets
# future upgrades short-circuit migrations that already ran.
# ---------------------------------------------------------------------------
CURRENT_SCHEMA_VERSION = 1
async def migrate_schema_version(engine: AsyncEngine) -> None:
"""Create schema_version table and bump it to CURRENT_SCHEMA_VERSION."""
async with engine.begin() as conn:
await conn.execute(
text(
"CREATE TABLE IF NOT EXISTS schema_version ("
" id INTEGER PRIMARY KEY CHECK (id = 1),"
" version INTEGER NOT NULL,"
" applied_at TEXT NOT NULL"
")"
)
)
row = await conn.run_sync(
lambda sc: sc.execute(
text("SELECT version FROM schema_version WHERE id = 1")
).fetchone()
)
from datetime import datetime, timezone
now = datetime.now(timezone.utc).isoformat()
if row is None:
await conn.execute(
text(
"INSERT INTO schema_version (id, version, applied_at) "
"VALUES (1, :v, :t)"
),
{"v": CURRENT_SCHEMA_VERSION, "t": now},
)
logger.info("Initialized schema_version at %d", CURRENT_SCHEMA_VERSION)
elif int(row[0]) < CURRENT_SCHEMA_VERSION:
await conn.execute(
text(
"UPDATE schema_version SET version = :v, applied_at = :t "
"WHERE id = 1"
),
{"v": CURRENT_SCHEMA_VERSION, "t": now},
)
logger.info(
"Bumped schema_version from %s to %d",
row[0], CURRENT_SCHEMA_VERSION,
)