fix: production-readiness hardening — security, perf, a11y, observability

Security - Default scripts_management, callbacks_management, links_management, and media_folders_management to False so a leaked token cannot escalate to RCE through admin CRUD endpoints. - TokenSpec + scope hierarchy (read | control | admin); legacy bare-string api_tokens entries promote to admin for back-compat. Management endpoints now require admin scope. - WebSocket subprotocol auth (Sec-WebSocket-Protocol: media-server.token.<T>) preferred over ?token= query so the token no longer lands in URL/history/ Referer; query fallback retained for HA integration back-compat. - Origin allow-list check on the WS endpoint (CSWSH defence). - In-process token-bucket rate limiter: 5/min for failed auths, 10/min for /api/scripts/execute and /api/callbacks/execute. - shell=False subprocess path (shlex.split) + per-parameter regex `pattern` in ScriptParameterConfig to harden shell=true scripts against parameter injection (Windows cmd.exe env-var expansion). - CSP gains form-action, worker-src, manifest-src directives. - Refuse cors_origins=["*"] at startup; strip token=... from uvicorn access logs; validate Gitea release tag against strict SemVer regex. - noopener noreferrer + no-referrer referrerpolicy on every outbound link. - icacls hardening of config.yaml on Windows (current user + SYSTEM + Administrators only); 0600 still enforced on POSIX. - WS volume handler clamps input and never drops the socket on bad messages. Performance - Album-art read in windows_media gated by track key — was decoding the WinRT thumbnail twice per second regardless of track changes. - /api/media/artwork returns content-derived ETag + Cache-Control so the browser sends If-None-Match and gets 304s on track repeats. - Foreground-service ctypes argtypes hoisted to one-time module init (was re-declaring ~14 prototypes per probe). - display_service _static_cache keyed by (edid_hash, ...) tuple with eviction of disappeared monitors — fixes stale capabilities on hot-plug swaps where the new topology has the same monitor count. - Visualizer rAF loop paused on document.hidden, resumed on visible. Reliability / bug fixes - Lifespan rewritten as try/yield/finally so a partial-startup failure cannot orphan background tasks or executors. - _run_callback in routes/media.py keeps a strong task ref (GC-safe) and uses the dedicated callback executor instead of the default pool. - macos_media.set_volume() no longer always returns True. - TrayManager._restart_requested initialised in __init__; set before signalling exit so the main thread observes it correctly. - Missing static_dir now logs a WARNING instead of silent UI disable. UX / accessibility / PWA - manifest.json theme_color and background_color match the Studio Reference base (#0E0D0B); added id and scope for PWA installability. - ARIA on mini-player icon buttons; inner SVGs marked aria-hidden. - OS mediaSession API wired so headset / lockscreen / Bluetooth buttons drive play/pause/next/prev/seek and show track metadata + artwork. Observability - X-Request-ID middleware (accept upstream id if it matches a safe regex, otherwise UUID4); request_id_var added to ContextVars and included in every log line alongside the token label. - Audit log (append-only JSONL) for every script + callback execution, including the on_play/on_pause/etc. event callbacks. Background-thread writer; queue capped; flushed in lifespan teardown. Deployment - proxy_headers + forwarded_allow_ips plumbed through Settings → uvicorn.Config for reverse-proxy installs. - HTTPS support via ssl_certfile + ssl_keyfile (+ optional password); startup refuses to launch with only one of the pair set. - Thumbnail cache moved from project-root .cache to %LOCALAPPDATA%/media-server/cache (Windows) and $XDG_CACHE_HOME/media-server/thumbnails (POSIX). Tests - 35 new tests across auth scopes, rate limiter, browser path traversal (../ NUL UNC absolute), script-param validation incl. regex, Gitea tag whitelist, config atomic write + POSIX perms. 47 passed / 4 skipped.
2026-05-22 22:25:54 +03:00
parent 450f9fe1ee
commit d131ba461c
31 changed files with 1586 additions and 204 deletions
@@ -15,7 +15,7 @@ from fastapi.responses import FileResponse
 from fastapi.staticfiles import StaticFiles

 from . import __version__
-from .auth import get_token_label, token_label_var
+from .auth import get_token_label, request_id_var, token_label_var
 from .config import generate_default_config, get_config_dir, settings
 from .routes import (
    audio_router,
@@ -33,10 +33,34 @@ from .services.websocket_manager import ws_manager


 class TokenLabelFilter(logging.Filter):
-    """Add token label to log records."""
+    """Add token label + request_id to log records."""

    def filter(self, record):
        record.token_label = token_label_var.get("unknown")
+        record.request_id = request_id_var.get("-")
+        return True
+
+
+class _StripTokenQueryFilter(logging.Filter):
+    """Strip `token=...` from query strings before they hit the access log.
+
+    uvicorn's default access log format includes the full request line, so
+    `/api/media/artwork?token=SECRET` would otherwise be persisted verbatim
+    in stdout/journald/file sinks.
+    """
+
+    import re as _re
+
+    _TOKEN_RE = _re.compile(r"([?&])token=[^&\s\"']+")
+
+    def filter(self, record):  # type: ignore[override]
+        if isinstance(record.args, tuple):
+            record.args = tuple(
+                self._TOKEN_RE.sub(r"\1token=REDACTED", a) if isinstance(a, str) else a
+                for a in record.args
+            )
+        if isinstance(record.msg, str) and "token=" in record.msg:
+            record.msg = self._TOKEN_RE.sub(r"\1token=REDACTED", record.msg)
        return True


@@ -49,17 +73,34 @@ def setup_logging():

    logging.basicConfig(
        level=getattr(logging, settings.log_level.upper()),
-        format="%(asctime)s - %(name)s - [%(token_label)s] - %(levelname)s - %(message)s",
+        format=(
+            "%(asctime)s - %(name)s - [%(token_label)s] [%(request_id)s]"
+            " - %(levelname)s - %(message)s"
+        ),
        handlers=[handler],
    )

    # Suppress noisy third-party loggers
    logging.getLogger("screen_brightness_control").setLevel(logging.ERROR)

+    # Make sure the uvicorn access log never persists tokens leaked into the
+    # query string (the artwork + WS endpoints accept `?token=` for browser
+    # compatibility — see verify_token_or_query).
+    strip_filter = _StripTokenQueryFilter()
+    for name in ("uvicorn.access", "uvicorn"):
+        logging.getLogger(name).addFilter(strip_filter)
+

@asynccontextmanager
 async def lifespan(app: FastAPI):
-    """Application lifespan handler."""
+    """Application lifespan handler.
+
+    All long-lived resources started during startup are kept in local refs and
+    torn down in a `finally:` so a partial-startup failure cannot orphan tasks
+    or thread pools.
+    """
+    import asyncio
+
    setup_logging()
    logger = logging.getLogger(__name__)
    logger.info(f"Media Server starting on {settings.host}:{settings.port}")
@@ -71,92 +112,125 @@ async def lifespan(app: FastAPI):
    else:
        logger.warning("No API tokens configured — authentication is DISABLED")

-    # Start WebSocket status monitor
-    controller = get_media_controller()
-    await ws_manager.start_status_monitor(controller.get_status)
-    logger.info("WebSocket status monitor started")
-
-    # Start update checker
    update_checker = None
-    if settings.update_check_enabled:
-        from .services.gitea_release_provider import GiteaReleaseProvider
-        from .services.update_checker import UpdateChecker
-
-        provider = GiteaReleaseProvider()
-        update_checker = UpdateChecker(provider, __version__)
-        await update_checker.start(settings.update_check_interval)
-        # Store globally so health endpoint can access cached result
-        app.state.update_checker = update_checker
-
-    # Schedule periodic thumbnail cache cleanup so the 500 MB cap is actually
-    # enforced. Runs once at startup and then hourly until shutdown.
-    from .services.thumbnail_service import ThumbnailService
-
-    async def _thumbnail_cleanup_loop() -> None:
-        while True:
-            try:
-                await asyncio.to_thread(ThumbnailService.cleanup_cache)
-            except Exception as e:
-                logger.warning("Thumbnail cache cleanup failed: %s", e)
-            try:
-                await asyncio.sleep(3600)
-            except asyncio.CancelledError:
-                break
-
-    import asyncio
-    cleanup_task = asyncio.create_task(_thumbnail_cleanup_loop())
-
-    # Register audio visualizer (capture starts on-demand when clients subscribe)
+    cleanup_task: asyncio.Task | None = None
    analyzer = None
-    if settings.visualizer_enabled:
-        from .services.audio_analyzer import get_audio_analyzer
+    status_monitor_started = False

-        analyzer = get_audio_analyzer(
-            num_bins=settings.visualizer_bins,
-            target_fps=settings.visualizer_fps,
-            device_name=settings.visualizer_device,
-        )
-        if analyzer.available:
-            await ws_manager.start_audio_monitor(analyzer)
-            logger.info("Audio visualizer available (capture on-demand)")
-        else:
-            logger.info("Audio visualizer unavailable (install soundcard + numpy)")
-
-    yield
-
-    # Stop update checker
-    if update_checker is not None:
-        await update_checker.stop()
-
-    # Cancel periodic thumbnail cleanup
-    cleanup_task.cancel()
    try:
-        await cleanup_task
-    except asyncio.CancelledError:
-        pass
+        # Start WebSocket status monitor
+        controller = get_media_controller()
+        await ws_manager.start_status_monitor(controller.get_status)
+        status_monitor_started = True
+        logger.info("WebSocket status monitor started")

-    # Stop audio visualizer
-    await ws_manager.stop_audio_monitor()
-    if analyzer and analyzer.running:
-        analyzer.stop()
+        # Start update checker
+        if settings.update_check_enabled:
+            from .services.gitea_release_provider import GiteaReleaseProvider
+            from .services.update_checker import UpdateChecker

-    # Stop WebSocket status monitor
-    await ws_manager.stop_status_monitor()
+            provider = GiteaReleaseProvider()
+            update_checker = UpdateChecker(provider, __version__)
+            await update_checker.start(settings.update_check_interval)
+            # Store globally so health endpoint can access cached result
+            app.state.update_checker = update_checker

-    # Shut down dedicated thread pools so pending scripts don't leak threads
-    from .routes.callbacks import shutdown_callback_executor
-    from .routes.scripts import shutdown_script_executor
+        # Schedule periodic thumbnail cache cleanup so the 500 MB cap is actually
+        # enforced. Runs once at startup and then hourly until shutdown.
+        from .services.thumbnail_service import ThumbnailService

-    shutdown_script_executor()
-    shutdown_callback_executor()
+        async def _thumbnail_cleanup_loop() -> None:
+            while True:
+                try:
+                    await asyncio.to_thread(ThumbnailService.cleanup_cache)
+                except Exception as e:
+                    logger.warning("Thumbnail cache cleanup failed: %s", e)
+                try:
+                    await asyncio.sleep(3600)
+                except asyncio.CancelledError:
+                    break

-    # Clean up platform-specific resources
-    import platform as _platform
-    if _platform.system() == "Windows":
-        from .services.windows_media import shutdown_executor
-        shutdown_executor()
+        cleanup_task = asyncio.create_task(_thumbnail_cleanup_loop())

-    logger.info("Media Server shutting down")
+        # Register audio visualizer (capture starts on-demand when clients subscribe)
+        if settings.visualizer_enabled:
+            from .services.audio_analyzer import get_audio_analyzer
+
+            analyzer = get_audio_analyzer(
+                num_bins=settings.visualizer_bins,
+                target_fps=settings.visualizer_fps,
+                device_name=settings.visualizer_device,
+            )
+            if analyzer.available:
+                await ws_manager.start_audio_monitor(analyzer)
+                logger.info("Audio visualizer available (capture on-demand)")
+            else:
+                logger.info("Audio visualizer unavailable (install soundcard + numpy)")
+
+        yield
+    finally:
+        # Stop update checker
+        if update_checker is not None:
+            try:
+                await update_checker.stop()
+            except Exception:
+                logger.exception("Error stopping update checker")
+
+        # Cancel periodic thumbnail cleanup
+        if cleanup_task is not None:
+            cleanup_task.cancel()
+            try:
+                await cleanup_task
+            except asyncio.CancelledError:
+                pass
+            except Exception:
+                logger.exception("Error awaiting thumbnail cleanup task")
+
+        # Stop audio visualizer
+        try:
+            await ws_manager.stop_audio_monitor()
+        except Exception:
+            logger.exception("Error stopping audio monitor")
+        if analyzer and analyzer.running:
+            try:
+                analyzer.stop()
+            except Exception:
+                logger.exception("Error stopping audio analyzer")
+
+        # Stop WebSocket status monitor
+        if status_monitor_started:
+            try:
+                await ws_manager.stop_status_monitor()
+            except Exception:
+                logger.exception("Error stopping status monitor")
+
+        # Shut down dedicated thread pools so pending scripts don't leak threads
+        try:
+            from .routes.callbacks import shutdown_callback_executor
+            from .routes.scripts import shutdown_script_executor
+
+            shutdown_script_executor()
+            shutdown_callback_executor()
+        except Exception:
+            logger.exception("Error shutting down script/callback executors")
+
+        # Flush audit log writer
+        try:
+            from .services.audit_log import shutdown_audit_log
+            shutdown_audit_log()
+        except Exception:
+            logger.exception("Error flushing audit log")
+
+        # Clean up platform-specific resources
+        import platform as _platform
+        if _platform.system() == "Windows":
+            try:
+                from .services.windows_media import shutdown_executor
+                shutdown_executor()
+            except Exception:
+                logger.exception("Error shutting down windows_media executor")
+
+        logger.info("Media Server shutting down")


 def create_app() -> FastAPI:
@@ -173,7 +247,15 @@ def create_app() -> FastAPI:

    # CORS — restrict to same-origin by default; users that integrate the API
    # from another origin (e.g. Home Assistant on a different host) can set
-    # cors_origins in config.yaml.
+    # cors_origins in config.yaml. Refuse "*" outright: combined with the
+    # admin endpoints this would let any origin in the universe run
+    # arbitrary shell. If users genuinely need every origin, they can list
+    # them explicitly.
+    if any(o.strip() == "*" for o in settings.cors_origins):
+        raise RuntimeError(
+            "cors_origins must not contain '*' — list exact origins instead. "
+            "This protects the script-execution endpoints from any-origin abuse."
+        )
    cors_origins = settings.cors_origins or [
        f"http://localhost:{settings.port}",
        f"http://127.0.0.1:{settings.port}",
@@ -186,6 +268,23 @@ def create_app() -> FastAPI:
        allow_headers=["Authorization", "Content-Type"],
    )

+    # Request correlation ID — accept upstream X-Request-ID if it's a sane
+    # ASCII id, otherwise mint a fresh UUID4. Emitted on the response so
+    # clients can quote it back in bug reports.
+    import re
+    import uuid as _uuid
+
+    _REQ_ID_RE = re.compile(r"^[A-Za-z0-9._\-]{1,128}$")
+
+    @app.middleware("http")
+    async def request_id_middleware(request: Request, call_next):
+        incoming = request.headers.get("x-request-id", "")
+        req_id = incoming if _REQ_ID_RE.match(incoming) else _uuid.uuid4().hex[:16]
+        request_id_var.set(req_id)
+        response = await call_next(request)
+        response.headers["X-Request-ID"] = req_id
+        return response
+
    # Security headers — strict CSP for the bundled UI, disallow framing, hide referrer.
    @app.middleware("http")
    async def security_headers_middleware(request: Request, call_next):
@@ -200,6 +299,9 @@ def create_app() -> FastAPI:
                "style-src 'self' 'unsafe-inline'; "
                "font-src 'self' data:; "
                "frame-ancestors 'none'; "
+                "form-action 'self'; "
+                "worker-src 'self'; "
+                "manifest-src 'self'; "
                "base-uri 'self'"
            ),
        )
@@ -208,32 +310,63 @@ def create_app() -> FastAPI:
        response.headers.setdefault("Referrer-Policy", "no-referrer")
        return response

-    # Add token logging middleware
+    # Add token logging middleware + auth-failure rate limit
+    from fastapi.responses import JSONResponse
+
+    from .services.rate_limit import check as ratelimit_check
+    from .services.rate_limit import get_peer
+
    @app.middleware("http")
    async def token_logging_middleware(request: Request, call_next):
-        """Extract token label and set in context for logging."""
+        """Extract token label, set in context, and rate-limit failed auths."""
        if not settings.api_tokens:
            token_label_var.set("anonymous")
        else:
            token_label = "unknown"
+            token_present = False
+            token_valid = False

            # Try Authorization header
            auth_header = request.headers.get("authorization", "")
            if auth_header.startswith("Bearer "):
+                token_present = True
                token = auth_header[7:]
                label = get_token_label(token)
                if label:
                    token_label = label
+                    token_valid = True

            # Try query parameter (for artwork endpoint)
            elif "token" in request.query_params:
+                token_present = True
                token = request.query_params["token"]
                label = get_token_label(token)
                if label:
                    token_label = label
+                    token_valid = True

            token_label_var.set(token_label)

+            # Brute-force gate: a peer that produces a wrong/missing token gets
+            # 5 failures per minute before being throttled. Static-asset
+            # requests (GET /static/*, /, /sw.js) and the docs endpoint are
+            # exempt — they're served unauthenticated by design.
+            if token_present and not token_valid:
+                path = request.url.path
+                if not (
+                    path == "/" or path == "/sw.js"
+                    or path.startswith("/static/")
+                    or path.startswith("/docs") or path.startswith("/openapi")
+                    or path.startswith("/redoc")
+                ):
+                    allowed, retry_after = ratelimit_check("auth", get_peer(request))
+                    if not allowed:
+                        return JSONResponse(
+                            status_code=429,
+                            content={"detail": "Too many authentication failures"},
+                            headers={"Retry-After": str(int(retry_after or 60))},
+                        )
+
        response = await call_next(request)
        return response

@@ -266,6 +399,11 @@ def create_app() -> FastAPI:
        async def serve_ui():
            """Serve the Web UI."""
            return FileResponse(static_dir / "index.html")
+    else:
+        logging.getLogger(__name__).warning(
+            "static_dir not found at %s — Web UI disabled (API only)",
+            static_dir,
+        )

    return app

@@ -316,8 +454,9 @@ def main():
        print(f"Config directory: {get_config_dir()}")
        if settings.api_tokens:
            print("\nAPI Tokens:")
-            for label, token in settings.api_tokens.items():
-                print(f"  {label:20} {token}")
+            for label, spec in settings.api_tokens.items():
+                scope_str = ",".join(spec.scopes)
+                print(f"  {label:20} {spec.token}  [scopes: {scope_str}]")
        else:
            print("\nAuthentication is DISABLED (no tokens configured)")
        return
@@ -374,6 +513,27 @@ def main():

    use_tray = PYSTRAY_AVAILABLE and not args.no_tray

+    # Validate TLS pair consistency before either path so we don't fail late.
+    if bool(settings.ssl_certfile) ^ bool(settings.ssl_keyfile):
+        _fatal(
+            "ERROR: ssl_certfile and ssl_keyfile must both be set, or both unset."
+        )
+
+    def _uvicorn_kwargs() -> dict:
+        kw: dict = {
+            "host": args.host,
+            "port": args.port,
+            "log_level": settings.log_level.lower(),
+            "proxy_headers": settings.proxy_headers,
+            "forwarded_allow_ips": settings.forwarded_allow_ips,
+        }
+        if settings.ssl_certfile and settings.ssl_keyfile:
+            kw["ssl_certfile"] = settings.ssl_certfile
+            kw["ssl_keyfile"] = settings.ssl_keyfile
+            if settings.ssl_keyfile_password:
+                kw["ssl_keyfile_password"] = settings.ssl_keyfile_password
+        return kw
+
    if use_tray:
        import asyncio
        import threading
@@ -381,9 +541,7 @@ def main():
        # Run uvicorn in a background thread so tray owns the main thread message loop
        uv_config = uvicorn.Config(
            "media_server.main:app",
-            host=args.host,
-            port=args.port,
-            log_level=settings.log_level.lower(),
+            **_uvicorn_kwargs(),
        )
        server = uvicorn.Server(uv_config)

@@ -421,9 +579,8 @@ def main():
    else:
        uvicorn.run(
            "media_server.main:app",
-            host=args.host,
-            port=args.port,
            reload=False,
+            **_uvicorn_kwargs(),
        )