fix: production-readiness hardening — security, perf, a11y, observability

Security - Default scripts_management, callbacks_management, links_management, and media_folders_management to False so a leaked token cannot escalate to RCE through admin CRUD endpoints. - TokenSpec + scope hierarchy (read | control | admin); legacy bare-string api_tokens entries promote to admin for back-compat. Management endpoints now require admin scope. - WebSocket subprotocol auth (Sec-WebSocket-Protocol: media-server.token.<T>) preferred over ?token= query so the token no longer lands in URL/history/ Referer; query fallback retained for HA integration back-compat. - Origin allow-list check on the WS endpoint (CSWSH defence). - In-process token-bucket rate limiter: 5/min for failed auths, 10/min for /api/scripts/execute and /api/callbacks/execute. - shell=False subprocess path (shlex.split) + per-parameter regex `pattern` in ScriptParameterConfig to harden shell=true scripts against parameter injection (Windows cmd.exe env-var expansion). - CSP gains form-action, worker-src, manifest-src directives. - Refuse cors_origins=["*"] at startup; strip token=... from uvicorn access logs; validate Gitea release tag against strict SemVer regex. - noopener noreferrer + no-referrer referrerpolicy on every outbound link. - icacls hardening of config.yaml on Windows (current user + SYSTEM + Administrators only); 0600 still enforced on POSIX. - WS volume handler clamps input and never drops the socket on bad messages. Performance - Album-art read in windows_media gated by track key — was decoding the WinRT thumbnail twice per second regardless of track changes. - /api/media/artwork returns content-derived ETag + Cache-Control so the browser sends If-None-Match and gets 304s on track repeats. - Foreground-service ctypes argtypes hoisted to one-time module init (was re-declaring ~14 prototypes per probe). - display_service _static_cache keyed by (edid_hash, ...) tuple with eviction of disappeared monitors — fixes stale capabilities on hot-plug swaps where the new topology has the same monitor count. - Visualizer rAF loop paused on document.hidden, resumed on visible. Reliability / bug fixes - Lifespan rewritten as try/yield/finally so a partial-startup failure cannot orphan background tasks or executors. - _run_callback in routes/media.py keeps a strong task ref (GC-safe) and uses the dedicated callback executor instead of the default pool. - macos_media.set_volume() no longer always returns True. - TrayManager._restart_requested initialised in __init__; set before signalling exit so the main thread observes it correctly. - Missing static_dir now logs a WARNING instead of silent UI disable. UX / accessibility / PWA - manifest.json theme_color and background_color match the Studio Reference base (#0E0D0B); added id and scope for PWA installability. - ARIA on mini-player icon buttons; inner SVGs marked aria-hidden. - OS mediaSession API wired so headset / lockscreen / Bluetooth buttons drive play/pause/next/prev/seek and show track metadata + artwork. Observability - X-Request-ID middleware (accept upstream id if it matches a safe regex, otherwise UUID4); request_id_var added to ContextVars and included in every log line alongside the token label. - Audit log (append-only JSONL) for every script + callback execution, including the on_play/on_pause/etc. event callbacks. Background-thread writer; queue capped; flushed in lifespan teardown. Deployment - proxy_headers + forwarded_allow_ips plumbed through Settings → uvicorn.Config for reverse-proxy installs. - HTTPS support via ssl_certfile + ssl_keyfile (+ optional password); startup refuses to launch with only one of the pair set. - Thumbnail cache moved from project-root .cache to %LOCALAPPDATA%/media-server/cache (Windows) and $XDG_CACHE_HOME/media-server/thumbnails (POSIX). Tests - 35 new tests across auth scopes, rate limiter, browser path traversal (../ NUL UNC absolute), script-param validation incl. regex, Gitea tag whitelist, config atomic write + POSIX perms. 47 passed / 4 skipped.
2026-05-22 22:25:54 +03:00
parent 450f9fe1ee
commit d131ba461c
31 changed files with 1586 additions and 204 deletions
@@ -10,12 +10,14 @@ import time
 from concurrent.futures import ThreadPoolExecutor
 from typing import Any

-from fastapi import APIRouter, Depends, HTTPException, status
+from fastapi import APIRouter, Depends, HTTPException, Request, status
 from pydantic import BaseModel, Field

 from ..auth import verify_token
 from ..config import ScriptConfig, ScriptParameterConfig, settings
 from ..config_manager import config_manager
+from ..services.rate_limit import check as ratelimit_check
+from ..services.rate_limit import get_peer
 from ..services.websocket_manager import ws_manager

 router = APIRouter(prefix="/api/scripts", tags=["scripts"])
@@ -31,6 +33,12 @@ def shutdown_script_executor() -> None:


 def _require_scripts_management() -> None:
+    """Authorise a scripts-CRUD operation.
+
+    Two gates: the operator-level `scripts_management` flag in config.yaml,
+    AND the per-token `admin` scope check (read from request-context). Either
+    failure → 403.
+    """
    if not settings.scripts_management:
        raise HTTPException(
            status_code=status.HTTP_403_FORBIDDEN,
@@ -39,6 +47,14 @@ def _require_scripts_management() -> None:
                " in config.yaml to enable."
            ),
        )
+    from ..auth import auth_enabled, token_has_scope, token_label_var
+    if auth_enabled():
+        label = token_label_var.get("unknown")
+        if not token_has_scope(label, "admin"):
+            raise HTTPException(
+                status_code=status.HTTP_403_FORBIDDEN,
+                detail=f"Token '{label}' lacks required scope: admin",
+            )


 class ScriptExecuteRequest(BaseModel):
@@ -215,6 +231,28 @@ def _validate_params(
            # string — just convert to str
            value = str(value)

+        # Optional regex constraint, validated against the *string form* of the
+        # value. This is the only practical defence for string parameters that
+        # flow into shell=true scripts via env vars (Windows cmd.exe expands
+        # `%VAR%` after argument parsing, so embedded `&`/`|`/`%` would inject
+        # commands). Authors of shell scripts should ALWAYS define a pattern.
+        if pdef.pattern:
+            try:
+                if not re.fullmatch(pdef.pattern, str(value)):
+                    raise HTTPException(
+                        status_code=status.HTTP_400_BAD_REQUEST,
+                        detail=(
+                            f"Parameter '{pname}' value {value!r} does not match"
+                            f" required pattern: {pdef.pattern}"
+                        ),
+                    )
+            except re.error as e:
+                # Bad pattern in config — fail closed.
+                raise HTTPException(
+                    status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
+                    detail=f"Parameter '{pname}' has invalid pattern: {e}",
+                ) from e
+
        env_vars[f"SCRIPT_PARAM_{pname.upper()}"] = str(value)

    return env_vars
@@ -223,6 +261,7 @@ def _validate_params(
@router.post("/execute/{script_name}")
 async def execute_script(
    script_name: str,
+    http_request: Request,
    request: ScriptExecuteRequest | None = None,
    _: str = Depends(verify_token),
 ) -> ScriptExecuteResponse:
@@ -235,6 +274,16 @@ async def execute_script(
    Returns:
        Execution result including stdout, stderr, and exit code
    """
+    # Rate-limit script execution per peer so a leaked token can't be used to
+    # spam the shell-exec endpoint.
+    allowed, retry_after = ratelimit_check("execute", get_peer(http_request))
+    if not allowed:
+        raise HTTPException(
+            status_code=429,
+            detail="Too many script executions, slow down",
+            headers={"Retry-After": str(int(retry_after or 60))},
+        )
+
    # Check if script exists
    if script_name not in settings.scripts:
        raise HTTPException(
@@ -249,6 +298,8 @@ async def execute_script(

    logger.info(f"Executing script: {script_name}")

+    from ..services.audit_log import record_script_execution
+
    try:
        # Execute in dedicated thread pool to not block the default executor
        loop = asyncio.get_running_loop()
@@ -263,6 +314,15 @@ async def execute_script(
            ),
        )

+        record_script_execution(
+            kind="script",
+            name=script_name,
+            exit_code=result["exit_code"],
+            duration=result.get("execution_time"),
+            stdout=result.get("stdout"),
+            stderr=result.get("stderr"),
+        )
+
        return ScriptExecuteResponse(
            success=result["exit_code"] == 0,
            script=script_name,
@@ -274,6 +334,13 @@ async def execute_script(

    except Exception as e:
        logger.error(f"Script execution error: {e}")
+        record_script_execution(
+            kind="script",
+            name=script_name,
+            exit_code=None,
+            duration=None,
+            error=str(e),
+        )
        return ScriptExecuteResponse(
            success=False,
            script=script_name,
@@ -313,9 +380,21 @@ def _run_script(
    else:
        popen_kwargs["start_new_session"] = True

+    # When shell=False, the user-provided command string is split via shlex
+    # (POSIX rules — also works for Windows args without backslashes). This
+    # disables shell metacharacter expansion entirely, so SCRIPT_PARAM_* env
+    # vars referenced as $FOO / %FOO% will be treated as literal text by the
+    # process, not interpreted by a shell. Use shell=false for any script
+    # whose params come from external input.
+    if shell:
+        run_command: str | list[str] = command
+    else:
+        import shlex
+        run_command = shlex.split(command, posix=(sys.platform != "win32"))
+
    try:
        result = subprocess.run(
-            command,
+            run_command,
            shell=shell,
            cwd=working_dir,
            capture_output=True,