fix: production-readiness hardening — security, perf, a11y, observability
Lint & Test / test (push) Successful in 20s

Security
- Default scripts_management, callbacks_management, links_management, and
  media_folders_management to False so a leaked token cannot escalate to RCE
  through admin CRUD endpoints.
- TokenSpec + scope hierarchy (read | control | admin); legacy bare-string
  api_tokens entries promote to admin for back-compat. Management endpoints
  now require admin scope.
- WebSocket subprotocol auth (Sec-WebSocket-Protocol: media-server.token.<T>)
  preferred over ?token= query so the token no longer lands in URL/history/
  Referer; query fallback retained for HA integration back-compat.
- Origin allow-list check on the WS endpoint (CSWSH defence).
- In-process token-bucket rate limiter: 5/min for failed auths,
  10/min for /api/scripts/execute and /api/callbacks/execute.
- shell=False subprocess path (shlex.split) + per-parameter regex `pattern`
  in ScriptParameterConfig to harden shell=true scripts against parameter
  injection (Windows cmd.exe env-var expansion).
- CSP gains form-action, worker-src, manifest-src directives.
- Refuse cors_origins=["*"] at startup; strip token=... from uvicorn access
  logs; validate Gitea release tag against strict SemVer regex.
- noopener noreferrer + no-referrer referrerpolicy on every outbound link.
- icacls hardening of config.yaml on Windows (current user + SYSTEM +
  Administrators only); 0600 still enforced on POSIX.
- WS volume handler clamps input and never drops the socket on bad messages.

Performance
- Album-art read in windows_media gated by track key — was decoding the
  WinRT thumbnail twice per second regardless of track changes.
- /api/media/artwork returns content-derived ETag + Cache-Control so the
  browser sends If-None-Match and gets 304s on track repeats.
- Foreground-service ctypes argtypes hoisted to one-time module init
  (was re-declaring ~14 prototypes per probe).
- display_service _static_cache keyed by (edid_hash, ...) tuple with
  eviction of disappeared monitors — fixes stale capabilities on hot-plug
  swaps where the new topology has the same monitor count.
- Visualizer rAF loop paused on document.hidden, resumed on visible.

Reliability / bug fixes
- Lifespan rewritten as try/yield/finally so a partial-startup failure
  cannot orphan background tasks or executors.
- _run_callback in routes/media.py keeps a strong task ref (GC-safe) and
  uses the dedicated callback executor instead of the default pool.
- macos_media.set_volume() no longer always returns True.
- TrayManager._restart_requested initialised in __init__; set before
  signalling exit so the main thread observes it correctly.
- Missing static_dir now logs a WARNING instead of silent UI disable.

UX / accessibility / PWA
- manifest.json theme_color and background_color match the Studio Reference
  base (#0E0D0B); added id and scope for PWA installability.
- ARIA on mini-player icon buttons; inner SVGs marked aria-hidden.
- OS mediaSession API wired so headset / lockscreen / Bluetooth buttons
  drive play/pause/next/prev/seek and show track metadata + artwork.

Observability
- X-Request-ID middleware (accept upstream id if it matches a safe regex,
  otherwise UUID4); request_id_var added to ContextVars and included in
  every log line alongside the token label.
- Audit log (append-only JSONL) for every script + callback execution,
  including the on_play/on_pause/etc. event callbacks. Background-thread
  writer; queue capped; flushed in lifespan teardown.

Deployment
- proxy_headers + forwarded_allow_ips plumbed through Settings →
  uvicorn.Config for reverse-proxy installs.
- HTTPS support via ssl_certfile + ssl_keyfile (+ optional password);
  startup refuses to launch with only one of the pair set.
- Thumbnail cache moved from project-root .cache to
  %LOCALAPPDATA%/media-server/cache (Windows) and
  $XDG_CACHE_HOME/media-server/thumbnails (POSIX).

Tests
- 35 new tests across auth scopes, rate limiter, browser path traversal
  (../ NUL UNC absolute), script-param validation incl. regex, Gitea tag
  whitelist, config atomic write + POSIX perms. 47 passed / 4 skipped.
This commit is contained in:
2026-05-22 22:25:54 +03:00
parent 450f9fe1ee
commit d131ba461c
31 changed files with 1586 additions and 204 deletions
+81 -2
View File
@@ -10,12 +10,14 @@ import time
from concurrent.futures import ThreadPoolExecutor
from typing import Any
from fastapi import APIRouter, Depends, HTTPException, status
from fastapi import APIRouter, Depends, HTTPException, Request, status
from pydantic import BaseModel, Field
from ..auth import verify_token
from ..config import ScriptConfig, ScriptParameterConfig, settings
from ..config_manager import config_manager
from ..services.rate_limit import check as ratelimit_check
from ..services.rate_limit import get_peer
from ..services.websocket_manager import ws_manager
router = APIRouter(prefix="/api/scripts", tags=["scripts"])
@@ -31,6 +33,12 @@ def shutdown_script_executor() -> None:
def _require_scripts_management() -> None:
"""Authorise a scripts-CRUD operation.
Two gates: the operator-level `scripts_management` flag in config.yaml,
AND the per-token `admin` scope check (read from request-context). Either
failure → 403.
"""
if not settings.scripts_management:
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
@@ -39,6 +47,14 @@ def _require_scripts_management() -> None:
" in config.yaml to enable."
),
)
from ..auth import auth_enabled, token_has_scope, token_label_var
if auth_enabled():
label = token_label_var.get("unknown")
if not token_has_scope(label, "admin"):
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail=f"Token '{label}' lacks required scope: admin",
)
class ScriptExecuteRequest(BaseModel):
@@ -215,6 +231,28 @@ def _validate_params(
# string — just convert to str
value = str(value)
# Optional regex constraint, validated against the *string form* of the
# value. This is the only practical defence for string parameters that
# flow into shell=true scripts via env vars (Windows cmd.exe expands
# `%VAR%` after argument parsing, so embedded `&`/`|`/`%` would inject
# commands). Authors of shell scripts should ALWAYS define a pattern.
if pdef.pattern:
try:
if not re.fullmatch(pdef.pattern, str(value)):
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail=(
f"Parameter '{pname}' value {value!r} does not match"
f" required pattern: {pdef.pattern}"
),
)
except re.error as e:
# Bad pattern in config — fail closed.
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=f"Parameter '{pname}' has invalid pattern: {e}",
) from e
env_vars[f"SCRIPT_PARAM_{pname.upper()}"] = str(value)
return env_vars
@@ -223,6 +261,7 @@ def _validate_params(
@router.post("/execute/{script_name}")
async def execute_script(
script_name: str,
http_request: Request,
request: ScriptExecuteRequest | None = None,
_: str = Depends(verify_token),
) -> ScriptExecuteResponse:
@@ -235,6 +274,16 @@ async def execute_script(
Returns:
Execution result including stdout, stderr, and exit code
"""
# Rate-limit script execution per peer so a leaked token can't be used to
# spam the shell-exec endpoint.
allowed, retry_after = ratelimit_check("execute", get_peer(http_request))
if not allowed:
raise HTTPException(
status_code=429,
detail="Too many script executions, slow down",
headers={"Retry-After": str(int(retry_after or 60))},
)
# Check if script exists
if script_name not in settings.scripts:
raise HTTPException(
@@ -249,6 +298,8 @@ async def execute_script(
logger.info(f"Executing script: {script_name}")
from ..services.audit_log import record_script_execution
try:
# Execute in dedicated thread pool to not block the default executor
loop = asyncio.get_running_loop()
@@ -263,6 +314,15 @@ async def execute_script(
),
)
record_script_execution(
kind="script",
name=script_name,
exit_code=result["exit_code"],
duration=result.get("execution_time"),
stdout=result.get("stdout"),
stderr=result.get("stderr"),
)
return ScriptExecuteResponse(
success=result["exit_code"] == 0,
script=script_name,
@@ -274,6 +334,13 @@ async def execute_script(
except Exception as e:
logger.error(f"Script execution error: {e}")
record_script_execution(
kind="script",
name=script_name,
exit_code=None,
duration=None,
error=str(e),
)
return ScriptExecuteResponse(
success=False,
script=script_name,
@@ -313,9 +380,21 @@ def _run_script(
else:
popen_kwargs["start_new_session"] = True
# When shell=False, the user-provided command string is split via shlex
# (POSIX rules — also works for Windows args without backslashes). This
# disables shell metacharacter expansion entirely, so SCRIPT_PARAM_* env
# vars referenced as $FOO / %FOO% will be treated as literal text by the
# process, not interpreted by a shell. Use shell=false for any script
# whose params come from external input.
if shell:
run_command: str | list[str] = command
else:
import shlex
run_command = shlex.split(command, posix=(sys.platform != "win32"))
try:
result = subprocess.run(
command,
run_command,
shell=shell,
cwd=working_dir,
capture_output=True,