fix: production-readiness hardening — security, perf, a11y, observability
Lint & Test / test (push) Successful in 20s
Lint & Test / test (push) Successful in 20s
Security - Default scripts_management, callbacks_management, links_management, and media_folders_management to False so a leaked token cannot escalate to RCE through admin CRUD endpoints. - TokenSpec + scope hierarchy (read | control | admin); legacy bare-string api_tokens entries promote to admin for back-compat. Management endpoints now require admin scope. - WebSocket subprotocol auth (Sec-WebSocket-Protocol: media-server.token.<T>) preferred over ?token= query so the token no longer lands in URL/history/ Referer; query fallback retained for HA integration back-compat. - Origin allow-list check on the WS endpoint (CSWSH defence). - In-process token-bucket rate limiter: 5/min for failed auths, 10/min for /api/scripts/execute and /api/callbacks/execute. - shell=False subprocess path (shlex.split) + per-parameter regex `pattern` in ScriptParameterConfig to harden shell=true scripts against parameter injection (Windows cmd.exe env-var expansion). - CSP gains form-action, worker-src, manifest-src directives. - Refuse cors_origins=["*"] at startup; strip token=... from uvicorn access logs; validate Gitea release tag against strict SemVer regex. - noopener noreferrer + no-referrer referrerpolicy on every outbound link. - icacls hardening of config.yaml on Windows (current user + SYSTEM + Administrators only); 0600 still enforced on POSIX. - WS volume handler clamps input and never drops the socket on bad messages. Performance - Album-art read in windows_media gated by track key — was decoding the WinRT thumbnail twice per second regardless of track changes. - /api/media/artwork returns content-derived ETag + Cache-Control so the browser sends If-None-Match and gets 304s on track repeats. - Foreground-service ctypes argtypes hoisted to one-time module init (was re-declaring ~14 prototypes per probe). - display_service _static_cache keyed by (edid_hash, ...) tuple with eviction of disappeared monitors — fixes stale capabilities on hot-plug swaps where the new topology has the same monitor count. - Visualizer rAF loop paused on document.hidden, resumed on visible. Reliability / bug fixes - Lifespan rewritten as try/yield/finally so a partial-startup failure cannot orphan background tasks or executors. - _run_callback in routes/media.py keeps a strong task ref (GC-safe) and uses the dedicated callback executor instead of the default pool. - macos_media.set_volume() no longer always returns True. - TrayManager._restart_requested initialised in __init__; set before signalling exit so the main thread observes it correctly. - Missing static_dir now logs a WARNING instead of silent UI disable. UX / accessibility / PWA - manifest.json theme_color and background_color match the Studio Reference base (#0E0D0B); added id and scope for PWA installability. - ARIA on mini-player icon buttons; inner SVGs marked aria-hidden. - OS mediaSession API wired so headset / lockscreen / Bluetooth buttons drive play/pause/next/prev/seek and show track metadata + artwork. Observability - X-Request-ID middleware (accept upstream id if it matches a safe regex, otherwise UUID4); request_id_var added to ContextVars and included in every log line alongside the token label. - Audit log (append-only JSONL) for every script + callback execution, including the on_play/on_pause/etc. event callbacks. Background-thread writer; queue capped; flushed in lifespan teardown. Deployment - proxy_headers + forwarded_allow_ips plumbed through Settings → uvicorn.Config for reverse-proxy installs. - HTTPS support via ssl_certfile + ssl_keyfile (+ optional password); startup refuses to launch with only one of the pair set. - Thumbnail cache moved from project-root .cache to %LOCALAPPDATA%/media-server/cache (Windows) and $XDG_CACHE_HOME/media-server/thumbnails (POSIX). Tests - 35 new tests across auth scopes, rate limiter, browser path traversal (../ NUL UNC absolute), script-param validation incl. regex, Gitea tag whitelist, config atomic write + POSIX perms. 47 passed / 4 skipped.
This commit is contained in:
@@ -10,12 +10,14 @@ import time
|
||||
from concurrent.futures import ThreadPoolExecutor
|
||||
from typing import Any
|
||||
|
||||
from fastapi import APIRouter, Depends, HTTPException, status
|
||||
from fastapi import APIRouter, Depends, HTTPException, Request, status
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
from ..auth import verify_token
|
||||
from ..config import ScriptConfig, ScriptParameterConfig, settings
|
||||
from ..config_manager import config_manager
|
||||
from ..services.rate_limit import check as ratelimit_check
|
||||
from ..services.rate_limit import get_peer
|
||||
from ..services.websocket_manager import ws_manager
|
||||
|
||||
router = APIRouter(prefix="/api/scripts", tags=["scripts"])
|
||||
@@ -31,6 +33,12 @@ def shutdown_script_executor() -> None:
|
||||
|
||||
|
||||
def _require_scripts_management() -> None:
|
||||
"""Authorise a scripts-CRUD operation.
|
||||
|
||||
Two gates: the operator-level `scripts_management` flag in config.yaml,
|
||||
AND the per-token `admin` scope check (read from request-context). Either
|
||||
failure → 403.
|
||||
"""
|
||||
if not settings.scripts_management:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_403_FORBIDDEN,
|
||||
@@ -39,6 +47,14 @@ def _require_scripts_management() -> None:
|
||||
" in config.yaml to enable."
|
||||
),
|
||||
)
|
||||
from ..auth import auth_enabled, token_has_scope, token_label_var
|
||||
if auth_enabled():
|
||||
label = token_label_var.get("unknown")
|
||||
if not token_has_scope(label, "admin"):
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_403_FORBIDDEN,
|
||||
detail=f"Token '{label}' lacks required scope: admin",
|
||||
)
|
||||
|
||||
|
||||
class ScriptExecuteRequest(BaseModel):
|
||||
@@ -215,6 +231,28 @@ def _validate_params(
|
||||
# string — just convert to str
|
||||
value = str(value)
|
||||
|
||||
# Optional regex constraint, validated against the *string form* of the
|
||||
# value. This is the only practical defence for string parameters that
|
||||
# flow into shell=true scripts via env vars (Windows cmd.exe expands
|
||||
# `%VAR%` after argument parsing, so embedded `&`/`|`/`%` would inject
|
||||
# commands). Authors of shell scripts should ALWAYS define a pattern.
|
||||
if pdef.pattern:
|
||||
try:
|
||||
if not re.fullmatch(pdef.pattern, str(value)):
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_400_BAD_REQUEST,
|
||||
detail=(
|
||||
f"Parameter '{pname}' value {value!r} does not match"
|
||||
f" required pattern: {pdef.pattern}"
|
||||
),
|
||||
)
|
||||
except re.error as e:
|
||||
# Bad pattern in config — fail closed.
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
|
||||
detail=f"Parameter '{pname}' has invalid pattern: {e}",
|
||||
) from e
|
||||
|
||||
env_vars[f"SCRIPT_PARAM_{pname.upper()}"] = str(value)
|
||||
|
||||
return env_vars
|
||||
@@ -223,6 +261,7 @@ def _validate_params(
|
||||
@router.post("/execute/{script_name}")
|
||||
async def execute_script(
|
||||
script_name: str,
|
||||
http_request: Request,
|
||||
request: ScriptExecuteRequest | None = None,
|
||||
_: str = Depends(verify_token),
|
||||
) -> ScriptExecuteResponse:
|
||||
@@ -235,6 +274,16 @@ async def execute_script(
|
||||
Returns:
|
||||
Execution result including stdout, stderr, and exit code
|
||||
"""
|
||||
# Rate-limit script execution per peer so a leaked token can't be used to
|
||||
# spam the shell-exec endpoint.
|
||||
allowed, retry_after = ratelimit_check("execute", get_peer(http_request))
|
||||
if not allowed:
|
||||
raise HTTPException(
|
||||
status_code=429,
|
||||
detail="Too many script executions, slow down",
|
||||
headers={"Retry-After": str(int(retry_after or 60))},
|
||||
)
|
||||
|
||||
# Check if script exists
|
||||
if script_name not in settings.scripts:
|
||||
raise HTTPException(
|
||||
@@ -249,6 +298,8 @@ async def execute_script(
|
||||
|
||||
logger.info(f"Executing script: {script_name}")
|
||||
|
||||
from ..services.audit_log import record_script_execution
|
||||
|
||||
try:
|
||||
# Execute in dedicated thread pool to not block the default executor
|
||||
loop = asyncio.get_running_loop()
|
||||
@@ -263,6 +314,15 @@ async def execute_script(
|
||||
),
|
||||
)
|
||||
|
||||
record_script_execution(
|
||||
kind="script",
|
||||
name=script_name,
|
||||
exit_code=result["exit_code"],
|
||||
duration=result.get("execution_time"),
|
||||
stdout=result.get("stdout"),
|
||||
stderr=result.get("stderr"),
|
||||
)
|
||||
|
||||
return ScriptExecuteResponse(
|
||||
success=result["exit_code"] == 0,
|
||||
script=script_name,
|
||||
@@ -274,6 +334,13 @@ async def execute_script(
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Script execution error: {e}")
|
||||
record_script_execution(
|
||||
kind="script",
|
||||
name=script_name,
|
||||
exit_code=None,
|
||||
duration=None,
|
||||
error=str(e),
|
||||
)
|
||||
return ScriptExecuteResponse(
|
||||
success=False,
|
||||
script=script_name,
|
||||
@@ -313,9 +380,21 @@ def _run_script(
|
||||
else:
|
||||
popen_kwargs["start_new_session"] = True
|
||||
|
||||
# When shell=False, the user-provided command string is split via shlex
|
||||
# (POSIX rules — also works for Windows args without backslashes). This
|
||||
# disables shell metacharacter expansion entirely, so SCRIPT_PARAM_* env
|
||||
# vars referenced as $FOO / %FOO% will be treated as literal text by the
|
||||
# process, not interpreted by a shell. Use shell=false for any script
|
||||
# whose params come from external input.
|
||||
if shell:
|
||||
run_command: str | list[str] = command
|
||||
else:
|
||||
import shlex
|
||||
run_command = shlex.split(command, posix=(sys.platform != "win32"))
|
||||
|
||||
try:
|
||||
result = subprocess.run(
|
||||
command,
|
||||
run_command,
|
||||
shell=shell,
|
||||
cwd=working_dir,
|
||||
capture_output=True,
|
||||
|
||||
Reference in New Issue
Block a user