fix: production-readiness hardening — security, perf, a11y, observability
Lint & Test / test (push) Successful in 20s

Security
- Default scripts_management, callbacks_management, links_management, and
  media_folders_management to False so a leaked token cannot escalate to RCE
  through admin CRUD endpoints.
- TokenSpec + scope hierarchy (read | control | admin); legacy bare-string
  api_tokens entries promote to admin for back-compat. Management endpoints
  now require admin scope.
- WebSocket subprotocol auth (Sec-WebSocket-Protocol: media-server.token.<T>)
  preferred over ?token= query so the token no longer lands in URL/history/
  Referer; query fallback retained for HA integration back-compat.
- Origin allow-list check on the WS endpoint (CSWSH defence).
- In-process token-bucket rate limiter: 5/min for failed auths,
  10/min for /api/scripts/execute and /api/callbacks/execute.
- shell=False subprocess path (shlex.split) + per-parameter regex `pattern`
  in ScriptParameterConfig to harden shell=true scripts against parameter
  injection (Windows cmd.exe env-var expansion).
- CSP gains form-action, worker-src, manifest-src directives.
- Refuse cors_origins=["*"] at startup; strip token=... from uvicorn access
  logs; validate Gitea release tag against strict SemVer regex.
- noopener noreferrer + no-referrer referrerpolicy on every outbound link.
- icacls hardening of config.yaml on Windows (current user + SYSTEM +
  Administrators only); 0600 still enforced on POSIX.
- WS volume handler clamps input and never drops the socket on bad messages.

Performance
- Album-art read in windows_media gated by track key — was decoding the
  WinRT thumbnail twice per second regardless of track changes.
- /api/media/artwork returns content-derived ETag + Cache-Control so the
  browser sends If-None-Match and gets 304s on track repeats.
- Foreground-service ctypes argtypes hoisted to one-time module init
  (was re-declaring ~14 prototypes per probe).
- display_service _static_cache keyed by (edid_hash, ...) tuple with
  eviction of disappeared monitors — fixes stale capabilities on hot-plug
  swaps where the new topology has the same monitor count.
- Visualizer rAF loop paused on document.hidden, resumed on visible.

Reliability / bug fixes
- Lifespan rewritten as try/yield/finally so a partial-startup failure
  cannot orphan background tasks or executors.
- _run_callback in routes/media.py keeps a strong task ref (GC-safe) and
  uses the dedicated callback executor instead of the default pool.
- macos_media.set_volume() no longer always returns True.
- TrayManager._restart_requested initialised in __init__; set before
  signalling exit so the main thread observes it correctly.
- Missing static_dir now logs a WARNING instead of silent UI disable.

UX / accessibility / PWA
- manifest.json theme_color and background_color match the Studio Reference
  base (#0E0D0B); added id and scope for PWA installability.
- ARIA on mini-player icon buttons; inner SVGs marked aria-hidden.
- OS mediaSession API wired so headset / lockscreen / Bluetooth buttons
  drive play/pause/next/prev/seek and show track metadata + artwork.

Observability
- X-Request-ID middleware (accept upstream id if it matches a safe regex,
  otherwise UUID4); request_id_var added to ContextVars and included in
  every log line alongside the token label.
- Audit log (append-only JSONL) for every script + callback execution,
  including the on_play/on_pause/etc. event callbacks. Background-thread
  writer; queue capped; flushed in lifespan teardown.

Deployment
- proxy_headers + forwarded_allow_ips plumbed through Settings →
  uvicorn.Config for reverse-proxy installs.
- HTTPS support via ssl_certfile + ssl_keyfile (+ optional password);
  startup refuses to launch with only one of the pair set.
- Thumbnail cache moved from project-root .cache to
  %LOCALAPPDATA%/media-server/cache (Windows) and
  $XDG_CACHE_HOME/media-server/thumbnails (POSIX).

Tests
- 35 new tests across auth scopes, rate limiter, browser path traversal
  (../ NUL UNC absolute), script-param validation incl. regex, Gitea tag
  whitelist, config atomic write + POSIX perms. 47 passed / 4 skipped.
This commit is contained in:
2026-05-22 22:25:54 +03:00
parent 450f9fe1ee
commit d131ba461c
31 changed files with 1586 additions and 204 deletions
+247 -90
View File
@@ -15,7 +15,7 @@ from fastapi.responses import FileResponse
from fastapi.staticfiles import StaticFiles
from . import __version__
from .auth import get_token_label, token_label_var
from .auth import get_token_label, request_id_var, token_label_var
from .config import generate_default_config, get_config_dir, settings
from .routes import (
audio_router,
@@ -33,10 +33,34 @@ from .services.websocket_manager import ws_manager
class TokenLabelFilter(logging.Filter):
"""Add token label to log records."""
"""Add token label + request_id to log records."""
def filter(self, record):
record.token_label = token_label_var.get("unknown")
record.request_id = request_id_var.get("-")
return True
class _StripTokenQueryFilter(logging.Filter):
"""Strip `token=...` from query strings before they hit the access log.
uvicorn's default access log format includes the full request line, so
`/api/media/artwork?token=SECRET` would otherwise be persisted verbatim
in stdout/journald/file sinks.
"""
import re as _re
_TOKEN_RE = _re.compile(r"([?&])token=[^&\s\"']+")
def filter(self, record): # type: ignore[override]
if isinstance(record.args, tuple):
record.args = tuple(
self._TOKEN_RE.sub(r"\1token=REDACTED", a) if isinstance(a, str) else a
for a in record.args
)
if isinstance(record.msg, str) and "token=" in record.msg:
record.msg = self._TOKEN_RE.sub(r"\1token=REDACTED", record.msg)
return True
@@ -49,17 +73,34 @@ def setup_logging():
logging.basicConfig(
level=getattr(logging, settings.log_level.upper()),
format="%(asctime)s - %(name)s - [%(token_label)s] - %(levelname)s - %(message)s",
format=(
"%(asctime)s - %(name)s - [%(token_label)s] [%(request_id)s]"
" - %(levelname)s - %(message)s"
),
handlers=[handler],
)
# Suppress noisy third-party loggers
logging.getLogger("screen_brightness_control").setLevel(logging.ERROR)
# Make sure the uvicorn access log never persists tokens leaked into the
# query string (the artwork + WS endpoints accept `?token=` for browser
# compatibility — see verify_token_or_query).
strip_filter = _StripTokenQueryFilter()
for name in ("uvicorn.access", "uvicorn"):
logging.getLogger(name).addFilter(strip_filter)
@asynccontextmanager
async def lifespan(app: FastAPI):
"""Application lifespan handler."""
"""Application lifespan handler.
All long-lived resources started during startup are kept in local refs and
torn down in a `finally:` so a partial-startup failure cannot orphan tasks
or thread pools.
"""
import asyncio
setup_logging()
logger = logging.getLogger(__name__)
logger.info(f"Media Server starting on {settings.host}:{settings.port}")
@@ -71,92 +112,125 @@ async def lifespan(app: FastAPI):
else:
logger.warning("No API tokens configured — authentication is DISABLED")
# Start WebSocket status monitor
controller = get_media_controller()
await ws_manager.start_status_monitor(controller.get_status)
logger.info("WebSocket status monitor started")
# Start update checker
update_checker = None
if settings.update_check_enabled:
from .services.gitea_release_provider import GiteaReleaseProvider
from .services.update_checker import UpdateChecker
provider = GiteaReleaseProvider()
update_checker = UpdateChecker(provider, __version__)
await update_checker.start(settings.update_check_interval)
# Store globally so health endpoint can access cached result
app.state.update_checker = update_checker
# Schedule periodic thumbnail cache cleanup so the 500 MB cap is actually
# enforced. Runs once at startup and then hourly until shutdown.
from .services.thumbnail_service import ThumbnailService
async def _thumbnail_cleanup_loop() -> None:
while True:
try:
await asyncio.to_thread(ThumbnailService.cleanup_cache)
except Exception as e:
logger.warning("Thumbnail cache cleanup failed: %s", e)
try:
await asyncio.sleep(3600)
except asyncio.CancelledError:
break
import asyncio
cleanup_task = asyncio.create_task(_thumbnail_cleanup_loop())
# Register audio visualizer (capture starts on-demand when clients subscribe)
cleanup_task: asyncio.Task | None = None
analyzer = None
if settings.visualizer_enabled:
from .services.audio_analyzer import get_audio_analyzer
status_monitor_started = False
analyzer = get_audio_analyzer(
num_bins=settings.visualizer_bins,
target_fps=settings.visualizer_fps,
device_name=settings.visualizer_device,
)
if analyzer.available:
await ws_manager.start_audio_monitor(analyzer)
logger.info("Audio visualizer available (capture on-demand)")
else:
logger.info("Audio visualizer unavailable (install soundcard + numpy)")
yield
# Stop update checker
if update_checker is not None:
await update_checker.stop()
# Cancel periodic thumbnail cleanup
cleanup_task.cancel()
try:
await cleanup_task
except asyncio.CancelledError:
pass
# Start WebSocket status monitor
controller = get_media_controller()
await ws_manager.start_status_monitor(controller.get_status)
status_monitor_started = True
logger.info("WebSocket status monitor started")
# Stop audio visualizer
await ws_manager.stop_audio_monitor()
if analyzer and analyzer.running:
analyzer.stop()
# Start update checker
if settings.update_check_enabled:
from .services.gitea_release_provider import GiteaReleaseProvider
from .services.update_checker import UpdateChecker
# Stop WebSocket status monitor
await ws_manager.stop_status_monitor()
provider = GiteaReleaseProvider()
update_checker = UpdateChecker(provider, __version__)
await update_checker.start(settings.update_check_interval)
# Store globally so health endpoint can access cached result
app.state.update_checker = update_checker
# Shut down dedicated thread pools so pending scripts don't leak threads
from .routes.callbacks import shutdown_callback_executor
from .routes.scripts import shutdown_script_executor
# Schedule periodic thumbnail cache cleanup so the 500 MB cap is actually
# enforced. Runs once at startup and then hourly until shutdown.
from .services.thumbnail_service import ThumbnailService
shutdown_script_executor()
shutdown_callback_executor()
async def _thumbnail_cleanup_loop() -> None:
while True:
try:
await asyncio.to_thread(ThumbnailService.cleanup_cache)
except Exception as e:
logger.warning("Thumbnail cache cleanup failed: %s", e)
try:
await asyncio.sleep(3600)
except asyncio.CancelledError:
break
# Clean up platform-specific resources
import platform as _platform
if _platform.system() == "Windows":
from .services.windows_media import shutdown_executor
shutdown_executor()
cleanup_task = asyncio.create_task(_thumbnail_cleanup_loop())
logger.info("Media Server shutting down")
# Register audio visualizer (capture starts on-demand when clients subscribe)
if settings.visualizer_enabled:
from .services.audio_analyzer import get_audio_analyzer
analyzer = get_audio_analyzer(
num_bins=settings.visualizer_bins,
target_fps=settings.visualizer_fps,
device_name=settings.visualizer_device,
)
if analyzer.available:
await ws_manager.start_audio_monitor(analyzer)
logger.info("Audio visualizer available (capture on-demand)")
else:
logger.info("Audio visualizer unavailable (install soundcard + numpy)")
yield
finally:
# Stop update checker
if update_checker is not None:
try:
await update_checker.stop()
except Exception:
logger.exception("Error stopping update checker")
# Cancel periodic thumbnail cleanup
if cleanup_task is not None:
cleanup_task.cancel()
try:
await cleanup_task
except asyncio.CancelledError:
pass
except Exception:
logger.exception("Error awaiting thumbnail cleanup task")
# Stop audio visualizer
try:
await ws_manager.stop_audio_monitor()
except Exception:
logger.exception("Error stopping audio monitor")
if analyzer and analyzer.running:
try:
analyzer.stop()
except Exception:
logger.exception("Error stopping audio analyzer")
# Stop WebSocket status monitor
if status_monitor_started:
try:
await ws_manager.stop_status_monitor()
except Exception:
logger.exception("Error stopping status monitor")
# Shut down dedicated thread pools so pending scripts don't leak threads
try:
from .routes.callbacks import shutdown_callback_executor
from .routes.scripts import shutdown_script_executor
shutdown_script_executor()
shutdown_callback_executor()
except Exception:
logger.exception("Error shutting down script/callback executors")
# Flush audit log writer
try:
from .services.audit_log import shutdown_audit_log
shutdown_audit_log()
except Exception:
logger.exception("Error flushing audit log")
# Clean up platform-specific resources
import platform as _platform
if _platform.system() == "Windows":
try:
from .services.windows_media import shutdown_executor
shutdown_executor()
except Exception:
logger.exception("Error shutting down windows_media executor")
logger.info("Media Server shutting down")
def create_app() -> FastAPI:
@@ -173,7 +247,15 @@ def create_app() -> FastAPI:
# CORS — restrict to same-origin by default; users that integrate the API
# from another origin (e.g. Home Assistant on a different host) can set
# cors_origins in config.yaml.
# cors_origins in config.yaml. Refuse "*" outright: combined with the
# admin endpoints this would let any origin in the universe run
# arbitrary shell. If users genuinely need every origin, they can list
# them explicitly.
if any(o.strip() == "*" for o in settings.cors_origins):
raise RuntimeError(
"cors_origins must not contain '*' — list exact origins instead. "
"This protects the script-execution endpoints from any-origin abuse."
)
cors_origins = settings.cors_origins or [
f"http://localhost:{settings.port}",
f"http://127.0.0.1:{settings.port}",
@@ -186,6 +268,23 @@ def create_app() -> FastAPI:
allow_headers=["Authorization", "Content-Type"],
)
# Request correlation ID — accept upstream X-Request-ID if it's a sane
# ASCII id, otherwise mint a fresh UUID4. Emitted on the response so
# clients can quote it back in bug reports.
import re
import uuid as _uuid
_REQ_ID_RE = re.compile(r"^[A-Za-z0-9._\-]{1,128}$")
@app.middleware("http")
async def request_id_middleware(request: Request, call_next):
incoming = request.headers.get("x-request-id", "")
req_id = incoming if _REQ_ID_RE.match(incoming) else _uuid.uuid4().hex[:16]
request_id_var.set(req_id)
response = await call_next(request)
response.headers["X-Request-ID"] = req_id
return response
# Security headers — strict CSP for the bundled UI, disallow framing, hide referrer.
@app.middleware("http")
async def security_headers_middleware(request: Request, call_next):
@@ -200,6 +299,9 @@ def create_app() -> FastAPI:
"style-src 'self' 'unsafe-inline'; "
"font-src 'self' data:; "
"frame-ancestors 'none'; "
"form-action 'self'; "
"worker-src 'self'; "
"manifest-src 'self'; "
"base-uri 'self'"
),
)
@@ -208,32 +310,63 @@ def create_app() -> FastAPI:
response.headers.setdefault("Referrer-Policy", "no-referrer")
return response
# Add token logging middleware
# Add token logging middleware + auth-failure rate limit
from fastapi.responses import JSONResponse
from .services.rate_limit import check as ratelimit_check
from .services.rate_limit import get_peer
@app.middleware("http")
async def token_logging_middleware(request: Request, call_next):
"""Extract token label and set in context for logging."""
"""Extract token label, set in context, and rate-limit failed auths."""
if not settings.api_tokens:
token_label_var.set("anonymous")
else:
token_label = "unknown"
token_present = False
token_valid = False
# Try Authorization header
auth_header = request.headers.get("authorization", "")
if auth_header.startswith("Bearer "):
token_present = True
token = auth_header[7:]
label = get_token_label(token)
if label:
token_label = label
token_valid = True
# Try query parameter (for artwork endpoint)
elif "token" in request.query_params:
token_present = True
token = request.query_params["token"]
label = get_token_label(token)
if label:
token_label = label
token_valid = True
token_label_var.set(token_label)
# Brute-force gate: a peer that produces a wrong/missing token gets
# 5 failures per minute before being throttled. Static-asset
# requests (GET /static/*, /, /sw.js) and the docs endpoint are
# exempt — they're served unauthenticated by design.
if token_present and not token_valid:
path = request.url.path
if not (
path == "/" or path == "/sw.js"
or path.startswith("/static/")
or path.startswith("/docs") or path.startswith("/openapi")
or path.startswith("/redoc")
):
allowed, retry_after = ratelimit_check("auth", get_peer(request))
if not allowed:
return JSONResponse(
status_code=429,
content={"detail": "Too many authentication failures"},
headers={"Retry-After": str(int(retry_after or 60))},
)
response = await call_next(request)
return response
@@ -266,6 +399,11 @@ def create_app() -> FastAPI:
async def serve_ui():
"""Serve the Web UI."""
return FileResponse(static_dir / "index.html")
else:
logging.getLogger(__name__).warning(
"static_dir not found at %s — Web UI disabled (API only)",
static_dir,
)
return app
@@ -316,8 +454,9 @@ def main():
print(f"Config directory: {get_config_dir()}")
if settings.api_tokens:
print("\nAPI Tokens:")
for label, token in settings.api_tokens.items():
print(f" {label:20} {token}")
for label, spec in settings.api_tokens.items():
scope_str = ",".join(spec.scopes)
print(f" {label:20} {spec.token} [scopes: {scope_str}]")
else:
print("\nAuthentication is DISABLED (no tokens configured)")
return
@@ -374,6 +513,27 @@ def main():
use_tray = PYSTRAY_AVAILABLE and not args.no_tray
# Validate TLS pair consistency before either path so we don't fail late.
if bool(settings.ssl_certfile) ^ bool(settings.ssl_keyfile):
_fatal(
"ERROR: ssl_certfile and ssl_keyfile must both be set, or both unset."
)
def _uvicorn_kwargs() -> dict:
kw: dict = {
"host": args.host,
"port": args.port,
"log_level": settings.log_level.lower(),
"proxy_headers": settings.proxy_headers,
"forwarded_allow_ips": settings.forwarded_allow_ips,
}
if settings.ssl_certfile and settings.ssl_keyfile:
kw["ssl_certfile"] = settings.ssl_certfile
kw["ssl_keyfile"] = settings.ssl_keyfile
if settings.ssl_keyfile_password:
kw["ssl_keyfile_password"] = settings.ssl_keyfile_password
return kw
if use_tray:
import asyncio
import threading
@@ -381,9 +541,7 @@ def main():
# Run uvicorn in a background thread so tray owns the main thread message loop
uv_config = uvicorn.Config(
"media_server.main:app",
host=args.host,
port=args.port,
log_level=settings.log_level.lower(),
**_uvicorn_kwargs(),
)
server = uvicorn.Server(uv_config)
@@ -421,9 +579,8 @@ def main():
else:
uvicorn.run(
"media_server.main:app",
host=args.host,
port=args.port,
reload=False,
**_uvicorn_kwargs(),
)