fix: production-readiness hardening — security, perf, a11y, observability
Lint & Test / test (push) Successful in 20s

Security
- Default scripts_management, callbacks_management, links_management, and
  media_folders_management to False so a leaked token cannot escalate to RCE
  through admin CRUD endpoints.
- TokenSpec + scope hierarchy (read | control | admin); legacy bare-string
  api_tokens entries promote to admin for back-compat. Management endpoints
  now require admin scope.
- WebSocket subprotocol auth (Sec-WebSocket-Protocol: media-server.token.<T>)
  preferred over ?token= query so the token no longer lands in URL/history/
  Referer; query fallback retained for HA integration back-compat.
- Origin allow-list check on the WS endpoint (CSWSH defence).
- In-process token-bucket rate limiter: 5/min for failed auths,
  10/min for /api/scripts/execute and /api/callbacks/execute.
- shell=False subprocess path (shlex.split) + per-parameter regex `pattern`
  in ScriptParameterConfig to harden shell=true scripts against parameter
  injection (Windows cmd.exe env-var expansion).
- CSP gains form-action, worker-src, manifest-src directives.
- Refuse cors_origins=["*"] at startup; strip token=... from uvicorn access
  logs; validate Gitea release tag against strict SemVer regex.
- noopener noreferrer + no-referrer referrerpolicy on every outbound link.
- icacls hardening of config.yaml on Windows (current user + SYSTEM +
  Administrators only); 0600 still enforced on POSIX.
- WS volume handler clamps input and never drops the socket on bad messages.

Performance
- Album-art read in windows_media gated by track key — was decoding the
  WinRT thumbnail twice per second regardless of track changes.
- /api/media/artwork returns content-derived ETag + Cache-Control so the
  browser sends If-None-Match and gets 304s on track repeats.
- Foreground-service ctypes argtypes hoisted to one-time module init
  (was re-declaring ~14 prototypes per probe).
- display_service _static_cache keyed by (edid_hash, ...) tuple with
  eviction of disappeared monitors — fixes stale capabilities on hot-plug
  swaps where the new topology has the same monitor count.
- Visualizer rAF loop paused on document.hidden, resumed on visible.

Reliability / bug fixes
- Lifespan rewritten as try/yield/finally so a partial-startup failure
  cannot orphan background tasks or executors.
- _run_callback in routes/media.py keeps a strong task ref (GC-safe) and
  uses the dedicated callback executor instead of the default pool.
- macos_media.set_volume() no longer always returns True.
- TrayManager._restart_requested initialised in __init__; set before
  signalling exit so the main thread observes it correctly.
- Missing static_dir now logs a WARNING instead of silent UI disable.

UX / accessibility / PWA
- manifest.json theme_color and background_color match the Studio Reference
  base (#0E0D0B); added id and scope for PWA installability.
- ARIA on mini-player icon buttons; inner SVGs marked aria-hidden.
- OS mediaSession API wired so headset / lockscreen / Bluetooth buttons
  drive play/pause/next/prev/seek and show track metadata + artwork.

Observability
- X-Request-ID middleware (accept upstream id if it matches a safe regex,
  otherwise UUID4); request_id_var added to ContextVars and included in
  every log line alongside the token label.
- Audit log (append-only JSONL) for every script + callback execution,
  including the on_play/on_pause/etc. event callbacks. Background-thread
  writer; queue capped; flushed in lifespan teardown.

Deployment
- proxy_headers + forwarded_allow_ips plumbed through Settings →
  uvicorn.Config for reverse-proxy installs.
- HTTPS support via ssl_certfile + ssl_keyfile (+ optional password);
  startup refuses to launch with only one of the pair set.
- Thumbnail cache moved from project-root .cache to
  %LOCALAPPDATA%/media-server/cache (Windows) and
  $XDG_CACHE_HOME/media-server/thumbnails (POSIX).

Tests
- 35 new tests across auth scopes, rate limiter, browser path traversal
  (../ NUL UNC absolute), script-param validation incl. regex, Gitea tag
  whitelist, config atomic write + POSIX perms. 47 passed / 4 skipped.
This commit is contained in:
2026-05-22 22:25:54 +03:00
parent 450f9fe1ee
commit d131ba461c
31 changed files with 1586 additions and 204 deletions
+120
View File
@@ -0,0 +1,120 @@
"""Append-only audit log for sensitive actions (script + callback execution).
Writes a single JSONL line per event to ``<config_dir>/audit.log``. The log is
write-only from the app's perspective — it never reads back, and rotation is
left to the operator (the file size is dominated by stdout/stderr truncation,
which is already capped at 10 KB per stream in `_run_script`).
Designed to be cheap: the write goes through a small background thread so the
hot path never blocks on disk I/O, and a failure to write is logged at WARNING
but never raised to callers.
"""
from __future__ import annotations
import json
import logging
import queue
import threading
import time
from typing import Any
from ..auth import token_label_var
from ..config import get_config_dir
logger = logging.getLogger(__name__)
# Cap on stdout/stderr inside the audit record so a chatty script doesn't
# explode the log. Mirrors the 10k cap used by _run_script.
_OUTPUT_CAP = 2000
_audit_queue: "queue.Queue[dict[str, Any] | None]" = queue.Queue(maxsize=1000)
_audit_thread: threading.Thread | None = None
_audit_lock = threading.Lock()
def _ensure_writer_started() -> None:
global _audit_thread
with _audit_lock:
if _audit_thread is not None and _audit_thread.is_alive():
return
_audit_thread = threading.Thread(
target=_audit_writer_loop,
name="audit-log",
daemon=True,
)
_audit_thread.start()
def _audit_writer_loop() -> None:
log_path = get_config_dir() / "audit.log"
while True:
try:
record = _audit_queue.get()
except Exception:
return
if record is None:
return
try:
line = json.dumps(record, ensure_ascii=False, default=str)
with open(log_path, "a", encoding="utf-8") as f:
f.write(line + "\n")
except OSError as e:
logger.warning("Failed to write audit record: %s", e)
def _truncate(value: str | None) -> str | None:
if value is None:
return None
if len(value) <= _OUTPUT_CAP:
return value
return value[:_OUTPUT_CAP] + f"\n…[truncated, {len(value) - _OUTPUT_CAP} chars]"
def record_script_execution(
*,
kind: str,
name: str,
exit_code: int | None,
duration: float | None,
stdout: str | None = None,
stderr: str | None = None,
error: str | None = None,
) -> None:
"""Append a single audit record. Never raises."""
_ensure_writer_started()
try:
record = {
"ts": time.time(),
"iso": time.strftime("%Y-%m-%dT%H:%M:%S", time.gmtime()),
"token_label": token_label_var.get("unknown"),
"kind": kind,
"name": name,
"exit_code": exit_code,
"duration_s": round(duration, 4) if duration is not None else None,
"success": exit_code == 0 if exit_code is not None else False,
"stdout": _truncate(stdout),
"stderr": _truncate(stderr),
"error": error,
}
_audit_queue.put_nowait(record)
except queue.Full:
# Backpressure: drop oldest record to make room. We'd rather lose an
# old entry than block the script that just ran.
try:
_audit_queue.get_nowait()
_audit_queue.put_nowait(record)
except queue.Empty:
pass
except Exception as e:
logger.warning("Failed to enqueue audit record: %s", e)
def shutdown_audit_log() -> None:
"""Flush the audit queue on app shutdown."""
try:
_audit_queue.put_nowait(None)
except queue.Full:
pass
if _audit_thread is not None:
_audit_thread.join(timeout=2)
+38 -14
View File
@@ -192,10 +192,11 @@ _CACHE_TTL = 5.0 # seconds
# Per-monitor cache of static capabilities (option lists + support flags).
# DDC/CI capability discovery is the slow part — it only changes when a
# monitor is replaced or rewired, so we probe it once per monitor and reuse
# it across refreshes. Cleared on explicit `rediscover` or when the monitor
# count changes (cheap stale-detection for hot-plug events).
_static_cache: dict[int, dict] = {}
_static_cache_monitor_count: int = -1
# it across refreshes. Keyed by a stable identity tuple
# (manufacturer, model, edid_hash) so that hot-plug swaps where the new
# topology has the same number of monitors but different devices still
# refresh the cache for the new monitor instead of serving stale capabilities.
_static_cache: dict[tuple, dict] = {}
def _enum_name(value, enum_cls=None) -> str | None:
@@ -353,7 +354,7 @@ def list_monitors(force_refresh: bool = False, rediscover: bool = False) -> list
next probe re-runs DDC/CI capability discovery. Use after hot-plug
or when a monitor's reported capabilities change.
"""
global _monitor_cache, _cache_time, _static_cache_monitor_count
global _monitor_cache, _cache_time
if (
not force_refresh
@@ -372,12 +373,11 @@ def list_monitors(force_refresh: bool = False, rediscover: bool = False) -> list
info_list = sbc.list_monitors_info()
brightnesses = sbc.get_brightness()
# Invalidate the static cache on explicit rediscover OR on topology
# change (hot-plug / disconnect). Both indicate the cached probe is
# potentially stale.
if rediscover or len(info_list) != _static_cache_monitor_count:
# Explicit rediscover wipes the whole cache; otherwise rely on stable
# per-monitor keys (manufacturer|model|edid_hash) so a hot-plug swap
# invalidates the entry for the missing monitor automatically.
if rediscover:
_static_cache.clear()
_static_cache_monitor_count = len(info_list)
mc = _load_monitorcontrol()
ddc_monitors = []
@@ -387,6 +387,9 @@ def list_monitors(force_refresh: bool = False, rediscover: bool = False) -> list
except Exception:
pass
import hashlib
seen_keys: set[tuple] = set()
for i, info in enumerate(info_list):
name = info.get("name", f"Monitor {i}")
model = info.get("model", "")
@@ -400,6 +403,21 @@ def list_monitors(force_refresh: bool = False, rediscover: bool = False) -> list
edid = info.get("edid", "")
resolution = _parse_edid_resolution(edid) if edid else None
# Stable cache key — EDID hash is unique per physical monitor.
# Fall back to (manufacturer, model, serial-ish) when EDID is
# missing, then to the legacy index as a last resort.
if edid:
edid_hash = hashlib.blake2b(
edid.encode("utf-8") if isinstance(edid, str) else bytes(edid),
digest_size=8,
).hexdigest()
cache_key: tuple = ("edid", edid_hash)
elif manufacturer or model:
cache_key = ("mm", manufacturer, model, name)
else:
cache_key = ("idx", i)
seen_keys.add(cache_key)
static: dict = {}
dynamic: dict = {}
@@ -409,13 +427,13 @@ def list_monitors(force_refresh: bool = False, rediscover: bool = False) -> list
if power_supported and i < len(ddc_monitors):
try:
with ddc_monitors[i] as mon:
if i not in _static_cache:
_static_cache[i] = _probe_static_open(mon, mc, i)
static = _static_cache[i]
if cache_key not in _static_cache:
_static_cache[cache_key] = _probe_static_open(mon, mc, i)
static = _static_cache[cache_key]
dynamic = _probe_dynamic_open(mon, mc, i, static)
except Exception as e:
logger.debug("Monitor %d: DDC/CI session failed: %s", i, e)
static = _static_cache.get(i, {})
static = _static_cache.get(cache_key, {})
monitors.append(MonitorInfo(
id=i,
@@ -439,6 +457,12 @@ def list_monitors(force_refresh: bool = False, rediscover: bool = False) -> list
available_picture_modes=static.get("available_picture_modes", []),
picture_mode_supported=static.get("picture_mode_supported", False),
))
# Evict cache entries for monitors that disappeared from this scan so
# the next hot-plug of a different monitor with the same identity
# tuple (e.g. same model) doesn't hit a stale entry first.
for stale_key in list(_static_cache.keys()):
if stale_key not in seen_keys:
_static_cache.pop(stale_key, None)
except Exception as e:
logger.error("Failed to enumerate monitors: %s", e)
+36 -7
View File
@@ -86,9 +86,29 @@ class _Cache:
_cache = _Cache()
# Win32 handles + signatures are declared once at module load (when running on
# Windows). The TTL cache fires this hundreds of times per minute; redoing the
# DLL load + ~10 argtype assignments per call was the largest chunk of probe
# cost. Keep these guarded behind a lazy init so non-Windows platforms don't
# pay the import.
_WIN32_INITIALIZED = False
_win32_user32 = None
_win32_kernel32 = None
_win32_psapi = None
def _init_win32_apis() -> None:
"""Declare ctypes argtypes/restype on every Win32 call we make.
CRITICAL: ctypes defaults to `c_int` (32-bit) for HANDLE/HWND/HMONITOR
which silently truncates 64-bit pointer values on x64 — that corrupts the
handle so `CloseHandle()` can either fail or close the wrong kernel
object, and pointer-equality comparisons (monitor index lookup) miss.
"""
global _WIN32_INITIALIZED, _win32_user32, _win32_kernel32, _win32_psapi
if _WIN32_INITIALIZED:
return
def _probe_windows() -> ForegroundInfo:
"""Probe foreground window state on Windows via Win32 API."""
import ctypes
import ctypes.wintypes as wt
@@ -96,11 +116,6 @@ def _probe_windows() -> ForegroundInfo:
kernel32 = ctypes.WinDLL("kernel32", use_last_error=True)
psapi = ctypes.WinDLL("psapi", use_last_error=True)
# CRITICAL: declare argtypes/restype on every Win32 call that returns a
# HANDLE/HWND/HMONITOR. ctypes defaults to `c_int` (32-bit) which
# silently truncates 64-bit pointer values on x64 — that corrupts the
# handle so `CloseHandle()` can either fail or close the wrong kernel
# object, and pointer-equality comparisons (monitor index lookup) miss.
user32.GetForegroundWindow.restype = wt.HWND
user32.GetWindowThreadProcessId.argtypes = [wt.HWND, ctypes.POINTER(wt.DWORD)]
user32.GetWindowThreadProcessId.restype = wt.DWORD
@@ -137,6 +152,20 @@ def _probe_windows() -> ForegroundInfo:
psapi.GetModuleFileNameExW.argtypes = [wt.HANDLE, wt.HMODULE, wt.LPWSTR, wt.DWORD]
psapi.GetModuleFileNameExW.restype = wt.DWORD
_win32_user32, _win32_kernel32, _win32_psapi = user32, kernel32, psapi
_WIN32_INITIALIZED = True
def _probe_windows() -> ForegroundInfo:
"""Probe foreground window state on Windows via Win32 API."""
import ctypes
import ctypes.wintypes as wt
_init_win32_apis()
user32 = _win32_user32
kernel32 = _win32_kernel32
psapi = _win32_psapi
hwnd = user32.GetForegroundWindow()
if not hwnd:
return ForegroundInfo(available=True, error="no foreground window")
@@ -2,6 +2,7 @@
import json
import logging
import re
import urllib.error
import urllib.request
from typing import Optional
@@ -15,6 +16,11 @@ _DEFAULT_BASE_URL = "https://git.dolgolyov-family.by"
_DEFAULT_OWNER = "alexei.dolgolyov"
_DEFAULT_REPO = "media-player-server"
# Restrictive tag whitelist — prevents a hostile Gitea response (or MITM) from
# injecting `..`, slashes, or URL-altering characters into the release URL we
# broadcast to clients. SemVer + pre-release suffix only.
_TAG_RE = re.compile(r"^v?\d+\.\d+\.\d+(?:[\w.\-+]{0,32})?$")
class GiteaReleaseProvider(ReleaseProvider):
"""Fetches the latest release from a Gitea repository."""
@@ -53,6 +59,9 @@ class GiteaReleaseProvider(ReleaseProvider):
continue
tag = release.get("tag_name", "")
if not isinstance(tag, str) or not _TAG_RE.match(tag):
logger.warning("Rejecting malformed release tag from upstream: %r", tag)
continue
version = tag.lstrip("v")
if not version:
continue
+6 -2
View File
@@ -264,8 +264,12 @@ class MacOSMediaController(MediaController):
async def set_volume(self, volume: int) -> bool:
"""Set system volume."""
result = self._run_osascript(f"set volume output volume {volume}")
return result is not None or True # osascript returns empty on success
# osascript returns empty string on success and None on failure (the
# _run_osascript helper catches subprocess errors). The previous
# `result is not None or True` always returned True regardless of
# outcome — surface real failures so the route can return 503.
result = self._run_osascript(f"set volume output volume {int(volume)}")
return result is not None
async def toggle_mute(self) -> bool:
"""Toggle mute state."""
+95
View File
@@ -0,0 +1,95 @@
"""In-process token-bucket rate limiter.
Light enough for a single-process app: one dict keyed by ``(bucket, peer)``
guarded by a thread lock. No extra dependency, no Redis. Good enough for
defeating credential-stuffing and runaway clients on a LAN; not a substitute
for an upstream WAF in a public deployment.
Buckets:
auth — failed-auth attempts, 5/min/peer (used in auth middleware)
execute — script + callback execute calls, 10/min/peer (LAN-friendly)
default — generic POST/DELETE writes, 60/min/peer
"""
from __future__ import annotations
import logging
import threading
import time
from dataclasses import dataclass
from typing import Optional
logger = logging.getLogger(__name__)
@dataclass
class BucketConfig:
capacity: float # max tokens (= burst size)
refill_per_sec: float # tokens added per second
# Defaults — tuned for "trusted LAN" use; operator can override via Settings.
BUCKETS: dict[str, BucketConfig] = {
"auth": BucketConfig(capacity=5, refill_per_sec=5 / 60), # 5/min
"execute": BucketConfig(capacity=10, refill_per_sec=10 / 60), # 10/min
"default": BucketConfig(capacity=60, refill_per_sec=60 / 60), # 60/min
}
_state: dict[tuple[str, str], tuple[float, float]] = {}
_lock = threading.Lock()
_LAST_CLEANUP = 0.0
def _evict_stale_locked(now: float) -> None:
"""Drop entries whose buckets are full (= idle for capacity / refill seconds)."""
global _LAST_CLEANUP
if now - _LAST_CLEANUP < 60:
return
_LAST_CLEANUP = now
stale = []
for key, (tokens, last) in _state.items():
bucket = BUCKETS.get(key[0])
if bucket is None:
continue
if tokens >= bucket.capacity and (now - last) > 3600:
stale.append(key)
for key in stale:
_state.pop(key, None)
def check(bucket: str, peer: str) -> tuple[bool, Optional[float]]:
"""Try to consume one token from ``(bucket, peer)``.
Returns:
(allowed, retry_after_seconds). When allowed=True retry_after is None.
When allowed=False, retry_after is the seconds to wait for one more token.
"""
cfg = BUCKETS.get(bucket) or BUCKETS["default"]
now = time.monotonic()
with _lock:
_evict_stale_locked(now)
tokens, last = _state.get((bucket, peer), (cfg.capacity, now))
elapsed = max(0.0, now - last)
tokens = min(cfg.capacity, tokens + elapsed * cfg.refill_per_sec)
if tokens >= 1:
tokens -= 1
_state[(bucket, peer)] = (tokens, now)
return True, None
deficit = 1 - tokens
retry = deficit / cfg.refill_per_sec if cfg.refill_per_sec > 0 else 60
_state[(bucket, peer)] = (tokens, now)
return False, retry
def get_peer(request) -> str:
"""Best-effort peer identifier from a Starlette request.
Honors X-Forwarded-For (only when settings.proxy_headers is True, which is
already enforced by uvicorn's middleware) so a reverse-proxied install
still rate-limits per real client.
"""
client = getattr(request, "client", None)
if client and client.host:
return client.host
return "unknown"
+16 -5
View File
@@ -26,12 +26,23 @@ class ThumbnailService:
def get_cache_dir() -> Path:
"""Get the thumbnail cache directory path.
Returns:
Path to the cache directory (project-local).
Returns user-writable platform cache dir so installs under
``%PROGRAMFILES%`` / ``/opt`` work without elevated permissions.
Mirrors the platform branching of ``config.get_config_dir``.
"""
# Store cache in project directory: media-server/.cache/thumbnails/
project_root = Path(__file__).parent.parent.parent
cache_dir = project_root / ".cache" / "thumbnails"
import os
if os.name == "nt":
# %LOCALAPPDATA% so the cache survives roaming-profile sync.
base = Path(os.environ.get("LOCALAPPDATA")
or os.environ.get("APPDATA")
or Path.home() / "AppData" / "Local")
cache_dir = base / "media-server" / "cache" / "thumbnails"
else:
# XDG_CACHE_HOME convention; falls back to ~/.cache.
xdg = os.environ.get("XDG_CACHE_HOME")
base = Path(xdg) if xdg else Path.home() / ".cache"
cache_dir = base / "media-server" / "thumbnails"
cache_dir.mkdir(parents=True, exist_ok=True)
return cache_dir
+9 -3
View File
@@ -33,9 +33,15 @@ class ConnectionManager:
self._audio_task: asyncio.Task | None = None
self._audio_analyzer = None
async def connect(self, websocket: WebSocket) -> None:
"""Accept a new WebSocket connection."""
await websocket.accept()
async def connect(self, websocket: WebSocket, already_accepted: bool = False) -> None:
"""Accept a new WebSocket connection.
``already_accepted=True`` is for callers that needed to call
``websocket.accept(subprotocol=...)`` themselves (token-via-subprotocol
auth path).
"""
if not already_accepted:
await websocket.accept()
async with self._lock:
self._active_connections.add(websocket)
logger.info(
+51 -23
View File
@@ -31,8 +31,15 @@ def _thread_loop() -> asyncio.AbstractEventLoop:
_thread_local.loop = loop
return loop
# Global storage for current album art (as bytes)
# Global storage for current album art (as bytes). Guarded by _art_lock so the
# WinRT polling thread and the FastAPI handler thread don't race on swap.
_current_album_art_bytes: bytes | None = None
_art_lock = threading.Lock()
# Identity of the track whose art is currently in _current_album_art_bytes.
# Used to gate the expensive WinRT thumbnail.open_read_async() so the bytes
# aren't re-decoded on every 500ms status poll.
_current_album_art_key: tuple | None = None
# Lock protecting _position_cache and _track_skip_pending from concurrent access
_position_lock = threading.Lock()
@@ -56,8 +63,9 @@ _track_skip_pending = {
def get_current_album_art() -> bytes | None:
"""Get the current album art bytes."""
return _current_album_art_bytes
"""Get the current album art bytes (thread-safe snapshot)."""
with _art_lock:
return _current_album_art_bytes
# Windows-specific imports
try:
@@ -379,28 +387,48 @@ def _sync_get_media_status() -> dict[str, Any]:
except Exception as e:
logger.debug(f"Timeline parse error: {e}")
# Try to get album art (requires media_props)
# Try to get album art (requires media_props). Gated by track key so
# the WinRT IPC + bytes copy only runs when the track actually
# changes; otherwise we just preserve the existing cached bytes.
if media_props:
try:
thumbnail = media_props.thumbnail
if thumbnail:
stream = loop.run_until_complete(thumbnail.open_read_async())
if stream:
size = stream.size
if size > 0 and size < 10 * 1024 * 1024: # Max 10MB
from winsdk.windows.storage.streams import DataReader
reader = DataReader(stream)
loop.run_until_complete(reader.load_async(size))
buffer = bytearray(size)
reader.read_bytes(buffer)
reader.close()
stream.close()
track_key = (
getattr(media_props, "title", "") or "",
getattr(media_props, "artist", "") or "",
getattr(media_props, "album_title", "") or "",
)
global _current_album_art_bytes, _current_album_art_key
if track_key == _current_album_art_key and _current_album_art_bytes:
# Same track — reuse cached art bytes without touching WinRT.
result["album_art_url"] = "/api/media/artwork"
else:
try:
thumbnail = media_props.thumbnail
if thumbnail:
stream = loop.run_until_complete(thumbnail.open_read_async())
if stream:
size = stream.size
if size > 0 and size < 10 * 1024 * 1024: # Max 10MB
from winsdk.windows.storage.streams import DataReader
reader = DataReader(stream)
loop.run_until_complete(reader.load_async(size))
buffer = bytearray(size)
reader.read_bytes(buffer)
reader.close()
stream.close()
global _current_album_art_bytes
_current_album_art_bytes = bytes(buffer)
result["album_art_url"] = "/api/media/artwork"
except Exception as e:
logger.debug(f"Failed to get album art: {e}")
with _art_lock:
_current_album_art_bytes = bytes(buffer)
_current_album_art_key = track_key
result["album_art_url"] = "/api/media/artwork"
else:
# No thumbnail on this track — drop stale bytes so
# the ETag flips and clients don't keep showing the
# previous album's cover.
with _art_lock:
_current_album_art_bytes = None
_current_album_art_key = track_key
except Exception as e:
logger.debug(f"Failed to get album art: {e}")
result["source"] = session.source_app_user_model_id