10d30fc956
Comprehensive multi-area pass driven by a parallel 8-agent production
review. Frontend, backend, database, security, performance, operational,
plus a new self-monitoring feature.
## Critical fixes
- Planka webhook: reads bounded raw body (was NameError on every call)
- HA quiet hours: ha_state_changed/automation_triggered/service_called/
event_fired added to deferrable set (were silently dropped)
- DNS-rebinding SSRF: PinnedResolver wired into shared aiohttp session
- Telegram inbound webhook: secret now mandatory (401 without)
- Generic webhook: auth_mode="none" requires explicit
acknowledge_unauthenticated=true; per-IP rate limit 60/min
- svelte-check: 5 null-narrowing errors in EventDetailModal fixed
- Provider hardcoding: Immich-only block extracted to descriptor
featureDiscoveryHint
- command_sync: snapshot+expunge bot before exiting AsyncSession
## Bug fixes
- notifier asyncio.gather(return_exceptions=True) — one bad chat no longer
cancels peer sends
- NotificationDispatcher hoisted out of per-tracker loop
- Provider credential resolution unified across all 5 dispatch sites
- HA asyncio.shield now drains inner task on cancellation
- Provider construction switched from if/elif ladder to factory registry
- NUT first poll seeds silently (no spurious ups_on_battery)
- Quiet-hours gate: event-type-disabled now wins over deferral
- APScheduler drain job ID resolution upgraded to seconds
- HA on_status_change wired through to EventLog
- Webhook payload rollback failures now logged (not swallowed)
- Batched receivers/chats/bots in load_link_data (was per-target N+1)
- flag_modified on JSON column reassignments in deferred_dispatch
## Database
- UNIQUE indexes on service_provider.webhook_token,
telegram_bot.webhook_path_id, partial UNIQUE on telegram_bot.bot_id,
telegram_chat(bot_id, chat_id), notification_tracker_target unique link,
partial UNIQUE on bridge_self provider per user
- Composite ix_event_log_user_event_type_created index
- save_chat_from_webhook switched to ON CONFLICT DO UPDATE
- ondelete=CASCADE on user-id FKs (model annotation; app-side cascade
delete added for existing data)
- delete_notification_tracker converted from N+1 to bulk DELETE/UPDATE
- Module-level asyncio.Lock replaced with lazy _get_lock() pattern
- VACUUM INTO snapshot now PRAGMA integrity_check verified
## Performance
- Jinja2 template compilation LRU cached (lru_cache maxsize=512)
- Per-locale render cache in NotificationDispatcher (skips re-rendering
identical content for receivers sharing a locale)
- Tracker list cached per provider_id with 5s TTL + explicit invalidation
on tracker CRUD (relieves HA chat-bus rate query pressure)
- Nav-counts collapsed from 16 round-trips to single UNION ALL
- HA event_log: skip persisting empty assets_added/removed events
## Security hardening
- Mass-assignment guard on Action create/update; cron sub-minute reject
- Backup JSON depth/node-count cap (depth ≤ 10, nodes ≤ 100k)
- _sanitize_config extended to all JSON-typed fields on backup import
- Telegram _safe_get walks redirects manually with SSRF revalidation
- Bcrypt 72-byte password length cap with clear 422
- Webhook payload body redaction; sensitive substring set extended with
oauth/client_secret/webhook_secret/csrf in both header filter and
template extras filter
## Frontend
- 76 catch (err: any) sites converted to errMsg(err) helper
- globalProviderFilter: pure getter; reconciliation moved to one-time
$effect in +layout
- Provider-filter binding: removed paired $effects + _syncingFilter flag,
now one-way derived
- entity-cache: separate _refreshing flag for background re-fetches
- api.ts 401 handling: AuthRedirectError class + dedup _redirecting flag,
goto() instead of window.location.href
- a11y: aria-expanded on mobile More, role=switch + aria-checked on
Telegram bot toggles
## Tests & operations
- CI pytest gate added to .gitea/workflows/build.yml + release.yml
(wheel-built install to dodge editable-install slowness)
- /api/ready upgraded to deep healthcheck (db SELECT 1, scheduler.running,
HA supervisor presence) returning {ready, checks, errors, version}
- /api/metrics endpoint with prometheus_client (deferred_pending,
event_log_total, dispatch_duration, poll_failures, send_failures)
- New OPERATIONS.md covering deploy, healthchecks, metrics, backup/restore
procedures, log handling, common scenarios, upgrade flow
- New tests: test_bridge_self (11), test_gitea_parser (9),
test_planka_parser (6), test_immich_change_detector (6),
test_backup_roundtrip (1)
## New feature: bridge self-monitoring
- New bridge_self provider type — internal sink for bridge health events
- Three event types: bridge_self_poll_failures (consecutive tracker poll
failures), bridge_self_deferred_backlog (pending count crosses
threshold), bridge_self_target_failures (consecutive 5xx/network
failures per target)
- Per-user thresholds (defaults: 3 / 100 / 5) configurable via the
provider config form
- Auto-seeded on user create + /setup + boot backfill for existing users
- Anti-spam: counters reset after emission; backlog uses transition latch
- Self-loop guard: bridge_self failures don't count toward target-failure
thresholds (logged only) — wire to your own Telegram/Email/Matrix to
get notified when polls/dispatches/sends fail
- 6 default templates (3 events × 2 locales), tracking config columns
with backfill migration, frontend descriptor (excluded from "create
provider" wizard since auto-managed)
Operator-visible behavior changes (call out in release notes):
- NOTIFY_BRIDGE_TELEGRAM_WEBHOOK_SECRET now REQUIRED for webhook mode
- Existing webhook providers with auth_mode="none" need explicit opt-in
- Generic webhook endpoint rate-limited 60/min per source IP
- HA disconnect/reconnect writes ha_status_* EventLog rows
- Every user gets a bridge_self provider — wire it to a target to
receive failure alerts
Pre-existing test failures (test_ssrf, test_release_provider) on
Python 3.13 are unrelated; CI runs on 3.12.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
225 lines
7.9 KiB
Python
225 lines
7.9 KiB
Python
"""Authentication API routes."""
|
|
|
|
import asyncio
|
|
|
|
from fastapi import APIRouter, Depends, HTTPException, Request, status
|
|
from pydantic import BaseModel
|
|
from slowapi import Limiter
|
|
from slowapi.util import get_remote_address
|
|
from sqlmodel import func, select
|
|
from sqlmodel.ext.asyncio.session import AsyncSession
|
|
|
|
import bcrypt
|
|
|
|
from ..database.engine import get_session
|
|
from ..database.models import User
|
|
from .dependencies import get_current_user
|
|
from .jwt import create_access_token, create_refresh_token, decode_token
|
|
|
|
router = APIRouter(prefix="/api/auth", tags=["auth"])
|
|
|
|
# Default rate limit applied by SlowAPIMiddleware to every route that does NOT
|
|
# specify its own @limiter.limit(...) — protects against blanket abuse.
|
|
limiter = Limiter(key_func=get_remote_address, default_limits=["600/minute"])
|
|
|
|
|
|
class SetupRequest(BaseModel):
|
|
username: str
|
|
password: str
|
|
|
|
|
|
class LoginRequest(BaseModel):
|
|
username: str
|
|
password: str
|
|
|
|
|
|
class TokenResponse(BaseModel):
|
|
access_token: str
|
|
refresh_token: str
|
|
token_type: str = "bearer"
|
|
|
|
|
|
class UserResponse(BaseModel):
|
|
id: int
|
|
username: str
|
|
role: str
|
|
|
|
|
|
class RefreshRequest(BaseModel):
|
|
refresh_token: str
|
|
|
|
|
|
async def _hash_password(password: str) -> str:
|
|
"""bcrypt.hashpw is CPU-bound (~200-500ms); never run it on the event loop.
|
|
|
|
Caller is responsible for length-validating ``password`` against the
|
|
72-byte bcrypt cap before calling — bcrypt silently truncates beyond
|
|
that, which is a correctness footgun, not a security one.
|
|
"""
|
|
|
|
def _work() -> str:
|
|
return bcrypt.hashpw(password.encode(), bcrypt.gensalt()).decode()
|
|
|
|
return await asyncio.to_thread(_work)
|
|
|
|
|
|
# bcrypt's algorithm cap — the underlying primitive truncates input
|
|
# beyond this so two distinct passwords sharing a 72-byte prefix would
|
|
# verify identically. We reject up-front with a clear 422 message.
|
|
_BCRYPT_MAX_PASSWORD_BYTES = 72
|
|
|
|
|
|
def _check_bcrypt_length(password: str) -> None:
|
|
if len(password.encode("utf-8")) > _BCRYPT_MAX_PASSWORD_BYTES:
|
|
raise HTTPException(
|
|
status_code=422,
|
|
detail=(
|
|
f"Password too long; bcrypt limit is "
|
|
f"{_BCRYPT_MAX_PASSWORD_BYTES} bytes (longer passwords would "
|
|
"be silently truncated)"
|
|
),
|
|
)
|
|
|
|
|
|
async def _verify_password(password: str, hashed: str) -> bool:
|
|
def _work() -> bool:
|
|
try:
|
|
return bcrypt.checkpw(password.encode(), hashed.encode())
|
|
except ValueError:
|
|
# Malformed hash in DB — treat as mismatch, never raise to caller.
|
|
return False
|
|
|
|
return await asyncio.to_thread(_work)
|
|
|
|
|
|
@router.post("/setup", response_model=TokenResponse)
|
|
@limiter.limit("3/minute")
|
|
async def setup(request: Request, body: SetupRequest, session: AsyncSession = Depends(get_session)):
|
|
if len(body.password) < 8:
|
|
raise HTTPException(status_code=400, detail="Password must be at least 8 characters")
|
|
_check_bcrypt_length(body.password)
|
|
# Compute hash BEFORE opening the transaction so we don't hold a writer lock
|
|
# during the CPU-bound bcrypt work.
|
|
hashed = await _hash_password(body.password)
|
|
|
|
# Serialize setup via an INSERT-inside-transaction-with-count-guard.
|
|
# SQLite's writer lock plus the count check inside the transaction closes
|
|
# the TOCTOU window between two concurrent POSTs. We ignore id=0 — that's
|
|
# the internal "__system__" placeholder used for ownership of default
|
|
# templates, never a real admin.
|
|
async with session.begin():
|
|
result = await session.exec(
|
|
select(func.count()).select_from(User).where(User.id != 0)
|
|
)
|
|
count = result.one()
|
|
if count > 0:
|
|
raise HTTPException(
|
|
status_code=status.HTTP_409_CONFLICT,
|
|
detail="Setup already completed.",
|
|
)
|
|
user = User(username=body.username, hashed_password=hashed, role="admin")
|
|
session.add(user)
|
|
await session.refresh(user)
|
|
|
|
# Auto-create the bridge_self provider for the new admin so internal-
|
|
# failure notifications work out of the box. Best-effort — a seeding
|
|
# failure should not abort setup.
|
|
try:
|
|
from ..database.seeds import ensure_bridge_self_provider_for_user
|
|
await ensure_bridge_self_provider_for_user(session, user.id)
|
|
await session.commit()
|
|
except Exception: # noqa: BLE001
|
|
await session.rollback()
|
|
|
|
return TokenResponse(
|
|
access_token=create_access_token(user.id, user.role, user.token_version),
|
|
refresh_token=create_refresh_token(user.id, user.token_version),
|
|
)
|
|
|
|
|
|
@router.post("/login", response_model=TokenResponse)
|
|
@limiter.limit("5/minute")
|
|
async def login(request: Request, body: LoginRequest, session: AsyncSession = Depends(get_session)):
|
|
result = await session.exec(select(User).where(User.username == body.username))
|
|
user = result.first()
|
|
# Always run a bcrypt verification to keep the response time constant,
|
|
# preventing username-enumeration via timing side channel.
|
|
password_ok = await _verify_password(
|
|
body.password,
|
|
user.hashed_password if user else "$2b$12$" + "a" * 53,
|
|
)
|
|
if not user or not password_ok:
|
|
raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid username or password")
|
|
|
|
return TokenResponse(
|
|
access_token=create_access_token(user.id, user.role, user.token_version),
|
|
refresh_token=create_refresh_token(user.id, user.token_version),
|
|
)
|
|
|
|
|
|
@router.post("/refresh", response_model=TokenResponse)
|
|
@limiter.limit("10/minute")
|
|
async def refresh(request: Request, body: RefreshRequest, session: AsyncSession = Depends(get_session)):
|
|
import jwt as pyjwt
|
|
try:
|
|
payload = decode_token(body.refresh_token)
|
|
if payload.get("type") != "refresh":
|
|
raise HTTPException(status_code=401, detail="Invalid token type")
|
|
user_id = int(payload["sub"])
|
|
token_version = int(payload.get("ver", 1))
|
|
except (pyjwt.PyJWTError, KeyError, ValueError) as exc:
|
|
raise HTTPException(status_code=401, detail="Invalid refresh token") from exc
|
|
|
|
user = await session.get(User, user_id)
|
|
if not user:
|
|
raise HTTPException(status_code=401, detail="User not found")
|
|
if token_version != user.token_version:
|
|
raise HTTPException(status_code=401, detail="Refresh token revoked")
|
|
|
|
return TokenResponse(
|
|
access_token=create_access_token(user.id, user.role, user.token_version),
|
|
refresh_token=create_refresh_token(user.id, user.token_version),
|
|
)
|
|
|
|
|
|
@router.get("/me", response_model=UserResponse)
|
|
async def me(user: User = Depends(get_current_user)):
|
|
return UserResponse(id=user.id, username=user.username, role=user.role)
|
|
|
|
|
|
class PasswordChangeRequest(BaseModel):
|
|
current_password: str
|
|
new_password: str
|
|
|
|
|
|
@router.put("/password")
|
|
@limiter.limit("10/minute")
|
|
async def change_password(
|
|
request: Request,
|
|
body: PasswordChangeRequest,
|
|
user: User = Depends(get_current_user),
|
|
session: AsyncSession = Depends(get_session),
|
|
):
|
|
if not await _verify_password(body.current_password, user.hashed_password):
|
|
raise HTTPException(status_code=400, detail="Current password is incorrect")
|
|
if len(body.new_password) < 8:
|
|
raise HTTPException(status_code=400, detail="New password must be at least 8 characters")
|
|
_check_bcrypt_length(body.new_password)
|
|
user.hashed_password = await _hash_password(body.new_password)
|
|
user.token_version = (user.token_version or 1) + 1
|
|
session.add(user)
|
|
await session.commit()
|
|
return {"success": True}
|
|
|
|
|
|
@router.get("/needs-setup")
|
|
@limiter.limit("30/minute")
|
|
async def needs_setup(request: Request, session: AsyncSession = Depends(get_session)):
|
|
# Exclude the internal __system__ placeholder (id=0) from the count so
|
|
# a fresh install still reports needs_setup=True.
|
|
result = await session.exec(
|
|
select(func.count()).select_from(User).where(User.id != 0)
|
|
)
|
|
count = result.one()
|
|
return {"needs_setup": count == 0}
|