Files

T

alexei.dolgolyov 6a8f374678 feat: observability, per-receiver Telegram options, oversized-video fallback

Operability:
- Correlation IDs end-to-end: shared dispatch_id between log lines and
  EventLog rows (event/watcher/scheduled/deferred/action/HA/command paths)
  and a new X-Request-Id middleware that normalizes inbound ids and binds
  request_id into log context.
- dispatch_summary block merged into EventLog.details: per-target
  success/failure counts plus Telegram media delivered/skipped/failed and
  truncated error lists, so partial outcomes surface in the UI.
- Diagnostic mode: admin can flip one module to DEBUG for a bounded
  window with auto-revert (in-memory only; setup_logging() resets on
  boot, lifespan reverts on shutdown). New /diagnostic-mode endpoints
  plus DiagnosticsCassette UI on the settings page.

Telegram:
- Per-receiver options: disable_notification (silent send) and
  message_thread_id (forum-topic routing), wired through the dispatcher
  via a ContextVar so all four send sites (sendMessage / sendPhoto-Video-
  Document / sendMediaGroup / cache-hit POST) pick them up.
- send_large_videos_as_documents target setting: bypass the 50 MB
  sendVideo cap by falling back to sendDocument for oversized videos.
- sendMediaGroup byte-budget enforcement (TELEGRAM_MAX_GROUP_TOTAL_BYTES,
  45 MB) with per-item fallback on chunk failure so a stale file_id no
  longer silently drops a cached asset.

Tests:
- New: diagnostic_mode, dispatch_summary, request_correlation,
  telegram_media_group_partial, telegram_per_send_options.

Docs:
- .claude/reviews/: six-axis production-readiness review of v0.8.1.
- .claude/docs/functional-review-2026-05-28.md: focused review of
  Telegram/Immich/logging subsystems.

2026-05-28 15:19:31 +03:00

30 KiB

Raw Permalink Blame History

Security Review — notify-bridge v0.8.1

Reviewer: security-reviewer (Opus 4.7) — 2026-05-22 Branch: master @ a20635a Scope: packages/server, packages/core, frontend/src, Dockerfile, docker-compose.yml, .gitea/workflows/, env handling.

Executive Summary

Overall posture is strong. The project applies many non-obvious controls correctly: Jinja2 SandboxedEnvironment on every render path; bcrypt with a 72-byte length guard and constant-time login (dummy hash on missing user); JWT with token_version revocation; SSRF guard with CGNAT, IPv4-mapped-IPv6 unwrapping, and a PinnedResolver that defeats DNS rebinding; secret-masking log filter; path-traversal-safe backup file resolver; security headers + CSP; non-root Docker user; required SECRET_KEY >= 32 chars with a rejection list; non-default Telegram webhook secret enforced; HMAC signature checks on Gitea/Generic webhooks; provider-config secret masking on GET; ownership checks (get_owned_entity) on every parameterised route I sampled.
HIGH — Home Assistant access_token is not masked. It is stored in provider.config, never added to the mask list in _provider_response, never added to the placeholder-drop list in update_provider. Any logged-in user can GET /api/providers/{id} and read their HA token in cleartext, and a partial save will wipe it. Trivial fix.
HIGH — Secrets at rest are plaintext. Telegram bot tokens (telegram_bot.token), provider configs containing api_key/api_token/webhook_secret/access_token/SMTP passwords, and email-bot SMTP passwords are stored unencrypted in SQLite. Disk theft, an unrelated read primitive, or any backup leak exposes all credentials. The masking on the API is good UX, but the DB itself has no encryption-at-rest. The exported JSON backup respects a secrets_mode flag (good) but the live DB does not.
MEDIUM — Template-preview endpoints bypass the timeout/size watchdog. template_configs.preview_config, template_configs.preview_raw, command_template_configs.preview_raw, and notifier.send_test_template_notification construct fresh SandboxedEnvironment(autoescape=False) instances and call .render(...) directly. The hardened helper render_template() (timeout, source cap, output cap, autoescape) is bypassed. A logged-in user can wedge a worker thread with {% for i in range(10**8) %}x{% endfor %}. Single-tenant deployment limits the blast radius, but the renderer should be the single chokepoint.
MEDIUM — Login rate limit is per-IP only. POST /api/auth/login @ 5/min keys on get_remote_address. An attacker behind a proxy / NAT, or one that rotates source IPs (cheap on residential / cloud), trivially bypasses it. There is no per-username lockout, no exponential backoff, no captcha. Combined with no MFA, this leaves the admin account vulnerable to a slow online dictionary attack from a single password (8-char minimum, no complexity requirement).
LOW / INFO — Several smaller findings: webhook payload logs persist source payload (now with key-level redaction, but the redactor is name-based and will miss high-entropy secret values in non-obvious keys); no replay protection on inbound webhooks (no nonce/timestamp window); the /api/auth/setup 3/min limit + JWT issuance race window is hardened with a transaction count guard (good), but the dummy bcrypt hash literal used for timing-equalisation is malformed and bcrypt.checkpw returns False via ValueError — the swallowed exception still equalises timing, but a maintainer could regress this; CSP allows script-src 'unsafe-inline' (necessary for SvelteKit hydration, acceptable risk acknowledged in code).

Findings

CRITICAL

None found.

HIGH

H-1. Home Assistant access_token leaked in provider GET responses

CWE: CWE-522 (Insufficiently Protected Credentials), CWE-200 (Exposure of Sensitive Information)
Files:
- packages/server/src/notify_bridge_server/api/providers.py:616-624 — _provider_response masks ("api_key", "api_token", "webhook_secret", "password", "client_secret", "refresh_token") but not access_token.
- packages/server/src/notify_bridge_server/api/providers.py:399-405 — update_provider also omits access_token from the placeholder-drop list, so the response masking is consistent here, but if you fix one you must fix the other.
Scenario: Any user authenticated to the bridge (any role) calls GET /api/providers/{id} for an HA provider they own and the response includes config.access_token in cleartext. The HA long-lived token grants full control of the user's Home Assistant instance (lights, locks, cameras, scripts, devices). In a multi-user deployment, even within the same admin account, a stolen JWT exfiltrates the HA token; in a single-user deployment, any read primitive (XSS via a future template feature, an MITM on an HTTPS misconfiguration) gives the same result.
Remediation: Add access_token to both lists.

# providers.py:_provider_response
for secret_field in (
    "api_key", "api_token", "webhook_secret", "password",
    "client_secret", "refresh_token", "access_token",  # <-- add
):
    ...

# providers.py:update_provider
for secret_field in (
    "api_key", "api_token", "webhook_secret", "password",
    "client_secret", "refresh_token", "access_token",  # <-- add
):
    value = incoming.get(secret_field)
    if isinstance(value, str) and value.startswith("***"):
        incoming.pop(secret_field, None)

Better still: replace the hand-maintained tuple with a single module-level constant _PROVIDER_SECRET_FIELDS referenced from both call sites, plus a unit test that asserts every field declared on the per-provider Pydantic configs whose name appears in a denylist (token, secret, password, key, credential) is in the set. That prevents the next provider type from re-introducing the same gap.

H-2. Secrets stored in plaintext at rest

CWE: CWE-312 (Cleartext Storage of Sensitive Information), CWE-256 (Plaintext Storage of a Password)
Files:
- packages/server/src/notify_bridge_server/database/models.py:54-84 — TelegramBot.token: str
- packages/server/src/notify_bridge_server/database/models.py:87-100 — MatrixBot (access_token in config)
- ServiceProvider.config: dict[str, Any] (JSON column) holds Immich api_key, Gitea webhook_secret + api_token, Google Photos client_secret + refresh_token, HA access_token, etc.
- EmailBot.smtp_password: str (per api/email_bots.py:142)
Scenario: An attacker who can read the SQLite file (compromised host, mis-permissioned backup volume, snapshot artifact in data_dir/backups/, leaked debug dump) gets every credential the bridge speaks: Telegram bot tokens (full bot control), Immich/Gitea/Planka API keys (read all photos / repos), Google Photos refresh tokens (long-lived, hard to revoke at scale), HA long-lived tokens (smart-home), SMTP passwords. The pre-migrate VACUUM-INTO snapshots (packages/server/src/notify_bridge_server/database/snapshot.py) inherit the same plaintext exposure and live alongside the active DB.
Remediation options, in order of effort:
1. Short term: document the threat in OPERATIONS.md, enforce file-system permissions on /data (the Dockerfile chowns to appuser already, but the host bind-mount must be chmod 700), and ensure backups are encrypted at the storage layer (S3 SSE / Borg / restic).
2. Better: column-level encryption with a key derived from NOTIFY_BRIDGE_SECRET_KEY (or a separate NOTIFY_BRIDGE_DB_ENCRYPTION_KEY). Use the cryptography library's Fernet for each sensitive column; envelope the secret JSON keys, not the whole row, so WHERE clauses and existing migrations keep working. Add a one-shot migration that re-encrypts existing rows.
3. Best: encrypt with a KMS-backed key (HashiCorp Vault Transit, AWS KMS) and rotate per-secret data keys. This is overkill for a homelab homeserver-style deployment but mandatory if the bridge is ever multi-tenant.
Skeleton for option 2:

# new file packages/server/src/notify_bridge_server/security/secretbox.py
from cryptography.fernet import Fernet, InvalidToken
from .config import settings

def _key() -> bytes:
    # Derive a deterministic Fernet key from secret_key. Anyone with secret_key
    # can decrypt — same threat model as JWT signing — but anyone with the DB
    # alone cannot.
    import base64, hashlib
    h = hashlib.sha256(settings.secret_key.encode()).digest()
    return base64.urlsafe_b64encode(h)

_fernet = Fernet(_key())

def encrypt_secret(plaintext: str) -> str:
    return _fernet.encrypt(plaintext.encode()).decode()

def decrypt_secret(ciphertext: str) -> str:
    return _fernet.decrypt(ciphertext.encode()).decode()

Apply at write time in update_provider / create_provider, decrypt at read time inside make_immich_provider, make_gitea_provider, the Telegram client constructor, etc. Add a migration that scans every ServiceProvider.config JSON and re-encrypts the listed keys in place.

MEDIUM

M-1. Template preview endpoints skip the renderer watchdog

CWE: CWE-400 (Uncontrolled Resource Consumption), CWE-1333 (Inefficient Regular Expression Complexity — analogous)
Files:
- packages/server/src/notify_bridge_server/api/template_configs.py:608-613 — preview_config calls SandboxedEnvironment(autoescape=False).from_string(template_body).render(...) directly.
- packages/server/src/notify_bridge_server/api/slot_helpers.py:72-90 — render_template_preview (used by /preview-raw for both notification and command templates).
- packages/server/src/notify_bridge_server/services/notifier.py:494-499 — send_test_template_notification.
- The hardened helper packages/core/src/notify_bridge_core/templates/renderer.py:48-108 (with timeout, length caps, output cap) is not used here.
Scenario: An authenticated admin submits {% for i in range(10**8) %}x{% endfor %} to POST /api/template-configs/preview-raw. Jinja2 has no built-in timeout. The sandbox blocks attribute access but not CPU. The request blocks the FastAPI event loop's executor thread until the worker oomkills or the client times out. Repeat to DoS the API.
Remediation: Route every render through a single, hardened helper.

# Use the existing core helper consistently
from notify_bridge_core.templates.renderer import render_template
rendered = render_template(template_str, context)  # already has timeout + caps

For the strict-undefined two-pass validation in render_template_preview, fold the watchdog into the helper itself rather than skipping it.

CWE: CWE-307 (Improper Restriction of Excessive Authentication Attempts)
Files: packages/server/src/notify_bridge_server/auth/routes.py:140-157.
Scenario: @limiter.limit("5/minute") keyed on get_remote_address gives 5 attempts per source IP per minute = ~7,200/day per IP. An attacker rotating across 10 IPs (cheap cloud, residential proxies, even a Tor exit pool) gets 72,000/day. With the 8-character minimum password and no complexity requirement, a 7-char-and-common password is reachable in days, not centuries. There is no per-username lockout, no captcha, no MFA.
Remediation:
1. Add a per-username sliding-window limiter on top of the per-IP one. Use a second Limiter whose key_func returns the lower-cased username from the body. Re-check after parsing the body.
2. Add an exponential lockout: after N consecutive failures for a username, require a cooldown (record in a LoginFailure table or in-memory TTLCache).
3. Document and recommend deploying behind a reverse proxy that adds CAPTCHA / WAF rate-limiting for login (Cloudflare Turnstile is cheap).
4. Track and log failed logins (auth-event audit trail) with src IP + username + timestamp.

# Sketch — a second limiter that keys by username from the parsed body.
async def _check_username_quota(username: str) -> None:
    # In-memory TTLCache: 10 attempts per username per 15 minutes
    if _username_attempts[username] >= 10:
        raise HTTPException(429, "Too many attempts for this account")
    _username_attempts[username] += 1

M-3. Webhook payload log redactor is keyword-based, misses value-based secrets

CWE: CWE-532 (Insertion of Sensitive Information into Log File)
Files: packages/server/src/notify_bridge_server/api/webhooks.py:326-358.
Scenario: _redact_sensitive_body walks the JSON and redacts values whose keys contain token, auth, key, secret, etc. A webhook provider that ships secrets under an innocent key (e.g. "oauth_state": "ya29.a0...", "continuation": "ABCDE...", "x_state": "...") leaves the secret in the persisted payload log. The log row is admin-readable and exported in backups.
Remediation: Layer a high-entropy value detector on top of the key matcher (e.g. anything matching [A-Za-z0-9_\-+/=]{32,} and high Shannon entropy ≥ 3.5). Lower bound: also redact known prefixes (ya29., xoxb-, ghp_, glpat_, sk-, Bearer ).

M-4. Webhook ingestion has no replay protection

CWE: CWE-294 (Authentication Bypass by Capture-replay)
Files: packages/server/src/notify_bridge_server/api/webhooks.py — Gitea/Planka/Generic.
Scenario: An attacker who once intercepts a signed Gitea push event (network downgrade, log leak from a proxy, exfil from the Gitea side) can replay it indefinitely. The HMAC stays valid; the bridge has no nonce / timestamp window / delivery-ID cache. With a webhook that fires assets_added it's just noise. With a webhook that triggers an action (planka card-created → /api/actions/{id}/execute chained logic), it could be more.
Remediation: For Gitea, store the last N X-Gitea-Delivery UUIDs per provider and reject duplicates; cap with a partial unique index. For the generic webhook, add an optional replay_window_seconds + a timestamp-extracting JSONPath in the provider config. Constant-time string compare.

M-5. `bcrypt.checkpw` dummy-hash literal is malformed

CWE: CWE-208 (Observable Timing Discrepancy) — partial.
Files: packages/server/src/notify_bridge_server/auth/routes.py:147-152.
Scenario: When the username doesn't exist, the code calls _verify_password(body.password, "$2b$12$" + "a" * 53). That hash is not a real bcrypt hash; bcrypt.checkpw raises ValueError which _verify_password swallows and returns False. The exception path is faster than a real bcrypt verify (no key schedule), so timing of "user does not exist" differs from "user exists, wrong password" — a maintainer changing the swallow behaviour later could regress this entirely.
Remediation: Cache one valid dummy bcrypt hash at module load time so the verify path actually runs the KDF.

_DUMMY_BCRYPT_HASH = bcrypt.hashpw(b"x", bcrypt.gensalt()).decode()  # module load
...
password_ok = await _verify_password(
    body.password,
    user.hashed_password if user else _DUMMY_BCRYPT_HASH,
)

M-6. Setup endpoint relies on `User.id != 0` filter — robust but a single typo breaks it

CWE: CWE-302 (Authentication Bypass) — defence-in-depth.
Files: packages/server/src/notify_bridge_server/auth/routes.py:97-119.
Scenario: POST /api/auth/setup is gated by "no users with id != 0". The __system__ sentinel is id=0. If a future migration changes the sentinel id, or the WHERE clause is dropped during a refactor, setup re-opens silently and an internet-reachable bridge would let an attacker claim the admin account.
Remediation: Add a defence-in-depth flag AppSetting.setup_completed=true set during the first successful setup, and require it to be unset (in addition to the count check). This bakes the invariant into a single boolean that's easier to audit.

M-7. Anonymous Prometheus metrics endpoint leaks operational data

CWE: CWE-200 (Exposure of Sensitive Information to an Unauthorized Actor)
Files: packages/server/src/notify_bridge_server/api/metrics.py:138-159.
Notes: This is documented and gated by NOTIFY_BRIDGE_METRICS_ENABLED, and the comment explicitly says scrapers don't authenticate. Acceptable when the API port is firewalled to the scraper. Surface it here as informational so an operator who exposes the API directly to the internet (e.g. via reverse-proxy without an ACL) doesn't accidentally expose dispatch rates, provider names, queue depths.
Remediation: keep the env flag, but additionally allow metrics_basic_auth_user / metrics_basic_auth_password as a soft credential check on the endpoint so a "default enabled, default protected" mode is possible. Document the threat in OPERATIONS.md next to the env var.

LOW

L-1. CSP allows `'unsafe-inline'` for scripts

CWE: CWE-1021 (Improper Restriction of Rendered UI Layers or Frames) — adjacent.
File: packages/server/src/notify_bridge_server/main.py:186-201.
Notes: Comment explicitly justifies it — SvelteKit static adapter emits an inline bootstrap. Acceptable, but 'strict-dynamic' with a per-page nonce (or moving the bootstrap into a hashed external module) eliminates the gap entirely. Track as INFO unless future XSS-injection paths emerge.

L-2. CSP `style-src 'unsafe-inline'` allows inline-style XSS payloads

CWE: CWE-79 (Cross-site Scripting) — defence-in-depth.
Same file as L-1. Inline styles are not directly executable, but they are a known vector for click-jacking and data-exfil via CSS selectors. Same remediation path: nonce-based CSP.

L-3. `frame-ancestors 'none'` but no `X-Frame-Options: DENY` collision (false — it is set)

INFO only. Both X-Frame-Options: DENY and frame-ancestors 'none' are set; modern browsers honour CSP, legacy ones honour XFO. Good.

L-4. Webhook `_filter_headers` allowlist accepts unknown `X-*` headers

CWE: CWE-532
File: packages/server/src/notify_bridge_server/api/webhooks.py:361-374.
Notes: The filter strips known sensitive headers, then accepts any X-*. A custom auth header like X-Custom-Authentication: <token> would slip past the substring check if the name doesn't contain auth/token/key/secret/etc. Low risk because the well-known providers we support don't ship such headers, but a misconfigured generic webhook will leave a credential in the log row.
Remediation: invert the policy — explicit allowlist for known-safe X-* headers (e.g. X-Forwarded-For is also borderline since it can carry PII).

L-5. `external_url` setting is not validated against an allow-list

CWE: CWE-918 (SSRF), CWE-79 (XSS in the rendered Telegram webhook URL).
File: packages/server/src/notify_bridge_server/api/app_settings.py:329-339 reads, packages/server/src/notify_bridge_server/api/telegram_bots.py:247 writes it into the registered Telegram webhook URL.
Notes: An admin can set external_url to anything. The value is used to build the URL passed to Telegram in setWebhook. Telegram itself enforces an HTTPS-only allow-list, so the actual risk is bounded. Still — validate scheme + host + that it doesn't include credentials or fragments.

L-6. Bot token GET endpoint is intentional but worth auditing

File: packages/server/src/notify_bridge_server/api/telegram_bots.py:148-156.
Notes: GET /api/telegram-bots/{bot_id}/token returns the full Telegram bot token to the owner. Used by the frontend to construct webhook URLs. Limiting to a single short-lived nonce per register_bot_webhook flow would be safer than exposing the token directly. Currently INFO; revisit if a multi-user role model lands.

L-7. SQLite journal mode + backup snapshot file permissions

File: packages/server/src/notify_bridge_server/database/snapshot.py:60-95.
Notes: Snapshots are written via VACUUM INTO 'path'. They land in data_dir/backups/ with default umask permissions. In the Docker image the dir is owned by appuser and only that user runs the process, so this is fine. On a host bind-mount, an operator who forgets to lock down /data exposes every credential in every snapshot to anyone with shell access. Document this in OPERATIONS.md.

L-8. No CSRF token on state-changing endpoints

CWE: CWE-352
Notes: The API uses Authorization: Bearer <jwt> exclusively (no cookies). Browsers don't auto-attach Authorization headers cross-origin, so this is not classical CSRF-exploitable. Combined with strict CORS (allow_credentials=True, explicit origin allowlist, wildcard rejected on startup) and the Origin/Referer same-host check on the backup endpoints, the practical risk is essentially zero. INFO only.

INFO / NEEDS VERIFICATION

N-1. Jinja2 `SandboxedEnvironment` is the standard sandbox — confirm it covers your threat model

The sandbox blocks __class__, __mro__, etc., but it is well-known that Jinja2's sandbox is not a security boundary against a determined attacker who can author templates. The threat model here is "templates are admin-authored, so we trust them but use the sandbox as defence-in-depth"; that is reasonable. Document explicitly in OPERATIONS.md that anyone with template-edit permission has effective RCE on the worker thread ({{ foo.__init__.__globals__... }} style escapes have been published in the past; new ones surface periodically).
Verification: run bandit -r packages/ and safety check against pinned versions of jinja2>=3.1. Latest CVEs against Jinja2 sandbox: track CVE-2024-34064 and any 2025+ disclosures. As of the review date there is no known unpatched sandbox-escape in jinja2>=3.1.4.

N-2. `apscheduler<4`

Notes: The pin apscheduler>=3.10,<4 keeps the bridge on the 3.x line, which is in maintenance. No known CVEs as of this review. Track when 4.x stabilises and migrate.

N-3. `python-multipart>=0.0.9`

Notes: This package had high-severity bugs prior to 0.0.6. The minimum here is 0.0.9 — good.

N-4. No signed-image / SBOM on the container

Notes: The release.yml workflow builds and pushes a multi-tag image but does not sign with cosign or emit an SBOM. For an internet-facing deployment, consider adding cosign sign against the image digest, and syft packages to emit an SBOM at release time. INFO only.

N-5. Frontend dependencies are pinned via caret (`^`) ranges

Notes: package.json uses ^x.y.z. CI builds npm ci from package-lock.json, so reproducibility is fine at build time. There is no npm audit step in .gitea/workflows/build.yml. Add npm audit --audit-level=high to the frontend build job.

N-6. `NOTIFY_BRIDGE_ALLOW_PRIVATE_URLS=1` is a footgun

File: packages/core/src/notify_bridge_core/notifications/ssrf.py:39-52.
Notes: When set, the SSRF guard becomes a no-op. The warning at boot is the only mitigation. Acceptable for the documented homelab use-case; document that the env flag must NEVER be set on an internet-reachable instance, and consider refusing to enable it when cors_allowed_origins resolves to a non-loopback host (defence-in-depth interlock).

N-7. Verify the auth flow at the WebSocket boundary

File: packages/core/src/notify_bridge_core/providers/home_assistant/client.py:54-83.
The _ws_url_from_base correctly strips userinfo before connecting and _redact defangs error messages — verify that wss:// URLs go through SSRF validation (currently the HA URL is validated by AnyHttpUrl at config time but I did not find a call to avalidate_outbound_url_full on the HA WS connect path; the resolver would not pin a host the validator never saw).
Action: confirm by reading ha_subscription.py for explicit validation, or add a check that calls avalidate_outbound_url_full against the derived ws_url (treating ws/wss like http/https for the block-range check) before ws_connect.

Prioritised Fix List (Top 10)

HIGH H-1 — Add access_token to the secret-mask list in providers._provider_response and the placeholder-drop list in providers.update_provider. Add a regression test that GETs an HA provider and asserts the response does not contain the cleartext token.
HIGH H-2 — Implement column-level encryption for TelegramBot.token, MatrixBot access tokens, EmailBot.smtp_password, and the sensitive keys inside ServiceProvider.config. Use Fernet with a key derived from SECRET_KEY. Write a one-shot migration.
MEDIUM M-1 — Replace the ad-hoc SandboxedEnvironment(...).render() calls in the four preview/test paths with the single hardened render_template() helper that already has timeout + size caps.
MEDIUM M-2 — Add per-username login lockout (TTL cache or DB-backed) on top of the per-IP 5/minute. Log failed login attempts.
MEDIUM M-5 — Replace the malformed dummy bcrypt literal in login() with a real bcrypt hash computed once at module load so the timing-equalisation actually runs the KDF.
MEDIUM M-3 — Strengthen _redact_sensitive_body with a value-entropy heuristic and well-known token-prefix matching.
MEDIUM M-4 — Add replay protection on Gitea webhooks via the X-Gitea-Delivery header (small table + partial unique index).
MEDIUM M-7 — Make the metrics endpoint require either a flag or a Basic Auth credential; document in OPERATIONS.md that the API port should not be internet-exposed when metrics are on.
MEDIUM M-6 — Add a defence-in-depth setup_completed boolean in app_setting and check it in /api/auth/setup in addition to the count.
N-5 — Add npm audit --audit-level=high to the frontend build job in .gitea/workflows/build.yml so dependency CVEs land in CI.

What was confirmed safe (worth keeping)

JWT design: HS256 with iss/aud/exp/type/sub/ver; refresh/access split; token_version revocation on role change, username change, and password change.
bcrypt with 72-byte length guard; CPU-bound work run in a thread.
SSRF guard with: scheme allowlist, IPv6-mapped-v4 unwrap, CGNAT block, IDN normalisation, async resolver, PinnedResolver to defeat DNS rebinding.
SQL access goes through SQLModel/SQLAlchemy with bind parameters; the only f"..." SQL is in DDL (column adds, index creates, VACUUM INTO) using server-controlled identifiers — sampled and clean.
Sandbox is SandboxedEnvironment everywhere a user-controllable template is rendered (six locations checked).
Frontend {@html} is wrapped in sanitizePreview() everywhere (tracking-configs, template-configs, command-template-configs).
Provider config secrets are masked on GET (except H-1).
_resolve_backup_file rejects .., NUL, separators, and enforces relative_to(base).
CORS rejects wildcard with credentials at startup; secret_key default values are rejected with a clear error.
Docker: non-root user, read_only: true, tmpfs: /tmp, no-new-privileges, cap_drop: ALL, resource limits, healthcheck on /api/ready.
Logging: SecretMaskingFilter masks Telegram bot tokens, Authorization, x-api-key, password, secret, access_token, refresh_token from formatted messages, exception text, and stack traces.
Telegram webhook: secret token mandatory, refused on missing config, opaque webhook_path_id separate from bot token.
Inbound generic webhook: refuses auth_mode="none" unless an explicit acknowledgment field is set; auto-generates a strong secret if missing for bearer_token/hmac_sha256.
Inbound payload size capped at 1 MiB with a streaming check that doesn't trust Content-Length.

Methodology

Manual code review of every authentication, authorization, webhook ingestion, template rendering, secret-handling, and outbound HTTP path under packages/.
Cross-checked CORS / CSP / security headers and rate-limiter configuration in main.py + auth/routes.py.
Sampled API routes for ownership enforcement (get_owned_entity / _get_user_provider / _get_user_bot) — all sampled routes apply it; no IDOR found.
Grepped for Environment( / jinja2.Environment / f"..." SQL / {@html} / subprocess / eval / os.system / known-bad patterns.
Reviewed CI workflows for secret leakage in env blocks and image-signing posture.
Reviewed Dockerfile + docker-compose for least-privilege and read-only root.
No dynamic testing performed; static review only. Run pytest (already gated in CI) + bandit -r packages/ + npm audit in CI to backstop this review.

30 KiB Raw Permalink Blame History