6a8f374678
Operability: - Correlation IDs end-to-end: shared dispatch_id between log lines and EventLog rows (event/watcher/scheduled/deferred/action/HA/command paths) and a new X-Request-Id middleware that normalizes inbound ids and binds request_id into log context. - dispatch_summary block merged into EventLog.details: per-target success/failure counts plus Telegram media delivered/skipped/failed and truncated error lists, so partial outcomes surface in the UI. - Diagnostic mode: admin can flip one module to DEBUG for a bounded window with auto-revert (in-memory only; setup_logging() resets on boot, lifespan reverts on shutdown). New /diagnostic-mode endpoints plus DiagnosticsCassette UI on the settings page. Telegram: - Per-receiver options: disable_notification (silent send) and message_thread_id (forum-topic routing), wired through the dispatcher via a ContextVar so all four send sites (sendMessage / sendPhoto-Video- Document / sendMediaGroup / cache-hit POST) pick them up. - send_large_videos_as_documents target setting: bypass the 50 MB sendVideo cap by falling back to sendDocument for oversized videos. - sendMediaGroup byte-budget enforcement (TELEGRAM_MAX_GROUP_TOTAL_BYTES, 45 MB) with per-item fallback on chunk failure so a stale file_id no longer silently drops a cached asset. Tests: - New: diagnostic_mode, dispatch_summary, request_correlation, telegram_media_group_partial, telegram_per_send_options. Docs: - .claude/reviews/: six-axis production-readiness review of v0.8.1. - .claude/docs/functional-review-2026-05-28.md: focused review of Telegram/Immich/logging subsystems.
343 lines
22 KiB
Markdown
343 lines
22 KiB
Markdown
# Backend Production-Readiness Review
|
|
|
|
Scope: packages/server/src/notify_bridge_server/ and packages/core/src/notify_bridge_core/ (~44k LOC, Python 3.11, FastAPI + SQLModel async + APScheduler + aiohttp).
|
|
|
|
## Executive Summary
|
|
|
|
- **Overall quality is high.** The Jinja2 sandbox is consistently applied (every Environment instantiation is SandboxedEnvironment), JWT auth uses bcrypt offloaded to a worker thread, SSRF guard exists with DNS-rebinding mitigation, secrets are masked in logs via a dedicated filter, and most async/SQL patterns show production-aware design (per-tracker sessions, batched IN-queries, partial unique indexes).
|
|
- **Top correctness risk: a fire-and-forget asyncio.create_task in ha_subscription._on_status_change** (no reference stored, GC can drop the task) plus thread-unsafe in-memory counters in bridge_self. Both bite on chatty HA installs.
|
|
- **Module-level dict caches shared across the event loop have small read-modify-write windows** in services/scheduler.py (adaptive state), services/bridge_self.py (failure counters), commands/handler.py (TTLCache rate limits), and command_sync._dirty_bots. Currently functional under low concurrency; risky under load.
|
|
- **Very large hot-path functions** — services/watcher.py:check_tracker (381 lines), services/dispatch_helpers.py:load_link_data (208 lines), the 1880-line database/migrations.py, and the 1365-line services/scheduler.py — concentrate too much logic in one place.
|
|
- **Provider-type hardcoding** persists in api/providers.py, services/__init__.py, services/action_runner.py, and services/manual_dispatch.py (if provider.type == immich chains). The watchers _POLL_FACTORIES registry is the right model — extend it.
|
|
- **Webhook handlers read the request body BEFORE authenticating** in the Gitea and generic-webhook routes. The Planka route gets it right. Net impact: a peer that knows the URL but not the secret can drive a 1 MiB read per request.
|
|
- **autoescape is inconsistent**: True for runtime templates (renderer.py, commands/handler.py), False for preview / sample-context renders in api/template_configs.py, api/slot_helpers.py, and services/notifier.send_test_template_notification. Lower risk (admin-authored input) but mismatch invites surprise.
|
|
|
|
---
|
|
|
|
## CRITICAL
|
|
|
|
### [C-1] _on_status_change schedules an unstored task (GC + drop risk)
|
|
|
|
File: [packages/server/src/notify_bridge_server/services/ha_subscription.py:240-260](../../packages/server/src/notify_bridge_server/services/ha_subscription.py#L240)
|
|
|
|
The task created by asyncio.create_task(_record_ha_status(...)) at line 249 is not held anywhere. Python may garbage-collect a task whose only reference is the create_task return value before it completes (Python docs explicitly warn: save a reference to the result). Result: an HA disconnect/reconnect EventLog row silently disappears under memory pressure.
|
|
|
|
**Fix:** Module-level set[asyncio.Task], add the new task, remove via task.add_done_callback. ha_subscription.start_all already does this correctly (line 315-320); the pattern is already in-house.
|
|
|
|
### [C-2] Telegram-webhook handler returns 200 OK on uncommitted writes
|
|
|
|
File: [packages/server/src/notify_bridge_server/commands/webhook.py:130-169](../../packages/server/src/notify_bridge_server/commands/webhook.py#L130)
|
|
|
|
The catch-all at line 162 swallows handle_command exceptions and returns OK to Telegram. The request already called await session.commit() at line 96 (after save_chat_from_webhook), and any subsequent writes via the dispatcher use NEW sessions inside the command path. If a downstream session inside handle_command partially commits before raising, the dependency get_session does NOT roll back automatically — the context manager only closes.
|
|
|
|
**Fix:** Either explicitly session.rollback() in the except block, or wrap the per-request mutations in async with session.begin(): so the implicit transaction guarantees rollback on exception.
|
|
|
|
### [C-3] Gitea/generic webhook reads body BEFORE verifying secret is configured
|
|
|
|
File: [packages/server/src/notify_bridge_server/api/webhooks.py:167-178](../../packages/server/src/notify_bridge_server/api/webhooks.py#L167) and line 449-454
|
|
|
|
The sequence is: read 1 MiB raw_body, then check if webhook_secret is empty. A peer that learned the URL but has no secret drives a 1 MiB body read per request. Plankas handler at line 232+ validates the bearer token BEFORE the body read — that is the correct pattern.
|
|
|
|
**Fix:** Hoist the "if not webhook_secret" (Gitea) and "if auth_mode == none" short-circuit (generic) above _read_bounded_body. Gitea HMAC still needs the body — but bailing on a missing-config-side error first costs nothing.
|
|
|
|
### [C-4] bridge_self in-memory counters are not async-safe
|
|
|
|
File: [packages/server/src/notify_bridge_server/services/bridge_self.py:186-230](../../packages/server/src/notify_bridge_server/services/bridge_self.py#L186)
|
|
|
|
record_poll_failure does _poll_failure_counts[tracker_id] = _poll_failure_counts.get(tracker_id, 0) + 1. These dicts are accessed concurrently from poll loop, HA push, webhook ingest, and dispatcher target-failure recording. Individual dict ops are atomic, but get + 1 + set is not when interleaved with another coroutine that touches the same key. Symptoms: missed threshold crossings, occasional double-emission. Same pattern in _target_failure_counts and _backlog_above_threshold.
|
|
|
|
**Fix:** Wrap mutating ops in an asyncio.Lock. The reset-and-re-arm semantics already assume serial access — make it explicit.
|
|
|
|
### [C-5] PROVIDER_SECRET_FIELDS audit needed for backup exports
|
|
|
|
File: [packages/server/src/notify_bridge_server/api/providers.py:617-625](../../packages/server/src/notify_bridge_server/api/providers.py#L617) and [services/backup_service.py:84-93](../../packages/server/src/notify_bridge_server/services/backup_service.py#L84)
|
|
|
|
_apply_secrets_provider redacts only fields named in PROVIDER_SECRET_FIELDS. The webhook flow uses a field called webhook_secret (Gitea, Planka, generic) — verify this is in PROVIDER_SECRET_FIELDS (defined in backup_schema.py). A backup export with secrets_mode=INCLUDE that misses webhook_secret leaks a token that grants webhook-forgery rights.
|
|
|
|
**Action:** Audit PROVIDER_SECRET_FIELDS. Specifically check it includes: api_key, api_token, access_token, webhook_secret, password, client_secret, refresh_token. The _provider_response mask list at api/providers.py:620 is a good cross-reference — both should be the same constant.
|
|
|
|
---
|
|
|
|
## HIGH
|
|
|
|
### [H-1] _compile_template lru_cache competes across tenants
|
|
|
|
File: [packages/server/src/notify_bridge_server/commands/handler.py:99-103](../../packages/server/src/notify_bridge_server/commands/handler.py#L99)
|
|
|
|
lru_cache(maxsize=256) keyed by raw template string. Edited templates remain cached. On a multi-tenant install one tenants 256 distinct templates can evict anothers. No invalidation on template-edit.
|
|
|
|
**Fix:** Drop the cache (Jinja compile is sub-ms) OR add an invalidation call from the template-edit endpoints. The notification renderer (renderer.py:31) uses 512 slots — same problem; consistent fix.
|
|
|
|
### [H-2] check_tracker is 381 lines with deep coupling
|
|
|
|
File: [packages/server/src/notify_bridge_server/services/watcher.py:263-644](../../packages/server/src/notify_bridge_server/services/watcher.py#L263)
|
|
|
|
Loads tracker, polls, writes state, persists EventLog, evaluates gates, defers, dispatches, records bridge_self — all in one function. Refactor candidates: _poll_phase, _persist_state_and_events, _dispatch_phase. This is the watchers hot path; bugs here affect every tracker tick.
|
|
|
|
### [H-3] load_link_data returns untyped dict[str, Any]
|
|
|
|
File: [packages/server/src/notify_bridge_server/services/dispatch_helpers.py:539-747](../../packages/server/src/notify_bridge_server/services/dispatch_helpers.py#L539)
|
|
|
|
Five call sites consume ld["target_type"], ld.get("link_id"), etc. — no static guarantee against key typos.
|
|
|
|
**Fix:** Introduce a frozen @dataclass class LinkData. Same for per-receiver entries.
|
|
|
|
### [H-4] N+1 in _resolve_command_context template-slot loop
|
|
|
|
File: [packages/server/src/notify_bridge_server/commands/handler.py:200-215](../../packages/server/src/notify_bridge_server/commands/handler.py#L200)
|
|
|
|
One SELECT per distinct command_template_config_id. Already batched for trackers/configs/providers — finish the job. Single WHERE config_id IN (...) query + Python pivot.
|
|
|
|
### [H-5] N+1 in backup_service.export_backup receiver loop
|
|
|
|
File: [packages/server/src/notify_bridge_server/services/backup_service.py:187-189](../../packages/server/src/notify_bridge_server/services/backup_service.py#L187)
|
|
|
|
50 targets = 51 SELECTs. Batch with WHERE target_id IN (...). Audit other sections of this 941-line file for the same pattern (templates -> slots, command configs -> slots).
|
|
|
|
### [H-6] _dirty_bots mutated from request and scheduler without a lock
|
|
|
|
File: [packages/server/src/notify_bridge_server/services/command_sync.py:25-95](../../packages/server/src/notify_bridge_server/services/command_sync.py#L25)
|
|
|
|
mark_bot_dirty runs in request handlers, _flush_dirty_bots on the scheduler executor. Currently safe (snapshot via ready = [...]) but fragile.
|
|
|
|
**Fix:** Snapshot under lock, or move to a thread-safe primitive.
|
|
|
|
### [H-7] HA reconnect cycle has no way for CRUD to short-circuit a stale supervisor
|
|
|
|
File: [packages/server/src/notify_bridge_server/services/ha_subscription.py:163-175](../../packages/server/src/notify_bridge_server/services/ha_subscription.py#L163)
|
|
|
|
Reload-on-reconnect means a disabled HA provider keeps trying to reconnect at the 30s/300s cadence until next reconnect attempt. CRUD endpoints should call reload_provider (defined at line 339) — verify wiring.
|
|
|
|
### [H-8] Cached expunged ORM instances are footguns
|
|
|
|
File: [packages/server/src/notify_bridge_server/services/event_dispatch.py:75-107](../../packages/server/src/notify_bridge_server/services/event_dispatch.py#L75)
|
|
|
|
_load_trackers_cached returns expunged NotificationTracker rows. Future maintainer calling session.add(tracker) on a stale cached instance triggers DetachedInstance or silent re-INSERT. Document this strongly, ideally convert to a typed projection.
|
|
|
|
### [H-9] Pending-restore at startup has no timeout
|
|
|
|
File: [packages/server/src/notify_bridge_server/main.py:142-143](../../packages/server/src/notify_bridge_server/main.py#L142)
|
|
|
|
apply_pending_restore_if_any runs in lifespan; a partially-corrupt restore could block startup indefinitely. Container liveness probes then fail after grace.
|
|
|
|
**Fix:** asyncio.wait_for with a generous timeout, or kick off as background task while app starts.
|
|
|
|
### [H-10] Jinja2 render watchdog uses daemon thread that can pin a CPU forever
|
|
|
|
File: [packages/core/src/notify_bridge_core/templates/renderer.py:48-73](../../packages/core/src/notify_bridge_core/templates/renderer.py#L48)
|
|
|
|
Comment acknowledges the trade-off. Multiple concurrent runaway renders can exhaust CPU cores while callers think they timed out. Add a process-level BoundedSemaphore capping concurrent in-flight renders.
|
|
|
|
### [H-11] _aggregate drops all but the first error
|
|
|
|
File: [packages/server/src/notify_bridge_server/services/notifier.py:326-335](../../packages/server/src/notify_bridge_server/services/notifier.py#L326)
|
|
|
|
When all sends fail, only results[0] is returned. Distinct subsequent errors are lost.
|
|
|
|
**Fix:** Aggregate all errors into a details field.
|
|
|
|
### [H-12] Generic-webhook header dict materialised twice
|
|
|
|
File: [packages/server/src/notify_bridge_server/api/webhooks.py:456](../../packages/server/src/notify_bridge_server/api/webhooks.py#L456) and line 475
|
|
|
|
dict(request.headers) materialises full headers map, then _filter_headers and _redact_sensitive_body walk the payload. With a malicious peer sending many headers (Starlette default 100), bounded but wasteful.
|
|
|
|
### [H-13] SSRF redirect-walk has no aggregate wall-clock budget
|
|
|
|
File: [packages/core/src/notify_bridge_core/notifications/telegram/client.py:232-268](../../packages/core/src/notify_bridge_core/notifications/telegram/client.py#L232)
|
|
|
|
max_redirects = 3, each with 120s _DOWNLOAD_TIMEOUT. Worst case per request: 480s. _TARGET_TIMEOUT_S = 120s in the dispatcher caps the top-level case, but per-asset preloads inside media groups dont all share that cap.
|
|
|
|
### [H-14] Backlog recovery logic flips latch for in-flight users
|
|
|
|
File: [packages/server/src/notify_bridge_server/services/bridge_self.py:544-551](../../packages/server/src/notify_bridge_server/services/bridge_self.py#L544)
|
|
|
|
Recovery loop iterates all known users and flips to False for any not in counts_by_user. If a user transiently has no user_id set on deferred rows (legacy / orphaned), theyre excluded from the GROUP BY and incorrectly marked recovered.
|
|
|
|
### [H-15] quiet_hours_status silently returns None on start == end
|
|
|
|
File: [packages/server/src/notify_bridge_server/services/dispatch_helpers.py:110-111](../../packages/server/src/notify_bridge_server/services/dispatch_helpers.py#L110)
|
|
|
|
The comment notes this is almost always a user mistake. Silent return means the user wonders why their notifications still arrive at all hours. Surface via WARNING log + UI hint.
|
|
|
|
---
|
|
|
|
## MEDIUM
|
|
|
|
### [M-1] register_commands_with_telegram chat overrides loop is sequential
|
|
|
|
File: [packages/server/src/notify_bridge_server/commands/handler.py:723-776](../../packages/server/src/notify_bridge_server/commands/handler.py#L723)
|
|
|
|
50 chats with overrides = 50 sequential Telegram round-trips. Use asyncio.gather with a semaphore as in _refresh_telegram_chat_titles.
|
|
|
|
### [M-2] _run_provider exception backoff has no escalation
|
|
|
|
File: [packages/server/src/notify_bridge_server/services/ha_subscription.py:278-283](../../packages/server/src/notify_bridge_server/services/ha_subscription.py#L278)
|
|
|
|
Persistent bug in _emit reconnects every 30s forever. Add exponential backoff with cap and bridge_self alert after N failures.
|
|
|
|
### [M-3] database/migrations.py is 1880 lines
|
|
|
|
File: [packages/server/src/notify_bridge_server/database/migrations.py](../../packages/server/src/notify_bridge_server/database/migrations.py)
|
|
|
|
Past the 800-line guideline. Split per-migration into database/migrations/<name>.py, list in main.py.
|
|
|
|
### [M-4] Locale-resolution logic duplicated
|
|
|
|
File: [packages/server/src/notify_bridge_server/services/dispatch_helpers.py:484-491](../../packages/server/src/notify_bridge_server/services/dispatch_helpers.py#L484) and [services/notifier.py:46](../../packages/server/src/notify_bridge_server/services/notifier.py#L46)
|
|
|
|
Two implementations of locale priority. One source of truth.
|
|
|
|
### [M-5] _normalize_locale duplicated across modules
|
|
|
|
File: [packages/server/src/notify_bridge_server/commands/handler.py:632](../../packages/server/src/notify_bridge_server/commands/handler.py#L632)
|
|
|
|
Five-line copy; move to commands/command_utils.py.
|
|
|
|
### [M-6] Provider-type if-chain in _test_provider_connection
|
|
|
|
File: [packages/server/src/notify_bridge_server/api/providers.py:203-250](../../packages/server/src/notify_bridge_server/api/providers.py#L203)
|
|
|
|
Same chain in services/__init__.py:_make_collection_provider. Both candidates for a single registry.
|
|
|
|
### [M-7] Secret masking exposes last 4 chars unconditionally
|
|
|
|
File: [packages/server/src/notify_bridge_server/api/providers.py:624](../../packages/server/src/notify_bridge_server/api/providers.py#L624) and [services/backup_service.py:81](../../packages/server/src/notify_bridge_server/services/backup_service.py#L81)
|
|
|
|
Fine for 32-char Immich keys. Returns half the value for short secrets. Use plain "***" for len(value) < 16.
|
|
|
|
### [M-8] Deprecated validate_outbound_url still imported
|
|
|
|
File: [packages/core/src/notify_bridge_core/providers/immich/client.py:14](../../packages/core/src/notify_bridge_core/providers/immich/client.py#L14)
|
|
|
|
The sync version uses blocking socket.getaddrinfo on the event loop. Migrate to avalidate_outbound_url.
|
|
|
|
### [M-9] Lazy cache init has confusing DCL comment
|
|
|
|
File: [packages/server/src/notify_bridge_server/services/watcher.py:81-113](../../packages/server/src/notify_bridge_server/services/watcher.py#L81)
|
|
|
|
Comment about Double-check after acquiring lock implies classic DCL — under asyncio, the unlocked first check is safe because theres no thread context switch, but rename to clarify.
|
|
|
|
### [M-10] Dispatcher concurrency cap is per-dispatch, not process-wide
|
|
|
|
File: [packages/core/src/notify_bridge_core/notifications/dispatcher.py:58](../../packages/core/src/notify_bridge_core/notifications/dispatcher.py#L58)
|
|
|
|
_DISPATCH_CONCURRENCY = 16 is INSIDE dispatch(). HA storm = N events x min(M, 16) sends with no outer cap. Add a process-level semaphore in event_dispatch.py.
|
|
|
|
### [M-11] success=True returned for partial failures
|
|
|
|
File: [packages/server/src/notify_bridge_server/services/notifier.py:329-335](../../packages/server/src/notify_bridge_server/services/notifier.py#L329)
|
|
|
|
A test that fails on 1 of 3 receivers returns success=True with a partial_failures count. Introduce a status: "ok"|"partial"|"fail" field.
|
|
|
|
### [M-12] Telegram command registration not retried on 429
|
|
|
|
File: [packages/server/src/notify_bridge_server/commands/handler.py:671-693](../../packages/server/src/notify_bridge_server/commands/handler.py#L671)
|
|
|
|
set_my_commands/delete_my_commands arent retried. Adopt the retry-after handling that _upload_media has.
|
|
|
|
### [M-13] event_log_id_by_event keyed on id(event)
|
|
|
|
File: [packages/server/src/notify_bridge_server/services/watcher.py:417-464](../../packages/server/src/notify_bridge_server/services/watcher.py#L417)
|
|
|
|
CPython object-address as key works because events are held alive in scope, but a typed key would be safer.
|
|
|
|
### [M-14] Bcrypt-length error wording could be clearer
|
|
|
|
File: [packages/server/src/notify_bridge_server/auth/routes.py:69-81](../../packages/server/src/notify_bridge_server/auth/routes.py#L69)
|
|
|
|
User typing 70 ASCII + emoji gets rejected and doesnt understand why. Clarify the byte-count language.
|
|
|
|
### [M-15] CSP allows unsafe-inline for script-src
|
|
|
|
File: [packages/server/src/notify_bridge_server/main.py:186-201](../../packages/server/src/notify_bridge_server/main.py#L186)
|
|
|
|
Acknowledged. SvelteKit --csp build flag emits hashes; switching unblocks dropping unsafe-inline.
|
|
|
|
### [M-16] Telegram-webhook body size not capped
|
|
|
|
File: [packages/server/src/notify_bridge_server/commands/webhook.py:71](../../packages/server/src/notify_bridge_server/commands/webhook.py#L71)
|
|
|
|
update = await request.json() reads with no cap. Add _read_bounded_body pattern.
|
|
|
|
### [M-17] _log_command_event swallows DB failures invisibly
|
|
|
|
File: [packages/server/src/notify_bridge_server/commands/handler.py:353-357](../../packages/server/src/notify_bridge_server/commands/handler.py#L353)
|
|
|
|
Hard DB failure here is invisible. Add a metrics counter.
|
|
|
|
### [M-18] apply_tracking_display_filters is a 60-line if-branched function
|
|
|
|
File: [packages/server/src/notify_bridge_server/services/dispatch_helpers.py:350-405](../../packages/server/src/notify_bridge_server/services/dispatch_helpers.py#L350)
|
|
|
|
Split into _filter_favorites, _apply_order_and_limit, _strip_details_and_tags.
|
|
|
|
---
|
|
|
|
## LOW
|
|
|
|
### [L-1] from .database.models import * in main.py
|
|
|
|
File: [packages/server/src/notify_bridge_server/main.py:26](../../packages/server/src/notify_bridge_server/main.py#L26)
|
|
|
|
Comment is honest about purpose, but explicit imports or a single module import is clearer.
|
|
|
|
### [L-2] None comparisons
|
|
|
|
All comparisons verified to use is None via grep — no findings.
|
|
|
|
### [L-3] Magic numbers
|
|
|
|
Constants are well-named throughout (_TG_429_MAX_ATTEMPTS, _MAX_PENDING_PER_TRACKER, DEBOUNCE_SECONDS, etc.). Only nit: seconds=30 literal in scheduler.schedule_bot_polling could be promoted.
|
|
|
|
### [L-4] noqa E712 repeated 8+ times for SQLModel boolean comparisons
|
|
|
|
Switch to .is_(True) for SQLAlchemy idiom, or add E712 to project ruff config.
|
|
|
|
### [L-5] _check_same_origin is best-effort by design
|
|
|
|
Acceptable.
|
|
|
|
### [L-6] _normalize_host strips IPv6 zone IDs silently
|
|
|
|
File: [packages/core/src/notify_bridge_core/notifications/ssrf.py:105-106](../../packages/core/src/notify_bridge_core/notifications/ssrf.py#L105)
|
|
|
|
Debug log when stripping changes the host would help diagnose.
|
|
|
|
### [L-7] _compute_jitter cap of 30s might be tight on hourly polls
|
|
|
|
File: [packages/server/src/notify_bridge_server/services/scheduler.py:91-105](../../packages/server/src/notify_bridge_server/services/scheduler.py#L91)
|
|
|
|
Revisit if jitter-collision becomes a real-world issue.
|
|
|
|
### [L-8] SmtpConfig repr may leak password
|
|
|
|
File: [packages/server/src/notify_bridge_server/services/notifier.py:205-213](../../packages/server/src/notify_bridge_server/services/notifier.py#L205)
|
|
|
|
If SmtpConfig is a vanilla dataclass, repr() will leak the password. Verify in notify_bridge_core.notifications.email.client — add field(repr=False) or a custom __repr__.
|
|
|
|
### [L-9] noqa BLE001 count is high
|
|
|
|
49 occurrences across 26 files. Each defensible; consider narrowing where possible.
|
|
|
|
### [L-10] _normalize_for_json does not handle UUID/Decimal
|
|
|
|
File: [packages/server/src/notify_bridge_server/services/deferred_dispatch.py:124-133](../../packages/server/src/notify_bridge_server/services/deferred_dispatch.py#L124)
|
|
|
|
No current consumer emits these, but a fallback str() for unknown types would prevent future breakage.
|
|
|
|
---
|
|
|
|
## Approval Verdict
|
|
|
|
**Block** — CRITICAL findings (C-1 unstored task, C-2 missing rollback, C-3 unauthenticated body read, C-4 racy counters, C-5 secret-mask audit) must be fixed before declaring production-ready. Once those are addressed, the HIGH findings can land in a follow-up.
|
|
|
|
## Quick Wins (low effort, high value)
|
|
|
|
1. **Wrap every fire-and-forget asyncio.create_task in a module-level set** — search for asyncio.create_task( with no assignment. Definite hit: ha_subscription.py:249.
|
|
2. **Move webhook-secret check before _read_bounded_body** in Gitea + generic webhook handlers — 5-line move per endpoint, eliminates pre-auth resource exhaustion.
|
|
3. **Add an asyncio.Lock around _poll_failure_counts and _target_failure_counts** mutations — eliminates C-4.
|
|
4. **Split migrations.py** — mechanical refactor, ~1 hour, improves blame/review.
|
|
5. **Batch the receiver query in backup_service.export_backup** — single IN (...) query, ~10x faster.
|
|
6. **Replace from .database.models import \*** with explicit imports — small clarity win.
|