feat: production readiness — security, perf, bug fixes, bridge self-monitoring
Comprehensive multi-area pass driven by a parallel 8-agent production
review. Frontend, backend, database, security, performance, operational,
plus a new self-monitoring feature.
## Critical fixes
- Planka webhook: reads bounded raw body (was NameError on every call)
- HA quiet hours: ha_state_changed/automation_triggered/service_called/
event_fired added to deferrable set (were silently dropped)
- DNS-rebinding SSRF: PinnedResolver wired into shared aiohttp session
- Telegram inbound webhook: secret now mandatory (401 without)
- Generic webhook: auth_mode="none" requires explicit
acknowledge_unauthenticated=true; per-IP rate limit 60/min
- svelte-check: 5 null-narrowing errors in EventDetailModal fixed
- Provider hardcoding: Immich-only block extracted to descriptor
featureDiscoveryHint
- command_sync: snapshot+expunge bot before exiting AsyncSession
## Bug fixes
- notifier asyncio.gather(return_exceptions=True) — one bad chat no longer
cancels peer sends
- NotificationDispatcher hoisted out of per-tracker loop
- Provider credential resolution unified across all 5 dispatch sites
- HA asyncio.shield now drains inner task on cancellation
- Provider construction switched from if/elif ladder to factory registry
- NUT first poll seeds silently (no spurious ups_on_battery)
- Quiet-hours gate: event-type-disabled now wins over deferral
- APScheduler drain job ID resolution upgraded to seconds
- HA on_status_change wired through to EventLog
- Webhook payload rollback failures now logged (not swallowed)
- Batched receivers/chats/bots in load_link_data (was per-target N+1)
- flag_modified on JSON column reassignments in deferred_dispatch
## Database
- UNIQUE indexes on service_provider.webhook_token,
telegram_bot.webhook_path_id, partial UNIQUE on telegram_bot.bot_id,
telegram_chat(bot_id, chat_id), notification_tracker_target unique link,
partial UNIQUE on bridge_self provider per user
- Composite ix_event_log_user_event_type_created index
- save_chat_from_webhook switched to ON CONFLICT DO UPDATE
- ondelete=CASCADE on user-id FKs (model annotation; app-side cascade
delete added for existing data)
- delete_notification_tracker converted from N+1 to bulk DELETE/UPDATE
- Module-level asyncio.Lock replaced with lazy _get_lock() pattern
- VACUUM INTO snapshot now PRAGMA integrity_check verified
## Performance
- Jinja2 template compilation LRU cached (lru_cache maxsize=512)
- Per-locale render cache in NotificationDispatcher (skips re-rendering
identical content for receivers sharing a locale)
- Tracker list cached per provider_id with 5s TTL + explicit invalidation
on tracker CRUD (relieves HA chat-bus rate query pressure)
- Nav-counts collapsed from 16 round-trips to single UNION ALL
- HA event_log: skip persisting empty assets_added/removed events
## Security hardening
- Mass-assignment guard on Action create/update; cron sub-minute reject
- Backup JSON depth/node-count cap (depth ≤ 10, nodes ≤ 100k)
- _sanitize_config extended to all JSON-typed fields on backup import
- Telegram _safe_get walks redirects manually with SSRF revalidation
- Bcrypt 72-byte password length cap with clear 422
- Webhook payload body redaction; sensitive substring set extended with
oauth/client_secret/webhook_secret/csrf in both header filter and
template extras filter
## Frontend
- 76 catch (err: any) sites converted to errMsg(err) helper
- globalProviderFilter: pure getter; reconciliation moved to one-time
$effect in +layout
- Provider-filter binding: removed paired $effects + _syncingFilter flag,
now one-way derived
- entity-cache: separate _refreshing flag for background re-fetches
- api.ts 401 handling: AuthRedirectError class + dedup _redirecting flag,
goto() instead of window.location.href
- a11y: aria-expanded on mobile More, role=switch + aria-checked on
Telegram bot toggles
## Tests & operations
- CI pytest gate added to .gitea/workflows/build.yml + release.yml
(wheel-built install to dodge editable-install slowness)
- /api/ready upgraded to deep healthcheck (db SELECT 1, scheduler.running,
HA supervisor presence) returning {ready, checks, errors, version}
- /api/metrics endpoint with prometheus_client (deferred_pending,
event_log_total, dispatch_duration, poll_failures, send_failures)
- New OPERATIONS.md covering deploy, healthchecks, metrics, backup/restore
procedures, log handling, common scenarios, upgrade flow
- New tests: test_bridge_self (11), test_gitea_parser (9),
test_planka_parser (6), test_immich_change_detector (6),
test_backup_roundtrip (1)
## New feature: bridge self-monitoring
- New bridge_self provider type — internal sink for bridge health events
- Three event types: bridge_self_poll_failures (consecutive tracker poll
failures), bridge_self_deferred_backlog (pending count crosses
threshold), bridge_self_target_failures (consecutive 5xx/network
failures per target)
- Per-user thresholds (defaults: 3 / 100 / 5) configurable via the
provider config form
- Auto-seeded on user create + /setup + boot backfill for existing users
- Anti-spam: counters reset after emission; backlog uses transition latch
- Self-loop guard: bridge_self failures don't count toward target-failure
thresholds (logged only) — wire to your own Telegram/Email/Matrix to
get notified when polls/dispatches/sends fail
- 6 default templates (3 events × 2 locales), tracking config columns
with backfill migration, frontend descriptor (excluded from "create
provider" wizard since auto-managed)
Operator-visible behavior changes (call out in release notes):
- NOTIFY_BRIDGE_TELEGRAM_WEBHOOK_SECRET now REQUIRED for webhook mode
- Existing webhook providers with auth_mode="none" need explicit opt-in
- Generic webhook endpoint rate-limited 60/min per source IP
- HA disconnect/reconnect writes ha_status_* EventLog rows
- Every user gets a bridge_self provider — wire it to a target to
receive failure alerts
Pre-existing test failures (test_ssrf, test_release_provider) on
Python 3.13 are unrelated; CI runs on 3.12.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -309,6 +309,22 @@ async def migrate_schema(engine: AsyncEngine) -> None:
|
||||
)
|
||||
logger.info("Added %s column to tracking_config table", col_name)
|
||||
|
||||
# Add Bridge self-monitoring tracking flags to tracking_config if missing.
|
||||
# All three default ON — the bridge_self provider exists specifically
|
||||
# to surface these conditions, so silencing one would defeat the point.
|
||||
if await _has_table(conn, "tracking_config"):
|
||||
bridge_self_flags = [
|
||||
("track_bridge_self_poll_failures", "INTEGER DEFAULT 1"),
|
||||
("track_bridge_self_deferred_backlog", "INTEGER DEFAULT 1"),
|
||||
("track_bridge_self_target_failures", "INTEGER DEFAULT 1"),
|
||||
]
|
||||
for col_name, col_type in bridge_self_flags:
|
||||
if not await _has_column(conn, "tracking_config", col_name):
|
||||
await conn.execute(
|
||||
text(f"ALTER TABLE tracking_config ADD COLUMN {col_name} {col_type}")
|
||||
)
|
||||
logger.info("Added %s column to tracking_config table", col_name)
|
||||
|
||||
# Add quiet hours to tracking_config if missing.
|
||||
# Start/end are nullable HH:MM strings; quiet_hours_enabled gates them.
|
||||
if await _has_table(conn, "tracking_config"):
|
||||
@@ -1361,6 +1377,12 @@ _INDEXES: list[tuple[str, str, str]] = [
|
||||
("ix_action_provider_id", "action", "provider_id"),
|
||||
# Dashboard: SELECT event_log WHERE user_id = ? ORDER BY created_at DESC
|
||||
("ix_event_log_user_created", "event_log", "user_id, created_at DESC"),
|
||||
# Dashboard "events of type X for me, recent first" filter.
|
||||
(
|
||||
"ix_event_log_user_event_type_created",
|
||||
"event_log",
|
||||
"user_id, event_type, created_at DESC",
|
||||
),
|
||||
("ix_event_log_provider_id", "event_log", "provider_id"),
|
||||
("ix_event_log_notification_tracker_id", "event_log", "notification_tracker_id"),
|
||||
("ix_event_log_action_id", "event_log", "action_id"),
|
||||
@@ -1543,6 +1565,269 @@ async def migrate_chat_action_to_column(engine: AsyncEngine) -> None:
|
||||
logger.info("Migrated chat_action from config JSON to column where present")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Uniqueness + dedupe migrations for webhook hot paths.
|
||||
#
|
||||
# These backfill missing UNIQUE indexes on webhook tokens, webhook path IDs,
|
||||
# bot_id (with sentinel guard), (bot_id, chat_id), and tracker-target links.
|
||||
# Every CREATE UNIQUE INDEX is preceded by a dedupe pass that keeps the
|
||||
# canonical row (lowest id, or oldest created_at where specified) and removes
|
||||
# the rest, logging a WARNING with the dropped count so operators can audit.
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
async def _dedupe_by_columns(
|
||||
conn,
|
||||
table: str,
|
||||
cols: list[str],
|
||||
*,
|
||||
keep: str = "min_id",
|
||||
label: str = "",
|
||||
) -> int:
|
||||
"""Delete duplicate rows leaving one survivor per ``cols`` group.
|
||||
|
||||
``keep`` chooses the survivor:
|
||||
- ``"min_id"`` keeps the row with the lowest ``id`` (default — used
|
||||
when there is no semantic "first" row to preserve).
|
||||
- ``"min_created_at"`` keeps the row with the oldest ``created_at``,
|
||||
falling back to the lowest id on ties — preferred for tracker-target
|
||||
links so the original link wins.
|
||||
|
||||
Returns the number of rows deleted. All identifiers flow through
|
||||
``_assert_ident`` to neutralise SQL injection from any caller mistake.
|
||||
"""
|
||||
_assert_ident(table, "table")
|
||||
for c in cols:
|
||||
_assert_ident(c, "column")
|
||||
group_by = ", ".join(cols)
|
||||
where_cols = " AND ".join(f"{c} = g.{c}" for c in cols)
|
||||
if keep == "min_created_at":
|
||||
# Tie-break on id so the survivor is deterministic even if two rows
|
||||
# share the same created_at (insert-batches commonly do).
|
||||
survivor_sql = (
|
||||
f"SELECT id FROM {table} "
|
||||
f"WHERE {where_cols} "
|
||||
f"ORDER BY created_at ASC, id ASC LIMIT 1"
|
||||
)
|
||||
elif keep == "min_id":
|
||||
survivor_sql = f"SELECT MIN(id) FROM {table} WHERE {where_cols}"
|
||||
else:
|
||||
raise ValueError(f"Unknown keep strategy: {keep!r}")
|
||||
|
||||
delete_sql = (
|
||||
f"DELETE FROM {table} WHERE id IN ("
|
||||
f" SELECT t.id FROM {table} t "
|
||||
f" JOIN ("
|
||||
f" SELECT {group_by} FROM {table} "
|
||||
f" GROUP BY {group_by} HAVING COUNT(*) > 1"
|
||||
f" ) g ON {' AND '.join(f't.{c} = g.{c}' for c in cols)} "
|
||||
f" WHERE t.id NOT IN ({survivor_sql})"
|
||||
f")"
|
||||
)
|
||||
result = await conn.execute(text(delete_sql))
|
||||
deleted = int(getattr(result, "rowcount", 0) or 0)
|
||||
if deleted:
|
||||
logger.warning(
|
||||
"Removed %d duplicate row(s) from %s on (%s)%s",
|
||||
deleted, table, ", ".join(cols),
|
||||
f" — {label}" if label else "",
|
||||
)
|
||||
return deleted
|
||||
|
||||
|
||||
async def migrate_uniqueness_constraints(engine: AsyncEngine) -> None:
|
||||
"""Backfill missing UNIQUE indexes on webhook hot paths.
|
||||
|
||||
SQLite cannot ALTER an existing column to add a UNIQUE constraint, but
|
||||
a UNIQUE INDEX is functionally equivalent and can be created with
|
||||
``IF NOT EXISTS`` on every boot. Each index is preceded by a dedupe
|
||||
pass so the index creation does not fail on existing duplicates.
|
||||
|
||||
Indexes added:
|
||||
- service_provider.webhook_token (full unique)
|
||||
- telegram_bot.webhook_path_id (full unique)
|
||||
- telegram_bot.bot_id (partial unique WHERE bot_id != 0; 0 is a
|
||||
sentinel meaning "not yet validated")
|
||||
- telegram_chat (bot_id, chat_id) (full unique composite)
|
||||
- notification_tracker_target (notification_tracker_id, target_id)
|
||||
(full unique composite)
|
||||
"""
|
||||
# Skip on non-SQLite engines — they enforce UNIQUE via the model
|
||||
# metadata (create_all) and don't have sqlite_master introspection.
|
||||
if not str(engine.url).startswith("sqlite"):
|
||||
return
|
||||
async with engine.begin() as conn:
|
||||
# service_provider.webhook_token
|
||||
if await _has_table(conn, "service_provider") and await _has_column(
|
||||
conn, "service_provider", "webhook_token",
|
||||
):
|
||||
await _dedupe_by_columns(
|
||||
conn, "service_provider", ["webhook_token"],
|
||||
keep="min_id", label="webhook_token uniqueness",
|
||||
)
|
||||
await conn.execute(text(
|
||||
"CREATE UNIQUE INDEX IF NOT EXISTS "
|
||||
"uq_service_provider_webhook_token "
|
||||
"ON service_provider(webhook_token)"
|
||||
))
|
||||
|
||||
# telegram_bot.webhook_path_id (full unique)
|
||||
# telegram_bot.bot_id (partial unique excluding sentinel 0)
|
||||
if await _has_table(conn, "telegram_bot"):
|
||||
if await _has_column(conn, "telegram_bot", "webhook_path_id"):
|
||||
await _dedupe_by_columns(
|
||||
conn, "telegram_bot", ["webhook_path_id"],
|
||||
keep="min_id", label="webhook_path_id uniqueness",
|
||||
)
|
||||
await conn.execute(text(
|
||||
"CREATE UNIQUE INDEX IF NOT EXISTS "
|
||||
"uq_telegram_bot_webhook_path_id "
|
||||
"ON telegram_bot(webhook_path_id)"
|
||||
))
|
||||
if await _has_column(conn, "telegram_bot", "bot_id"):
|
||||
# Dedupe only non-sentinel rows. Two unverified bots both
|
||||
# carrying bot_id=0 is legitimate — only collisions among
|
||||
# validated bot_ids signal a real corruption to clean up.
|
||||
deleted = await conn.execute(text(
|
||||
"DELETE FROM telegram_bot WHERE id IN ("
|
||||
" SELECT t.id FROM telegram_bot t "
|
||||
" JOIN ("
|
||||
" SELECT bot_id FROM telegram_bot "
|
||||
" WHERE bot_id != 0 GROUP BY bot_id HAVING COUNT(*) > 1"
|
||||
" ) g ON t.bot_id = g.bot_id "
|
||||
" WHERE t.id NOT IN ("
|
||||
" SELECT MIN(id) FROM telegram_bot WHERE bot_id = g.bot_id"
|
||||
" )"
|
||||
")"
|
||||
))
|
||||
rc = int(getattr(deleted, "rowcount", 0) or 0)
|
||||
if rc:
|
||||
logger.warning(
|
||||
"Removed %d duplicate telegram_bot row(s) on bot_id "
|
||||
"(non-sentinel collisions)", rc,
|
||||
)
|
||||
# Plain INDEX for the lookup-by-bot_id path.
|
||||
await conn.execute(text(
|
||||
"CREATE INDEX IF NOT EXISTS ix_telegram_bot_bot_id "
|
||||
"ON telegram_bot(bot_id)"
|
||||
))
|
||||
# Partial UNIQUE excluding the sentinel.
|
||||
await conn.execute(text(
|
||||
"CREATE UNIQUE INDEX IF NOT EXISTS "
|
||||
"uq_telegram_bot_bot_id_nonzero "
|
||||
"ON telegram_bot(bot_id) WHERE bot_id != 0"
|
||||
))
|
||||
|
||||
# telegram_chat (bot_id, chat_id) — keep the survivor with the oldest
|
||||
# discovered_at so the original discovery row wins. _dedupe_by_columns
|
||||
# only handles created_at; do this one inline.
|
||||
if await _has_table(conn, "telegram_chat"):
|
||||
res = await conn.execute(text(
|
||||
"DELETE FROM telegram_chat WHERE id IN ("
|
||||
" SELECT t.id FROM telegram_chat t "
|
||||
" JOIN ("
|
||||
" SELECT bot_id, chat_id FROM telegram_chat "
|
||||
" GROUP BY bot_id, chat_id HAVING COUNT(*) > 1"
|
||||
" ) g ON t.bot_id = g.bot_id AND t.chat_id = g.chat_id "
|
||||
" WHERE t.id NOT IN ("
|
||||
" SELECT id FROM telegram_chat "
|
||||
" WHERE bot_id = g.bot_id AND chat_id = g.chat_id "
|
||||
" ORDER BY discovered_at ASC, id ASC LIMIT 1"
|
||||
" )"
|
||||
")"
|
||||
))
|
||||
rc = int(getattr(res, "rowcount", 0) or 0)
|
||||
if rc:
|
||||
logger.warning(
|
||||
"Removed %d duplicate telegram_chat row(s) on (bot_id, chat_id)",
|
||||
rc,
|
||||
)
|
||||
await conn.execute(text(
|
||||
"CREATE UNIQUE INDEX IF NOT EXISTS uq_telegram_chat_bot_chat "
|
||||
"ON telegram_chat(bot_id, chat_id)"
|
||||
))
|
||||
await conn.execute(text(
|
||||
"CREATE INDEX IF NOT EXISTS ix_telegram_chat_bot_chat "
|
||||
"ON telegram_chat(bot_id, chat_id)"
|
||||
))
|
||||
|
||||
# notification_tracker_target (notification_tracker_id, target_id)
|
||||
# — keep the oldest created_at link so the original wins.
|
||||
if await _has_table(conn, "notification_tracker_target") and await _has_column(
|
||||
conn, "notification_tracker_target", "notification_tracker_id",
|
||||
):
|
||||
await _dedupe_by_columns(
|
||||
conn,
|
||||
"notification_tracker_target",
|
||||
["notification_tracker_id", "target_id"],
|
||||
keep="min_created_at",
|
||||
label="tracker-target link uniqueness",
|
||||
)
|
||||
await conn.execute(text(
|
||||
"CREATE UNIQUE INDEX IF NOT EXISTS uq_ntt_tracker_target "
|
||||
"ON notification_tracker_target(notification_tracker_id, target_id)"
|
||||
))
|
||||
|
||||
# service_provider partial unique on (user_id) WHERE type='bridge_self'.
|
||||
# Bridge-self is special: exactly one row per user, auto-seeded at boot,
|
||||
# at user-create, and on /setup. Without this guard, a concurrent boot
|
||||
# backfill + POST /api/users could double-insert. Dedupe keeps the
|
||||
# oldest row so any user-customised thresholds on it survive.
|
||||
if await _has_table(conn, "service_provider"):
|
||||
res = await conn.execute(text(
|
||||
"DELETE FROM service_provider WHERE id IN ("
|
||||
" SELECT t.id FROM service_provider t "
|
||||
" JOIN ("
|
||||
" SELECT user_id FROM service_provider "
|
||||
" WHERE type='bridge_self' GROUP BY user_id HAVING COUNT(*) > 1"
|
||||
" ) g ON t.user_id = g.user_id "
|
||||
" WHERE t.type='bridge_self' AND t.id NOT IN ("
|
||||
" SELECT MIN(id) FROM service_provider "
|
||||
" WHERE type='bridge_self' AND user_id = g.user_id"
|
||||
" )"
|
||||
")"
|
||||
))
|
||||
rc = int(getattr(res, "rowcount", 0) or 0)
|
||||
if rc:
|
||||
logger.warning(
|
||||
"Removed %d duplicate bridge_self service_provider row(s) "
|
||||
"on user_id", rc,
|
||||
)
|
||||
await conn.execute(text(
|
||||
"CREATE UNIQUE INDEX IF NOT EXISTS "
|
||||
"uq_service_provider_bridge_self_per_user "
|
||||
"ON service_provider(user_id) WHERE type='bridge_self'"
|
||||
))
|
||||
|
||||
|
||||
async def migrate_eventlog_provider_fk(engine: AsyncEngine) -> None:
|
||||
"""Document the EventLog.provider_id FK situation.
|
||||
|
||||
SQLite cannot ALTER a column to add a foreign-key constraint without
|
||||
rebuilding the table. The model annotation now declares
|
||||
``ondelete=SET NULL`` which only takes effect on freshly created
|
||||
tables (i.e. brand-new installs). For existing installs we rely on
|
||||
application-side cleanup in ``api/providers.delete_provider`` to NULL
|
||||
out ``event_log.provider_id`` rows before deleting the provider row.
|
||||
|
||||
This migration is intentionally a no-op aside from the log line — it
|
||||
exists so the migration order is explicit and operators see in the
|
||||
logs that the FK strategy was reviewed on this boot.
|
||||
"""
|
||||
if not str(engine.url).startswith("sqlite"):
|
||||
return
|
||||
async with engine.begin() as conn:
|
||||
if not await _has_table(conn, "event_log"):
|
||||
return
|
||||
# No DDL change. Application code in api/providers.delete_provider
|
||||
# is the source of truth for the SET NULL semantic on existing tables.
|
||||
logger.debug(
|
||||
"event_log.provider_id FK enforcement deferred to application "
|
||||
"code on existing SQLite tables (model declares ondelete=SET NULL "
|
||||
"which applies to fresh schemas only)."
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Schema version tracking — lightweight alternative to Alembic while the
|
||||
# hand-rolled idempotent migrations remain the source of truth. Gives
|
||||
|
||||
Reference in New Issue
Block a user