feat: production readiness — security, perf, bug fixes, bridge self-monitoring

Comprehensive multi-area pass driven by a parallel 8-agent production
review. Frontend, backend, database, security, performance, operational,
plus a new self-monitoring feature.

## Critical fixes
- Planka webhook: reads bounded raw body (was NameError on every call)
- HA quiet hours: ha_state_changed/automation_triggered/service_called/
  event_fired added to deferrable set (were silently dropped)
- DNS-rebinding SSRF: PinnedResolver wired into shared aiohttp session
- Telegram inbound webhook: secret now mandatory (401 without)
- Generic webhook: auth_mode="none" requires explicit
  acknowledge_unauthenticated=true; per-IP rate limit 60/min
- svelte-check: 5 null-narrowing errors in EventDetailModal fixed
- Provider hardcoding: Immich-only block extracted to descriptor
  featureDiscoveryHint
- command_sync: snapshot+expunge bot before exiting AsyncSession

## Bug fixes
- notifier asyncio.gather(return_exceptions=True) — one bad chat no longer
  cancels peer sends
- NotificationDispatcher hoisted out of per-tracker loop
- Provider credential resolution unified across all 5 dispatch sites
- HA asyncio.shield now drains inner task on cancellation
- Provider construction switched from if/elif ladder to factory registry
- NUT first poll seeds silently (no spurious ups_on_battery)
- Quiet-hours gate: event-type-disabled now wins over deferral
- APScheduler drain job ID resolution upgraded to seconds
- HA on_status_change wired through to EventLog
- Webhook payload rollback failures now logged (not swallowed)
- Batched receivers/chats/bots in load_link_data (was per-target N+1)
- flag_modified on JSON column reassignments in deferred_dispatch

## Database
- UNIQUE indexes on service_provider.webhook_token,
  telegram_bot.webhook_path_id, partial UNIQUE on telegram_bot.bot_id,
  telegram_chat(bot_id, chat_id), notification_tracker_target unique link,
  partial UNIQUE on bridge_self provider per user
- Composite ix_event_log_user_event_type_created index
- save_chat_from_webhook switched to ON CONFLICT DO UPDATE
- ondelete=CASCADE on user-id FKs (model annotation; app-side cascade
  delete added for existing data)
- delete_notification_tracker converted from N+1 to bulk DELETE/UPDATE
- Module-level asyncio.Lock replaced with lazy _get_lock() pattern
- VACUUM INTO snapshot now PRAGMA integrity_check verified

## Performance
- Jinja2 template compilation LRU cached (lru_cache maxsize=512)
- Per-locale render cache in NotificationDispatcher (skips re-rendering
  identical content for receivers sharing a locale)
- Tracker list cached per provider_id with 5s TTL + explicit invalidation
  on tracker CRUD (relieves HA chat-bus rate query pressure)
- Nav-counts collapsed from 16 round-trips to single UNION ALL
- HA event_log: skip persisting empty assets_added/removed events

## Security hardening
- Mass-assignment guard on Action create/update; cron sub-minute reject
- Backup JSON depth/node-count cap (depth ≤ 10, nodes ≤ 100k)
- _sanitize_config extended to all JSON-typed fields on backup import
- Telegram _safe_get walks redirects manually with SSRF revalidation
- Bcrypt 72-byte password length cap with clear 422
- Webhook payload body redaction; sensitive substring set extended with
  oauth/client_secret/webhook_secret/csrf in both header filter and
  template extras filter

## Frontend
- 76 catch (err: any) sites converted to errMsg(err) helper
- globalProviderFilter: pure getter; reconciliation moved to one-time
  $effect in +layout
- Provider-filter binding: removed paired $effects + _syncingFilter flag,
  now one-way derived
- entity-cache: separate _refreshing flag for background re-fetches
- api.ts 401 handling: AuthRedirectError class + dedup _redirecting flag,
  goto() instead of window.location.href
- a11y: aria-expanded on mobile More, role=switch + aria-checked on
  Telegram bot toggles

## Tests & operations
- CI pytest gate added to .gitea/workflows/build.yml + release.yml
  (wheel-built install to dodge editable-install slowness)
- /api/ready upgraded to deep healthcheck (db SELECT 1, scheduler.running,
  HA supervisor presence) returning {ready, checks, errors, version}
- /api/metrics endpoint with prometheus_client (deferred_pending,
  event_log_total, dispatch_duration, poll_failures, send_failures)
- New OPERATIONS.md covering deploy, healthchecks, metrics, backup/restore
  procedures, log handling, common scenarios, upgrade flow
- New tests: test_bridge_self (11), test_gitea_parser (9),
  test_planka_parser (6), test_immich_change_detector (6),
  test_backup_roundtrip (1)

## New feature: bridge self-monitoring
- New bridge_self provider type — internal sink for bridge health events
- Three event types: bridge_self_poll_failures (consecutive tracker poll
  failures), bridge_self_deferred_backlog (pending count crosses
  threshold), bridge_self_target_failures (consecutive 5xx/network
  failures per target)
- Per-user thresholds (defaults: 3 / 100 / 5) configurable via the
  provider config form
- Auto-seeded on user create + /setup + boot backfill for existing users
- Anti-spam: counters reset after emission; backlog uses transition latch
- Self-loop guard: bridge_self failures don't count toward target-failure
  thresholds (logged only) — wire to your own Telegram/Email/Matrix to
  get notified when polls/dispatches/sends fail
- 6 default templates (3 events × 2 locales), tracking config columns
  with backfill migration, frontend descriptor (excluded from "create
  provider" wizard since auto-managed)

Operator-visible behavior changes (call out in release notes):
- NOTIFY_BRIDGE_TELEGRAM_WEBHOOK_SECRET now REQUIRED for webhook mode
- Existing webhook providers with auth_mode="none" need explicit opt-in
- Generic webhook endpoint rate-limited 60/min per source IP
- HA disconnect/reconnect writes ha_status_* EventLog rows
- Every user gets a bridge_self provider — wire it to a target to
  receive failure alerts

Pre-existing test failures (test_ssrf, test_release_provider) on
Python 3.13 are unrelated; CI runs on 3.12.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-16 02:16:49 +03:00
parent 22127e2a59
commit 10d30fc956
97 changed files with 5423 additions and 821 deletions
@@ -1,7 +1,7 @@
<script lang="ts">
import { onMount, onDestroy } from 'svelte';
import { slide } from 'svelte/transition';
import { api, getBlockedBy, type BlockedByDetail } from '$lib/api';
import { api, getBlockedBy, type BlockedByDetail , errMsg} from '$lib/api';
import { topbarAction } from '$lib/stores/topbar-action.svelte';
import BlockedByModal from '$lib/components/BlockedByModal.svelte';
import { t } from '$lib/i18n';
@@ -28,13 +28,13 @@
import Button from '$lib/components/Button.svelte';
import MetaStrip, { type MetaTile } from '$lib/components/MetaStrip.svelte';
/** Grid-select item source lookup maps descriptor string name to actual function. */
/** Grid-select item source lookup — maps descriptor string name to actual function. */
const gridItemSources: Record<string, () => any[]> = {
sortByItems, sortOrderItems, albumModeItems, assetTypeItems, memorySourceItems,
};
/**
* HH:MM, comma-separated: "09:00" or "09:00, 18:30" the only format cron
* HH:MM, comma-separated: "09:00" or "09:00, 18:30" — the only format cron
* dispatch accepts. Matched on blur for time-list fields; invalid values
* are surfaced inline next to the input.
*/
@@ -43,7 +43,7 @@
/** Per-field error messages surfaced inline under time-list inputs. */
let timeListErrors = $state<Record<string, string>>({});
/** Normalize "9:0 , 18:30" "09:00,18:30" on blur, clear error when valid. */
/** Normalize "9:0 , 18:30" в†’ "09:00,18:30" on blur, clear error when valid. */
function normalizeTimeList(key: string) {
const raw = String(form[key] ?? '').trim();
if (!raw) { timeListErrors = { ...timeListErrors, [key]: '' }; return; }
@@ -74,8 +74,8 @@
}
/**
* Quiet-hours preview: "22:00 07:00 next day (9h)" or "Quiet period is 0
* minutes adjust times" when start equals end. Handles overnight ranges
* Quiet-hours preview: "22:00 в†’ 07:00 next day (9h)" or "Quiet period is 0
* minutes — adjust times" when start equals end. Handles overnight ranges
* (start > end) correctly.
*/
function quietHoursPreview(start: string, end: string): string {
@@ -92,8 +92,8 @@
const m = span % 60;
const dur = m === 0 ? `${h}h` : `${h}h ${m}m`;
const arrow = overnight
? `${start} ${end} ${t('trackingConfig.nextDay')}`
: `${start} ${end}`;
? `${start} в†’ ${end} ${t('trackingConfig.nextDay')}`
: `${start} в†’ ${end}`;
return `${arrow} (${dur})`;
}
@@ -112,12 +112,12 @@
/**
* Inline preview of the shipped default template for a scheduled/periodic/
* memory slot. Using the shipped default (not a tracker's current template)
* keeps this scoped to the tracking-config page which has no concept of
* keeps this scoped to the tracking-config page — which has no concept of
* which TemplateConfig a given tracker uses. Users who want to edit the
* actual config can click "Edit template" in the modal footer.
*
* ``previewLocale`` is modal-scoped so switching tabs only refetches for
* this preview the user's UI locale (and other previews) are untouched.
* this preview — the user's UI locale (and other previews) are untouched.
*/
let previewModal = $state<{ slotName: string; rendered: string; error: string; locale: string } | null>(null);
let previewLoading = $state(false);
@@ -158,8 +158,8 @@
error: res?.error || '',
locale,
};
} catch (err: any) {
previewModal = { slotName, rendered: '', error: err.message, locale };
} catch (err: unknown) {
previewModal = { slotName, rendered: '', error: errMsg(err), locale };
} finally {
previewLoading = false;
}
@@ -217,7 +217,7 @@
});
async function load() {
try { await trackingConfigsCache.fetch(true); }
catch (err: any) { error = err.message || t('common.loadError'); snackError(error); }
catch (err: unknown) { error = errMsg(err, t('common.loadError')); snackError(error); }
finally { loaded = true; highlightFromUrl(); _openEditFromUrl(); }
}
@@ -262,7 +262,7 @@
if (config.quiet_hours_start && config.quiet_hours_end) {
tiles.push({
icon: 'mdiWeatherNight',
label: `${config.quiet_hours_start}${config.quiet_hours_end}`,
label: `${config.quiet_hours_start}–${config.quiet_hours_end}`,
hint: t('trackingConfig.quietHoursStart'),
tone: 'citrus',
mono: true,
@@ -285,7 +285,7 @@
else await api('/tracking-configs', { method: 'POST', body: JSON.stringify(form) });
showForm = false; editing = null; await load();
snackSuccess(t('snack.trackingConfigSaved'));
} catch (err: any) { error = err.message; snackError(err.message); }
} catch (err: unknown) { const __m = errMsg(err); error = __m; snackError(__m); }
}
let blockedBy = $state<BlockedByDetail | null>(null);
@@ -294,10 +294,10 @@
id,
onconfirm: async () => {
try { await api(`/tracking-configs/${id}`, { method: 'DELETE' }); await load(); snackSuccess(t('snack.trackingConfigDeleted')); }
catch (err: any) {
catch (err: unknown) {
const bb = getBlockedBy(err);
if (bb) { blockedBy = bb; return; }
error = err.message; snackError(err.message);
const m = errMsg(err); error = m; snackError(m);
}
finally { confirmDelete = null; }
}
@@ -344,7 +344,7 @@
{/if}
</div>
<!-- Event tracking driven by descriptor -->
<!-- Event tracking — driven by descriptor -->
{#if descriptor}
<fieldset class="border border-[var(--color-border)] rounded-md p-3">
<legend class="text-sm font-medium px-1">{t('trackingConfig.eventTracking')}</legend>
@@ -377,7 +377,7 @@
{/if}
</fieldset>
<!-- Feature sections (periodic, scheduled, memory) driven by descriptor -->
<!-- Feature sections (periodic, scheduled, memory) — driven by descriptor -->
{#each descriptor.featureSections ?? [] as section (section.key)}
<fieldset class="border border-[var(--color-border)] rounded-md p-3">
<legend class="text-sm font-medium px-1">
@@ -494,9 +494,9 @@
</div>
<p class="text-sm text-[var(--color-muted-foreground)] list-row__secondary">
{(desc?.eventFields ?? []).filter(f => (config as Record<string, any>)[f.key]).map(f => t(f.label)).join(', ')}
{config.periodic_enabled ? ` · ${t('trackingConfig.periodic')}` : ''}
{config.scheduled_enabled ? ` · ${t('trackingConfig.scheduled')}` : ''}
{config.memory_enabled ? ` · ${t('trackingConfig.memory')}` : ''}
{config.periodic_enabled ? ` Р’В· ${t('trackingConfig.periodic')}` : ''}
{config.scheduled_enabled ? ` Р’В· ${t('trackingConfig.scheduled')}` : ''}
{config.memory_enabled ? ` Р’В· ${t('trackingConfig.memory')}` : ''}
</p>
</div>
<MetaStrip tiles={trackingConfigTiles(config)} />
@@ -518,7 +518,7 @@
<BlockedByModal open={!!blockedBy} detail={blockedBy} onclose={() => blockedBy = null} />
<Modal open={previewModal !== null}
title={previewModal ? `${t('trackingConfig.previewTemplate')} ${previewModal.slotName}` : ''}
title={previewModal ? `${t('trackingConfig.previewTemplate')} — ${previewModal.slotName}` : ''}
onclose={() => previewModal = null}>
{#if previewModal}
{#if previewLocales.length > 1}
@@ -537,7 +537,7 @@
{t('trackingConfig.previewSampleNote')}
</p>
<!-- Keep the prior rendered/error box mounted while refetching on locale
switch just dim it. Unmounting and replacing with a small ""
switch — just dim it. Unmounting and replacing with a small "…"
placeholder caused a one-frame layout jump as the modal shrank and
then re-expanded. -->
<div class="relative mb-3" style="opacity: {previewLoading ? 0.5 : 1}; transition: opacity 0.15s ease;">
@@ -550,7 +550,7 @@
<pre class="whitespace-pre-wrap text-xs">{@html sanitizePreview(previewModal.rendered)}</pre>
</div>
{:else}
<div class="p-3 text-xs" style="color: var(--color-muted-foreground);"></div>
<div class="p-3 text-xs" style="color: var(--color-muted-foreground);">…</div>
{/if}
</div>
<div class="flex gap-2 justify-end mt-3">