feat: Home Assistant provider — WebSocket subscription + bot commands

Adds Home Assistant as a service provider with two coordinated surfaces: Notifications (subscription): - Long-lived WebSocket client (aiohttp ws_connect) with auth handshake, exponential-backoff reconnect, bounded event queue, and area-registry enrichment cached per (re)connect - ServiceProvider ABC gains an optional `subscribe()` method for push-style providers; HomeAssistantServiceProvider uses it via a per-provider supervisor task started in the FastAPI lifespan - 4 event types (state_changed, automation_triggered, call_service, event_fired), 4 default Jinja templates (en + ru), HA-specific tracker filters (entity_glob, domain_allowlist, exact entity ids) - Extracted shared dispatch pipeline (api/webhooks.py → services/ event_dispatch.py) so subscription and webhook ingest share the same event_log + deferred-dispatch + quiet-hours code path Bot commands: - /status, /entities [glob], /state <entity_id>, /areas - Multi-command WS session so /status and /areas cost one handshake - Sensitive-attribute blocklist (camera access_token, entity_picture, etc.) and 30-attribute cap to keep /state output safe and within Telegram's message size - Error-message redaction strips URL userinfo before surfacing to chat Frontend: - HA descriptor with toggle ConfigField type (new) and tag-input filter mode for free-text glob/domain lists (new TagInput component) - 15 command slots + 4 notification slots wired into the existing template-config UI
2026-05-13 14:31:56 +03:00
parent 90f958bdc6
commit 22127e2a59
79 changed files with 4042 additions and 210 deletions
@@ -0,0 +1,177 @@
+# Feature Backlog
+
+Curated feature ideas, narrowed from a brainstorming pass on 2026-05-13.
+Order is **rough sequencing preference**, not strict priority — adjust as we go.
+
+---
+
+## 1. Quiet Hours — close the gaps in the existing system
+
+**Reality check (verified 2026-05-13).** Quiet hours are already shipped under
+the "deferred dispatch" name in v0.8.0. The pipeline lives at
+`packages/server/src/notify_bridge_server/services/deferred_dispatch.py` with
+helpers in `dispatch_helpers.py` and tests in
+`tests/test_deferred_dispatch.py`. What exists:
+
+- Per-tracking-config window: `tracking_config.quiet_hours_enabled`,
+  `quiet_hours_start`, `quiet_hours_end`.
+- Per-link override: `notification_tracker_target.quiet_hours_start`,
+  `quiet_hours_end`.
+- Smart coalescing (asset add + asset remove during a window cancels each
+  other out, set-union merge for repeated adds).
+- Post-window drain via APScheduler one-shot date jobs.
+- Wall-clock event types (`scheduled_message`) drop instead of deferring.
+- Frontend status surface: `deferred`, `deferred_then_dropped`,
+  `deferred_then_failed`, with `deferred_until` and `deferred_for_seconds`
+  fields exposed in the event log.
+
+**What's NOT there (the actual gaps):**
+
+| Gap | Sketch |
+| --- | --- |
+| **Target-level windows** | Today, hours bind to the *watcher* (tracking config / link). Users naturally think of DND at the *destination* ("don't ping my phone at night, regardless of source"). New column on `notification_target` + dispatcher gate. |
+| **Multiple windows per row** | Today is a single HH:MM range. Real schedules want weekday-evening + weekend-all-day. JSON list of windows. |
+| **Days-of-week** | Same window every day. Need `days: ["mon", "tue", ...]` filter per window. |
+| **Per-window timezone** | Uses the global app TZ. Multi-traveller / multi-target setups want per-window TZ. |
+| **Silent mode** | Modes today are defer-or-drop. Telegram `disable_notification=true` ("send but don't ring") is a third useful mode. |
+| **Per-receiver windows** | One bot → multiple chats, each potentially with its own DND. Today it's all-or-nothing per target. |
+
+**Recommended cut for v1 of "extend quiet hours":**
+
+- Add target-level quiet hours (new column `notification_target.quiet_hours_json`
+  = list of `{days, start, end, mode, tz}`).
+- Modes: `drop`, `defer`, `silent`. `defer` reuses the existing
+  deferred-dispatch pipeline (just changes who decides). `silent` maps to
+  `disable_notification=true` for Telegram; other targets fall through to
+  normal send (or we treat `silent` as `defer` for non-Telegram targets — TBD).
+- Dispatcher precedence: target window wins over link/tracking-config window
+  when both are configured. Document this explicitly.
+- Frontend: new "Quiet hours" fieldset in the target editor (Aurora cassette
+  style). Reuses Timezone picker; new day-picker chip row.
+- Skip days-of-week + multi-window in v1 if scope grows — ship the target-level
+  cut first, then iterate.
+
+**Open questions.**
+
+- How exactly do target / link / tracking-config windows combine? Proposal:
+  any window covering "now" wins (drop > defer > silent precedence).
+- Should `silent` for non-Telegram targets degrade to normal send or to
+  defer? Defer is the safer default.
+- Does the event log need a new status (`silenced` / `dropped_by_target_qh`)
+  to make precedence visible?
+
+---
+
+## 2. Immich Smart Actions (expand beyond Auto-Organize)
+
+**What.** Extend the existing Smart Actions pattern (currently:
+**Immich Auto-Organize**) with more rule-driven actions against the Immich API.
+
+**Why.** Auto-Organize already proves the descriptor → rule editor → executor
+pipeline. Adding actions is mostly authoring new executors + small UI rule
+shapes, not new infra.
+
+**Candidates (pick in this order).**
+
+1. **Auto-favorite by person** — when an asset is detected containing person
+   X (or any of a set), mark it favorite.
+2. **Auto-archive by age / album** — assets older than N days in a given
+   album get archived. Pair with a "dry-run shows count" UX like
+   Auto-Organize already has.
+3. **Duplicate cluster nudge** — periodically run Immich's duplicate API and
+   send a digest notification with inline buttons ("review", "ignore for 30d").
+   Depends on inline-button work (see backlog item 4 dependencies).
+4. **Share-link rotation** — for an album, regenerate the share link every N
+   days; notify with the new URL.
+5. **Pending-delete review** — push a weekly digest of trash contents before
+   Immich's auto-purge fires.
+
+**Shape.**
+
+- Reuse the existing **action descriptor** layer
+  (`packages/core/src/notify_bridge_core/providers/actions.py`,
+  `action_executor.py`) and the frontend rule editor used by Auto-Organize.
+- Each new action = (a) executor in core, (b) rule schema in the descriptor,
+  (c) frontend descriptor extension for the rule editor fields.
+- Persist as `provider_actions` rows (already exists for Auto-Organize) with
+  a discriminator + JSON config.
+
+**Open questions.**
+
+- Does "auto-favorite by person" need a confirmation queue or run silently?
+  Default to silent + event_log entry.
+- How do we surface "this action moved/changed X assets" in the dashboard?
+  Probably a per-action stat tile on the provider detail page.
+
+---
+
+## 3. Home Assistant Provider
+
+**Full plan:** [feature-home-assistant.md](feature-home-assistant.md).
+
+**One-line summary.** New WebSocket-based service provider with a 3-phase
+ship: subscribe + dispatch (Phase 1), bot commands (Phase 2), HA service
+calls as Smart Actions (Phase 3). Chosen over webhook ingest because
+Phases 2 + 3 force a long-lived API connection anyway; consolidating on WS
+avoids a refactor.
+
+**Status:** planned, not started.
+
+---
+
+## 4. Block-Based Template Builder
+
+**What.** A visual, drag-and-drop builder for notification and command
+templates that compiles down to Jinja2. Lives alongside (not instead of) the
+current `JinjaEditor`. Author can flip between views.
+
+**Why.** The current Jinja editor is powerful but unforgiving. A block UI
+lowers the floor for new users and provides a discovery surface for the
+variables documented in `template_configs.py`.
+
+**Shape.**
+
+- Frontend-only feature for v1 — compiles to the same Jinja strings the
+  backend already accepts.
+- Blocks: `Text`, `Variable`, `If`, `For`, `Link`, `Image`, `Icon`, `Caption`,
+  `Group` (HTML span/group). Each block knows its serialized Jinja
+  representation.
+- Round-trip: variables, simple `{% if %}` / `{% for %}` blocks, and string
+  literals parse back to blocks; arbitrary Jinja stays in a "Raw" block that
+  the user can edit as text.
+- Variable picker reads `get_template_variables(provider_type, slot)`. This is
+  the same data already shown in the template-help panel.
+- Preview pane unchanged — reuses `services/sample_context.py` server
+  rendering.
+- Toggle in the template editor: **Visual / Code**.
+
+**Open questions.**
+
+- Round-tripping arbitrary Jinja is hard. v1: parseable subset → blocks,
+  anything else → single Raw block. Show a banner explaining.
+- Locale handling: same compiled Jinja, just authored per locale tab.
+- Do we want a marketplace of pre-built block compositions? Out of scope for
+  v1 — bundle import/export is a separate backlog item.
+
+---
+
+## Recommended Sequencing
+
+1. **Quiet Hours per Target** — small, isolated, immediate user value.
+2. **Immich Smart Actions** — incremental on existing pattern; ship one
+   action at a time (start with auto-favorite by person).
+3. **Home Assistant Provider** — multi-file, follows new-provider checklist;
+   biggest user-base expansion.
+4. **Block-Based Template Builder** — largest frontend lift; benefits from
+   the variable-doc work that the other features will exercise.
+
+Dependencies are loose — 1 and 2 are independent of 3 and 4. The block
+builder pairs nicely with Home Assistant because HA's rich context surfaces
+the value of an easier authoring UX.
+
+---
+
+## Decision log
+
+- **2026-05-13** — Backlog seeded with these four items selected from a
+  broader brainstorm. Not started.
@@ -0,0 +1,284 @@
+# Home Assistant Provider — Implementation Plan
+
+> Status: **planned, not started**. Sequencing: third item on the backlog
+> (see [feature-backlog.md](feature-backlog.md)).
+> Last updated: 2026-05-13.
+
+## Decision: WebSocket subscription, not webhook
+
+We considered three ingest modes (webhook automation, WebSocket subscription,
+hybrid). The WebSocket route is chosen as the architectural foundation because
+the medium-term roadmap forces it anyway:
+
+| Phase | Capability | Needs API access? |
+| --- | --- | --- |
+| 1 | Subscribe to events, emit notifications | Read (event stream) |
+| 2 | Bot commands (`/state`, `/entities`, `/areas`) | Read (REST or WS get_states) |
+| 3 | Smart Actions (`light.turn_on`, scene activation) | Write (call_service) |
+
+A webhook-only Phase 1 would still need a REST client by Phase 2 and a write
+path by Phase 3 — net result is two client implementations + one event
+pipeline refactor. WebSocket consolidates all three phases on one connection.
+
+**Tradeoff (be honest):** WebSocket introduces a long-lived-connection pattern
+this codebase does not have yet. Reconnect logic, missed-events-on-restart
+gap, and a new shape on the `ServiceProvider` ABC are real costs. Phase 1 is
+**not** shippable in one short session — plan for a multi-session build.
+
+## Provider abstraction extension
+
+The current `ServiceProvider` ABC
+([packages/core/src/notify_bridge_core/providers/base.py](../../packages/core/src/notify_bridge_core/providers/base.py))
+is poll-oriented: every provider implements `poll(collection_ids, state) →
+(events, new_state)`. Webhook providers (Gitea, Planka, Webhook) satisfy this
+by no-op'ing `poll` and shoving events in via `api/webhooks.py` instead.
+
+Home Assistant fits neither cleanly. The plan:
+
+1. Add an **optional** `async subscribe(emit) → None` method on the base ABC.
+   Default implementation raises `NotImplementedError`. Polling providers do
+   not override it. The scheduler / lifecycle layer (currently `services/watcher.py`)
+   gains a "subscription manager" branch that, for any provider whose class
+   overrides `subscribe`, starts a long-lived task instead of registering
+   a polling job.
+2. `emit` is a callback `(event: ServiceEvent) → None` provided by the
+   subscription manager — it routes events to the dispatcher exactly like the
+   webhook handler does today. Keeping the dispatch path unchanged is the
+   point of this design.
+3. Reconnect lives **inside** `subscribe`: the method is expected to be a
+   `while not cancelled: try connect; on drop, sleep with backoff, retry`
+   loop. The manager cancels the task on shutdown via the cooperative cancel
+   token used elsewhere.
+
+This is a small, additive change to one ABC. No existing provider is
+modified.
+
+## Phase 1 — Subscribe + Dispatch (MVP)
+
+### Scope
+
+- Long-lived WebSocket connection to HA, authenticated with a long-lived
+  access token.
+- Subscribe to the event bus with optional `event_type` filter (defaults to
+  `state_changed`).
+- Translate HA events into `ServiceEvent` and dispatch via the existing
+  pipeline. Notifications go out exactly as they do today for any other
+  provider.
+- Filter UI: entity-id glob list, domain allowlist (e.g. `light.*`,
+  `binary_sensor.*`), event-type allowlist. **Hard-required** to avoid the HA
+  firehose drowning the bridge.
+- Connection test + entity listing via WS `get_states` (no REST client yet —
+  WS gives us both subscribe and read).
+
+### Out of scope for Phase 1
+
+- Bot commands → Phase 2.
+- Service calls → Phase 3.
+- Replay of events missed during disconnect (HA does not support this; we
+  document the gap and surface "reconnected after N seconds" in the event
+  log).
+- Webhook-style ingestion (path-embedded token webhook receiver). If a user
+  prefers webhooks, we add it later as a second ingestion mode on the same
+  provider — out of scope for v1.
+
+### Event types (v1)
+
+| HA event | ServiceEvent type | Notification slot |
+| --- | --- | --- |
+| `state_changed` | `ha_state_changed` | `message_state_changed` |
+| `automation_triggered` | `ha_automation_triggered` | `message_automation_triggered` |
+| `call_service` | `ha_service_called` | `message_service_called` |
+| (custom event types) | `ha_event_fired` | `message_event_fired` |
+
+Default tracking config enables `state_changed` only — the others are loud
+and opt-in.
+
+### Context variables exposed to templates
+
+Pulled directly from HA's `state_changed` payload, normalized:
+
+- `entity_id` — `light.kitchen`
+- `friendly_name` — `attributes.friendly_name` or fallback to `entity_id`
+- `domain` — derived from `entity_id` before the dot
+- `old_state` — `from_state.state`
+- `new_state` — `to_state.state`
+- `attributes` — dict of new-state attributes (raw)
+- `device_class` — `attributes.device_class` if present
+- `area` — `attributes.area_id` if present (best effort; only set if HA
+  exposes it via the area registry, which costs a `get_registry` WS call —
+  see "Open questions")
+- `last_changed`, `last_updated` — ISO timestamps
+- For non-`state_changed` events: `event_type`, `event_data` (full dict)
+
+### File touch map (Phase 1)
+
+**Core** (`packages/core/src/notify_bridge_core/`)
+
+| Path | Action | Notes |
+| --- | --- | --- |
+| `providers/base.py` | Modify | Add optional `subscribe(emit)` ABC method (default `NotImplementedError`); add `HOME_ASSISTANT = "home_assistant"` to `ServiceProviderType` |
+| `providers/capabilities.py` | Modify | Add `HOME_ASSISTANT_CAPABILITIES` + register |
+| `providers/home_assistant/__init__.py` | Create | Export + register template variables |
+| `providers/home_assistant/client.py` | Create | WebSocket client (auth, subscribe, get_states, call_service stub) |
+| `providers/home_assistant/event_parser.py` | Create | HA event dict → `ServiceEvent` |
+| `providers/home_assistant/provider.py` | Create | Class with `connect`, `disconnect`, `subscribe`, `list_collections` (entity list), `get_available_variables`, `get_provider_config_schema`, `test_connection`. `poll` raises NotImplementedError. |
+| `templates/defaults/en/home_assistant_*.jinja2` | Create | 4 slot templates |
+| `templates/defaults/ru/home_assistant_*.jinja2` | Create | 4 slot templates |
+| `templates/defaults/loader.py` | Modify | Add to `PROVIDER_SLOT_FILE_MAP` |
+| `templates/command_defaults/loader.py` | Modify | Stub entry — empty commands list for now |
+| `templates/context.py` | Modify | HA context builder |
+| `templates/validator.py` | Modify | Whitelist HA variable names |
+
+**Server** (`packages/server/src/notify_bridge_server/`)
+
+| Path | Action | Notes |
+| --- | --- | --- |
+| `services/watcher.py` *(or scheduler / lifecycle module that hosts polling)* | Modify | Add subscription-manager branch — for providers whose class overrides `subscribe`, start/stop long-running task instead of polling |
+| `services/scheduler.py` | Verify | Confirm we cancel HA subscription on shutdown (graceful_shutdown_seconds path) |
+| `api/template_configs.py` | Modify | `get_template_variables()` entry |
+| `api/command_template_configs.py` | Modify | Sample ctx (minimal for Phase 1 — no commands) |
+| `services/sample_context.py` | Modify | `_SAMPLE_CONTEXT["home_assistant"]` |
+| `database/seeds.py` | Modify | Seed notification templates + default tracking config |
+
+**Frontend** (`frontend/src/`)
+
+| Path | Action | Notes |
+| --- | --- | --- |
+| `lib/providers/home-assistant.ts` | Create | Descriptor per CLAUDE.md rule 11 |
+| `lib/providers/index.ts` | Modify | Register descriptor |
+| `lib/locales/en.json` | Modify | `providers.typeHomeAssistant`, `gridDesc.providerHomeAssistant` |
+| `lib/locales/ru.json` | Modify | Same |
+
+**Tests**
+
+| Path | Action |
+| --- | --- |
+| `packages/core/tests/providers/test_home_assistant_parser.py` | Create — HA payload → `ServiceEvent` |
+| `packages/core/tests/providers/test_home_assistant_client.py` | Create — WS auth, subscribe, reconnect (use a fake server) |
+| `packages/server/tests/test_home_assistant_subscription.py` | Create — subscription manager lifecycle, event flows through dispatcher |
+
+### Frontend descriptor essentials
+
+```text
+type: "home_assistant"
+defaultName: "Home Assistant"
+icon: "home" (consider Lucide icon; HA logo if a custom asset exists)
+hasUrl: true            // base URL of HA (used to derive WS URL)
+configFields:
+  - url:                http(s)://homeassistant.local:8123
+  - access_token:       long-lived access token (required)
+  - allowed_event_types: comma-separated, defaults to "state_changed"
+eventFields: 4 checkboxes (state_changed, automation_triggered,
+                           call_service, event_fired)
+extraTrackingFields:
+  - entity_glob: tag input ("light.*", "binary_sensor.*_motion")
+  - domain_allowlist: tag input
+collectionMeta: { label: "Entities", icon: "..." }
+webhookBased: false     // we are NOT webhook based
+```
+
+WS URL is derived: `wss://{host}/api/websocket` (or `ws://` for plain http
+HA). Document this in the UI hint.
+
+### Auth model
+
+- **Long-lived access token** from HA (Profile → Long-Lived Access Tokens).
+- Stored encrypted at rest via the same path the other providers use for
+  secrets (the bridge already has a secret-encryption helper — verify the
+  exact module name during implementation).
+- WS auth handshake: connect → server sends `auth_required` → client sends
+  `{type: "auth", access_token: "..."}` → server replies `auth_ok` or
+  `auth_invalid`.
+
+### Risks / open questions (Phase 1)
+
+1. **Reconnect strategy.** Exponential backoff capped at 60s, jittered.
+   On reconnect, log a `connection_restored_after` event so the UI can
+   surface the gap. Document that HA does not support event replay.
+2. **Area registry.** Pulling `area_id` for entities requires a separate
+   `config/area_registry/list` WS call. Decision needed: fetch once on
+   connect and cache, refetch on `area_registry_updated` event, or skip
+   `area` from the context entirely in v1. Recommendation: fetch on
+   connect, refetch on `area_registry_updated`, skip if it fails (best-effort).
+3. **TLS verification for self-signed HA.** Homelab users often have
+   self-signed certs. Need a `verify_tls: bool` config field (default true)
+   and a clear warning when disabled. Same pattern as
+   `NOTIFY_BRIDGE_ALLOW_PRIVATE_URLS` for the SSRF case.
+4. **Backpressure.** HA's `state_changed` can fire hundreds of events per
+   minute in a busy install. The subscription manager must drop or coalesce
+   if the dispatcher backlog grows beyond a threshold. Cheapest cut: a
+   bounded `asyncio.Queue` between WS receiver and dispatch — `put_nowait`
+   with overflow counter visible in the event log.
+5. **Entity filter precedence.** Tracking-config has `collection_ids`
+   (entity_id list) and we want `entity_glob` + `domain_allowlist`. Decision:
+   if both `collection_ids` and globs are set, union them (any match passes).
+   Documented prominently in the tracker UI.
+6. **Library choice.** `hass-client` is a Python WS client maintained by the
+   HA community; alternative is rolling our own with `websockets`. The
+   latter is ~150 LOC and has no external dependency surface. Recommendation:
+   roll our own. Re-evaluate if Phase 3 needs registry-aware service calls.
+
+## Phase 2 — Bot Commands
+
+Adds Telegram bot commands for HA tracking configs.
+
+- `/status` — connection status, subscribed event count
+- `/entities <glob>` — list matching entities + current state
+- `/state <entity_id>` — full state + attributes for one entity
+- `/areas` — area registry summary
+- `/help`
+
+These use the existing WS connection (no new client) via WS commands
+`get_states`, `config/area_registry/list`. Template slots and command
+template configs follow the same pattern as Gitea/Planka — see
+[CLAUDE.md](../../CLAUDE.md) rule 7 / rule 11 for the full set of locations
+that must be updated.
+
+Out-of-scope for Phase 2: any command that mutates HA state.
+
+## Phase 3 — Smart Actions (Service Calls)
+
+A new action descriptor in the existing Smart Actions framework
+([packages/core/src/notify_bridge_core/providers/actions.py](../../packages/core/src/notify_bridge_core/providers/actions.py)).
+
+- Action type: `ha_call_service`
+- Rule: trigger event → service call (e.g. "on motion event in
+  `binary_sensor.front_door` → call `light.turn_on` on `light.porch`")
+- Executor uses the existing WS connection to send `call_service`.
+
+This phase is gated behind explicit per-target authorization in the UI — HA
+service calls can do anything the access token allows, including unlocking
+doors. Default state: **disabled**, with a clear consent flow when enabling.
+
+## Rough effort estimates
+
+These are rough — sub-task discovery during Phase 1 will refine them.
+
+| Phase | Estimate (focused work) |
+| --- | --- |
+| Phase 1 (subscribe + dispatch) | 2–3 sessions |
+| Phase 2 (bot commands) | 1 session |
+| Phase 3 (smart actions) | 1–2 sessions |
+
+## When to start
+
+Phase 1 work order, once you green-light it:
+
+1. ABC extension (`base.py`) + tests for the new `subscribe` shape on a fake
+   provider.
+2. WS client + parser + unit tests against recorded HA fixtures (no live HA
+   needed for these).
+3. Subscription manager in `services/watcher.py` — integration test with the
+   fake provider from step 1.
+4. Templates (en + ru), capabilities entry, validator whitelist.
+5. Server: seeds, sample context, template_configs entry.
+6. Frontend: descriptor, locale keys, i18n.
+7. End-to-end smoke test against a real HA instance (homelab).
+
+Backend restart cadence per the project rule: after **every** change in
+`packages/server/` or `packages/core/`.
+
+## Decision log
+
+- **2026-05-13** — Plan drafted. Ingest mode = WebSocket (chosen over
+  webhook for future-proofing toward Phases 2 + 3). Not started.