# Home Assistant Provider — Implementation Plan > Status: **planned, not started**. Sequencing: third item on the backlog > (see [feature-backlog.md](feature-backlog.md)). > Last updated: 2026-05-13. ## Decision: WebSocket subscription, not webhook We considered three ingest modes (webhook automation, WebSocket subscription, hybrid). The WebSocket route is chosen as the architectural foundation because the medium-term roadmap forces it anyway: | Phase | Capability | Needs API access? | | --- | --- | --- | | 1 | Subscribe to events, emit notifications | Read (event stream) | | 2 | Bot commands (`/state`, `/entities`, `/areas`) | Read (REST or WS get_states) | | 3 | Smart Actions (`light.turn_on`, scene activation) | Write (call_service) | A webhook-only Phase 1 would still need a REST client by Phase 2 and a write path by Phase 3 — net result is two client implementations + one event pipeline refactor. WebSocket consolidates all three phases on one connection. **Tradeoff (be honest):** WebSocket introduces a long-lived-connection pattern this codebase does not have yet. Reconnect logic, missed-events-on-restart gap, and a new shape on the `ServiceProvider` ABC are real costs. Phase 1 is **not** shippable in one short session — plan for a multi-session build. ## Provider abstraction extension The current `ServiceProvider` ABC ([packages/core/src/notify_bridge_core/providers/base.py](../../packages/core/src/notify_bridge_core/providers/base.py)) is poll-oriented: every provider implements `poll(collection_ids, state) → (events, new_state)`. Webhook providers (Gitea, Planka, Webhook) satisfy this by no-op'ing `poll` and shoving events in via `api/webhooks.py` instead. Home Assistant fits neither cleanly. The plan: 1. Add an **optional** `async subscribe(emit) → None` method on the base ABC. Default implementation raises `NotImplementedError`. Polling providers do not override it. The scheduler / lifecycle layer (currently `services/watcher.py`) gains a "subscription manager" branch that, for any provider whose class overrides `subscribe`, starts a long-lived task instead of registering a polling job. 2. `emit` is a callback `(event: ServiceEvent) → None` provided by the subscription manager — it routes events to the dispatcher exactly like the webhook handler does today. Keeping the dispatch path unchanged is the point of this design. 3. Reconnect lives **inside** `subscribe`: the method is expected to be a `while not cancelled: try connect; on drop, sleep with backoff, retry` loop. The manager cancels the task on shutdown via the cooperative cancel token used elsewhere. This is a small, additive change to one ABC. No existing provider is modified. ## Phase 1 — Subscribe + Dispatch (MVP) ### Scope - Long-lived WebSocket connection to HA, authenticated with a long-lived access token. - Subscribe to the event bus with optional `event_type` filter (defaults to `state_changed`). - Translate HA events into `ServiceEvent` and dispatch via the existing pipeline. Notifications go out exactly as they do today for any other provider. - Filter UI: entity-id glob list, domain allowlist (e.g. `light.*`, `binary_sensor.*`), event-type allowlist. **Hard-required** to avoid the HA firehose drowning the bridge. - Connection test + entity listing via WS `get_states` (no REST client yet — WS gives us both subscribe and read). ### Out of scope for Phase 1 - Bot commands → Phase 2. - Service calls → Phase 3. - Replay of events missed during disconnect (HA does not support this; we document the gap and surface "reconnected after N seconds" in the event log). - Webhook-style ingestion (path-embedded token webhook receiver). If a user prefers webhooks, we add it later as a second ingestion mode on the same provider — out of scope for v1. ### Event types (v1) | HA event | ServiceEvent type | Notification slot | | --- | --- | --- | | `state_changed` | `ha_state_changed` | `message_state_changed` | | `automation_triggered` | `ha_automation_triggered` | `message_automation_triggered` | | `call_service` | `ha_service_called` | `message_service_called` | | (custom event types) | `ha_event_fired` | `message_event_fired` | Default tracking config enables `state_changed` only — the others are loud and opt-in. ### Context variables exposed to templates Pulled directly from HA's `state_changed` payload, normalized: - `entity_id` — `light.kitchen` - `friendly_name` — `attributes.friendly_name` or fallback to `entity_id` - `domain` — derived from `entity_id` before the dot - `old_state` — `from_state.state` - `new_state` — `to_state.state` - `attributes` — dict of new-state attributes (raw) - `device_class` — `attributes.device_class` if present - `area` — `attributes.area_id` if present (best effort; only set if HA exposes it via the area registry, which costs a `get_registry` WS call — see "Open questions") - `last_changed`, `last_updated` — ISO timestamps - For non-`state_changed` events: `event_type`, `event_data` (full dict) ### File touch map (Phase 1) **Core** (`packages/core/src/notify_bridge_core/`) | Path | Action | Notes | | --- | --- | --- | | `providers/base.py` | Modify | Add optional `subscribe(emit)` ABC method (default `NotImplementedError`); add `HOME_ASSISTANT = "home_assistant"` to `ServiceProviderType` | | `providers/capabilities.py` | Modify | Add `HOME_ASSISTANT_CAPABILITIES` + register | | `providers/home_assistant/__init__.py` | Create | Export + register template variables | | `providers/home_assistant/client.py` | Create | WebSocket client (auth, subscribe, get_states, call_service stub) | | `providers/home_assistant/event_parser.py` | Create | HA event dict → `ServiceEvent` | | `providers/home_assistant/provider.py` | Create | Class with `connect`, `disconnect`, `subscribe`, `list_collections` (entity list), `get_available_variables`, `get_provider_config_schema`, `test_connection`. `poll` raises NotImplementedError. | | `templates/defaults/en/home_assistant_*.jinja2` | Create | 4 slot templates | | `templates/defaults/ru/home_assistant_*.jinja2` | Create | 4 slot templates | | `templates/defaults/loader.py` | Modify | Add to `PROVIDER_SLOT_FILE_MAP` | | `templates/command_defaults/loader.py` | Modify | Stub entry — empty commands list for now | | `templates/context.py` | Modify | HA context builder | | `templates/validator.py` | Modify | Whitelist HA variable names | **Server** (`packages/server/src/notify_bridge_server/`) | Path | Action | Notes | | --- | --- | --- | | `services/watcher.py` *(or scheduler / lifecycle module that hosts polling)* | Modify | Add subscription-manager branch — for providers whose class overrides `subscribe`, start/stop long-running task instead of polling | | `services/scheduler.py` | Verify | Confirm we cancel HA subscription on shutdown (graceful_shutdown_seconds path) | | `api/template_configs.py` | Modify | `get_template_variables()` entry | | `api/command_template_configs.py` | Modify | Sample ctx (minimal for Phase 1 — no commands) | | `services/sample_context.py` | Modify | `_SAMPLE_CONTEXT["home_assistant"]` | | `database/seeds.py` | Modify | Seed notification templates + default tracking config | **Frontend** (`frontend/src/`) | Path | Action | Notes | | --- | --- | --- | | `lib/providers/home-assistant.ts` | Create | Descriptor per CLAUDE.md rule 11 | | `lib/providers/index.ts` | Modify | Register descriptor | | `lib/locales/en.json` | Modify | `providers.typeHomeAssistant`, `gridDesc.providerHomeAssistant` | | `lib/locales/ru.json` | Modify | Same | **Tests** | Path | Action | | --- | --- | | `packages/core/tests/providers/test_home_assistant_parser.py` | Create — HA payload → `ServiceEvent` | | `packages/core/tests/providers/test_home_assistant_client.py` | Create — WS auth, subscribe, reconnect (use a fake server) | | `packages/server/tests/test_home_assistant_subscription.py` | Create — subscription manager lifecycle, event flows through dispatcher | ### Frontend descriptor essentials ```text type: "home_assistant" defaultName: "Home Assistant" icon: "home" (consider Lucide icon; HA logo if a custom asset exists) hasUrl: true // base URL of HA (used to derive WS URL) configFields: - url: http(s)://homeassistant.local:8123 - access_token: long-lived access token (required) - allowed_event_types: comma-separated, defaults to "state_changed" eventFields: 4 checkboxes (state_changed, automation_triggered, call_service, event_fired) extraTrackingFields: - entity_glob: tag input ("light.*", "binary_sensor.*_motion") - domain_allowlist: tag input collectionMeta: { label: "Entities", icon: "..." } webhookBased: false // we are NOT webhook based ``` WS URL is derived: `wss://{host}/api/websocket` (or `ws://` for plain http HA). Document this in the UI hint. ### Auth model - **Long-lived access token** from HA (Profile → Long-Lived Access Tokens). - Stored encrypted at rest via the same path the other providers use for secrets (the bridge already has a secret-encryption helper — verify the exact module name during implementation). - WS auth handshake: connect → server sends `auth_required` → client sends `{type: "auth", access_token: "..."}` → server replies `auth_ok` or `auth_invalid`. ### Risks / open questions (Phase 1) 1. **Reconnect strategy.** Exponential backoff capped at 60s, jittered. On reconnect, log a `connection_restored_after` event so the UI can surface the gap. Document that HA does not support event replay. 2. **Area registry.** Pulling `area_id` for entities requires a separate `config/area_registry/list` WS call. Decision needed: fetch once on connect and cache, refetch on `area_registry_updated` event, or skip `area` from the context entirely in v1. Recommendation: fetch on connect, refetch on `area_registry_updated`, skip if it fails (best-effort). 3. **TLS verification for self-signed HA.** Homelab users often have self-signed certs. Need a `verify_tls: bool` config field (default true) and a clear warning when disabled. Same pattern as `NOTIFY_BRIDGE_ALLOW_PRIVATE_URLS` for the SSRF case. 4. **Backpressure.** HA's `state_changed` can fire hundreds of events per minute in a busy install. The subscription manager must drop or coalesce if the dispatcher backlog grows beyond a threshold. Cheapest cut: a bounded `asyncio.Queue` between WS receiver and dispatch — `put_nowait` with overflow counter visible in the event log. 5. **Entity filter precedence.** Tracking-config has `collection_ids` (entity_id list) and we want `entity_glob` + `domain_allowlist`. Decision: if both `collection_ids` and globs are set, union them (any match passes). Documented prominently in the tracker UI. 6. **Library choice.** `hass-client` is a Python WS client maintained by the HA community; alternative is rolling our own with `websockets`. The latter is ~150 LOC and has no external dependency surface. Recommendation: roll our own. Re-evaluate if Phase 3 needs registry-aware service calls. ## Phase 2 — Bot Commands Adds Telegram bot commands for HA tracking configs. - `/status` — connection status, subscribed event count - `/entities ` — list matching entities + current state - `/state ` — full state + attributes for one entity - `/areas` — area registry summary - `/help` These use the existing WS connection (no new client) via WS commands `get_states`, `config/area_registry/list`. Template slots and command template configs follow the same pattern as Gitea/Planka — see [CLAUDE.md](../../CLAUDE.md) rule 7 / rule 11 for the full set of locations that must be updated. Out-of-scope for Phase 2: any command that mutates HA state. ## Phase 3 — Smart Actions (Service Calls) A new action descriptor in the existing Smart Actions framework ([packages/core/src/notify_bridge_core/providers/actions.py](../../packages/core/src/notify_bridge_core/providers/actions.py)). - Action type: `ha_call_service` - Rule: trigger event → service call (e.g. "on motion event in `binary_sensor.front_door` → call `light.turn_on` on `light.porch`") - Executor uses the existing WS connection to send `call_service`. This phase is gated behind explicit per-target authorization in the UI — HA service calls can do anything the access token allows, including unlocking doors. Default state: **disabled**, with a clear consent flow when enabling. ## Rough effort estimates These are rough — sub-task discovery during Phase 1 will refine them. | Phase | Estimate (focused work) | | --- | --- | | Phase 1 (subscribe + dispatch) | 2–3 sessions | | Phase 2 (bot commands) | 1 session | | Phase 3 (smart actions) | 1–2 sessions | ## When to start Phase 1 work order, once you green-light it: 1. ABC extension (`base.py`) + tests for the new `subscribe` shape on a fake provider. 2. WS client + parser + unit tests against recorded HA fixtures (no live HA needed for these). 3. Subscription manager in `services/watcher.py` — integration test with the fake provider from step 1. 4. Templates (en + ru), capabilities entry, validator whitelist. 5. Server: seeds, sample context, template_configs entry. 6. Frontend: descriptor, locale keys, i18n. 7. End-to-end smoke test against a real HA instance (homelab). Backend restart cadence per the project rule: after **every** change in `packages/server/` or `packages/core/`. ## Decision log - **2026-05-13** — Plan drafted. Ingest mode = WebSocket (chosen over webhook for future-proofing toward Phases 2 + 3). Not started.