22127e2a59
Adds Home Assistant as a service provider with two coordinated surfaces: Notifications (subscription): - Long-lived WebSocket client (aiohttp ws_connect) with auth handshake, exponential-backoff reconnect, bounded event queue, and area-registry enrichment cached per (re)connect - ServiceProvider ABC gains an optional `subscribe()` method for push-style providers; HomeAssistantServiceProvider uses it via a per-provider supervisor task started in the FastAPI lifespan - 4 event types (state_changed, automation_triggered, call_service, event_fired), 4 default Jinja templates (en + ru), HA-specific tracker filters (entity_glob, domain_allowlist, exact entity ids) - Extracted shared dispatch pipeline (api/webhooks.py → services/ event_dispatch.py) so subscription and webhook ingest share the same event_log + deferred-dispatch + quiet-hours code path Bot commands: - /status, /entities [glob], /state <entity_id>, /areas - Multi-command WS session so /status and /areas cost one handshake - Sensitive-attribute blocklist (camera access_token, entity_picture, etc.) and 30-attribute cap to keep /state output safe and within Telegram's message size - Error-message redaction strips URL userinfo before surfacing to chat Frontend: - HA descriptor with toggle ConfigField type (new) and tag-input filter mode for free-text glob/domain lists (new TagInput component) - 15 command slots + 4 notification slots wired into the existing template-config UI
285 lines
13 KiB
Markdown
285 lines
13 KiB
Markdown
# Home Assistant Provider — Implementation Plan
|
||
|
||
> Status: **planned, not started**. Sequencing: third item on the backlog
|
||
> (see [feature-backlog.md](feature-backlog.md)).
|
||
> Last updated: 2026-05-13.
|
||
|
||
## Decision: WebSocket subscription, not webhook
|
||
|
||
We considered three ingest modes (webhook automation, WebSocket subscription,
|
||
hybrid). The WebSocket route is chosen as the architectural foundation because
|
||
the medium-term roadmap forces it anyway:
|
||
|
||
| Phase | Capability | Needs API access? |
|
||
| --- | --- | --- |
|
||
| 1 | Subscribe to events, emit notifications | Read (event stream) |
|
||
| 2 | Bot commands (`/state`, `/entities`, `/areas`) | Read (REST or WS get_states) |
|
||
| 3 | Smart Actions (`light.turn_on`, scene activation) | Write (call_service) |
|
||
|
||
A webhook-only Phase 1 would still need a REST client by Phase 2 and a write
|
||
path by Phase 3 — net result is two client implementations + one event
|
||
pipeline refactor. WebSocket consolidates all three phases on one connection.
|
||
|
||
**Tradeoff (be honest):** WebSocket introduces a long-lived-connection pattern
|
||
this codebase does not have yet. Reconnect logic, missed-events-on-restart
|
||
gap, and a new shape on the `ServiceProvider` ABC are real costs. Phase 1 is
|
||
**not** shippable in one short session — plan for a multi-session build.
|
||
|
||
## Provider abstraction extension
|
||
|
||
The current `ServiceProvider` ABC
|
||
([packages/core/src/notify_bridge_core/providers/base.py](../../packages/core/src/notify_bridge_core/providers/base.py))
|
||
is poll-oriented: every provider implements `poll(collection_ids, state) →
|
||
(events, new_state)`. Webhook providers (Gitea, Planka, Webhook) satisfy this
|
||
by no-op'ing `poll` and shoving events in via `api/webhooks.py` instead.
|
||
|
||
Home Assistant fits neither cleanly. The plan:
|
||
|
||
1. Add an **optional** `async subscribe(emit) → None` method on the base ABC.
|
||
Default implementation raises `NotImplementedError`. Polling providers do
|
||
not override it. The scheduler / lifecycle layer (currently `services/watcher.py`)
|
||
gains a "subscription manager" branch that, for any provider whose class
|
||
overrides `subscribe`, starts a long-lived task instead of registering
|
||
a polling job.
|
||
2. `emit` is a callback `(event: ServiceEvent) → None` provided by the
|
||
subscription manager — it routes events to the dispatcher exactly like the
|
||
webhook handler does today. Keeping the dispatch path unchanged is the
|
||
point of this design.
|
||
3. Reconnect lives **inside** `subscribe`: the method is expected to be a
|
||
`while not cancelled: try connect; on drop, sleep with backoff, retry`
|
||
loop. The manager cancels the task on shutdown via the cooperative cancel
|
||
token used elsewhere.
|
||
|
||
This is a small, additive change to one ABC. No existing provider is
|
||
modified.
|
||
|
||
## Phase 1 — Subscribe + Dispatch (MVP)
|
||
|
||
### Scope
|
||
|
||
- Long-lived WebSocket connection to HA, authenticated with a long-lived
|
||
access token.
|
||
- Subscribe to the event bus with optional `event_type` filter (defaults to
|
||
`state_changed`).
|
||
- Translate HA events into `ServiceEvent` and dispatch via the existing
|
||
pipeline. Notifications go out exactly as they do today for any other
|
||
provider.
|
||
- Filter UI: entity-id glob list, domain allowlist (e.g. `light.*`,
|
||
`binary_sensor.*`), event-type allowlist. **Hard-required** to avoid the HA
|
||
firehose drowning the bridge.
|
||
- Connection test + entity listing via WS `get_states` (no REST client yet —
|
||
WS gives us both subscribe and read).
|
||
|
||
### Out of scope for Phase 1
|
||
|
||
- Bot commands → Phase 2.
|
||
- Service calls → Phase 3.
|
||
- Replay of events missed during disconnect (HA does not support this; we
|
||
document the gap and surface "reconnected after N seconds" in the event
|
||
log).
|
||
- Webhook-style ingestion (path-embedded token webhook receiver). If a user
|
||
prefers webhooks, we add it later as a second ingestion mode on the same
|
||
provider — out of scope for v1.
|
||
|
||
### Event types (v1)
|
||
|
||
| HA event | ServiceEvent type | Notification slot |
|
||
| --- | --- | --- |
|
||
| `state_changed` | `ha_state_changed` | `message_state_changed` |
|
||
| `automation_triggered` | `ha_automation_triggered` | `message_automation_triggered` |
|
||
| `call_service` | `ha_service_called` | `message_service_called` |
|
||
| (custom event types) | `ha_event_fired` | `message_event_fired` |
|
||
|
||
Default tracking config enables `state_changed` only — the others are loud
|
||
and opt-in.
|
||
|
||
### Context variables exposed to templates
|
||
|
||
Pulled directly from HA's `state_changed` payload, normalized:
|
||
|
||
- `entity_id` — `light.kitchen`
|
||
- `friendly_name` — `attributes.friendly_name` or fallback to `entity_id`
|
||
- `domain` — derived from `entity_id` before the dot
|
||
- `old_state` — `from_state.state`
|
||
- `new_state` — `to_state.state`
|
||
- `attributes` — dict of new-state attributes (raw)
|
||
- `device_class` — `attributes.device_class` if present
|
||
- `area` — `attributes.area_id` if present (best effort; only set if HA
|
||
exposes it via the area registry, which costs a `get_registry` WS call —
|
||
see "Open questions")
|
||
- `last_changed`, `last_updated` — ISO timestamps
|
||
- For non-`state_changed` events: `event_type`, `event_data` (full dict)
|
||
|
||
### File touch map (Phase 1)
|
||
|
||
**Core** (`packages/core/src/notify_bridge_core/`)
|
||
|
||
| Path | Action | Notes |
|
||
| --- | --- | --- |
|
||
| `providers/base.py` | Modify | Add optional `subscribe(emit)` ABC method (default `NotImplementedError`); add `HOME_ASSISTANT = "home_assistant"` to `ServiceProviderType` |
|
||
| `providers/capabilities.py` | Modify | Add `HOME_ASSISTANT_CAPABILITIES` + register |
|
||
| `providers/home_assistant/__init__.py` | Create | Export + register template variables |
|
||
| `providers/home_assistant/client.py` | Create | WebSocket client (auth, subscribe, get_states, call_service stub) |
|
||
| `providers/home_assistant/event_parser.py` | Create | HA event dict → `ServiceEvent` |
|
||
| `providers/home_assistant/provider.py` | Create | Class with `connect`, `disconnect`, `subscribe`, `list_collections` (entity list), `get_available_variables`, `get_provider_config_schema`, `test_connection`. `poll` raises NotImplementedError. |
|
||
| `templates/defaults/en/home_assistant_*.jinja2` | Create | 4 slot templates |
|
||
| `templates/defaults/ru/home_assistant_*.jinja2` | Create | 4 slot templates |
|
||
| `templates/defaults/loader.py` | Modify | Add to `PROVIDER_SLOT_FILE_MAP` |
|
||
| `templates/command_defaults/loader.py` | Modify | Stub entry — empty commands list for now |
|
||
| `templates/context.py` | Modify | HA context builder |
|
||
| `templates/validator.py` | Modify | Whitelist HA variable names |
|
||
|
||
**Server** (`packages/server/src/notify_bridge_server/`)
|
||
|
||
| Path | Action | Notes |
|
||
| --- | --- | --- |
|
||
| `services/watcher.py` *(or scheduler / lifecycle module that hosts polling)* | Modify | Add subscription-manager branch — for providers whose class overrides `subscribe`, start/stop long-running task instead of polling |
|
||
| `services/scheduler.py` | Verify | Confirm we cancel HA subscription on shutdown (graceful_shutdown_seconds path) |
|
||
| `api/template_configs.py` | Modify | `get_template_variables()` entry |
|
||
| `api/command_template_configs.py` | Modify | Sample ctx (minimal for Phase 1 — no commands) |
|
||
| `services/sample_context.py` | Modify | `_SAMPLE_CONTEXT["home_assistant"]` |
|
||
| `database/seeds.py` | Modify | Seed notification templates + default tracking config |
|
||
|
||
**Frontend** (`frontend/src/`)
|
||
|
||
| Path | Action | Notes |
|
||
| --- | --- | --- |
|
||
| `lib/providers/home-assistant.ts` | Create | Descriptor per CLAUDE.md rule 11 |
|
||
| `lib/providers/index.ts` | Modify | Register descriptor |
|
||
| `lib/locales/en.json` | Modify | `providers.typeHomeAssistant`, `gridDesc.providerHomeAssistant` |
|
||
| `lib/locales/ru.json` | Modify | Same |
|
||
|
||
**Tests**
|
||
|
||
| Path | Action |
|
||
| --- | --- |
|
||
| `packages/core/tests/providers/test_home_assistant_parser.py` | Create — HA payload → `ServiceEvent` |
|
||
| `packages/core/tests/providers/test_home_assistant_client.py` | Create — WS auth, subscribe, reconnect (use a fake server) |
|
||
| `packages/server/tests/test_home_assistant_subscription.py` | Create — subscription manager lifecycle, event flows through dispatcher |
|
||
|
||
### Frontend descriptor essentials
|
||
|
||
```text
|
||
type: "home_assistant"
|
||
defaultName: "Home Assistant"
|
||
icon: "home" (consider Lucide icon; HA logo if a custom asset exists)
|
||
hasUrl: true // base URL of HA (used to derive WS URL)
|
||
configFields:
|
||
- url: http(s)://homeassistant.local:8123
|
||
- access_token: long-lived access token (required)
|
||
- allowed_event_types: comma-separated, defaults to "state_changed"
|
||
eventFields: 4 checkboxes (state_changed, automation_triggered,
|
||
call_service, event_fired)
|
||
extraTrackingFields:
|
||
- entity_glob: tag input ("light.*", "binary_sensor.*_motion")
|
||
- domain_allowlist: tag input
|
||
collectionMeta: { label: "Entities", icon: "..." }
|
||
webhookBased: false // we are NOT webhook based
|
||
```
|
||
|
||
WS URL is derived: `wss://{host}/api/websocket` (or `ws://` for plain http
|
||
HA). Document this in the UI hint.
|
||
|
||
### Auth model
|
||
|
||
- **Long-lived access token** from HA (Profile → Long-Lived Access Tokens).
|
||
- Stored encrypted at rest via the same path the other providers use for
|
||
secrets (the bridge already has a secret-encryption helper — verify the
|
||
exact module name during implementation).
|
||
- WS auth handshake: connect → server sends `auth_required` → client sends
|
||
`{type: "auth", access_token: "..."}` → server replies `auth_ok` or
|
||
`auth_invalid`.
|
||
|
||
### Risks / open questions (Phase 1)
|
||
|
||
1. **Reconnect strategy.** Exponential backoff capped at 60s, jittered.
|
||
On reconnect, log a `connection_restored_after` event so the UI can
|
||
surface the gap. Document that HA does not support event replay.
|
||
2. **Area registry.** Pulling `area_id` for entities requires a separate
|
||
`config/area_registry/list` WS call. Decision needed: fetch once on
|
||
connect and cache, refetch on `area_registry_updated` event, or skip
|
||
`area` from the context entirely in v1. Recommendation: fetch on
|
||
connect, refetch on `area_registry_updated`, skip if it fails (best-effort).
|
||
3. **TLS verification for self-signed HA.** Homelab users often have
|
||
self-signed certs. Need a `verify_tls: bool` config field (default true)
|
||
and a clear warning when disabled. Same pattern as
|
||
`NOTIFY_BRIDGE_ALLOW_PRIVATE_URLS` for the SSRF case.
|
||
4. **Backpressure.** HA's `state_changed` can fire hundreds of events per
|
||
minute in a busy install. The subscription manager must drop or coalesce
|
||
if the dispatcher backlog grows beyond a threshold. Cheapest cut: a
|
||
bounded `asyncio.Queue` between WS receiver and dispatch — `put_nowait`
|
||
with overflow counter visible in the event log.
|
||
5. **Entity filter precedence.** Tracking-config has `collection_ids`
|
||
(entity_id list) and we want `entity_glob` + `domain_allowlist`. Decision:
|
||
if both `collection_ids` and globs are set, union them (any match passes).
|
||
Documented prominently in the tracker UI.
|
||
6. **Library choice.** `hass-client` is a Python WS client maintained by the
|
||
HA community; alternative is rolling our own with `websockets`. The
|
||
latter is ~150 LOC and has no external dependency surface. Recommendation:
|
||
roll our own. Re-evaluate if Phase 3 needs registry-aware service calls.
|
||
|
||
## Phase 2 — Bot Commands
|
||
|
||
Adds Telegram bot commands for HA tracking configs.
|
||
|
||
- `/status` — connection status, subscribed event count
|
||
- `/entities <glob>` — list matching entities + current state
|
||
- `/state <entity_id>` — full state + attributes for one entity
|
||
- `/areas` — area registry summary
|
||
- `/help`
|
||
|
||
These use the existing WS connection (no new client) via WS commands
|
||
`get_states`, `config/area_registry/list`. Template slots and command
|
||
template configs follow the same pattern as Gitea/Planka — see
|
||
[CLAUDE.md](../../CLAUDE.md) rule 7 / rule 11 for the full set of locations
|
||
that must be updated.
|
||
|
||
Out-of-scope for Phase 2: any command that mutates HA state.
|
||
|
||
## Phase 3 — Smart Actions (Service Calls)
|
||
|
||
A new action descriptor in the existing Smart Actions framework
|
||
([packages/core/src/notify_bridge_core/providers/actions.py](../../packages/core/src/notify_bridge_core/providers/actions.py)).
|
||
|
||
- Action type: `ha_call_service`
|
||
- Rule: trigger event → service call (e.g. "on motion event in
|
||
`binary_sensor.front_door` → call `light.turn_on` on `light.porch`")
|
||
- Executor uses the existing WS connection to send `call_service`.
|
||
|
||
This phase is gated behind explicit per-target authorization in the UI — HA
|
||
service calls can do anything the access token allows, including unlocking
|
||
doors. Default state: **disabled**, with a clear consent flow when enabling.
|
||
|
||
## Rough effort estimates
|
||
|
||
These are rough — sub-task discovery during Phase 1 will refine them.
|
||
|
||
| Phase | Estimate (focused work) |
|
||
| --- | --- |
|
||
| Phase 1 (subscribe + dispatch) | 2–3 sessions |
|
||
| Phase 2 (bot commands) | 1 session |
|
||
| Phase 3 (smart actions) | 1–2 sessions |
|
||
|
||
## When to start
|
||
|
||
Phase 1 work order, once you green-light it:
|
||
|
||
1. ABC extension (`base.py`) + tests for the new `subscribe` shape on a fake
|
||
provider.
|
||
2. WS client + parser + unit tests against recorded HA fixtures (no live HA
|
||
needed for these).
|
||
3. Subscription manager in `services/watcher.py` — integration test with the
|
||
fake provider from step 1.
|
||
4. Templates (en + ru), capabilities entry, validator whitelist.
|
||
5. Server: seeds, sample context, template_configs entry.
|
||
6. Frontend: descriptor, locale keys, i18n.
|
||
7. End-to-end smoke test against a real HA instance (homelab).
|
||
|
||
Backend restart cadence per the project rule: after **every** change in
|
||
`packages/server/` or `packages/core/`.
|
||
|
||
## Decision log
|
||
|
||
- **2026-05-13** — Plan drafted. Ingest mode = WebSocket (chosen over
|
||
webhook for future-proofing toward Phases 2 + 3). Not started.
|