Files
notify-bridge/.claude/docs/feature-home-assistant.md
T
alexei.dolgolyov 22127e2a59 feat: Home Assistant provider — WebSocket subscription + bot commands
Adds Home Assistant as a service provider with two coordinated surfaces:

Notifications (subscription):
- Long-lived WebSocket client (aiohttp ws_connect) with auth handshake,
  exponential-backoff reconnect, bounded event queue, and area-registry
  enrichment cached per (re)connect
- ServiceProvider ABC gains an optional `subscribe()` method for push-style
  providers; HomeAssistantServiceProvider uses it via a per-provider
  supervisor task started in the FastAPI lifespan
- 4 event types (state_changed, automation_triggered, call_service,
  event_fired), 4 default Jinja templates (en + ru), HA-specific
  tracker filters (entity_glob, domain_allowlist, exact entity ids)
- Extracted shared dispatch pipeline (api/webhooks.py → services/
  event_dispatch.py) so subscription and webhook ingest share the same
  event_log + deferred-dispatch + quiet-hours code path

Bot commands:
- /status, /entities [glob], /state <entity_id>, /areas
- Multi-command WS session so /status and /areas cost one handshake
- Sensitive-attribute blocklist (camera access_token, entity_picture, etc.)
  and 30-attribute cap to keep /state output safe and within Telegram's
  message size
- Error-message redaction strips URL userinfo before surfacing to chat

Frontend:
- HA descriptor with toggle ConfigField type (new) and tag-input filter
  mode for free-text glob/domain lists (new TagInput component)
- 15 command slots + 4 notification slots wired into the existing
  template-config UI
2026-05-13 14:31:56 +03:00

285 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Home Assistant Provider — Implementation Plan
> Status: **planned, not started**. Sequencing: third item on the backlog
> (see [feature-backlog.md](feature-backlog.md)).
> Last updated: 2026-05-13.
## Decision: WebSocket subscription, not webhook
We considered three ingest modes (webhook automation, WebSocket subscription,
hybrid). The WebSocket route is chosen as the architectural foundation because
the medium-term roadmap forces it anyway:
| Phase | Capability | Needs API access? |
| --- | --- | --- |
| 1 | Subscribe to events, emit notifications | Read (event stream) |
| 2 | Bot commands (`/state`, `/entities`, `/areas`) | Read (REST or WS get_states) |
| 3 | Smart Actions (`light.turn_on`, scene activation) | Write (call_service) |
A webhook-only Phase 1 would still need a REST client by Phase 2 and a write
path by Phase 3 — net result is two client implementations + one event
pipeline refactor. WebSocket consolidates all three phases on one connection.
**Tradeoff (be honest):** WebSocket introduces a long-lived-connection pattern
this codebase does not have yet. Reconnect logic, missed-events-on-restart
gap, and a new shape on the `ServiceProvider` ABC are real costs. Phase 1 is
**not** shippable in one short session — plan for a multi-session build.
## Provider abstraction extension
The current `ServiceProvider` ABC
([packages/core/src/notify_bridge_core/providers/base.py](../../packages/core/src/notify_bridge_core/providers/base.py))
is poll-oriented: every provider implements `poll(collection_ids, state) →
(events, new_state)`. Webhook providers (Gitea, Planka, Webhook) satisfy this
by no-op'ing `poll` and shoving events in via `api/webhooks.py` instead.
Home Assistant fits neither cleanly. The plan:
1. Add an **optional** `async subscribe(emit) → None` method on the base ABC.
Default implementation raises `NotImplementedError`. Polling providers do
not override it. The scheduler / lifecycle layer (currently `services/watcher.py`)
gains a "subscription manager" branch that, for any provider whose class
overrides `subscribe`, starts a long-lived task instead of registering
a polling job.
2. `emit` is a callback `(event: ServiceEvent) → None` provided by the
subscription manager — it routes events to the dispatcher exactly like the
webhook handler does today. Keeping the dispatch path unchanged is the
point of this design.
3. Reconnect lives **inside** `subscribe`: the method is expected to be a
`while not cancelled: try connect; on drop, sleep with backoff, retry`
loop. The manager cancels the task on shutdown via the cooperative cancel
token used elsewhere.
This is a small, additive change to one ABC. No existing provider is
modified.
## Phase 1 — Subscribe + Dispatch (MVP)
### Scope
- Long-lived WebSocket connection to HA, authenticated with a long-lived
access token.
- Subscribe to the event bus with optional `event_type` filter (defaults to
`state_changed`).
- Translate HA events into `ServiceEvent` and dispatch via the existing
pipeline. Notifications go out exactly as they do today for any other
provider.
- Filter UI: entity-id glob list, domain allowlist (e.g. `light.*`,
`binary_sensor.*`), event-type allowlist. **Hard-required** to avoid the HA
firehose drowning the bridge.
- Connection test + entity listing via WS `get_states` (no REST client yet —
WS gives us both subscribe and read).
### Out of scope for Phase 1
- Bot commands → Phase 2.
- Service calls → Phase 3.
- Replay of events missed during disconnect (HA does not support this; we
document the gap and surface "reconnected after N seconds" in the event
log).
- Webhook-style ingestion (path-embedded token webhook receiver). If a user
prefers webhooks, we add it later as a second ingestion mode on the same
provider — out of scope for v1.
### Event types (v1)
| HA event | ServiceEvent type | Notification slot |
| --- | --- | --- |
| `state_changed` | `ha_state_changed` | `message_state_changed` |
| `automation_triggered` | `ha_automation_triggered` | `message_automation_triggered` |
| `call_service` | `ha_service_called` | `message_service_called` |
| (custom event types) | `ha_event_fired` | `message_event_fired` |
Default tracking config enables `state_changed` only — the others are loud
and opt-in.
### Context variables exposed to templates
Pulled directly from HA's `state_changed` payload, normalized:
- `entity_id``light.kitchen`
- `friendly_name``attributes.friendly_name` or fallback to `entity_id`
- `domain` — derived from `entity_id` before the dot
- `old_state``from_state.state`
- `new_state``to_state.state`
- `attributes` — dict of new-state attributes (raw)
- `device_class``attributes.device_class` if present
- `area``attributes.area_id` if present (best effort; only set if HA
exposes it via the area registry, which costs a `get_registry` WS call —
see "Open questions")
- `last_changed`, `last_updated` — ISO timestamps
- For non-`state_changed` events: `event_type`, `event_data` (full dict)
### File touch map (Phase 1)
**Core** (`packages/core/src/notify_bridge_core/`)
| Path | Action | Notes |
| --- | --- | --- |
| `providers/base.py` | Modify | Add optional `subscribe(emit)` ABC method (default `NotImplementedError`); add `HOME_ASSISTANT = "home_assistant"` to `ServiceProviderType` |
| `providers/capabilities.py` | Modify | Add `HOME_ASSISTANT_CAPABILITIES` + register |
| `providers/home_assistant/__init__.py` | Create | Export + register template variables |
| `providers/home_assistant/client.py` | Create | WebSocket client (auth, subscribe, get_states, call_service stub) |
| `providers/home_assistant/event_parser.py` | Create | HA event dict → `ServiceEvent` |
| `providers/home_assistant/provider.py` | Create | Class with `connect`, `disconnect`, `subscribe`, `list_collections` (entity list), `get_available_variables`, `get_provider_config_schema`, `test_connection`. `poll` raises NotImplementedError. |
| `templates/defaults/en/home_assistant_*.jinja2` | Create | 4 slot templates |
| `templates/defaults/ru/home_assistant_*.jinja2` | Create | 4 slot templates |
| `templates/defaults/loader.py` | Modify | Add to `PROVIDER_SLOT_FILE_MAP` |
| `templates/command_defaults/loader.py` | Modify | Stub entry — empty commands list for now |
| `templates/context.py` | Modify | HA context builder |
| `templates/validator.py` | Modify | Whitelist HA variable names |
**Server** (`packages/server/src/notify_bridge_server/`)
| Path | Action | Notes |
| --- | --- | --- |
| `services/watcher.py` *(or scheduler / lifecycle module that hosts polling)* | Modify | Add subscription-manager branch — for providers whose class overrides `subscribe`, start/stop long-running task instead of polling |
| `services/scheduler.py` | Verify | Confirm we cancel HA subscription on shutdown (graceful_shutdown_seconds path) |
| `api/template_configs.py` | Modify | `get_template_variables()` entry |
| `api/command_template_configs.py` | Modify | Sample ctx (minimal for Phase 1 — no commands) |
| `services/sample_context.py` | Modify | `_SAMPLE_CONTEXT["home_assistant"]` |
| `database/seeds.py` | Modify | Seed notification templates + default tracking config |
**Frontend** (`frontend/src/`)
| Path | Action | Notes |
| --- | --- | --- |
| `lib/providers/home-assistant.ts` | Create | Descriptor per CLAUDE.md rule 11 |
| `lib/providers/index.ts` | Modify | Register descriptor |
| `lib/locales/en.json` | Modify | `providers.typeHomeAssistant`, `gridDesc.providerHomeAssistant` |
| `lib/locales/ru.json` | Modify | Same |
**Tests**
| Path | Action |
| --- | --- |
| `packages/core/tests/providers/test_home_assistant_parser.py` | Create — HA payload → `ServiceEvent` |
| `packages/core/tests/providers/test_home_assistant_client.py` | Create — WS auth, subscribe, reconnect (use a fake server) |
| `packages/server/tests/test_home_assistant_subscription.py` | Create — subscription manager lifecycle, event flows through dispatcher |
### Frontend descriptor essentials
```text
type: "home_assistant"
defaultName: "Home Assistant"
icon: "home" (consider Lucide icon; HA logo if a custom asset exists)
hasUrl: true // base URL of HA (used to derive WS URL)
configFields:
- url: http(s)://homeassistant.local:8123
- access_token: long-lived access token (required)
- allowed_event_types: comma-separated, defaults to "state_changed"
eventFields: 4 checkboxes (state_changed, automation_triggered,
call_service, event_fired)
extraTrackingFields:
- entity_glob: tag input ("light.*", "binary_sensor.*_motion")
- domain_allowlist: tag input
collectionMeta: { label: "Entities", icon: "..." }
webhookBased: false // we are NOT webhook based
```
WS URL is derived: `wss://{host}/api/websocket` (or `ws://` for plain http
HA). Document this in the UI hint.
### Auth model
- **Long-lived access token** from HA (Profile → Long-Lived Access Tokens).
- Stored encrypted at rest via the same path the other providers use for
secrets (the bridge already has a secret-encryption helper — verify the
exact module name during implementation).
- WS auth handshake: connect → server sends `auth_required` → client sends
`{type: "auth", access_token: "..."}` → server replies `auth_ok` or
`auth_invalid`.
### Risks / open questions (Phase 1)
1. **Reconnect strategy.** Exponential backoff capped at 60s, jittered.
On reconnect, log a `connection_restored_after` event so the UI can
surface the gap. Document that HA does not support event replay.
2. **Area registry.** Pulling `area_id` for entities requires a separate
`config/area_registry/list` WS call. Decision needed: fetch once on
connect and cache, refetch on `area_registry_updated` event, or skip
`area` from the context entirely in v1. Recommendation: fetch on
connect, refetch on `area_registry_updated`, skip if it fails (best-effort).
3. **TLS verification for self-signed HA.** Homelab users often have
self-signed certs. Need a `verify_tls: bool` config field (default true)
and a clear warning when disabled. Same pattern as
`NOTIFY_BRIDGE_ALLOW_PRIVATE_URLS` for the SSRF case.
4. **Backpressure.** HA's `state_changed` can fire hundreds of events per
minute in a busy install. The subscription manager must drop or coalesce
if the dispatcher backlog grows beyond a threshold. Cheapest cut: a
bounded `asyncio.Queue` between WS receiver and dispatch — `put_nowait`
with overflow counter visible in the event log.
5. **Entity filter precedence.** Tracking-config has `collection_ids`
(entity_id list) and we want `entity_glob` + `domain_allowlist`. Decision:
if both `collection_ids` and globs are set, union them (any match passes).
Documented prominently in the tracker UI.
6. **Library choice.** `hass-client` is a Python WS client maintained by the
HA community; alternative is rolling our own with `websockets`. The
latter is ~150 LOC and has no external dependency surface. Recommendation:
roll our own. Re-evaluate if Phase 3 needs registry-aware service calls.
## Phase 2 — Bot Commands
Adds Telegram bot commands for HA tracking configs.
- `/status` — connection status, subscribed event count
- `/entities <glob>` — list matching entities + current state
- `/state <entity_id>` — full state + attributes for one entity
- `/areas` — area registry summary
- `/help`
These use the existing WS connection (no new client) via WS commands
`get_states`, `config/area_registry/list`. Template slots and command
template configs follow the same pattern as Gitea/Planka — see
[CLAUDE.md](../../CLAUDE.md) rule 7 / rule 11 for the full set of locations
that must be updated.
Out-of-scope for Phase 2: any command that mutates HA state.
## Phase 3 — Smart Actions (Service Calls)
A new action descriptor in the existing Smart Actions framework
([packages/core/src/notify_bridge_core/providers/actions.py](../../packages/core/src/notify_bridge_core/providers/actions.py)).
- Action type: `ha_call_service`
- Rule: trigger event → service call (e.g. "on motion event in
`binary_sensor.front_door` → call `light.turn_on` on `light.porch`")
- Executor uses the existing WS connection to send `call_service`.
This phase is gated behind explicit per-target authorization in the UI — HA
service calls can do anything the access token allows, including unlocking
doors. Default state: **disabled**, with a clear consent flow when enabling.
## Rough effort estimates
These are rough — sub-task discovery during Phase 1 will refine them.
| Phase | Estimate (focused work) |
| --- | --- |
| Phase 1 (subscribe + dispatch) | 23 sessions |
| Phase 2 (bot commands) | 1 session |
| Phase 3 (smart actions) | 12 sessions |
## When to start
Phase 1 work order, once you green-light it:
1. ABC extension (`base.py`) + tests for the new `subscribe` shape on a fake
provider.
2. WS client + parser + unit tests against recorded HA fixtures (no live HA
needed for these).
3. Subscription manager in `services/watcher.py` — integration test with the
fake provider from step 1.
4. Templates (en + ru), capabilities entry, validator whitelist.
5. Server: seeds, sample context, template_configs entry.
6. Frontend: descriptor, locale keys, i18n.
7. End-to-end smoke test against a real HA instance (homelab).
Backend restart cadence per the project rule: after **every** change in
`packages/server/` or `packages/core/`.
## Decision log
- **2026-05-13** — Plan drafted. Ingest mode = WebSocket (chosen over
webhook for future-proofing toward Phases 2 + 3). Not started.