feat: Home Assistant provider — WebSocket subscription + bot commands

Adds Home Assistant as a service provider with two coordinated surfaces:

Notifications (subscription):
- Long-lived WebSocket client (aiohttp ws_connect) with auth handshake,
  exponential-backoff reconnect, bounded event queue, and area-registry
  enrichment cached per (re)connect
- ServiceProvider ABC gains an optional `subscribe()` method for push-style
  providers; HomeAssistantServiceProvider uses it via a per-provider
  supervisor task started in the FastAPI lifespan
- 4 event types (state_changed, automation_triggered, call_service,
  event_fired), 4 default Jinja templates (en + ru), HA-specific
  tracker filters (entity_glob, domain_allowlist, exact entity ids)
- Extracted shared dispatch pipeline (api/webhooks.py → services/
  event_dispatch.py) so subscription and webhook ingest share the same
  event_log + deferred-dispatch + quiet-hours code path

Bot commands:
- /status, /entities [glob], /state <entity_id>, /areas
- Multi-command WS session so /status and /areas cost one handshake
- Sensitive-attribute blocklist (camera access_token, entity_picture, etc.)
  and 30-attribute cap to keep /state output safe and within Telegram's
  message size
- Error-message redaction strips URL userinfo before surfacing to chat

Frontend:
- HA descriptor with toggle ConfigField type (new) and tag-input filter
  mode for free-text glob/domain lists (new TagInput component)
- 15 command slots + 4 notification slots wired into the existing
  template-config UI
This commit is contained in:
2026-05-13 14:31:56 +03:00
parent 90f958bdc6
commit 22127e2a59
79 changed files with 4042 additions and 210 deletions
+284
View File
@@ -0,0 +1,284 @@
# Home Assistant Provider — Implementation Plan
> Status: **planned, not started**. Sequencing: third item on the backlog
> (see [feature-backlog.md](feature-backlog.md)).
> Last updated: 2026-05-13.
## Decision: WebSocket subscription, not webhook
We considered three ingest modes (webhook automation, WebSocket subscription,
hybrid). The WebSocket route is chosen as the architectural foundation because
the medium-term roadmap forces it anyway:
| Phase | Capability | Needs API access? |
| --- | --- | --- |
| 1 | Subscribe to events, emit notifications | Read (event stream) |
| 2 | Bot commands (`/state`, `/entities`, `/areas`) | Read (REST or WS get_states) |
| 3 | Smart Actions (`light.turn_on`, scene activation) | Write (call_service) |
A webhook-only Phase 1 would still need a REST client by Phase 2 and a write
path by Phase 3 — net result is two client implementations + one event
pipeline refactor. WebSocket consolidates all three phases on one connection.
**Tradeoff (be honest):** WebSocket introduces a long-lived-connection pattern
this codebase does not have yet. Reconnect logic, missed-events-on-restart
gap, and a new shape on the `ServiceProvider` ABC are real costs. Phase 1 is
**not** shippable in one short session — plan for a multi-session build.
## Provider abstraction extension
The current `ServiceProvider` ABC
([packages/core/src/notify_bridge_core/providers/base.py](../../packages/core/src/notify_bridge_core/providers/base.py))
is poll-oriented: every provider implements `poll(collection_ids, state) →
(events, new_state)`. Webhook providers (Gitea, Planka, Webhook) satisfy this
by no-op'ing `poll` and shoving events in via `api/webhooks.py` instead.
Home Assistant fits neither cleanly. The plan:
1. Add an **optional** `async subscribe(emit) → None` method on the base ABC.
Default implementation raises `NotImplementedError`. Polling providers do
not override it. The scheduler / lifecycle layer (currently `services/watcher.py`)
gains a "subscription manager" branch that, for any provider whose class
overrides `subscribe`, starts a long-lived task instead of registering
a polling job.
2. `emit` is a callback `(event: ServiceEvent) → None` provided by the
subscription manager — it routes events to the dispatcher exactly like the
webhook handler does today. Keeping the dispatch path unchanged is the
point of this design.
3. Reconnect lives **inside** `subscribe`: the method is expected to be a
`while not cancelled: try connect; on drop, sleep with backoff, retry`
loop. The manager cancels the task on shutdown via the cooperative cancel
token used elsewhere.
This is a small, additive change to one ABC. No existing provider is
modified.
## Phase 1 — Subscribe + Dispatch (MVP)
### Scope
- Long-lived WebSocket connection to HA, authenticated with a long-lived
access token.
- Subscribe to the event bus with optional `event_type` filter (defaults to
`state_changed`).
- Translate HA events into `ServiceEvent` and dispatch via the existing
pipeline. Notifications go out exactly as they do today for any other
provider.
- Filter UI: entity-id glob list, domain allowlist (e.g. `light.*`,
`binary_sensor.*`), event-type allowlist. **Hard-required** to avoid the HA
firehose drowning the bridge.
- Connection test + entity listing via WS `get_states` (no REST client yet —
WS gives us both subscribe and read).
### Out of scope for Phase 1
- Bot commands → Phase 2.
- Service calls → Phase 3.
- Replay of events missed during disconnect (HA does not support this; we
document the gap and surface "reconnected after N seconds" in the event
log).
- Webhook-style ingestion (path-embedded token webhook receiver). If a user
prefers webhooks, we add it later as a second ingestion mode on the same
provider — out of scope for v1.
### Event types (v1)
| HA event | ServiceEvent type | Notification slot |
| --- | --- | --- |
| `state_changed` | `ha_state_changed` | `message_state_changed` |
| `automation_triggered` | `ha_automation_triggered` | `message_automation_triggered` |
| `call_service` | `ha_service_called` | `message_service_called` |
| (custom event types) | `ha_event_fired` | `message_event_fired` |
Default tracking config enables `state_changed` only — the others are loud
and opt-in.
### Context variables exposed to templates
Pulled directly from HA's `state_changed` payload, normalized:
- `entity_id``light.kitchen`
- `friendly_name``attributes.friendly_name` or fallback to `entity_id`
- `domain` — derived from `entity_id` before the dot
- `old_state``from_state.state`
- `new_state``to_state.state`
- `attributes` — dict of new-state attributes (raw)
- `device_class``attributes.device_class` if present
- `area``attributes.area_id` if present (best effort; only set if HA
exposes it via the area registry, which costs a `get_registry` WS call —
see "Open questions")
- `last_changed`, `last_updated` — ISO timestamps
- For non-`state_changed` events: `event_type`, `event_data` (full dict)
### File touch map (Phase 1)
**Core** (`packages/core/src/notify_bridge_core/`)
| Path | Action | Notes |
| --- | --- | --- |
| `providers/base.py` | Modify | Add optional `subscribe(emit)` ABC method (default `NotImplementedError`); add `HOME_ASSISTANT = "home_assistant"` to `ServiceProviderType` |
| `providers/capabilities.py` | Modify | Add `HOME_ASSISTANT_CAPABILITIES` + register |
| `providers/home_assistant/__init__.py` | Create | Export + register template variables |
| `providers/home_assistant/client.py` | Create | WebSocket client (auth, subscribe, get_states, call_service stub) |
| `providers/home_assistant/event_parser.py` | Create | HA event dict → `ServiceEvent` |
| `providers/home_assistant/provider.py` | Create | Class with `connect`, `disconnect`, `subscribe`, `list_collections` (entity list), `get_available_variables`, `get_provider_config_schema`, `test_connection`. `poll` raises NotImplementedError. |
| `templates/defaults/en/home_assistant_*.jinja2` | Create | 4 slot templates |
| `templates/defaults/ru/home_assistant_*.jinja2` | Create | 4 slot templates |
| `templates/defaults/loader.py` | Modify | Add to `PROVIDER_SLOT_FILE_MAP` |
| `templates/command_defaults/loader.py` | Modify | Stub entry — empty commands list for now |
| `templates/context.py` | Modify | HA context builder |
| `templates/validator.py` | Modify | Whitelist HA variable names |
**Server** (`packages/server/src/notify_bridge_server/`)
| Path | Action | Notes |
| --- | --- | --- |
| `services/watcher.py` *(or scheduler / lifecycle module that hosts polling)* | Modify | Add subscription-manager branch — for providers whose class overrides `subscribe`, start/stop long-running task instead of polling |
| `services/scheduler.py` | Verify | Confirm we cancel HA subscription on shutdown (graceful_shutdown_seconds path) |
| `api/template_configs.py` | Modify | `get_template_variables()` entry |
| `api/command_template_configs.py` | Modify | Sample ctx (minimal for Phase 1 — no commands) |
| `services/sample_context.py` | Modify | `_SAMPLE_CONTEXT["home_assistant"]` |
| `database/seeds.py` | Modify | Seed notification templates + default tracking config |
**Frontend** (`frontend/src/`)
| Path | Action | Notes |
| --- | --- | --- |
| `lib/providers/home-assistant.ts` | Create | Descriptor per CLAUDE.md rule 11 |
| `lib/providers/index.ts` | Modify | Register descriptor |
| `lib/locales/en.json` | Modify | `providers.typeHomeAssistant`, `gridDesc.providerHomeAssistant` |
| `lib/locales/ru.json` | Modify | Same |
**Tests**
| Path | Action |
| --- | --- |
| `packages/core/tests/providers/test_home_assistant_parser.py` | Create — HA payload → `ServiceEvent` |
| `packages/core/tests/providers/test_home_assistant_client.py` | Create — WS auth, subscribe, reconnect (use a fake server) |
| `packages/server/tests/test_home_assistant_subscription.py` | Create — subscription manager lifecycle, event flows through dispatcher |
### Frontend descriptor essentials
```text
type: "home_assistant"
defaultName: "Home Assistant"
icon: "home" (consider Lucide icon; HA logo if a custom asset exists)
hasUrl: true // base URL of HA (used to derive WS URL)
configFields:
- url: http(s)://homeassistant.local:8123
- access_token: long-lived access token (required)
- allowed_event_types: comma-separated, defaults to "state_changed"
eventFields: 4 checkboxes (state_changed, automation_triggered,
call_service, event_fired)
extraTrackingFields:
- entity_glob: tag input ("light.*", "binary_sensor.*_motion")
- domain_allowlist: tag input
collectionMeta: { label: "Entities", icon: "..." }
webhookBased: false // we are NOT webhook based
```
WS URL is derived: `wss://{host}/api/websocket` (or `ws://` for plain http
HA). Document this in the UI hint.
### Auth model
- **Long-lived access token** from HA (Profile → Long-Lived Access Tokens).
- Stored encrypted at rest via the same path the other providers use for
secrets (the bridge already has a secret-encryption helper — verify the
exact module name during implementation).
- WS auth handshake: connect → server sends `auth_required` → client sends
`{type: "auth", access_token: "..."}` → server replies `auth_ok` or
`auth_invalid`.
### Risks / open questions (Phase 1)
1. **Reconnect strategy.** Exponential backoff capped at 60s, jittered.
On reconnect, log a `connection_restored_after` event so the UI can
surface the gap. Document that HA does not support event replay.
2. **Area registry.** Pulling `area_id` for entities requires a separate
`config/area_registry/list` WS call. Decision needed: fetch once on
connect and cache, refetch on `area_registry_updated` event, or skip
`area` from the context entirely in v1. Recommendation: fetch on
connect, refetch on `area_registry_updated`, skip if it fails (best-effort).
3. **TLS verification for self-signed HA.** Homelab users often have
self-signed certs. Need a `verify_tls: bool` config field (default true)
and a clear warning when disabled. Same pattern as
`NOTIFY_BRIDGE_ALLOW_PRIVATE_URLS` for the SSRF case.
4. **Backpressure.** HA's `state_changed` can fire hundreds of events per
minute in a busy install. The subscription manager must drop or coalesce
if the dispatcher backlog grows beyond a threshold. Cheapest cut: a
bounded `asyncio.Queue` between WS receiver and dispatch — `put_nowait`
with overflow counter visible in the event log.
5. **Entity filter precedence.** Tracking-config has `collection_ids`
(entity_id list) and we want `entity_glob` + `domain_allowlist`. Decision:
if both `collection_ids` and globs are set, union them (any match passes).
Documented prominently in the tracker UI.
6. **Library choice.** `hass-client` is a Python WS client maintained by the
HA community; alternative is rolling our own with `websockets`. The
latter is ~150 LOC and has no external dependency surface. Recommendation:
roll our own. Re-evaluate if Phase 3 needs registry-aware service calls.
## Phase 2 — Bot Commands
Adds Telegram bot commands for HA tracking configs.
- `/status` — connection status, subscribed event count
- `/entities <glob>` — list matching entities + current state
- `/state <entity_id>` — full state + attributes for one entity
- `/areas` — area registry summary
- `/help`
These use the existing WS connection (no new client) via WS commands
`get_states`, `config/area_registry/list`. Template slots and command
template configs follow the same pattern as Gitea/Planka — see
[CLAUDE.md](../../CLAUDE.md) rule 7 / rule 11 for the full set of locations
that must be updated.
Out-of-scope for Phase 2: any command that mutates HA state.
## Phase 3 — Smart Actions (Service Calls)
A new action descriptor in the existing Smart Actions framework
([packages/core/src/notify_bridge_core/providers/actions.py](../../packages/core/src/notify_bridge_core/providers/actions.py)).
- Action type: `ha_call_service`
- Rule: trigger event → service call (e.g. "on motion event in
`binary_sensor.front_door` → call `light.turn_on` on `light.porch`")
- Executor uses the existing WS connection to send `call_service`.
This phase is gated behind explicit per-target authorization in the UI — HA
service calls can do anything the access token allows, including unlocking
doors. Default state: **disabled**, with a clear consent flow when enabling.
## Rough effort estimates
These are rough — sub-task discovery during Phase 1 will refine them.
| Phase | Estimate (focused work) |
| --- | --- |
| Phase 1 (subscribe + dispatch) | 23 sessions |
| Phase 2 (bot commands) | 1 session |
| Phase 3 (smart actions) | 12 sessions |
## When to start
Phase 1 work order, once you green-light it:
1. ABC extension (`base.py`) + tests for the new `subscribe` shape on a fake
provider.
2. WS client + parser + unit tests against recorded HA fixtures (no live HA
needed for these).
3. Subscription manager in `services/watcher.py` — integration test with the
fake provider from step 1.
4. Templates (en + ru), capabilities entry, validator whitelist.
5. Server: seeds, sample context, template_configs entry.
6. Frontend: descriptor, locale keys, i18n.
7. End-to-end smoke test against a real HA instance (homelab).
Backend restart cadence per the project rule: after **every** change in
`packages/server/` or `packages/core/`.
## Decision log
- **2026-05-13** — Plan drafted. Ingest mode = WebSocket (chosen over
webhook for future-proofing toward Phases 2 + 3). Not started.