Files
notify-bridge/.claude/docs/feature-home-assistant.md
T
alexei.dolgolyov 22127e2a59 feat: Home Assistant provider — WebSocket subscription + bot commands
Adds Home Assistant as a service provider with two coordinated surfaces:

Notifications (subscription):
- Long-lived WebSocket client (aiohttp ws_connect) with auth handshake,
  exponential-backoff reconnect, bounded event queue, and area-registry
  enrichment cached per (re)connect
- ServiceProvider ABC gains an optional `subscribe()` method for push-style
  providers; HomeAssistantServiceProvider uses it via a per-provider
  supervisor task started in the FastAPI lifespan
- 4 event types (state_changed, automation_triggered, call_service,
  event_fired), 4 default Jinja templates (en + ru), HA-specific
  tracker filters (entity_glob, domain_allowlist, exact entity ids)
- Extracted shared dispatch pipeline (api/webhooks.py → services/
  event_dispatch.py) so subscription and webhook ingest share the same
  event_log + deferred-dispatch + quiet-hours code path

Bot commands:
- /status, /entities [glob], /state <entity_id>, /areas
- Multi-command WS session so /status and /areas cost one handshake
- Sensitive-attribute blocklist (camera access_token, entity_picture, etc.)
  and 30-attribute cap to keep /state output safe and within Telegram's
  message size
- Error-message redaction strips URL userinfo before surfacing to chat

Frontend:
- HA descriptor with toggle ConfigField type (new) and tag-input filter
  mode for free-text glob/domain lists (new TagInput component)
- 15 command slots + 4 notification slots wired into the existing
  template-config UI
2026-05-13 14:31:56 +03:00

13 KiB
Raw Blame History

Home Assistant Provider — Implementation Plan

Status: planned, not started. Sequencing: third item on the backlog (see feature-backlog.md). Last updated: 2026-05-13.

Decision: WebSocket subscription, not webhook

We considered three ingest modes (webhook automation, WebSocket subscription, hybrid). The WebSocket route is chosen as the architectural foundation because the medium-term roadmap forces it anyway:

Phase Capability Needs API access?
1 Subscribe to events, emit notifications Read (event stream)
2 Bot commands (/state, /entities, /areas) Read (REST or WS get_states)
3 Smart Actions (light.turn_on, scene activation) Write (call_service)

A webhook-only Phase 1 would still need a REST client by Phase 2 and a write path by Phase 3 — net result is two client implementations + one event pipeline refactor. WebSocket consolidates all three phases on one connection.

Tradeoff (be honest): WebSocket introduces a long-lived-connection pattern this codebase does not have yet. Reconnect logic, missed-events-on-restart gap, and a new shape on the ServiceProvider ABC are real costs. Phase 1 is not shippable in one short session — plan for a multi-session build.

Provider abstraction extension

The current ServiceProvider ABC (packages/core/src/notify_bridge_core/providers/base.py) is poll-oriented: every provider implements poll(collection_ids, state) → (events, new_state). Webhook providers (Gitea, Planka, Webhook) satisfy this by no-op'ing poll and shoving events in via api/webhooks.py instead.

Home Assistant fits neither cleanly. The plan:

  1. Add an optional async subscribe(emit) → None method on the base ABC. Default implementation raises NotImplementedError. Polling providers do not override it. The scheduler / lifecycle layer (currently services/watcher.py) gains a "subscription manager" branch that, for any provider whose class overrides subscribe, starts a long-lived task instead of registering a polling job.
  2. emit is a callback (event: ServiceEvent) → None provided by the subscription manager — it routes events to the dispatcher exactly like the webhook handler does today. Keeping the dispatch path unchanged is the point of this design.
  3. Reconnect lives inside subscribe: the method is expected to be a while not cancelled: try connect; on drop, sleep with backoff, retry loop. The manager cancels the task on shutdown via the cooperative cancel token used elsewhere.

This is a small, additive change to one ABC. No existing provider is modified.

Phase 1 — Subscribe + Dispatch (MVP)

Scope

  • Long-lived WebSocket connection to HA, authenticated with a long-lived access token.
  • Subscribe to the event bus with optional event_type filter (defaults to state_changed).
  • Translate HA events into ServiceEvent and dispatch via the existing pipeline. Notifications go out exactly as they do today for any other provider.
  • Filter UI: entity-id glob list, domain allowlist (e.g. light.*, binary_sensor.*), event-type allowlist. Hard-required to avoid the HA firehose drowning the bridge.
  • Connection test + entity listing via WS get_states (no REST client yet — WS gives us both subscribe and read).

Out of scope for Phase 1

  • Bot commands → Phase 2.
  • Service calls → Phase 3.
  • Replay of events missed during disconnect (HA does not support this; we document the gap and surface "reconnected after N seconds" in the event log).
  • Webhook-style ingestion (path-embedded token webhook receiver). If a user prefers webhooks, we add it later as a second ingestion mode on the same provider — out of scope for v1.

Event types (v1)

HA event ServiceEvent type Notification slot
state_changed ha_state_changed message_state_changed
automation_triggered ha_automation_triggered message_automation_triggered
call_service ha_service_called message_service_called
(custom event types) ha_event_fired message_event_fired

Default tracking config enables state_changed only — the others are loud and opt-in.

Context variables exposed to templates

Pulled directly from HA's state_changed payload, normalized:

  • entity_idlight.kitchen
  • friendly_nameattributes.friendly_name or fallback to entity_id
  • domain — derived from entity_id before the dot
  • old_statefrom_state.state
  • new_stateto_state.state
  • attributes — dict of new-state attributes (raw)
  • device_classattributes.device_class if present
  • areaattributes.area_id if present (best effort; only set if HA exposes it via the area registry, which costs a get_registry WS call — see "Open questions")
  • last_changed, last_updated — ISO timestamps
  • For non-state_changed events: event_type, event_data (full dict)

File touch map (Phase 1)

Core (packages/core/src/notify_bridge_core/)

Path Action Notes
providers/base.py Modify Add optional subscribe(emit) ABC method (default NotImplementedError); add HOME_ASSISTANT = "home_assistant" to ServiceProviderType
providers/capabilities.py Modify Add HOME_ASSISTANT_CAPABILITIES + register
providers/home_assistant/__init__.py Create Export + register template variables
providers/home_assistant/client.py Create WebSocket client (auth, subscribe, get_states, call_service stub)
providers/home_assistant/event_parser.py Create HA event dict → ServiceEvent
providers/home_assistant/provider.py Create Class with connect, disconnect, subscribe, list_collections (entity list), get_available_variables, get_provider_config_schema, test_connection. poll raises NotImplementedError.
templates/defaults/en/home_assistant_*.jinja2 Create 4 slot templates
templates/defaults/ru/home_assistant_*.jinja2 Create 4 slot templates
templates/defaults/loader.py Modify Add to PROVIDER_SLOT_FILE_MAP
templates/command_defaults/loader.py Modify Stub entry — empty commands list for now
templates/context.py Modify HA context builder
templates/validator.py Modify Whitelist HA variable names

Server (packages/server/src/notify_bridge_server/)

Path Action Notes
services/watcher.py (or scheduler / lifecycle module that hosts polling) Modify Add subscription-manager branch — for providers whose class overrides subscribe, start/stop long-running task instead of polling
services/scheduler.py Verify Confirm we cancel HA subscription on shutdown (graceful_shutdown_seconds path)
api/template_configs.py Modify get_template_variables() entry
api/command_template_configs.py Modify Sample ctx (minimal for Phase 1 — no commands)
services/sample_context.py Modify _SAMPLE_CONTEXT["home_assistant"]
database/seeds.py Modify Seed notification templates + default tracking config

Frontend (frontend/src/)

Path Action Notes
lib/providers/home-assistant.ts Create Descriptor per CLAUDE.md rule 11
lib/providers/index.ts Modify Register descriptor
lib/locales/en.json Modify providers.typeHomeAssistant, gridDesc.providerHomeAssistant
lib/locales/ru.json Modify Same

Tests

Path Action
packages/core/tests/providers/test_home_assistant_parser.py Create — HA payload → ServiceEvent
packages/core/tests/providers/test_home_assistant_client.py Create — WS auth, subscribe, reconnect (use a fake server)
packages/server/tests/test_home_assistant_subscription.py Create — subscription manager lifecycle, event flows through dispatcher

Frontend descriptor essentials

type: "home_assistant"
defaultName: "Home Assistant"
icon: "home" (consider Lucide icon; HA logo if a custom asset exists)
hasUrl: true            // base URL of HA (used to derive WS URL)
configFields:
  - url:                http(s)://homeassistant.local:8123
  - access_token:       long-lived access token (required)
  - allowed_event_types: comma-separated, defaults to "state_changed"
eventFields: 4 checkboxes (state_changed, automation_triggered,
                           call_service, event_fired)
extraTrackingFields:
  - entity_glob: tag input ("light.*", "binary_sensor.*_motion")
  - domain_allowlist: tag input
collectionMeta: { label: "Entities", icon: "..." }
webhookBased: false     // we are NOT webhook based

WS URL is derived: wss://{host}/api/websocket (or ws:// for plain http HA). Document this in the UI hint.

Auth model

  • Long-lived access token from HA (Profile → Long-Lived Access Tokens).
  • Stored encrypted at rest via the same path the other providers use for secrets (the bridge already has a secret-encryption helper — verify the exact module name during implementation).
  • WS auth handshake: connect → server sends auth_required → client sends {type: "auth", access_token: "..."} → server replies auth_ok or auth_invalid.

Risks / open questions (Phase 1)

  1. Reconnect strategy. Exponential backoff capped at 60s, jittered. On reconnect, log a connection_restored_after event so the UI can surface the gap. Document that HA does not support event replay.
  2. Area registry. Pulling area_id for entities requires a separate config/area_registry/list WS call. Decision needed: fetch once on connect and cache, refetch on area_registry_updated event, or skip area from the context entirely in v1. Recommendation: fetch on connect, refetch on area_registry_updated, skip if it fails (best-effort).
  3. TLS verification for self-signed HA. Homelab users often have self-signed certs. Need a verify_tls: bool config field (default true) and a clear warning when disabled. Same pattern as NOTIFY_BRIDGE_ALLOW_PRIVATE_URLS for the SSRF case.
  4. Backpressure. HA's state_changed can fire hundreds of events per minute in a busy install. The subscription manager must drop or coalesce if the dispatcher backlog grows beyond a threshold. Cheapest cut: a bounded asyncio.Queue between WS receiver and dispatch — put_nowait with overflow counter visible in the event log.
  5. Entity filter precedence. Tracking-config has collection_ids (entity_id list) and we want entity_glob + domain_allowlist. Decision: if both collection_ids and globs are set, union them (any match passes). Documented prominently in the tracker UI.
  6. Library choice. hass-client is a Python WS client maintained by the HA community; alternative is rolling our own with websockets. The latter is ~150 LOC and has no external dependency surface. Recommendation: roll our own. Re-evaluate if Phase 3 needs registry-aware service calls.

Phase 2 — Bot Commands

Adds Telegram bot commands for HA tracking configs.

  • /status — connection status, subscribed event count
  • /entities <glob> — list matching entities + current state
  • /state <entity_id> — full state + attributes for one entity
  • /areas — area registry summary
  • /help

These use the existing WS connection (no new client) via WS commands get_states, config/area_registry/list. Template slots and command template configs follow the same pattern as Gitea/Planka — see CLAUDE.md rule 7 / rule 11 for the full set of locations that must be updated.

Out-of-scope for Phase 2: any command that mutates HA state.

Phase 3 — Smart Actions (Service Calls)

A new action descriptor in the existing Smart Actions framework (packages/core/src/notify_bridge_core/providers/actions.py).

  • Action type: ha_call_service
  • Rule: trigger event → service call (e.g. "on motion event in binary_sensor.front_door → call light.turn_on on light.porch")
  • Executor uses the existing WS connection to send call_service.

This phase is gated behind explicit per-target authorization in the UI — HA service calls can do anything the access token allows, including unlocking doors. Default state: disabled, with a clear consent flow when enabling.

Rough effort estimates

These are rough — sub-task discovery during Phase 1 will refine them.

Phase Estimate (focused work)
Phase 1 (subscribe + dispatch) 23 sessions
Phase 2 (bot commands) 1 session
Phase 3 (smart actions) 12 sessions

When to start

Phase 1 work order, once you green-light it:

  1. ABC extension (base.py) + tests for the new subscribe shape on a fake provider.
  2. WS client + parser + unit tests against recorded HA fixtures (no live HA needed for these).
  3. Subscription manager in services/watcher.py — integration test with the fake provider from step 1.
  4. Templates (en + ru), capabilities entry, validator whitelist.
  5. Server: seeds, sample context, template_configs entry.
  6. Frontend: descriptor, locale keys, i18n.
  7. End-to-end smoke test against a real HA instance (homelab).

Backend restart cadence per the project rule: after every change in packages/server/ or packages/core/.

Decision log

  • 2026-05-13 — Plan drafted. Ingest mode = WebSocket (chosen over webhook for future-proofing toward Phases 2 + 3). Not started.