feat: Home Assistant provider — WebSocket subscription + bot commands

Adds Home Assistant as a service provider with two coordinated surfaces:

Notifications (subscription):
- Long-lived WebSocket client (aiohttp ws_connect) with auth handshake,
  exponential-backoff reconnect, bounded event queue, and area-registry
  enrichment cached per (re)connect
- ServiceProvider ABC gains an optional `subscribe()` method for push-style
  providers; HomeAssistantServiceProvider uses it via a per-provider
  supervisor task started in the FastAPI lifespan
- 4 event types (state_changed, automation_triggered, call_service,
  event_fired), 4 default Jinja templates (en + ru), HA-specific
  tracker filters (entity_glob, domain_allowlist, exact entity ids)
- Extracted shared dispatch pipeline (api/webhooks.py → services/
  event_dispatch.py) so subscription and webhook ingest share the same
  event_log + deferred-dispatch + quiet-hours code path

Bot commands:
- /status, /entities [glob], /state <entity_id>, /areas
- Multi-command WS session so /status and /areas cost one handshake
- Sensitive-attribute blocklist (camera access_token, entity_picture, etc.)
  and 30-attribute cap to keep /state output safe and within Telegram's
  message size
- Error-message redaction strips URL userinfo before surfacing to chat

Frontend:
- HA descriptor with toggle ConfigField type (new) and tag-input filter
  mode for free-text glob/domain lists (new TagInput component)
- 15 command slots + 4 notification slots wired into the existing
  template-config UI
This commit is contained in:
2026-05-13 14:31:56 +03:00
parent 90f958bdc6
commit 22127e2a59
79 changed files with 4042 additions and 210 deletions
+177
View File
@@ -0,0 +1,177 @@
# Feature Backlog
Curated feature ideas, narrowed from a brainstorming pass on 2026-05-13.
Order is **rough sequencing preference**, not strict priority — adjust as we go.
---
## 1. Quiet Hours — close the gaps in the existing system
**Reality check (verified 2026-05-13).** Quiet hours are already shipped under
the "deferred dispatch" name in v0.8.0. The pipeline lives at
`packages/server/src/notify_bridge_server/services/deferred_dispatch.py` with
helpers in `dispatch_helpers.py` and tests in
`tests/test_deferred_dispatch.py`. What exists:
- Per-tracking-config window: `tracking_config.quiet_hours_enabled`,
`quiet_hours_start`, `quiet_hours_end`.
- Per-link override: `notification_tracker_target.quiet_hours_start`,
`quiet_hours_end`.
- Smart coalescing (asset add + asset remove during a window cancels each
other out, set-union merge for repeated adds).
- Post-window drain via APScheduler one-shot date jobs.
- Wall-clock event types (`scheduled_message`) drop instead of deferring.
- Frontend status surface: `deferred`, `deferred_then_dropped`,
`deferred_then_failed`, with `deferred_until` and `deferred_for_seconds`
fields exposed in the event log.
**What's NOT there (the actual gaps):**
| Gap | Sketch |
| --- | --- |
| **Target-level windows** | Today, hours bind to the *watcher* (tracking config / link). Users naturally think of DND at the *destination* ("don't ping my phone at night, regardless of source"). New column on `notification_target` + dispatcher gate. |
| **Multiple windows per row** | Today is a single HH:MM range. Real schedules want weekday-evening + weekend-all-day. JSON list of windows. |
| **Days-of-week** | Same window every day. Need `days: ["mon", "tue", ...]` filter per window. |
| **Per-window timezone** | Uses the global app TZ. Multi-traveller / multi-target setups want per-window TZ. |
| **Silent mode** | Modes today are defer-or-drop. Telegram `disable_notification=true` ("send but don't ring") is a third useful mode. |
| **Per-receiver windows** | One bot → multiple chats, each potentially with its own DND. Today it's all-or-nothing per target. |
**Recommended cut for v1 of "extend quiet hours":**
- Add target-level quiet hours (new column `notification_target.quiet_hours_json`
= list of `{days, start, end, mode, tz}`).
- Modes: `drop`, `defer`, `silent`. `defer` reuses the existing
deferred-dispatch pipeline (just changes who decides). `silent` maps to
`disable_notification=true` for Telegram; other targets fall through to
normal send (or we treat `silent` as `defer` for non-Telegram targets — TBD).
- Dispatcher precedence: target window wins over link/tracking-config window
when both are configured. Document this explicitly.
- Frontend: new "Quiet hours" fieldset in the target editor (Aurora cassette
style). Reuses Timezone picker; new day-picker chip row.
- Skip days-of-week + multi-window in v1 if scope grows — ship the target-level
cut first, then iterate.
**Open questions.**
- How exactly do target / link / tracking-config windows combine? Proposal:
any window covering "now" wins (drop > defer > silent precedence).
- Should `silent` for non-Telegram targets degrade to normal send or to
defer? Defer is the safer default.
- Does the event log need a new status (`silenced` / `dropped_by_target_qh`)
to make precedence visible?
---
## 2. Immich Smart Actions (expand beyond Auto-Organize)
**What.** Extend the existing Smart Actions pattern (currently:
**Immich Auto-Organize**) with more rule-driven actions against the Immich API.
**Why.** Auto-Organize already proves the descriptor → rule editor → executor
pipeline. Adding actions is mostly authoring new executors + small UI rule
shapes, not new infra.
**Candidates (pick in this order).**
1. **Auto-favorite by person** — when an asset is detected containing person
X (or any of a set), mark it favorite.
2. **Auto-archive by age / album** — assets older than N days in a given
album get archived. Pair with a "dry-run shows count" UX like
Auto-Organize already has.
3. **Duplicate cluster nudge** — periodically run Immich's duplicate API and
send a digest notification with inline buttons ("review", "ignore for 30d").
Depends on inline-button work (see backlog item 4 dependencies).
4. **Share-link rotation** — for an album, regenerate the share link every N
days; notify with the new URL.
5. **Pending-delete review** — push a weekly digest of trash contents before
Immich's auto-purge fires.
**Shape.**
- Reuse the existing **action descriptor** layer
(`packages/core/src/notify_bridge_core/providers/actions.py`,
`action_executor.py`) and the frontend rule editor used by Auto-Organize.
- Each new action = (a) executor in core, (b) rule schema in the descriptor,
(c) frontend descriptor extension for the rule editor fields.
- Persist as `provider_actions` rows (already exists for Auto-Organize) with
a discriminator + JSON config.
**Open questions.**
- Does "auto-favorite by person" need a confirmation queue or run silently?
Default to silent + event_log entry.
- How do we surface "this action moved/changed X assets" in the dashboard?
Probably a per-action stat tile on the provider detail page.
---
## 3. Home Assistant Provider
**Full plan:** [feature-home-assistant.md](feature-home-assistant.md).
**One-line summary.** New WebSocket-based service provider with a 3-phase
ship: subscribe + dispatch (Phase 1), bot commands (Phase 2), HA service
calls as Smart Actions (Phase 3). Chosen over webhook ingest because
Phases 2 + 3 force a long-lived API connection anyway; consolidating on WS
avoids a refactor.
**Status:** planned, not started.
---
## 4. Block-Based Template Builder
**What.** A visual, drag-and-drop builder for notification and command
templates that compiles down to Jinja2. Lives alongside (not instead of) the
current `JinjaEditor`. Author can flip between views.
**Why.** The current Jinja editor is powerful but unforgiving. A block UI
lowers the floor for new users and provides a discovery surface for the
variables documented in `template_configs.py`.
**Shape.**
- Frontend-only feature for v1 — compiles to the same Jinja strings the
backend already accepts.
- Blocks: `Text`, `Variable`, `If`, `For`, `Link`, `Image`, `Icon`, `Caption`,
`Group` (HTML span/group). Each block knows its serialized Jinja
representation.
- Round-trip: variables, simple `{% if %}` / `{% for %}` blocks, and string
literals parse back to blocks; arbitrary Jinja stays in a "Raw" block that
the user can edit as text.
- Variable picker reads `get_template_variables(provider_type, slot)`. This is
the same data already shown in the template-help panel.
- Preview pane unchanged — reuses `services/sample_context.py` server
rendering.
- Toggle in the template editor: **Visual / Code**.
**Open questions.**
- Round-tripping arbitrary Jinja is hard. v1: parseable subset → blocks,
anything else → single Raw block. Show a banner explaining.
- Locale handling: same compiled Jinja, just authored per locale tab.
- Do we want a marketplace of pre-built block compositions? Out of scope for
v1 — bundle import/export is a separate backlog item.
---
## Recommended Sequencing
1. **Quiet Hours per Target** — small, isolated, immediate user value.
2. **Immich Smart Actions** — incremental on existing pattern; ship one
action at a time (start with auto-favorite by person).
3. **Home Assistant Provider** — multi-file, follows new-provider checklist;
biggest user-base expansion.
4. **Block-Based Template Builder** — largest frontend lift; benefits from
the variable-doc work that the other features will exercise.
Dependencies are loose — 1 and 2 are independent of 3 and 4. The block
builder pairs nicely with Home Assistant because HA's rich context surfaces
the value of an easier authoring UX.
---
## Decision log
- **2026-05-13** — Backlog seeded with these four items selected from a
broader brainstorm. Not started.
+284
View File
@@ -0,0 +1,284 @@
# Home Assistant Provider — Implementation Plan
> Status: **planned, not started**. Sequencing: third item on the backlog
> (see [feature-backlog.md](feature-backlog.md)).
> Last updated: 2026-05-13.
## Decision: WebSocket subscription, not webhook
We considered three ingest modes (webhook automation, WebSocket subscription,
hybrid). The WebSocket route is chosen as the architectural foundation because
the medium-term roadmap forces it anyway:
| Phase | Capability | Needs API access? |
| --- | --- | --- |
| 1 | Subscribe to events, emit notifications | Read (event stream) |
| 2 | Bot commands (`/state`, `/entities`, `/areas`) | Read (REST or WS get_states) |
| 3 | Smart Actions (`light.turn_on`, scene activation) | Write (call_service) |
A webhook-only Phase 1 would still need a REST client by Phase 2 and a write
path by Phase 3 — net result is two client implementations + one event
pipeline refactor. WebSocket consolidates all three phases on one connection.
**Tradeoff (be honest):** WebSocket introduces a long-lived-connection pattern
this codebase does not have yet. Reconnect logic, missed-events-on-restart
gap, and a new shape on the `ServiceProvider` ABC are real costs. Phase 1 is
**not** shippable in one short session — plan for a multi-session build.
## Provider abstraction extension
The current `ServiceProvider` ABC
([packages/core/src/notify_bridge_core/providers/base.py](../../packages/core/src/notify_bridge_core/providers/base.py))
is poll-oriented: every provider implements `poll(collection_ids, state) →
(events, new_state)`. Webhook providers (Gitea, Planka, Webhook) satisfy this
by no-op'ing `poll` and shoving events in via `api/webhooks.py` instead.
Home Assistant fits neither cleanly. The plan:
1. Add an **optional** `async subscribe(emit) → None` method on the base ABC.
Default implementation raises `NotImplementedError`. Polling providers do
not override it. The scheduler / lifecycle layer (currently `services/watcher.py`)
gains a "subscription manager" branch that, for any provider whose class
overrides `subscribe`, starts a long-lived task instead of registering
a polling job.
2. `emit` is a callback `(event: ServiceEvent) → None` provided by the
subscription manager — it routes events to the dispatcher exactly like the
webhook handler does today. Keeping the dispatch path unchanged is the
point of this design.
3. Reconnect lives **inside** `subscribe`: the method is expected to be a
`while not cancelled: try connect; on drop, sleep with backoff, retry`
loop. The manager cancels the task on shutdown via the cooperative cancel
token used elsewhere.
This is a small, additive change to one ABC. No existing provider is
modified.
## Phase 1 — Subscribe + Dispatch (MVP)
### Scope
- Long-lived WebSocket connection to HA, authenticated with a long-lived
access token.
- Subscribe to the event bus with optional `event_type` filter (defaults to
`state_changed`).
- Translate HA events into `ServiceEvent` and dispatch via the existing
pipeline. Notifications go out exactly as they do today for any other
provider.
- Filter UI: entity-id glob list, domain allowlist (e.g. `light.*`,
`binary_sensor.*`), event-type allowlist. **Hard-required** to avoid the HA
firehose drowning the bridge.
- Connection test + entity listing via WS `get_states` (no REST client yet —
WS gives us both subscribe and read).
### Out of scope for Phase 1
- Bot commands → Phase 2.
- Service calls → Phase 3.
- Replay of events missed during disconnect (HA does not support this; we
document the gap and surface "reconnected after N seconds" in the event
log).
- Webhook-style ingestion (path-embedded token webhook receiver). If a user
prefers webhooks, we add it later as a second ingestion mode on the same
provider — out of scope for v1.
### Event types (v1)
| HA event | ServiceEvent type | Notification slot |
| --- | --- | --- |
| `state_changed` | `ha_state_changed` | `message_state_changed` |
| `automation_triggered` | `ha_automation_triggered` | `message_automation_triggered` |
| `call_service` | `ha_service_called` | `message_service_called` |
| (custom event types) | `ha_event_fired` | `message_event_fired` |
Default tracking config enables `state_changed` only — the others are loud
and opt-in.
### Context variables exposed to templates
Pulled directly from HA's `state_changed` payload, normalized:
- `entity_id``light.kitchen`
- `friendly_name``attributes.friendly_name` or fallback to `entity_id`
- `domain` — derived from `entity_id` before the dot
- `old_state``from_state.state`
- `new_state``to_state.state`
- `attributes` — dict of new-state attributes (raw)
- `device_class``attributes.device_class` if present
- `area``attributes.area_id` if present (best effort; only set if HA
exposes it via the area registry, which costs a `get_registry` WS call —
see "Open questions")
- `last_changed`, `last_updated` — ISO timestamps
- For non-`state_changed` events: `event_type`, `event_data` (full dict)
### File touch map (Phase 1)
**Core** (`packages/core/src/notify_bridge_core/`)
| Path | Action | Notes |
| --- | --- | --- |
| `providers/base.py` | Modify | Add optional `subscribe(emit)` ABC method (default `NotImplementedError`); add `HOME_ASSISTANT = "home_assistant"` to `ServiceProviderType` |
| `providers/capabilities.py` | Modify | Add `HOME_ASSISTANT_CAPABILITIES` + register |
| `providers/home_assistant/__init__.py` | Create | Export + register template variables |
| `providers/home_assistant/client.py` | Create | WebSocket client (auth, subscribe, get_states, call_service stub) |
| `providers/home_assistant/event_parser.py` | Create | HA event dict → `ServiceEvent` |
| `providers/home_assistant/provider.py` | Create | Class with `connect`, `disconnect`, `subscribe`, `list_collections` (entity list), `get_available_variables`, `get_provider_config_schema`, `test_connection`. `poll` raises NotImplementedError. |
| `templates/defaults/en/home_assistant_*.jinja2` | Create | 4 slot templates |
| `templates/defaults/ru/home_assistant_*.jinja2` | Create | 4 slot templates |
| `templates/defaults/loader.py` | Modify | Add to `PROVIDER_SLOT_FILE_MAP` |
| `templates/command_defaults/loader.py` | Modify | Stub entry — empty commands list for now |
| `templates/context.py` | Modify | HA context builder |
| `templates/validator.py` | Modify | Whitelist HA variable names |
**Server** (`packages/server/src/notify_bridge_server/`)
| Path | Action | Notes |
| --- | --- | --- |
| `services/watcher.py` *(or scheduler / lifecycle module that hosts polling)* | Modify | Add subscription-manager branch — for providers whose class overrides `subscribe`, start/stop long-running task instead of polling |
| `services/scheduler.py` | Verify | Confirm we cancel HA subscription on shutdown (graceful_shutdown_seconds path) |
| `api/template_configs.py` | Modify | `get_template_variables()` entry |
| `api/command_template_configs.py` | Modify | Sample ctx (minimal for Phase 1 — no commands) |
| `services/sample_context.py` | Modify | `_SAMPLE_CONTEXT["home_assistant"]` |
| `database/seeds.py` | Modify | Seed notification templates + default tracking config |
**Frontend** (`frontend/src/`)
| Path | Action | Notes |
| --- | --- | --- |
| `lib/providers/home-assistant.ts` | Create | Descriptor per CLAUDE.md rule 11 |
| `lib/providers/index.ts` | Modify | Register descriptor |
| `lib/locales/en.json` | Modify | `providers.typeHomeAssistant`, `gridDesc.providerHomeAssistant` |
| `lib/locales/ru.json` | Modify | Same |
**Tests**
| Path | Action |
| --- | --- |
| `packages/core/tests/providers/test_home_assistant_parser.py` | Create — HA payload → `ServiceEvent` |
| `packages/core/tests/providers/test_home_assistant_client.py` | Create — WS auth, subscribe, reconnect (use a fake server) |
| `packages/server/tests/test_home_assistant_subscription.py` | Create — subscription manager lifecycle, event flows through dispatcher |
### Frontend descriptor essentials
```text
type: "home_assistant"
defaultName: "Home Assistant"
icon: "home" (consider Lucide icon; HA logo if a custom asset exists)
hasUrl: true // base URL of HA (used to derive WS URL)
configFields:
- url: http(s)://homeassistant.local:8123
- access_token: long-lived access token (required)
- allowed_event_types: comma-separated, defaults to "state_changed"
eventFields: 4 checkboxes (state_changed, automation_triggered,
call_service, event_fired)
extraTrackingFields:
- entity_glob: tag input ("light.*", "binary_sensor.*_motion")
- domain_allowlist: tag input
collectionMeta: { label: "Entities", icon: "..." }
webhookBased: false // we are NOT webhook based
```
WS URL is derived: `wss://{host}/api/websocket` (or `ws://` for plain http
HA). Document this in the UI hint.
### Auth model
- **Long-lived access token** from HA (Profile → Long-Lived Access Tokens).
- Stored encrypted at rest via the same path the other providers use for
secrets (the bridge already has a secret-encryption helper — verify the
exact module name during implementation).
- WS auth handshake: connect → server sends `auth_required` → client sends
`{type: "auth", access_token: "..."}` → server replies `auth_ok` or
`auth_invalid`.
### Risks / open questions (Phase 1)
1. **Reconnect strategy.** Exponential backoff capped at 60s, jittered.
On reconnect, log a `connection_restored_after` event so the UI can
surface the gap. Document that HA does not support event replay.
2. **Area registry.** Pulling `area_id` for entities requires a separate
`config/area_registry/list` WS call. Decision needed: fetch once on
connect and cache, refetch on `area_registry_updated` event, or skip
`area` from the context entirely in v1. Recommendation: fetch on
connect, refetch on `area_registry_updated`, skip if it fails (best-effort).
3. **TLS verification for self-signed HA.** Homelab users often have
self-signed certs. Need a `verify_tls: bool` config field (default true)
and a clear warning when disabled. Same pattern as
`NOTIFY_BRIDGE_ALLOW_PRIVATE_URLS` for the SSRF case.
4. **Backpressure.** HA's `state_changed` can fire hundreds of events per
minute in a busy install. The subscription manager must drop or coalesce
if the dispatcher backlog grows beyond a threshold. Cheapest cut: a
bounded `asyncio.Queue` between WS receiver and dispatch — `put_nowait`
with overflow counter visible in the event log.
5. **Entity filter precedence.** Tracking-config has `collection_ids`
(entity_id list) and we want `entity_glob` + `domain_allowlist`. Decision:
if both `collection_ids` and globs are set, union them (any match passes).
Documented prominently in the tracker UI.
6. **Library choice.** `hass-client` is a Python WS client maintained by the
HA community; alternative is rolling our own with `websockets`. The
latter is ~150 LOC and has no external dependency surface. Recommendation:
roll our own. Re-evaluate if Phase 3 needs registry-aware service calls.
## Phase 2 — Bot Commands
Adds Telegram bot commands for HA tracking configs.
- `/status` — connection status, subscribed event count
- `/entities <glob>` — list matching entities + current state
- `/state <entity_id>` — full state + attributes for one entity
- `/areas` — area registry summary
- `/help`
These use the existing WS connection (no new client) via WS commands
`get_states`, `config/area_registry/list`. Template slots and command
template configs follow the same pattern as Gitea/Planka — see
[CLAUDE.md](../../CLAUDE.md) rule 7 / rule 11 for the full set of locations
that must be updated.
Out-of-scope for Phase 2: any command that mutates HA state.
## Phase 3 — Smart Actions (Service Calls)
A new action descriptor in the existing Smart Actions framework
([packages/core/src/notify_bridge_core/providers/actions.py](../../packages/core/src/notify_bridge_core/providers/actions.py)).
- Action type: `ha_call_service`
- Rule: trigger event → service call (e.g. "on motion event in
`binary_sensor.front_door` → call `light.turn_on` on `light.porch`")
- Executor uses the existing WS connection to send `call_service`.
This phase is gated behind explicit per-target authorization in the UI — HA
service calls can do anything the access token allows, including unlocking
doors. Default state: **disabled**, with a clear consent flow when enabling.
## Rough effort estimates
These are rough — sub-task discovery during Phase 1 will refine them.
| Phase | Estimate (focused work) |
| --- | --- |
| Phase 1 (subscribe + dispatch) | 23 sessions |
| Phase 2 (bot commands) | 1 session |
| Phase 3 (smart actions) | 12 sessions |
## When to start
Phase 1 work order, once you green-light it:
1. ABC extension (`base.py`) + tests for the new `subscribe` shape on a fake
provider.
2. WS client + parser + unit tests against recorded HA fixtures (no live HA
needed for these).
3. Subscription manager in `services/watcher.py` — integration test with the
fake provider from step 1.
4. Templates (en + ru), capabilities entry, validator whitelist.
5. Server: seeds, sample context, template_configs entry.
6. Frontend: descriptor, locale keys, i18n.
7. End-to-end smoke test against a real HA instance (homelab).
Backend restart cadence per the project rule: after **every** change in
`packages/server/` or `packages/core/`.
## Decision log
- **2026-05-13** — Plan drafted. Ingest mode = WebSocket (chosen over
webhook for future-proofing toward Phases 2 + 3). Not started.