# BLE LED Controllers — Investigation & Implementation Notes Reference for anyone touching the BLE device provider (`server/src/ledgrab/core/devices/ble_*`). Captures the protocol quirks, Windows/bleak traps, and hardware lockdown we hit while bringing up SP110E / Triones / Zengge / Govee support. ## Architecture ``` BLEDeviceProvider → BLEClient → BLETransport (desktop: bleak, Android: Kotlin BleBridge via Chaquopy) │ └─ BLEProtocol (family-specific wire bytes: sp110e.py, triones.py, zengge.py, govee.py) ``` - One `BLEProtocol` dataclass per controller family. Each supplies GATT UUIDs, write type (with/without response), `encode_color` / `encode_power` functions, name prefixes for discovery, and an optional `init_writes` handshake sequence. - `BLEClient` is whole-strip only. `send_pixels()` averages incoming pixel arrays and emits one solid color per frame — none of these protocols support per-pixel streaming. - Discovery auto-detects the family via advertised name prefix first, falls back to service UUID matching. The detected family is returned on `DiscoveredDevice.ble_family` and preselected in the UI. - The settings modal lets users change the family after creation — wrong family → writes go to a characteristic the device ignores → strip stays dark. ## Protocol Quirks ### SP110E / SP108E (critical handshake) The controller **silently tears the GATT link down within ~1 second of connect** unless a two-write handshake arrives immediately: ``` Write 01 00 → characteristic FFE2 Write 01 B7 E3 D5 → characteristic FFE1 ``` Without this, the first real write later hangs for 30 s because bleak thinks the link is up but the peripheral has already dropped it. We carry these writes in `PROTOCOL.init_writes` and execute them from `BLEClient.connect()` right after GATT open. Color frame is **4 bytes** (`RR GG BB 1E`), not 5 — the earlier implementation had a stray `0x00` padding byte that the device tolerated but isn't documented. Source: [mbullington's reverse-engineering gist](https://gist.github.com/mbullington/37957501a07ad065b67d4e8d39bfe012). ### Triones / Zengge / Govee No init handshake required. Color frames and command bytes documented inline in each protocol module. Notable: Zengge and SP110E share service UUID `FFE0/FFE1`, so name-based identification is the only reliable way to tell them apart. In `_register_builtins()`, SP110E is registered first so it wins the `identify_family_by_service_uuids` tie by default — change this if the user base flips. ## bleak + Windows WinRT Traps These bit us hard. All are now worked around, but future BLE work should keep them in mind. ### 1. `asyncio.wait_for` hangs forever on WinRT `BleakClient.connect()` / `write_gatt_char()` wrap WinRT `IAsyncOperation`s. When asyncio tries to cancel them (as `wait_for` does on timeout), the WinRT task **never finishes cancelling**, so `wait_for` itself blocks forever while awaiting the cancellation. Symptom: log stops with no timeout error, process is alive but wedged. **Fix**: `_bounded_await()` in [ble_transport.py](../server/src/ledgrab/core/devices/ble_transport.py) uses `asyncio.wait()` instead, which returns on timeout without awaiting pending tasks. Orphans the hanging WinRT task but frees the caller. ### 2. Connect by raw MAC string fails on Windows Passing `BleakClient("AA:BB:CC:DD:EE:FF")` makes WinRT guess the address type (public vs random static vs random resolvable). Guesses wrong → connect silently times out. Symptom: `TimeoutError: BLE connect to ... exceeded 10.0s` with no other signal. **Fix**: Always pre-scan with `BleakScanner.find_device_by_address()` and pass the returned `BLEDevice` object to `BleakClient`. Costs ~400 ms but makes connect reliable. ### 3. Client-side fetch timeout too short for BLE target start The target-start endpoint does a ~5 s pre-scan + up to 10 s GATT connect + init handshake. Default `fetchWithAuth` has a 10 s timeout and 3× retry, so the UI was aborting and retrying concurrent `/start` requests into the server. **Fix**: `startTargetProcessing` overrides `timeout: 30000, retry: false`. ### 4. `Start-Process -WindowStyle Hidden` from bash/WSL strips handles When `restart.ps1` is invoked from Git-Bash / WSL, `Start-Process` inherited handles cause the child uvicorn to exit immediately. Stream redirection fixes it. **Fix**: `restart.ps1` always uses `-RedirectStandardOutput`/`-RedirectStandardError` to a temp log. Failed startups dump the stderr tail to the caller so root cause is visible. ## Vendor Lockdown (the dead end) Some controllers — notably the one we tested, advertising as `AlexTable` at `16:61:05:70:68:44` — **only accept connections from the vendor phone app**. Diagnostic sequence: | Test | Result | Meaning | | --- | --- | --- | | LedGrab `BleakClient.connect()` | 10 s timeout | Windows can't connect | | Windows "Bluetooth LE Explorer" | Hangs on connect | Same Windows stack as bleak — not our bug | | Phone **OS** Bluetooth Settings | Can't connect | Phone OS uses generic BLE stack — also fails | | Phone **LED Hue** app | Connects fine | Vendor app is the *only* working client | At this point, further Windows/bleak tweaks have no effect. The peripheral firmware rejects generic GATT connects and only stays connected when the LED Hue app emits its vendor-specific handshake. To unlock such a controller from LedGrab you'd need to: 1. Enable **Developer Options → Bluetooth HCI snoop log** on Android. 2. Reproduce the LED Hue flow (connect → color change → disconnect). 3. `adb bugreport bugreport.zip`; extract `btsnoop_hci.log`. 4. Open in Wireshark; identify the vendor handshake bytes written during connect. 5. Add them to the protocol's `init_writes`. Alternatively, replace the BLE controller hardware with **WLED on ESP32** — $3, fully supported, vastly more capable. ## Frontend - BLE family picker uses the project's shared `IconSelect` grid (project rule — see [CLAUDE.md](../CLAUDE.md): "NEVER use plain HTML `