docs(android): add audio-capture design + missing-functionality review

- android-audio-capture-plan.md — design behind the merged on-device audio capture feature (487259a). - android-missing-functionality.md — Android missing-feature review notes.
2026-06-02 03:30:43 +03:00
parent 487259a96d
commit 4b2e8fc5ec
2 changed files with 461 additions and 0 deletions
@@ -0,0 +1,308 @@
+# Plan: Android on-device audio capture
+
+> Status: proposed plan (not yet approved). No code changes. Last updated 2026-06-01.
+
+## Context
+
+LedGrab's audio-reactive features (music analyzer, audio value sources, band filters)
+depend on capturing an audio stream and running it through `AudioAnalyzer`
+(`server/src/ledgrab/core/audio/analysis.py`). On desktop this is fed by **WASAPI**
+(Windows) or **Sounddevice/PortAudio** (cross-platform). On the **experimental
+Android-TV build** neither is available — `sounddevice` has no Chaquopy wheel and PortAudio
+isn't bundled — so `core/audio/__init__.py` registers only `DemoAudioEngine`, and
+audio-reactive lighting is effectively dead on Android.
+
+Android does not need PortAudio: the platform exposes **`AudioPlaybackCapture`** (API 29+),
+which captures system playback audio and **takes a `MediaProjection` token — the very token
+the app already obtains for screen capture** (`ScreenCapture(projection, …)`). This plan adds
+a push-based Android audio engine so the TV box can drive sound-reactive lighting from its own
+media playback, at parity with how desktop audio feeds the analyzer.
+
+The design mirrors the working screen-capture bridge
+(`mediaprojection_engine.py` ↔ `ScreenCapture.kt` ↔ `PythonBridge`) and the existing audio
+engine abstraction (`AudioCaptureEngine` / `AudioCaptureStreamBase` /
+`AudioEngineRegistry`). **No new Python dependencies** (`numpy` is already bundled) → no
+Chaquopy / `build.gradle.kts` `pip {}` changes.
+
+---
+
+## Approach
+
+A new **push-based** audio engine registered in the existing `AudioEngineRegistry`:
+
+- **Python:** `AndroidAudioEngine` + `AndroidAudioCaptureStream` mirroring `SounddeviceEngine`,
+  but `read_chunk()` pops PCM from a module-level queue that **Kotlin fills** (mirror of
+  `mediaprojection_engine.push_frame`). High `ENGINE_PRIORITY` so
+  `AudioEngineRegistry.get_best_available_engine()` selects it on Android. The existing
+  `ManagedAudioStream` capture loop and `AudioAnalyzer` consume `read_chunk()` unchanged.
+- **Android:** an `AudioCapture` helper using `AudioRecord` + `AudioPlaybackCaptureConfiguration`
+  (reusing `CaptureService`'s `MediaProjection`), pushing float32 PCM to Python. Mic
+  (`AudioSource.MIC`) fallback. Wired into `CaptureService` next to `ScreenCapture`.
+
+```
+[media playback] → AudioRecord (AudioPlaybackCapture, reuses MediaProjection)
+   → AudioCapture.kt → PythonBridge.pushAudio(pcmFloat32, frames, channels)
+   → android_audio_engine.push_samples()  [module-level queue]
+   → AndroidAudioCaptureStream.read_chunk()  → ManagedAudioStream → AudioAnalyzer  [unchanged]
+```
+
+---
+
+## Part A — Python (server)
+
+**New file: `server/src/ledgrab/core/audio/android_audio_engine.py`** — mirror
+`mediaprojection_engine.py` (queue + configure + push) and `sounddevice_engine.py` (engine/stream shape):
+
+```python
+import queue
+import numpy as np
+from typing import Any, Dict, List
+from ledgrab.core.audio.base import AudioCaptureEngine, AudioCaptureStreamBase, AudioDeviceInfo
+from ledgrab.utils import get_logger
+
+logger = get_logger(__name__)
+
+_pcm_queue: "queue.Queue[np.ndarray]" = queue.Queue(maxsize=8)
+_sample_rate = 48000
+_channels = 2
+_chunk_size = 1024
+_active = False
+
+def configure(sample_rate: int, channels: int, chunk_size: int) -> None:
+    """Called from Kotlin before audio frames start flowing. Drains stale PCM."""
+    global _sample_rate, _channels, _chunk_size, _active
+    while not _pcm_queue.empty():
+        try: _pcm_queue.get_nowait()
+        except queue.Empty: break
+    _sample_rate, _channels, _chunk_size = sample_rate, channels, chunk_size
+    _active = True
+
+def push_samples(pcm_float32: bytes) -> None:
+    """Push one interleaved float32 PCM chunk from Kotlin. Drops oldest if full."""
+    samples = np.frombuffer(pcm_float32, dtype=np.float32)
+    try:
+        _pcm_queue.put_nowait(samples)
+    except queue.Full:
+        try: _pcm_queue.get_nowait()
+        except queue.Empty: pass
+        try: _pcm_queue.put_nowait(samples)
+        except queue.Full: pass
+
+def shutdown() -> None:
+    global _active
+    _active = False
+
+
+class AndroidAudioCaptureStream(AudioCaptureStreamBase):
+    @property
+    def channels(self) -> int: return _channels
+    @property
+    def sample_rate(self) -> int: return _sample_rate
+    @property
+    def chunk_size(self) -> int: return _chunk_size
+    def initialize(self) -> None:
+        if not _active:
+            raise RuntimeError("Android audio engine not configured (only valid in-app).")
+        self._initialized = True
+    def cleanup(self) -> None:
+        self._initialized = False
+    def read_chunk(self) -> np.ndarray | None:
+        try:
+            return _pcm_queue.get(timeout=0.1)  # 1-D float32 interleaved
+        except queue.Empty:
+            return None
+
+
+class AndroidAudioEngine(AudioCaptureEngine):
+    ENGINE_TYPE = "android_playback"
+    ENGINE_PRIORITY = 100  # highest on Android (demo is lower)
+    @classmethod
+    def is_available(cls) -> bool:
+        from ledgrab.utils.platform import is_android
+        return is_android() and _active
+    @classmethod
+    def get_default_config(cls) -> Dict[str, Any]:
+        return {"sample_rate": _sample_rate, "channels": _channels, "chunk_size": _chunk_size}
+    @classmethod
+    def enumerate_devices(cls) -> List[AudioDeviceInfo]:
+        if not cls.is_available(): return []
+        return [AudioDeviceInfo(index=0, name="Android playback (system audio)",
+                                is_input=True, is_loopback=True,
+                                channels=_channels, default_samplerate=float(_sample_rate))]
+    @classmethod
+    def create_stream(cls, device_index, is_loopback, config) -> AndroidAudioCaptureStream:
+        return AndroidAudioCaptureStream(device_index, is_loopback, {**cls.get_default_config(), **config})
+```
+
+**Modify `server/src/ledgrab/core/audio/__init__.py`** — register behind a guarded import,
+matching the existing `_has_wasapi` / `_has_sounddevice` pattern:
+
+```python
+try:
+    from ledgrab.core.audio.android_audio_engine import AndroidAudioEngine
+    _has_android_audio = True
+except ImportError:
+    _has_android_audio = False
+...
+if _has_android_audio:
+    AudioEngineRegistry.register(AndroidAudioEngine)
+```
+
+**Reused, unchanged:** `AudioEngineRegistry.get_best_available_engine()` (picks by priority),
+`ManagedAudioStream._capture_loop()` (`audio_capture.py`), `AudioAnalyzer`, the audio value
+sources, and the device-enumeration endpoints. The Android engine appears as one loopback
+device named "Android playback (system audio)".
+
+---
+
+## Part B — Android (Kotlin + manifest)
+
+**New file: `android/app/src/main/java/com/ledgrab/android/AudioCapture.kt`**
+
+Mirrors `ScreenCapture.kt`, taking the same `MediaProjection`:
+
+```kotlin
+class AudioCapture(
+    private val projection: MediaProjection,
+    private val bridge: PythonBridge,
+    private val sampleRate: Int = 48000,
+    private val channels: Int = 2,
+    private val chunkFrames: Int = 1024,
+)
+```
+
+- `start()` (API 29+, MediaProjection mode):
+  - Build `AudioPlaybackCaptureConfiguration(projection)` adding usages
+    `USAGE_MEDIA`, `USAGE_GAME`, `USAGE_UNKNOWN` (the capturable set).
+  - `AudioRecord.Builder().setAudioPlaybackCaptureConfig(cfg)` with
+    `AudioFormat(ENCODING_PCM_FLOAT, sampleRate, CHANNEL_IN_STEREO)`.
+  - On a dedicated `HandlerThread`, loop `audioRecord.read(floatBuf, …, READ_BLOCKING)` →
+    wrap into a little-endian float32 `ByteArray` (reusable buffer, like `ScreenCapture`'s
+    `frameBuffer`) → `bridge.pushAudio(bytes, framesRead, channels)`.
+- `stop()`: stop/release `AudioRecord`, quit the thread.
+- **Mic fallback** (`startMic()`): `AudioSource.MIC` for root mode (no MediaProjection) or
+  API < 29. Used only when playback capture is unavailable.
+
+**Modify `android/app/src/main/java/com/ledgrab/android/PythonBridge.kt`** — add the audio
+push path (same shape as `pushFrame`, with a cached PyObject handle):
+
+```kotlin
+@Volatile private var androidAudioEngine: PyObject? = null
+
+fun configureAudio(sampleRate: Int, channels: Int, chunkFrames: Int) {
+    val engine = Python.getInstance().getModule("ledgrab.core.audio.android_audio_engine")
+    engine.callAttr("configure", sampleRate, channels, chunkFrames)
+    androidAudioEngine = engine
+}
+fun pushAudio(pcmFloat32: ByteArray, frames: Int, channels: Int) {
+    if (!running) return
+    androidAudioEngine?.let {
+        try { it.callAttr("push_samples", pcmFloat32) }
+        catch (e: Exception) { Log.w(TAG, "pushAudio failed: ${e.message}") }
+    }
+}
+```
+
+**Modify `android/app/src/main/java/com/ledgrab/android/CaptureService.kt`** — in the
+MediaProjection start path (where `ScreenCapture` is created with the projection), if
+`RECORD_AUDIO` is granted and API ≥ 29, also `bridge.configureAudio(...)` and start an
+`AudioCapture(projection, bridge)`. Stop/release it in `onDestroy` alongside `ScreenCapture`.
+Root path → optional mic fallback (or skip; see Risks).
+
+**Modify `android/app/src/main/AndroidManifest.xml`:**
+```xml
+<uses-permission android:name="android.permission.RECORD_AUDIO" />
+<!-- For mic-mode foreground capture on API 34+ (playback capture is covered by the
+     existing mediaProjection FGS type): -->
+<uses-permission android:name="android.permission.FOREGROUND_SERVICE_MICROPHONE" />
+```
+The existing `CaptureService` already declares `foregroundServiceType="mediaProjection|specialUse"`
+and holds `FOREGROUND_SERVICE_MEDIA_PROJECTION`; add `microphone` to the type only if mic
+fallback is implemented.
+
+**Modify `MainActivity.kt`** — request `RECORD_AUDIO` at runtime alongside the existing
+`ensureNotificationPermission()` (POST_NOTIFICATIONS) flow, before starting capture. Capture
+proceeds without audio if denied (graceful degradation).
+
+---
+
+## Orchestration decision (the main trade-off)
+
+Desktop starts audio capture **on demand** when an audio-reactive source is acquired
+(`AudioCaptureManager.acquire`). On Android, PCM only flows if Kotlin has set up `AudioRecord`.
+
+- **MVP (recommended):** start `AudioCapture` when `CaptureService` starts (if `RECORD_AUDIO`
+  granted + MediaProjection mode + API ≥ 29) and push continuously; the bounded queue drops
+  frames when no audio source consumes them. Simplest; modest extra CPU.
+- **Future optimization:** on-demand start/stop signaled Python→Kotlin (Chaquopy can call
+  Kotlin, as `BleBridge`/`UsbSerialBridge` show) so `AudioRecord` runs only while an
+  audio-reactive source is active. Defer unless CPU/battery on low-end boxes warrants it.
+
+---
+
+## What does NOT change
+
+- **Frontend / API** — audio engine + device selection, the music analyzer UI, and audio value
+  sources are engine-agnostic; the Android engine shows up via the existing device enumeration.
+- **`build.gradle.kts` / Chaquopy pip block** — no new Python packages.
+- **Audio analysis pipeline** — `AudioAnalyzer`, band filters, `ManagedAudioStream` untouched.
+
+---
+
+## Files
+
+**Create**
+- `server/src/ledgrab/core/audio/android_audio_engine.py`
+- `android/app/src/main/java/com/ledgrab/android/AudioCapture.kt`
+- `server/tests/core/audio/test_android_audio_engine.py`
+
+**Modify**
+- `server/src/ledgrab/core/audio/__init__.py` — guarded import + registry registration.
+- `android/app/src/main/java/com/ledgrab/android/PythonBridge.kt` — `configureAudio` + `pushAudio`.
+- `android/app/src/main/java/com/ledgrab/android/CaptureService.kt` — start/stop `AudioCapture`.
+- `android/app/src/main/java/com/ledgrab/android/MainActivity.kt` — request `RECORD_AUDIO`.
+- `android/app/src/main/AndroidManifest.xml` — `RECORD_AUDIO` (+ mic FGS if mic fallback).
+
+---
+
+## Tests (Python — run on desktop CI, no Android device needed)
+
+New `server/tests/core/audio/test_android_audio_engine.py`:
+
+- `configure()` then `push_samples()` → `read_chunk()` returns the same float32 samples;
+  queue drops oldest when full (push > maxsize).
+- `AndroidAudioEngine.is_available()` is `False` until `configure()` and only on Android
+  (monkeypatch `ledgrab.utils.platform.is_android`); `True` after.
+- `enumerate_devices()` returns exactly one loopback device when active, `[]` otherwise.
+- Integration: with `is_android()` patched true + `configure()`, `get_best_available_engine()`
+  returns `"android_playback"` (priority beats demo), and a stream created via
+  `AudioEngineRegistry.create_stream("android_playback", 0, True, {})` yields pushed chunks.
+- Registry isolation: use `AudioEngineRegistry.clear_registry()` / re-register in fixtures so
+  desktop engines aren't disturbed.
+
+## Verification
+
+1. **Python:** `py -3.13 -m pytest tests/core/audio/test_android_audio_engine.py --no-cov -q`
+   (from `server/`), then the full suite.
+2. **Lint:** `ruff check src/ tests/ --fix` (from `server/`).
+3. **Android build:** `./gradlew :app:assembleDebug` (from `android/`).
+4. **On device/emulator (manual):** install APK → grant `RECORD_AUDIO` + screen-capture consent
+   → start capture → play non-DRM media (e.g. a local video / YouTube web) → create an
+   audio-reactive value source bound to a strip → confirm the LEDs react to the audio, and the
+   Android playback device appears in audio device enumeration.
+
+## Risks / notes
+
+- **DRM opt-out:** Netflix/Disney+/etc. set audio as non-capturable; `AudioPlaybackCapture`
+  yields silence for them. Works for non-DRM media and the device's own audio. Document in UI.
+- **API 29 minimum** for playback capture (minSdk is 24). API 24–28 and root mode (no
+  MediaProjection) → mic fallback only, or audio unsupported. Gate cleanly + log.
+- **`RECORD_AUDIO`** is a runtime "dangerous" permission — must be requested; capture must
+  degrade gracefully when denied.
+- **Format:** request `ENCODING_PCM_FLOAT` so Kotlin pushes float32 matching
+  `read_chunk()`'s contract (1-D interleaved float32, length = frames × channels). If a device
+  rejects float, capture 16-bit PCM and convert (`/32768.0`) before pushing.
+- **Latency/CPU:** small `chunkFrames` (e.g. 1024 @ 48 kHz ≈ 21 ms) keeps reactivity tight;
+  continuous capture (MVP) adds modest CPU on low-end boxes — see the orchestration trade-off.
+- **R8/ProGuard:** minify is disabled and the Python module is resolved by string from Kotlin;
+  no new keep-rules needed.