Files
ledgrab/ANDROID-REVIEW/android-audio-capture-plan.md
T
alexei.dolgolyov 4b2e8fc5ec docs(android): add audio-capture design + missing-functionality review
- android-audio-capture-plan.md — design behind the merged on-device audio
  capture feature (487259a).
- android-missing-functionality.md — Android missing-feature review notes.
2026-06-02 03:30:43 +03:00

14 KiB
Raw Blame History

Plan: Android on-device audio capture

Status: proposed plan (not yet approved). No code changes. Last updated 2026-06-01.

Context

LedGrab's audio-reactive features (music analyzer, audio value sources, band filters) depend on capturing an audio stream and running it through AudioAnalyzer (server/src/ledgrab/core/audio/analysis.py). On desktop this is fed by WASAPI (Windows) or Sounddevice/PortAudio (cross-platform). On the experimental Android-TV build neither is available — sounddevice has no Chaquopy wheel and PortAudio isn't bundled — so core/audio/__init__.py registers only DemoAudioEngine, and audio-reactive lighting is effectively dead on Android.

Android does not need PortAudio: the platform exposes AudioPlaybackCapture (API 29+), which captures system playback audio and takes a MediaProjection token — the very token the app already obtains for screen capture (ScreenCapture(projection, …)). This plan adds a push-based Android audio engine so the TV box can drive sound-reactive lighting from its own media playback, at parity with how desktop audio feeds the analyzer.

The design mirrors the working screen-capture bridge (mediaprojection_engine.pyScreenCapture.ktPythonBridge) and the existing audio engine abstraction (AudioCaptureEngine / AudioCaptureStreamBase / AudioEngineRegistry). No new Python dependencies (numpy is already bundled) → no Chaquopy / build.gradle.kts pip {} changes.


Approach

A new push-based audio engine registered in the existing AudioEngineRegistry:

  • Python: AndroidAudioEngine + AndroidAudioCaptureStream mirroring SounddeviceEngine, but read_chunk() pops PCM from a module-level queue that Kotlin fills (mirror of mediaprojection_engine.push_frame). High ENGINE_PRIORITY so AudioEngineRegistry.get_best_available_engine() selects it on Android. The existing ManagedAudioStream capture loop and AudioAnalyzer consume read_chunk() unchanged.
  • Android: an AudioCapture helper using AudioRecord + AudioPlaybackCaptureConfiguration (reusing CaptureService's MediaProjection), pushing float32 PCM to Python. Mic (AudioSource.MIC) fallback. Wired into CaptureService next to ScreenCapture.
[media playback] → AudioRecord (AudioPlaybackCapture, reuses MediaProjection)
   → AudioCapture.kt → PythonBridge.pushAudio(pcmFloat32, frames, channels)
   → android_audio_engine.push_samples()  [module-level queue]
   → AndroidAudioCaptureStream.read_chunk()  → ManagedAudioStream → AudioAnalyzer  [unchanged]

Part A — Python (server)

New file: server/src/ledgrab/core/audio/android_audio_engine.py — mirror mediaprojection_engine.py (queue + configure + push) and sounddevice_engine.py (engine/stream shape):

import queue
import numpy as np
from typing import Any, Dict, List
from ledgrab.core.audio.base import AudioCaptureEngine, AudioCaptureStreamBase, AudioDeviceInfo
from ledgrab.utils import get_logger

logger = get_logger(__name__)

_pcm_queue: "queue.Queue[np.ndarray]" = queue.Queue(maxsize=8)
_sample_rate = 48000
_channels = 2
_chunk_size = 1024
_active = False

def configure(sample_rate: int, channels: int, chunk_size: int) -> None:
    """Called from Kotlin before audio frames start flowing. Drains stale PCM."""
    global _sample_rate, _channels, _chunk_size, _active
    while not _pcm_queue.empty():
        try: _pcm_queue.get_nowait()
        except queue.Empty: break
    _sample_rate, _channels, _chunk_size = sample_rate, channels, chunk_size
    _active = True

def push_samples(pcm_float32: bytes) -> None:
    """Push one interleaved float32 PCM chunk from Kotlin. Drops oldest if full."""
    samples = np.frombuffer(pcm_float32, dtype=np.float32)
    try:
        _pcm_queue.put_nowait(samples)
    except queue.Full:
        try: _pcm_queue.get_nowait()
        except queue.Empty: pass
        try: _pcm_queue.put_nowait(samples)
        except queue.Full: pass

def shutdown() -> None:
    global _active
    _active = False


class AndroidAudioCaptureStream(AudioCaptureStreamBase):
    @property
    def channels(self) -> int: return _channels
    @property
    def sample_rate(self) -> int: return _sample_rate
    @property
    def chunk_size(self) -> int: return _chunk_size
    def initialize(self) -> None:
        if not _active:
            raise RuntimeError("Android audio engine not configured (only valid in-app).")
        self._initialized = True
    def cleanup(self) -> None:
        self._initialized = False
    def read_chunk(self) -> np.ndarray | None:
        try:
            return _pcm_queue.get(timeout=0.1)  # 1-D float32 interleaved
        except queue.Empty:
            return None


class AndroidAudioEngine(AudioCaptureEngine):
    ENGINE_TYPE = "android_playback"
    ENGINE_PRIORITY = 100  # highest on Android (demo is lower)
    @classmethod
    def is_available(cls) -> bool:
        from ledgrab.utils.platform import is_android
        return is_android() and _active
    @classmethod
    def get_default_config(cls) -> Dict[str, Any]:
        return {"sample_rate": _sample_rate, "channels": _channels, "chunk_size": _chunk_size}
    @classmethod
    def enumerate_devices(cls) -> List[AudioDeviceInfo]:
        if not cls.is_available(): return []
        return [AudioDeviceInfo(index=0, name="Android playback (system audio)",
                                is_input=True, is_loopback=True,
                                channels=_channels, default_samplerate=float(_sample_rate))]
    @classmethod
    def create_stream(cls, device_index, is_loopback, config) -> AndroidAudioCaptureStream:
        return AndroidAudioCaptureStream(device_index, is_loopback, {**cls.get_default_config(), **config})

Modify server/src/ledgrab/core/audio/__init__.py — register behind a guarded import, matching the existing _has_wasapi / _has_sounddevice pattern:

try:
    from ledgrab.core.audio.android_audio_engine import AndroidAudioEngine
    _has_android_audio = True
except ImportError:
    _has_android_audio = False
...
if _has_android_audio:
    AudioEngineRegistry.register(AndroidAudioEngine)

Reused, unchanged: AudioEngineRegistry.get_best_available_engine() (picks by priority), ManagedAudioStream._capture_loop() (audio_capture.py), AudioAnalyzer, the audio value sources, and the device-enumeration endpoints. The Android engine appears as one loopback device named "Android playback (system audio)".


Part B — Android (Kotlin + manifest)

New file: android/app/src/main/java/com/ledgrab/android/AudioCapture.kt

Mirrors ScreenCapture.kt, taking the same MediaProjection:

class AudioCapture(
    private val projection: MediaProjection,
    private val bridge: PythonBridge,
    private val sampleRate: Int = 48000,
    private val channels: Int = 2,
    private val chunkFrames: Int = 1024,
)
  • start() (API 29+, MediaProjection mode):
    • Build AudioPlaybackCaptureConfiguration(projection) adding usages USAGE_MEDIA, USAGE_GAME, USAGE_UNKNOWN (the capturable set).
    • AudioRecord.Builder().setAudioPlaybackCaptureConfig(cfg) with AudioFormat(ENCODING_PCM_FLOAT, sampleRate, CHANNEL_IN_STEREO).
    • On a dedicated HandlerThread, loop audioRecord.read(floatBuf, …, READ_BLOCKING) → wrap into a little-endian float32 ByteArray (reusable buffer, like ScreenCapture's frameBuffer) → bridge.pushAudio(bytes, framesRead, channels).
  • stop(): stop/release AudioRecord, quit the thread.
  • Mic fallback (startMic()): AudioSource.MIC for root mode (no MediaProjection) or API < 29. Used only when playback capture is unavailable.

Modify android/app/src/main/java/com/ledgrab/android/PythonBridge.kt — add the audio push path (same shape as pushFrame, with a cached PyObject handle):

@Volatile private var androidAudioEngine: PyObject? = null

fun configureAudio(sampleRate: Int, channels: Int, chunkFrames: Int) {
    val engine = Python.getInstance().getModule("ledgrab.core.audio.android_audio_engine")
    engine.callAttr("configure", sampleRate, channels, chunkFrames)
    androidAudioEngine = engine
}
fun pushAudio(pcmFloat32: ByteArray, frames: Int, channels: Int) {
    if (!running) return
    androidAudioEngine?.let {
        try { it.callAttr("push_samples", pcmFloat32) }
        catch (e: Exception) { Log.w(TAG, "pushAudio failed: ${e.message}") }
    }
}

Modify android/app/src/main/java/com/ledgrab/android/CaptureService.kt — in the MediaProjection start path (where ScreenCapture is created with the projection), if RECORD_AUDIO is granted and API ≥ 29, also bridge.configureAudio(...) and start an AudioCapture(projection, bridge). Stop/release it in onDestroy alongside ScreenCapture. Root path → optional mic fallback (or skip; see Risks).

Modify android/app/src/main/AndroidManifest.xml:

<uses-permission android:name="android.permission.RECORD_AUDIO" />
<!-- For mic-mode foreground capture on API 34+ (playback capture is covered by the
     existing mediaProjection FGS type): -->
<uses-permission android:name="android.permission.FOREGROUND_SERVICE_MICROPHONE" />

The existing CaptureService already declares foregroundServiceType="mediaProjection|specialUse" and holds FOREGROUND_SERVICE_MEDIA_PROJECTION; add microphone to the type only if mic fallback is implemented.

Modify MainActivity.kt — request RECORD_AUDIO at runtime alongside the existing ensureNotificationPermission() (POST_NOTIFICATIONS) flow, before starting capture. Capture proceeds without audio if denied (graceful degradation).


Orchestration decision (the main trade-off)

Desktop starts audio capture on demand when an audio-reactive source is acquired (AudioCaptureManager.acquire). On Android, PCM only flows if Kotlin has set up AudioRecord.

  • MVP (recommended): start AudioCapture when CaptureService starts (if RECORD_AUDIO granted + MediaProjection mode + API ≥ 29) and push continuously; the bounded queue drops frames when no audio source consumes them. Simplest; modest extra CPU.
  • Future optimization: on-demand start/stop signaled Python→Kotlin (Chaquopy can call Kotlin, as BleBridge/UsbSerialBridge show) so AudioRecord runs only while an audio-reactive source is active. Defer unless CPU/battery on low-end boxes warrants it.

What does NOT change

  • Frontend / API — audio engine + device selection, the music analyzer UI, and audio value sources are engine-agnostic; the Android engine shows up via the existing device enumeration.
  • build.gradle.kts / Chaquopy pip block — no new Python packages.
  • Audio analysis pipelineAudioAnalyzer, band filters, ManagedAudioStream untouched.

Files

Create

  • server/src/ledgrab/core/audio/android_audio_engine.py
  • android/app/src/main/java/com/ledgrab/android/AudioCapture.kt
  • server/tests/core/audio/test_android_audio_engine.py

Modify

  • server/src/ledgrab/core/audio/__init__.py — guarded import + registry registration.
  • android/app/src/main/java/com/ledgrab/android/PythonBridge.ktconfigureAudio + pushAudio.
  • android/app/src/main/java/com/ledgrab/android/CaptureService.kt — start/stop AudioCapture.
  • android/app/src/main/java/com/ledgrab/android/MainActivity.kt — request RECORD_AUDIO.
  • android/app/src/main/AndroidManifest.xmlRECORD_AUDIO (+ mic FGS if mic fallback).

Tests (Python — run on desktop CI, no Android device needed)

New server/tests/core/audio/test_android_audio_engine.py:

  • configure() then push_samples()read_chunk() returns the same float32 samples; queue drops oldest when full (push > maxsize).
  • AndroidAudioEngine.is_available() is False until configure() and only on Android (monkeypatch ledgrab.utils.platform.is_android); True after.
  • enumerate_devices() returns exactly one loopback device when active, [] otherwise.
  • Integration: with is_android() patched true + configure(), get_best_available_engine() returns "android_playback" (priority beats demo), and a stream created via AudioEngineRegistry.create_stream("android_playback", 0, True, {}) yields pushed chunks.
  • Registry isolation: use AudioEngineRegistry.clear_registry() / re-register in fixtures so desktop engines aren't disturbed.

Verification

  1. Python: py -3.13 -m pytest tests/core/audio/test_android_audio_engine.py --no-cov -q (from server/), then the full suite.
  2. Lint: ruff check src/ tests/ --fix (from server/).
  3. Android build: ./gradlew :app:assembleDebug (from android/).
  4. On device/emulator (manual): install APK → grant RECORD_AUDIO + screen-capture consent → start capture → play non-DRM media (e.g. a local video / YouTube web) → create an audio-reactive value source bound to a strip → confirm the LEDs react to the audio, and the Android playback device appears in audio device enumeration.

Risks / notes

  • DRM opt-out: Netflix/Disney+/etc. set audio as non-capturable; AudioPlaybackCapture yields silence for them. Works for non-DRM media and the device's own audio. Document in UI.
  • API 29 minimum for playback capture (minSdk is 24). API 2428 and root mode (no MediaProjection) → mic fallback only, or audio unsupported. Gate cleanly + log.
  • RECORD_AUDIO is a runtime "dangerous" permission — must be requested; capture must degrade gracefully when denied.
  • Format: request ENCODING_PCM_FLOAT so Kotlin pushes float32 matching read_chunk()'s contract (1-D interleaved float32, length = frames × channels). If a device rejects float, capture 16-bit PCM and convert (/32768.0) before pushing.
  • Latency/CPU: small chunkFrames (e.g. 1024 @ 48 kHz ≈ 21 ms) keeps reactivity tight; continuous capture (MVP) adds modest CPU on low-end boxes — see the orchestration trade-off.
  • R8/ProGuard: minify is disabled and the Python module is resolved by string from Kotlin; no new keep-rules needed.