Home

The iOS Audio Stack Problem

ios
audio
design

An imperative API to a shared OS service is not the right API design.

The iOS audio stack — AVAudioSession, AVAudioEngine, Audio Units, and friends — still forces developers to micromanage a global mutable singleton that the entire operating system also mutates. We issue command after command, wire nodes manually, observe a dozen notifications, and pray the timing is perfect. The result is fragile code, cryptic crashes, and weeks of debugging that should never have existed in the first place.

The Current Imperative Reality

Apple’s audio APIs expect you to hold a complex mental model of the system’s hidden state at all times. You must:

  • Configure the shared AVAudioSession
  • Activate/deactivate it at the right moments
  • Manually build and maintain an AVAudioEngine node graph
  • React to every interruption, route change, and media server reset yourself

Here’s what a simple mic → speaker passthrough looks like today.

// 1. Imperative session setup
let session = AVAudioSession.sharedInstance()
do {
    try session.setCategory(.playAndRecord,
                          mode: .default,
                          options: [.defaultToSpeaker, .allowBluetooth])
    try session.setActive(true)
} catch {
    // Good luck
}

// 2. Engine and manual wiring
let engine = AVAudioEngine()
let input = engine.inputNode
let output = engine.outputNode

let format = input.inputFormat(forBus: 0)
engine.connect(input, to: output, format: format)

// 3. Optional tap for processing
input.installTap(onBus: 0, bufferSize: 1024, format: format) { buffer, time in
    // Process microphone data
}

// 4. Start and hope nothing breaks
try engine.start()

// 5. Manual observers (you need these forever)
NotificationCenter.default.addObserver(self,
    selector: #selector(handleInterruption),
    name: AVAudioSession.interruptionNotification,
    object: nil)

Every step is order-dependent. One wrong call, one missed notification, or one background transition and your app crashes or goes silent.

Why This Design Is Fundamentally Broken

An imperative API to a shared OS service is not the right API design.

The audio hardware, Bluetooth stack, media server, and other apps all compete for the same scarce resource. An imperative model gives you too much responsibility and too little visibility. The system cannot reason about your intent — only about the sequence of calls you made. This leads to:

  • Brittle ordering dependencies
  • Race conditions with system events
  • Massive boilerplate and defensive code
  • Crashes that surface far away from the offending call

A Better Way: Declarative Intents + Delegate Callbacks

We should declare what we want to achieve and provide a delegate for how to handle I/O and lifecycle events. The system takes care of session management, engine lifecycle, node wiring, automatic reconnects after route changes, media server resets, and background behavior.

Example 1: Mic Input → Speaker Output (Passthrough)

// 1. Declare the intent once
let intent = AudioIntent.playAndRecord(
    allowsBackground: false,
    spatialPolicy: .auto,
    priority: .userInitiated
)

// 2. Configure with delegate
AudioSession.configure(with: intent, delegate: self)
extension MyAudioController: AudioIntentDelegate {

    func audioInputBufferReceived(_ buffer: AVAudioPCMBuffer, at time: AVAudioTime) {
        // Clean mic data arrives here — system already handles format and reconnection
    }

    func audioOutputBufferRequested(_ buffer: inout AVAudioPCMBuffer, at time: AVAudioTime) {
        // Fill output buffer (or passthrough mic data)
    }

    func audioSessionInterrupted(reason: AudioInterruptionReason, shouldResume: Bool) {
        // Simple, high-level handling
    }

    func audioRouteDidChange(newRoute: AVAudioSessionRouteDescription) {
        // System already reconnected everything
    }

    func audioEngineRecoveredFromError(_ error: Error) {
        // System handled recovery
    }
}

Example 2: Audio Playback (Music / Voice / Effects)

// Declare playback intent
let playbackIntent = AudioIntent.playback(
    source: .file(url: audioURL),           // or .streaming, .generator, etc.
    allowsBackground: true,
    spatialPolicy: .headTracked,
    interruptionPolicy: .pauseAndResume,
    priority: .userInitiated
)

AudioSession.configure(with: playbackIntent, delegate: self)
extension MyAudioController: AudioIntentDelegate {

    // Pull-based: system asks you for the next buffer when needed
    func audioOutputBufferRequested(_ buffer: inout AVAudioPCMBuffer, at time: AVAudioTime) {
        // Decode next chunk of audio file, generate tone, mix effects, etc.
        // System guarantees correct format and timing
    }

    func audioPlaybackFinished() {
        // Natural end of file or stream
    }

    func audioSessionInterrupted(reason: AudioInterruptionReason, shouldResume: Bool) {
        if shouldResume {
            // Optional: continue from saved position
        }
    }
}

No manual engine.start(), no node graph management, no separate notification center observers, and no fragile cleanup code. The system reconciles your declared intent with current device state and keeps everything alive.

What Apple Should Ship

A new declarative layer (AudioIntent, AudioIntentDelegate, and supporting value types) that lives alongside the existing imperative APIs. Default intent presets for common use cases (music player, VoIP, live audio processing, games) would make 90 % of apps dramatically simpler and more reliable.

Stop telling the system how to do its job and start telling it what you want. Apple, if you’re listening: give us the declarative audio stack we deserve.