More on AVAudioEngine + AirPods

I found the cause of the most recent of my AVAudioEngine + AirPods issues, and an end-user workaround to go along with it. Sadly, this isn’t something I can easily patch with code.

There are a lot of moving parts here, so buckle in. I’m going to try and wave my hands around as many details as I can because this has already eaten far too much of my time over the past few days.

Bluetooth Headsets

My understanding is that Bluetooth headsets have two different modes that they operate in. The music-playing high-quality audio mode (output only), and the call-making low-quality mode (for input and output.)

On macOS, my AirPods expose two different devices to the system: A 2-channel output-only device for music playback, and a 1-channel input-only device for recording. The output-only device on my AirPods Pro operates at a native 48kHz sample rate, and the input-only device on my AirPods Pro operates at a native 16kHz sample rate.

As far as I can tell, the AirPods must drop their playback quality down to 16kHz whenever the microphone is activated. Whether that’s a Bluetooth limitation, an AirPods limitation, or not actually a hardware limitation is irrelevant right now. That’s just how things work.

Forcing the Problem

If you want to experience this for yourself, try doing the following with your AirPods paired to your Mac (running Big Sur 11.1 as of this writing):

Start some audio playing in Music.app on your Mac
Open System Preferences, choose Sound, then select the Input tab
Select the AirPods as your sound input device

If your Mac works like mine does, the quality of audio playback should approximate that of an AM radio.

If you leave the AirPods selected as your sound input device, and quit System Preferences, music playback quality should return to normal.

This behavior makes sense, because the Sound preferences UI has to enable the microphone in order to test the current input level. Once the system preferences UI is closed, the microphone can be shut down safely, and the AirPods can return to its high-quality mode for the rest of the system’s audio playback.

The Bug

There are some subtleties here that I hope don’t get lost, so let’s lay out the following prerequisites:

After completing the steps above, your AirPods should still be set as the default system output and input devices.
You should still hear high-quality audio playing from Music.app through your AirPods.
The application you run must either (a) not be sandboxed, or (b) declare the use of Audio Input.

With all those in place, you create an instance of AVAudioEngine, then try to retrieve the engine’s outputUnit. Boom—audio goes bad just like before.

The following Swift code reproduces this issue on my machine at the command-line:

  #!/usr/bin/env swift
   
  import Foundation
  import AVFoundation

  let ae = AVAudioEngine()
  let _ = ae.outputNode 

  RunLoop.current.run(
    until: Date(
      timeIntervalSinceNow: 10
    )
  )
}

Why does this happen?

This is another area where we can easily get stuck in the weeds with details, so bear with me here.

Remember when I said there are two separate devices that independently handle your input and output, and they operate at two different sample rates? Well, an audio engine/graph—whether it uses the CoreAudio HAL, AUHAL, AUGraph, or whatever—can only really talk to a single device at a time.

In order to work around this, AVAudioEngine “smushes” these two devices together into an aggregate audio device that is capable of performing both input & output at the same time. Then, it automatically selects the aggregate device as the one it will use for its audio processing.

Sadly, and for reasons I don’t fully understand, this process appears to happen automatically when I ask for the AVAudioEngine’s outputNode, or its mainMixerNode.

But I didn’t want that!

AVAudioEngine doesn’t give me an opportunity to state my “intentions” for using the engine explicitly on macOS. There is no API that lets me specify that I am building an output-only engine, and don’t require its input abilities.

The previous workaround was to pull the AUAudioUnit instance out of the outputNode and set its deviceID to override the default device selection behavior. Unfortunately, this workaround no longer works because the aggregate device now appears to get created (activated?) when I call outputNode.

…or maybe that’s exactly what I wanted?

Here’s where things get tricky. Note that my prerequisites can be interpreted as follows:

I specified that the default input should come from my AirPods
I am in an environment that allows microphone access
I created an AVAudioEngine instance that is capable of both input & output

You could make a pretty solid argument that I was asking for this behavior, and I wouldn’t have a good counter-argument. The API doesn’t allow me to opt out of the input capability explicitly.

Works as designed?

This is the crux of the problem I’m dealing with here, and why I don’t want anything to do with this API anymore. It’s simply too magical for my tastes.

The API is designed to mask a lot of complexity, and to abstract away a lot of the messy details of building an audio engine. On iOS, this is fine—you can only really have a single audio device active at any time, and you use the audio session API to “declare your intentions”.

Unfortunately, that simplicity doesn’t match up with the pro audio jungle that you’ll often encounter on macOS. On a Mac, you’re just as likely to be interacting with AirPods or a cheap-o USB podcasting microphone as you are to be interacting with this 16-channel pro studio interface, an AVB interface, or this 188-channel behemoth.

Don’t get me wrong—the AVAudioEngine will totally behave itself in the presence of pro audio hardware. You just need to be OK with whatever magic it’s doing under the hood in your specific application. If you require the user to override the device selection, buffer sizes, etc., then you might be in for some trouble and/or painful workarounds.

But if your needs are that advanced, then you very likely aren’t even looking at AVAudioEngine to begin with, and already have your own solution in place. (You’re probably also laughing at my misfortune, and mumbling to yourself that I should have known better.)

So what’s the workaround, already?

If you’re experiencing this kind of trouble with your AirPods (and perhaps with other Bluetooth headsets) in Capo or other audio apps, then make absolutely sure that your Mac’s default input device is not set to your AirPods. Choose the built-in microphone or line-in if you have the option to do so.

If you’re using a Mac that doesn’t have a built-in input of its own (like the Mac mini, or the 2013 Mac Pro) that you can select instead, then you might be stuck. Using a software audio interface like those defined by Loopback or Soundflower might give you a way out here, but YMMV.

A possible "patch" for Capo

I mentioned that microphone access was a requirement for this issue to pop up. It turns out that a sandboxed app that doesn’t allow microphone access doesn’t suffer from this issue.

If your sandboxed app can’t access the microphone, AVAudioEngine fails to create the aggregate device, and appears to operate in an output-only mode.

Capo is a sandboxed app, and while it doesn’t need microphone access, it does currently request an exception for audio input. But, why?

In the early days of sandboxing, and long before AVAudioEngine ever came into existence, the AUHAL audio unit that I use for Capo’s main audio output would fail to run without the microphone access exception. Even when I configured the AUHAL to disable the input bus, it just wouldn’t work without this exception.

So there’s a possibility that this limitation in the AUHAL has long since been fixed, and I can shut off the audio input exception to patch the issue in the meantime—a code-free bug fix!

We’re in the middle of QA for a (minor) 4.1 update as I write this, so I will try and slip this fix into the next build for further testing. Keep an eye out for that, but in the meantime you’ll have to settle for the workaround above.