It’s over between us, AVAudioEngine

I’ve been writing audio code for macOS since 2003, and it’s never been an easy task.

In the early days, I dealt with the CoreAudio HAL (Hardware Abstraction Layer) directly, and that was no fun at all. But the problems were purely related to the inherent complexity of audio and hardware.

Life got a lot simpler when the AUHAL Audio Unit appeared on the scene. It did a good job hiding most of the complexity, and combined with the AUGraph API I could combine Audio Units like pieces of Lego that fit together fairly easily.

Some years later, Apple introduced the AVAudioEngine API that looked very promising. On the surface, it had a nice-looking Objective-C API that hid even more complexity from its users. It eliminated dozens of lines of (awful) boilerplate code that were required just to get started.

In its first year or two, AVAudioEngine was very thin on functionality, but its capabilities grew year after year. When the replacement for the aging AudioUnit API appeared, it seemed that the audio API team(s) at Apple were trying to build a simple-yet-capable replacement for AUGraph. Sounds like a great idea to me!

When I developed Capo’s Audio Freezer feature, I thought it was a good opportunity for me to test the waters with AVAudioEngine in an isolated environment.

It was a great success! In the beginning, at least…

The first major headache

About a year after I shipped the Audio Freezer, I started receiving reports of Capo crashing on launch. Customers were unable to launch the software that was working perfectly for them a few days earlier, and we couldn’t figure out why—we were unable to reproduce the issue on our end.

Looking at the crash logs, I noticed that invoking -[AVAudioEngine mainMixerNode] would fire some kind of internal exception. The API does not return any kind of error, nor is it documented that an exception could get raised. Theoretically, failure at this stage should be impossible!

Unfortunately for me, I had written the high-level engine management code in Swift (as was common/prescribed at the time), so I couldn’t even attempt to handle this exception to patch this behavior.

I decided to quickly re-write this code in Objective-C, and wrapped the problematic area with a try/catch block so I could return my own NSError back to a higher level and report what was going on. Sadly, this approach didn’t work in the field and the crashes kept rolling in.

Finally, we got one report from a user that included a screenshot of their audio configuration. Upon further investigation, I realized that the mere presence of an Aggregate audio device in your system’s audio device list would cause the crash.

Again, this was a failure that occurred because I asked for the audio engine’s mainMixerNode. The audio engine didn’t fail to init, and the failure wasn’t deferred until the startAndReturnError: call. Terrible!

To make matters worse, this bug was introduced in a late update to macOS 10.13. I think it showed up in 10.13.4, but I can’t recall the details. Of course, users on 10.13.x never got a solution to this issue—it didn’t get “fixed” until 10.14, but more on that later.

To stop the crashes for these users on 10.13, I had to walk the user’s audio device list, and detect the presence of an aggregate device. If found, I presented an alert and shut off the audio freezer feature. Awful!

Not quite fixed…

Remember I said that the bug was fixed in 10.14? Unfortunately, it wasn’t very long before we started seeing these crashes once again.

Just as last time, a 10.14.x update re-introduced the bug, but in a slightly different way. This time, the bug only popped up if the aggregate audio device was set as the system’s default audio output.

What. The. Hell?

A new headache

Around the same time that this was going on, we started receiving reports from AirPods users that their audio was blowing out on them and reverting down to a low quality. When I say audio was “blowing out”, I mean that they were being treated to an ear-piercingly loud blast of garbled noise before audio playback was restored—at a lower quality than before.

If I recall correctly, the problem was triggered once the AVAudioEngine was started, so it only affected users that interacted with Capo’s audio scrubber. Still, the noise was awful, and quite painful—an absolutely horrible failure mode for something so close to your ear drums.

I think that this all happened because the AirPods can offer input + output, but at the lower quality. So the full-duplex-capable audio engine favours that configuration rather than the higher-quality device that is only capable of output.

The “fix” for this issue was as follows. Before the engine was started, I needed to dig into the AVAudioEngine’s outputUnit and manually override the AudioDeviceID.

Unfortunately for me, that “fix” was not very robust. It would work if the AirPods were already attached before Capo started the audio engine, but not if they were paired while Capo was running.

After a whole lot of monkey business, I had a working solution for this bug. Sadly, this code has proven to be quite delicate, and has generated some new crashes in the field.

The final (?) straw

Fast-forward to macOS 11, and the AirPods issue is back. In fact, it’s even worse now because there’s no workaround.

We had been testing the AirPods against macOS 11 betas and haven’t noticed these sound degradation issues. When we started receiving reports of problems recently, we thought it was specific to the AirPods Pro (which we didn’t have on hand for testing.) That seemed plausible to me, because they all run different firmware and have new auto-pairing features.

Sure enough, when I acquired and set up my AirPods Pro to play my Mac’s audio, launching Capo caused the system’s audio to revert down to the crappy, 16kHz “bluetooth headset mode” that sounds awful. I thought that my prior workarounds might have been incomplete, but I was doing everything right.

After lots of investigation, ~~I discovered that merely creating the AVAudioEngine causes the sound degradation. Like, simply calling [[AVAudioEngine alloc] init] destroys the quality of the Mac’s output audio.~~ It turns out I was wrong about this, and the new bug is that calling outputNode now triggers the issue. More details at the bottom of the post.

It looks something like this for our users: If you’re listening to the Music app, everything’s running at full quality. Then, you load a project in Capo and your already-playing music now sounds like garbage. In fact, everything you play from now on in any other app sounds like garbage until you quit Capo and re-start audio playback elsewhere.

Hey, Apple—guess who my customers will blame for this?

As a sucker^W consumer of this API, I don’t get any opportunity to prevent this from happening like I was able to do before. I need to construct the engine before I can override its output unit, but by that point the damage is already done.

Adding insult to injury, I re-tested this behavior with my old pair of AirPods and found that the issue is exactly the same. I don’t know if this just showed up in 11.1, or we somehow missed this during testing (we’re human, after all.) Given my past experience with AVAudioEngine, I wouldn’t be surprised if it’s a regression.

Now what?

For each of the above scenarios that caused trouble for the AVAudioEngine API on macOS, not a single one caused trouble in my battle-worn audio engine that is built on top of AUGraph and the DefaultOutput Audio Unit.

Capo’s main audio engine handles AirPods perfectly, it’s fine with aggregate audio devices, and it even gracefully handles audio hardware that disappears during audio playback.

AVAudioEngine seemed to deal with aggregate audio devices before 10.13.5 (or whatever), but then it broke mysteriously. Later, it seemed to completely lose its mind in the presence of AirPods.

Unfortunately, Apple has marked the AUGraph API as deprecated, and urges us to move that code to use AVAudioEngine instead to manage a graph of audio units.

No thanks, Apple. Call me when Logic or Final Cut are based on AVAudioEngine, and we’ll go for a skate together in Hell.

I guess I need to go back to doing things the hard way.

EDIT (20210127): It turns out that my prior debugging was flawed, and I was preventing a call to outputNode or mainMixerNode when I commented out the initialization of the engine.

Still, the previous workaround no longer works because calling outputNode now triggers the issue. I put together a sample project called CrappifyAudio that demonstrates the problem in a minimal way.