Audio Units, AudioKit, AVAudioEngine, Core Audio, Objective-C, Swift, TAAE

When I started iOS audio development in 2008, there was only one option for realtime, frame-accurate audio development – working directly with Apple’s RemoteIO audio unit. It was ugly. It involved a lot of verbose C code, the documentation was sparse, the errors were nebulous, and tons of low level tasks had to be directly managed.

A few years later, The Amazing Audio Engine wrapped all of this nastiness up into an easy-to-use Objective-C interface, and I have never looked back at the old ways since. Now that it’s 2016, TAAE1 is soon to be deprecated, TAAE2 is soon to leave beta, AudioKit has emerged as a popular Swift framework, and Apple finally updated their audio unit API to “v3” for “modern” audio development.

It’s time to put these frameworks to the test by solving a simple synchronization problem on each one.

Problem Statement

The challenge is to play an audio file by streaming it from disk and then, the instant that that file finishes playing, play another file starting on the exact next frame/sample. The two output signals must match up exactly line-to-line.

To test this, I have created an audio file that is a sine wave test tone that starts and ends at the maximum point of its cycle. If there is mismatch in timing, it will create a discontinuity that is both easy to hear as a click/pop and easy to find in a waveform plot.

 

player-alignment-diagram-001

An important requirement is that the audio data streams from disk and is not fully loaded into memory before playing. Playing from active memory makes many tasks easier, but it’s not practical for files that are longer than just a few seconds. The large memory footprint of the uncompressed audio will lead iOS to kill the app unpredictably.

I find this example instructive because it gets at a number of core questions:

  • How easy is it to integrate the framework and get it running?
  • Does the framework have a means of streaming an audio file?
  • Does the framework include synchronization functionality that is frame-perfect?

If a framework can satisfy these requirements, then it is likely to be well-suited for various common realtime and multitrack applications.

NOTE: The code in this post is written for brevity, and it’s not generally production-ready. Avoid force unwraps and do proper error catching when building your apps!

AVAudioEngine with AVAudioPlayerNode

AVAudioEngine is Apple’s relatively recent attempt at cleaning up the C-level clutter of the Audio Unit framework in order to make audio development on iOS more “modern”. It still doesn’t feel totally natural, but it is much more straightforward than the previous iteration.

// Create the engine
let engine = AVAudioEngine()

// Open the audio file
let fileURL = Bundle.main.url(forResource: "TestTone", withExtension: "wav")
let file = try AVAudioFile(forReading: fileURL)

// Create player1 and attach it to the engine, then "schedule" the audio file
let player1 = AVAudioPlayerNode()
engine.attach(player1)
player1.scheduleFile(file, at: nil, completionHandler: nil)

// Repeat for player2
let player2 = AVAudioPlayerNode()
engine.attach(player2)
player2.scheduleFile(file, at: nil, completionHandler: nil)

// Connect the players to the engine's main mixer
let mixer = engine.mainMixerNode
engine.connect(player1, to: mixer, format: file.processingFormat)
engine.connect(player2, to: mixer, format: file.processingFormat)

// Start your engine!
try engine.start()

// Find the conversion factor from host ticks to seconds
let currentTimeTicks = mach_absolute_time()
var timebaseInfo = mach_timebase_info()
mach_timebase_info(&timebaseInfo)
let hostTimeToSecFactor = Double(timebaseInfo.numer) / Double(timebaseInfo.denom) / 1000000000.0

// Play player1 1 second from now, play player2 1.69299 seconds from now
player1.play(at: AVAudioTime(hostTime: currentTimeTicks + UInt64(1.0 / hostTimeToSecFactor)))
player2.play(at: AVAudioTime(hostTime: currentTimeTicks + UInt64((1.69299) / hostTimeToSecFactor)))

Phew! Lots of different parts to keep track of a few really verbose, low level lines of code. Well the good news is that it passes the synchronization test!

AudioKit

AudioKit has grown quickly in popularity, and it is the only framework we’ve found whose higher layers are written natively in Swift. Judging by the provided examples and developer discussions, it appears to be more synthesis-oriented than playback-oriented, and it includes an impressive array of effects units and other tools.

AudioKit’s audio player class, AKAudioPlayer, does not appear to have any synchronization facilities such as a playAtTime method. It is worth noting, however, that it uses AVAudioPlayerNode internally, so I imagine that the developers of AudioKit could pretty quickly accommodate such a feature.

For this experiment, we attempted to find the most (seemingly) sanctioned approach, which was gleaned from this discussion thread. We will set up an AKSequencer that triggers two AKMIDISamplers that will play the audio file.

// Create all of the components of our audio system
let midi = AKMIDI()
let mixer = AKMixer()
let sequence = AKSequencer()
let sampler1 = AKMIDISampler()
let sampler2 = AKMIDISampler()

// Connect the mixer to the output and start the engine
AudioKit.output = mixer
AudioKit.start()

// Create a track in the sequencer that will put out a single midi note
let track1 = sequence.newTrack()!
track1.setLength(AKDuration(beats: 120, tempo: 120))
track1.add(noteNumber: 60, velocity: 127, position: AKDuration(seconds: 1.0, sampleRate: 44100.0, tempo: 120.0), duration: AKDuration(beats:16.0), channel:1)

// Repeat for the second track
let track2 = sequence.newTrack()!
track2.setLength(AKDuration(beats: 120, tempo: 120))
track2.add(noteNumber: 60, velocity: 127, position: AKDuration(seconds: 1.69299, sampleRate: 44100.0, tempo: 120.0), duration: AKDuration(beats:16.0), channel:1)

// Hook track1 up to sampler1 so that the midi note will trigger the audio to play
sampler1.enableMIDI(midi.client, name: "midi_name")
track1.setMIDIOutput(sampler1.midiIn)
mixer.connect(sampler1)

// Repeat for track2 and sampler2
sampler2.enableMIDI(midi.client, name: "midi_name")
track2.setMIDIOutput(sampler2.midiIn)
mixer.connect(sampler2)

// Load the samplers with the audio file
sampler1.loadWav("TestTone")
sampler2.loadWav("TestTone")

// Kick off the sequencer
sequence.play()

 

Outside of the inline midi and duration declarations, it’s nice to see the concise code of AudioKit’s interface. The midi routing is confusing, though, and it seems unnecessary to have to think about samplers and sequencers to achieve what we are after.

Ultimately, this method did not pass the test. Over five quick trials, the mismatch was found to vary between player2 being 31 ms too early and being 4 ms too late. It’s perhaps due to MIDI’s lack of timing precision, but it seems unlikely to vary by such a large degree. As of this writing, I don’t know why the timing does not line up.

The Amazing Audio Engine 1 and 2

Let’s start with TAAE1.

// Create the audio engine
let audioController: AEAudioController = AEAudioController(audioDescription: AEAudioController.nonInterleaved16BitStereoAudioDescription())

// Create the players based on the audio file
let filePath = Bundle.main.url(forResource: "TestTone", withExtension: "wav")
try player1 = AEAudioFilePlayer(url: filePath)
try player2 = AEAudioFilePlayer(url: filePath)

// Start the engine
try audioController.start()

// Find the conversion factor from host ticks to seconds
let currentTimeTicks = mach_absolute_time()
var timebaseInfo = mach_timebase_info()
mach_timebase_info(&timebaseInfo)
let hostTimeToSecFactor = Double(timebaseInfo.numer) / Double(timebaseInfo.denom) / 1000000000.0

// Tell the players to play at the exact time
player1.play(atTime: currentTimeTicks + UInt64(1.0 / hostTimeToSecFactor))
player2.play(atTime: currentTimeTicks + UInt64(1.69299 / hostTimeToSecFactor))

// Add the channels to the engine
audioController.addChannels([player1,player2])

The nice thing about TAAE1 is that it’s actually the least lines of code of all frameworks that we tried. The not-so-nice thing is that it fails the test – there are some weird glitchy artifacts happening at the start of playing boundary. Perhaps this will be fixed soon.

And now TAAE2.

// Create a renderer and it to render out to the output
let renderer = AERenderer()
let output = AEAudioUnitOutput(renderer: renderer)!

// Create the players based on the audio file
let fileURL = Bundle.main.url(forResource: "TestTone", withExtension: "wav")!
let player1 = try AEAudioFilePlayerModule(renderer: renderer, url: fileURL)
let player2 = try AEAudioFilePlayerModule(renderer: renderer, url: fileURL)

// Find the conversion factor from host ticks to seconds
let currentTimeTicks = mach_absolute_time()
var timebaseInfo = mach_timebase_info()
mach_timebase_info(&timebaseInfo)
let hostTimeToSecFactor = Double(timebaseInfo.numer) / Double(timebaseInfo.denom) / 1000000000.0

// Tell the players to play at the exact time
player1.play(atTime: AETimeStampWithHostTicks(currentTimeTicks + AEHostTicks(1.0 / hostTimeToSecFactor)))
player2.play(atTime: AETimeStampWithHostTicks(currentTimeTicks + AEHostTicks(1.69299 / hostTimeToSecFactor)))

// Create a mixer based on the renderer and hook in the players
let mixer = AEMixerModule(renderer: renderer)!
mixer.modules = [player1,player2]

// Define the rendering process in the renderer, which renders the mixer and puts the result to the output
renderer.block = {(context:UnsafePointer) in
    AEModuleProcess(mixer, context)
    AERenderContextOutput(context, 1)
}

// Configure the AVAudioSession (for some reason this was finicky and required a few iterations to find settings that would work)
try AVAudioSession.sharedInstance().setCategory(AVAudioSessionCategoryPlayAndRecord, with: AVAudioSessionCategoryOptions.defaultToSpeaker)
try AVAudioSession.sharedInstance().setActive(true)

// Start the engine
try output.start()

The stack-based rendering of TAAE2 is less approachable and intuitive than TAAE1, but it does seem to add a level of transparency. Explicitly coding up the rendering process in the renderer block seems as though it could help with debugging. In any case, TAAE2 passes the synchronization test!

Final Results

Synchronization Test Results

Overall, each framework seemed reasonably straightforward and that they could be made to work, even if they don’t in my current configuration. TAAE1 was the easiest to follow, while AVAudioEngine or possibly AudioKit were the hardest. In their current state, only TAAE2 or AVAudioEngine pass the test.

Possible Improvements

What would the ideal framework for synchronized audio look like? Looking at the pain points evident in each of these experiments, I drew the conclusions that an audio framework should

  • Be concise – use short, meaningful names and avoid allowing verbose inline declarations of objects
  • Obscure meaningless low level details – I want to avoid working in host ticks or midi bytes unless absolutely necessary
  • Reduce number of required function calls – not only does it lead to more code clutter, but it makes working with the framework cumbersome when it’s not obvious what the required order of the calls should be
  • Auto-configure – just run at 16-bit stereo and don’t require the developer to explicitly state this. Leave other configurations as additional method calls the dev can seek out when he/she needs it.
  • Make the multithreaded aspects of audio work as effortlessly as possible and make the code structure drive the developer towards good practices

With regard to that last point, I want to emphasize that it’s really easy to make timing mistakes like

player1.play(atTime: mach_absolute_time+oneSecondInTicks)
player2.play(atTime: mach_absolute_time+oneSecondInTicks)

In this case, the two players won’t play at the same time because some time has elapsed between the two calls. What might be better is a closure interface like

Engine.doActions(atTime: mach_absolute_time+oneSecondInTicks) {
    player1.play()
    player2.play()
}

where play() will fail and output an error description to the console when not called within the proper doActions block. Closures like this can also discourage developers from modifying things in non-thread-safe ways and prevent mistakes like running locks on the audio thread.

What would be the best is if a framework could allow full control over the signal chain, but be straightforward to write something like

Engine.playFileFromBundle(“TestTone.wav”).thenPlayFileFromBundle(“TestTone.wav”)

Feel free to send feedback to jeremy [at] helloworldeng.com. Thanks!