Mixing Multiple Audio Tracks With AVFoundation

I’ve been working on a prototype that played multiple audio tracks and provided an interface for mixing the volumes of those channels relative to one another.

The intention is to select media from the iPod library via MPMediaPickerController from the MediaPlayer framework. Using the MPMusicPlayerController class has a lot of handy functionality and even lets you leverage the iPod music player itself, but it doesn’t meet the functional requirements - you can only have playback for a single audio track, period.

Fortunately, I was hoping for an opportunity to really delve into AVFoundation.

First Approach: Using AVComposition

Since I wanted to alter and play any number of media tracks, I started by looking at the AVComposition classes. Its essentially a collection of tracks (AVCompositionTrack) that each contain audio assets (AVAsset). Assets can be inserted and removed, and playback properties of tracks such as volume can be modified.

We can extract AVAsset representations out of the MPMediaItem objects returned by the MPMediaPickerController.

NSURL *url = [mediaItem valueForProperty:MPMediaItemPropertyAssetURL];
NSDictionary *options = [NSDictionary dictionaryWithObject:[NSNumber numberWithBool:YES]
                                                    forKey:AVURLAssetPreferPreciseDurationAndTimingKey];
AVURLAsset *urlAsset = [AVURLAsset URLAssetWithURL:url options:options];

Once we have the asset, we create a composition and the tracks we need:

AVMutableComposition *composition = [AVMutableComposition composition];

AVMutableCompositionTrack *audioTrack = [composition addMutableTrackWithMediaType:AVMediaTypeAudio preferredTrackID:kCMPersistentTrackID_Invalid];
[audioTrack insertTimeRange:CMTimeRangeMake(kCMTimeZero, asset.duration) ofTrack:[[asset tracksWithMediaType:AVMediaTypeAudio] objectAtIndex:0] atTime:kCMTimeZero error:nil];

Now all thats needed was to queue up the composition in an AVPlayer:

AVPlayerItem *playerItem = [AVPlayerItem playerItemWithAsset:composition];
self.player = [[AVPlayer alloc] initWithPlayerItem:playerItem];
[player play];

During playback (or even before), we can apply an AVAudioMix the composition track to update the volume:

// Set the parameters of the mix being applied
AVMutableAudioMixInputParameters *mixParameters = [AVMutableAudioMixInputParameters audioMixInputParametersWithTrack:audioTrack];
[mixParameters setVolume:volume atTime:kCMTimeZero];
[mixParameters setTrackID:[track trackID]];

// Add the parameters to the audio mix
AVMutableAudioMix *audioMix = [AVMutableAudioMix audioMix];
[audioMix setInputParameters:@[mixParameters];

// Apply the new audio mix to the player's current item
[[player currentItem] setAudioMix:audioMix];

Results

Using AVComposition I was able to get multiple tracks playing simultaneously and control the playback properties in real-time, but there were drawbacks. There was no inherent mechanism for determining when one asset ended and another began; if I wanted to detect the end of a song, or skip forwards or backwards through the different assets in a given track, I would have to code that manually and rely on the asset duration.

There was also a performance bottleneck in that I appeared to be buffering all of the assets into memory as I was creating the track. That might not scale well, particularly on older devices.

It felt distinctly like this was a low-level and granular approach, so I started looking at an alternative…

Plan B : Using Multiple AVPlayers

AVPlayer is designed to handle a single AVPlayerItem that contains an asset. There’s a subclass called AVQueuePlayer that handles a collection of items, but one interesting side effect is that when its finished playing an item it is removed from the queue. So with the queue-based player, I can only advance forward through my asset list.

Rather than keep a collection of multiple AVPlayer objects and swap assets around, I went with something that seemed to preserve the MVC pattern a little more simply: maintain an array of assets for each “channel” I want playing at the same time, and initialize new AVPlayer on-demand as the end of an asset is detected or if the user skips forwards or backwards.

Success

With this setup, I was able to spawn as many players as I wanted, and I created a UI that provided a set of playback controls for each mixer channel of assets that enabled skipping, scrubbing, tracked time, and included a little “playlist” functionality.

The final prototype was the basis for a new app from Tactica Interactive for combining your favorite music with your daily podcasts. Go check it out in the App Store.

Footnote - An Elusive Issue

When I was testing on an iPhone 5S, the application would indicate the asset was loaded and the playback controls would operate. The time-observer would even indicate the track was playing, but zero sound was being made.

As it turns out, AVAudioSession has a setCategory value called AVAudioSessionCategoryAmbient. On devices prior to the 5s, you can still hear audio… on 5s devices, no sound will play, no error is generated. When I switched to AVAudioSessionCategoryPlayback I had sound output. Weirdest issue to track down ever.