I retweeted this already, but if you care about FCP, read Philip Hodgetts’ A new 64 bit Final Cut Pro? for some excellent analysis about what this could possibly be, given the respective capabilities and release timing of FCP, QTKit, AV Foundation, and Lion:

My biggest doubt was the timing. I believed a rewritten 64 bit Final Cut Pro would require a rewritten 64 bit QuickTime before it can be developed and clearly that wasn’t a valid assumption. Speculating wildly – to pull off a fully rewritten, 64 bit pure Cocoa Final Cut Pro – would require building on AVFoundation (the basis of iMovie for iPhone), which is coming to OS X in 10.7 Lion.

A quick update on upcoming conference talks…

  • Next Saturday, February 19, is MobiDevDay Detroit at the Compuware building downtown. I’ll be doing a talk on iOS Multimedia, a high-level overview of the various media frameworks (AV Foundation, Core Audio, Open AL, Media Library, etc.), with an emphasis on the practical questions of “which one do I pick for my application”. They’ve asked me to do the talk twice, once in the morning and once in the afternoon, so feel free to check out the other iOS talks from Dave Koziol, Chris Judd, Henry Balanon, et. al.

  • In April, I’ll be in Seattle for another round of the Voices That Matter: iPhone Developer Conference. This time, I’m doing Advanced Media Manipulation with AV Foundation, which will cover advanced AV Foundation topics like capture-time media processing, editing with effects (and exporting them), sample-level access, etc. It’s kind of a follow-up to the AV Foundation intro I did at VTM in Philly in the Fall, except that we can’t assume that attendees were there for Philly, so I’ll probably start with an abbreviated AV Foundation intro before getting into the rough stuff. I’ve also told the conference organizers that I could do the intro talk if a spot opens up that they need to fill, though I don’t actually suspect that’ll happen.

  • I’m going to update the blog’s right colum with a badge for the conference as soon as I post this entry, but in the meantime, here’s a registration code for you: SEASPK2. That’s good for $100. If combined with Early Bird pricing (ends Feb. 25), you’re in the door for $395. Which is, like, what, a quarter of what you’d pay for WWDC? Plus, hey, smaller crowds, indie speakers, Seattle (Shorty’s is two blocks from the conference hotel)… it’s just packed with win.

Annnnnd… now I need to get cracking on my slides for MobiDevDay in a week…

In a July blog entry, I showed a gruesome technique for getting raw PCM samples of audio from your iPod library, by means of an easily-overlooked metadata attribute in the Media Library framework, along with the export functionality of AV Foundation. The AV Foundation stuff was the gruesome part — with no direct means for sample-level access to the song “asset”, it required an intermedia export to .m4a, which was a lossy re-encode if the source was of a different format (like MP3), and then a subsequent conversion to PCM with Core Audio.

Please feel free to forget all about that approach… except for the Core Media timescale stuff, which you’ll surely see again before too long.

iOS 4.1 added a number of new classes to AV Foundation (indeed, these were among the most significant 4.1 API diffs) to provide an API for sample-level access to media. The essential classes are AVAssetReader and AVAssetWriter. Using these, we can dramatically simplify and improve the iPod converter.

I have an example project, VTM_AViPodReader.zip (70 KB) that was originally meant to be part of my session at the Voices That Matter iPhone conference in Philadelphia, but didn’t come together in time. I’m going to skip the UI stuff in this blog, and leave you to a screenshot and a simple description: tap “choose song”, pick something from your iPod library, tap “done”, and tap “Convert”.

Screenshot of VTM_AViPodReader

To do the conversion, we’ll use an AVAssetReader to read from the original song file, and an AVAssetWriter to perform the conversion and write to a new file in our application’s Documents directory.

Start, as in the previous example, by using the valueForProperty:MPMediaItemPropertyAssetURL attribute to get an NSURL representing the song in a format compatible with AV Foundation.



-(IBAction) convertTapped: (id) sender {
	// set up an AVAssetReader to read from the iPod Library
	NSURL *assetURL = [song valueForProperty:MPMediaItemPropertyAssetURL];
	AVURLAsset *songAsset =
		[AVURLAsset URLAssetWithURL:assetURL options:nil];

	NSError *assetError = nil;
	AVAssetReader *assetReader =
		[[AVAssetReader assetReaderWithAsset:songAsset
			   error:&assetError]
		  retain];
	if (assetError) {
		NSLog (@"error: %@", assetError);
		return;
	}

Sorry about the dangling retains. I’ll explain those in a little bit (and yes, you could use the alloc/init equivalents… I’m making a point here…). Anyways, it’s simple enough to take an AVAsset and make an AVAssetReader from it.

But what do you do with that? Contrary to what you might think, you don’t just read from it directly. Instead, you create another object, an AVAssetReaderOutput, which is able to produce samples from an AVAssetReader.


AVAssetReaderOutput *assetReaderOutput =
	[[AVAssetReaderAudioMixOutput
	  assetReaderAudioMixOutputWithAudioTracks:songAsset.tracks
				audioSettings: nil]
	retain];
if (! [assetReader canAddOutput: assetReaderOutput]) {
	NSLog (@"can't add reader output... die!");
	return;
}
[assetReader addOutput: assetReaderOutput];

AVAssetReaderOutput is abstract. Since we’re only interested in the audio from this asset, a AVAssetReaderAudioMixOutput will suit us fine. For reading samples from an audio/video file, like a QuickTime movie, we’d want AVAssetReaderVideoCompositionOutput instead. An important point here is that we set audioSettings to nil to get a generic PCM output. The alternative is to provide an NSDictionary specifying the format you want to receive; I ended up doing that later in the output step, so the default PCM here will be fine.

That’s all we need to worry about for now for reading from the song file. Now let’s start dealing with writing the converted file. We start by setting up an output file… the only important thing to know here is that AV Foundation won’t overwrite a file for you, so you should delete the exported.caf if it already exists.


NSArray *dirs = NSSearchPathForDirectoriesInDomains
				(NSDocumentDirectory, NSUserDomainMask, YES);
NSString *documentsDirectoryPath = [dirs objectAtIndex:0];
NSString *exportPath = [[documentsDirectoryPath
				 stringByAppendingPathComponent:EXPORT_NAME]
				retain];
if ([[NSFileManager defaultManager] fileExistsAtPath:exportPath]) {
	[[NSFileManager defaultManager] removeItemAtPath:exportPath
		error:nil];
}
NSURL *exportURL = [NSURL fileURLWithPath:exportPath];

Yeah, there’s another spurious retain here. I’ll explain later. For now, let’s take exportURL and create the AVAssetWriter:


AVAssetWriter *assetWriter =
	[[AVAssetWriter assetWriterWithURL:exportURL
		  fileType:AVFileTypeCoreAudioFormat
			 error:&assetError]
	  retain];
if (assetError) {
	NSLog (@"error: %@", assetError);
	return;
}

OK, no sweat there, but the AVAssetWriter isn’t really the important part. Just as the reader is paired with “reader output” objects, so too is the writer connected to “writer input” objects, which is what we’ll be providing samples to, in order to write them to the filesystem.

To create the AVAssetWriterInput, we provide an NSDictionary describing the format and contents we want to create… this is analogous to a step we skipped earlier to specify the format we receive from the AVAssetReaderOutput. The dictionary keys are defined in AVAudioSettings.h and AVVideoSettings.h. You may find you need to look in these header files to look for the value types to provide for these keys, and in some cases, they’ll point you to the Core Audio header files. Trial and error led me to ultimately specify all of the fields that would be encountered in a AudioStreamBasicDescription, along with an AudioChannelLayout structure, which needs to be wrapped in an NSData in order to be added to an NSDictionary



AudioChannelLayout channelLayout;
memset(&channelLayout, 0, sizeof(AudioChannelLayout));
channelLayout.mChannelLayoutTag = kAudioChannelLayoutTag_Stereo;
NSDictionary *outputSettings =
[NSDictionary dictionaryWithObjectsAndKeys:
	[NSNumber numberWithInt:kAudioFormatLinearPCM], AVFormatIDKey,
	[NSNumber numberWithFloat:44100.0], AVSampleRateKey,
	[NSNumber numberWithInt:2], AVNumberOfChannelsKey,
	[NSData dataWithBytes:&channelLayout length:sizeof(AudioChannelLayout)],
		AVChannelLayoutKey,
	[NSNumber numberWithInt:16], AVLinearPCMBitDepthKey,
	[NSNumber numberWithBool:NO], AVLinearPCMIsNonInterleaved,
	[NSNumber numberWithBool:NO],AVLinearPCMIsFloatKey,
	[NSNumber numberWithBool:NO], AVLinearPCMIsBigEndianKey,
	nil];

With this dictionary describing 44.1 KHz, stereo, 16-bit, non-interleaved, little-endian integer PCM, we can create an AVAssetWriterInput to encode and write samples in this format.


AVAssetWriterInput *assetWriterInput =
	[[AVAssetWriterInput assetWriterInputWithMediaType:AVMediaTypeAudio
				outputSettings:outputSettings]
	retain];
if ([assetWriter canAddInput:assetWriterInput]) {
	[assetWriter addInput:assetWriterInput];
} else {
	NSLog (@"can't add asset writer input... die!");
	return;
}
assetWriterInput.expectsMediaDataInRealTime = NO;

Notice that we’ve set the property assetWriterInput.expectsMediaDataInRealTime to NO. This will allow our transcode to run as fast as possible; of course, you’d set this to YES if you were capturing or generating samples in real-time.

Now that our reader and writer are ready, we signal that we’re ready to start moving samples around:


[assetWriter startWriting];
[assetReader startReading];
AVAssetTrack *soundTrack = [songAsset.tracks objectAtIndex:0];
CMTime startTime = CMTimeMake (0, soundTrack.naturalTimeScale);
[assetWriter startSessionAtSourceTime: startTime];

These calls will allow us to start reading from the reader and writing to the writer… but just how do we do that? The key is the AVAssetReaderOutput method copyNextSampleBuffer. This call produces a Core Media CMSampleBufferRef, which is what we need to provide to the AVAssetWriterInput‘s appendSampleBuffer method.

But this is where it starts getting tricky. We can’t just drop into a while loop and start copying buffers over. We have to be explicitly signaled that the writer is able to accept input. We do this by providing a block to the asset writer’s requestMediaDataWhenReadyOnQueue:usingBlock. Once we do this, our code will continue on, while the block will be called asynchronously by Grand Central Dispatch periodically. This explains the earlier retains… autoreleased variables created here in convertTapped: will soon be released, while we need them to still be around when the block is executed. So we need to take care that stuff we need is available inside the block: objects need to not be released, and local primitives need the __block modifier to get into the block.


__block UInt64 convertedByteCount = 0;
dispatch_queue_t mediaInputQueue =
	dispatch_queue_create("mediaInputQueue", NULL);
[assetWriterInput requestMediaDataWhenReadyOnQueue:mediaInputQueue
										usingBlock: ^
 {

The block will be called repeatedly by GCD, but we still need to make sure that the writer input is able to accept new samples.


while (assetWriterInput.readyForMoreMediaData) {
	CMSampleBufferRef nextBuffer =
		[assetReaderOutput copyNextSampleBuffer];
	if (nextBuffer) {
		// append buffer
		[assetWriterInput appendSampleBuffer: nextBuffer];
		// update ui
		convertedByteCount +=
			CMSampleBufferGetTotalSampleSize (nextBuffer);
		NSNumber *convertedByteCountNumber =
			[NSNumber numberWithLong:convertedByteCount];
		[self performSelectorOnMainThread:@selector(updateSizeLabel:)
			withObject:convertedByteCountNumber
		waitUntilDone:NO];

What’s happening here is that while the writer input can accept more samples, we try to get a sample from the reader output. If we get one, appending it to the writer output is a one-line call. Updating the UI is another matter: since GCD has us running on an arbitrary thread, we have to use performSelectorOnMainThread for any updates to the UI, such as updating a label with the current total byte-count. We would also have to do call out to the main thread to update the progress bar, currently unimplemented because I don’t have a good way to do it yet.

If the writer is ever unable to accept new samples, we fall out of the while and the block, though GCD will continue to re-run the block until we explicitly stop the writer.

How do we know when to do that? When we don’t get a sample from copyNextSampleBuffer, which means we’ve read all the data from the reader.


} else {
	// done!
	[assetWriterInput markAsFinished];
	[assetWriter finishWriting];
	[assetReader cancelReading];
	NSDictionary *outputFileAttributes =
		[[NSFileManager defaultManager]
			  attributesOfItemAtPath:exportPath
			  error:nil];
	NSLog (@"done. file size is %ld",
		    [outputFileAttributes fileSize]);
	NSNumber *doneFileSize = [NSNumber numberWithLong:
			[outputFileAttributes fileSize]];
	[self performSelectorOnMainThread:@selector(updateCompletedSizeLabel:)
			withObject:doneFileSize
			waitUntilDone:NO];
	// release a lot of stuff
	[assetReader release];
	[assetReaderOutput release];
	[assetWriter release];
	[assetWriterInput release];
	[exportPath release];
	break;
}

Reaching the finish state requires us to tell the writer to finish up the file by sending finish messages to both the writer input and the writer itself. After we update the UI (again, with the song-and-dance required to do so on the main thread), we release all the objects we had to retain in order that they would be available to the block.

Finally, for those of you copy-and-pasting at home, I think I owe you some close braces:


		}
	 }];
	NSLog (@"bottom of convertTapped:");
}

Once you’ve run this code on the device (it won’t work in the Simulator, which doesn’t have an iPod Library) and performed a conversion, you’ll have converted PCM in an exported.caf file in your app’s Documents directory. In theory, your app could do something interesting with this file, like representing it as a waveform, or running it through a Core Audio AUGraph to apply some interesting effects. Just to prove that we actually have performed the desired conversion, use the Xcode Organizer to open up the “iPod Reader” application and drag its “Application Data” to your Mac:

Accessing app's documents with Xcode Organizer

The exported folder will have a Documents, in which you should find exported.caf. Drag it over to QuickTime Player or any other application that can show you the format of the file you’ve produced:

QuickTime Player inspector showing PCM format of exported.caf file

Hopefully this is going to work for you. It worked for most Amazon and iTunes albums I threw at it, but found I had an iTunes Plus album, Ashtray Rock by the Joel Plaskett Emergency, whose songs throw an inexplicable error when opened, so I can’t presume to fully understand this API just yet:


2010-12-12 15:28:18.939 VTM_AViPodReader[7666:307] *** Terminating app
 due to uncaught exception 'NSInvalidArgumentException', reason:
 '*** -[AVAssetReader initWithAsset:error:] invalid parameter not
 satisfying: asset != ((void *)0)'

Still, the arrival of AVAssetReader and AVAssetWriter open up a lot of new possibilities for audio and video apps on iOS. With the reader, you can inspect media samples, either in their original format or with a conversion to a form that suits your code. With the writer, you can supply samples that you receive by transcoding (as I’ve done here), by capture, or even samples you generate programmatically (such as a screen recorder class that just grabs the screen as often as possible and writes it to a movie file).

I wish I hadn’t been so crunched in the week leading up to the Voices That Matter: iPhone Developer Conference last weekend in Philadelphia, and had gotten a few of the super-advanced AV Foundation features working for demos, but since I went over my time by 10 minutes, I guess the talk was already chock ful o’ content.

Anyways, I promised materials would be on my blog after a code clean-up, and here they are:

Title: Mastering Media with AV Foundation

  • Presentation slides (PDF)
  • VTM_Player.zip – Illustrates basic playback functionality, with local and remote files and streams (included URLs include .m4a, .mov, Shoutcast, and HTTP LIve Streaming)
  • VTM_AVRecPlay.zip – Performs A/V capture from camera/mic and playback of captured movie
  • VTM_AVEditor.zip – Simple cuts-only editor for in/out editing, addition of audio track at export time.

Just a reminder, for those of you who don’t scroll all the way down the right column, that the Voices That Matter iPhone Developer’s Conference is coming up in a little over a month, October 16 & 17, in Philadelphia.

Why this matters right now:

  • Early Bird pricing ends tomorrow (September 10). Combine it with the speakers’ discount code PHRSPKR and you’re in for $395. Given the quality of the speakers, that’s a heck of a deal.
  • iOS 4.1 just came out yesterday, meaning we can now talk about new-in-4.1 APIs publicly. Aside from Game Center, one of the biggest changes in the SDK is the addition of AVAssetReader and AVAssetWriter to AV Foundation. These classes permit sample-level access to movies assets, enabling some new kinds of applications that weren’t possible before (can you say “ScreenFlow for iOS”?), as well as simplifying things like my music library PCM converter. I’m doing the talk on AV Foundation, and you can count on these new classes being covered.

So there you have it. See you in Philly. I’m going to try to make sure my travel plans get me there in time to do dinner at Ted’s on Friday night. Join me for bison burgers and Coke Zero… nom.

Philip Hodgetts e-mailed me yesterday, having found my recent CocoaHeads Ann Arbor talk on AV Foundation, and searching from there to find my blog. The first thing this brings up is that I’ve been slack about linking my various online identities and outlets… it should be easier for anyone who happens across my stuff to be able to get to it more easily. As a first step, behold the “More of This Stuff” box at the right, which links to my slideshare.net presentations and my Twitter feed. The former is updated less frequently than the latter, but also contains fewer obscenities and references to anime.

Philip co-hosts a podcast about digital media production, and their latest episode is chock-ful of important stuff about QuickTime and QTKit that more people should know (frame rate doesn’t have to be constant!), along with wondering aloud about where the hell Final Cut stands given the QuickTime/QTKit schism on the Mac and the degree to which it is built atop the 32-bit legacy QuickTime API. FWIW, between reported layoffs on the Final Cut team and their key programmers working on iMovie for iPhone, I do not have a particularly good feeling about the future of FCP/FCE.

Philip, being a Mac guy and not an iOS guy, blogged that he was surprised my presentation wasn’t an NDA violation. Actually, AV Foundation has been around since 2.2, but only became a document-based audio/video editing framework in iOS 4. The only thing that’s NDA is what’s in iOS 4.1 (good stuff, BTW… hope we see it Wednesday, even though I might have to race out some code and a blog entry to revise this beastly entry).

He’s right in the podcast, though, that iPhone OS / iOS has sometimes kept some of its video functionality away from third-party developers. For example, Safari could embed a video, but through iPhone OS 3.1, the only video playback option was the MPMoviePlayerController, which takes over the entire screen when you play the movie. 3.2 provided the ability to get a separate view… but recall that 3.2 was iPad-only, and the iPad form factor clearly demands the ability to embed video in a view. In iOS 4, it may make more sense to ditch MPMoviePlayerController and leave MediaPlayer.framework for iPod library access, and instead do playback by getting an AVURLAsset and feeding it to an AVPlayer.

One slide Philip calls attention to in his blog is where I compare the class and method counts of AV Foundation, android.media, QTKit, and QuickTime for Java. A few notes on how I spoke to this slide when I gave my presentation:

  • First, notice that AV Foundation is already larger than QTKit. But also notice that while it has twice as many classes, it only has about 30% more methods. This is because AV Foundation had the option of starting fresh, rather than wrapping the old QuickTime API, and thus could opt for a more hierarchical class structure. AVAssets represent anything playable, while AVCompositions are movies that are being created and edited in-process. Many of the subclasses also split out separate classes for their mutable versions. By comparison, QTKit’s QTMovie class has over 100 methods; it just has to be all things to all people.

  • Not only is android.media smaller than AV Foundation, it also represents the alpha and omega of media on that platform, so while it’s mostly provided as a media player and capture API, it also includes everything else media-related on the platform, like ringtone synthesis and face recognition. While iOS doesn’t do these, keep in mind that on iOS, there are totally different frameworks for media library access (MediaPlayer.framework), low-level audio (Core Audio), photo library access (AssetsLibrary.framework), in-memory audio clips (System Sounds), etc. By this analysis, media support on iOS is many times more comprehensive than what’s currently available in Android.

  • Don’t read too much into my inclusion of QuickTime for Java. It was deprecated at WWDC 2008, after all. I put it in this chart because its use of classes and methods offered an apples-to-apples comparison with the other frameworks. Really, it’s there as a proxy for the old C-based QuickTime API. If you counted the number of functions in QuickTime, I’m sure you’d easily top 10,000. After all, QTJ represented Apple’s last attempt to wrap all of QuickTime with an OO layer. In QTKit, there’s no such ambition to be comprehensive. Instead, QTKit feels like a calculated attempt to include the stuff that the most developers will need. This allows Apple to quietly abandon unneeded legacies like Wired Sprites and QuickTime VR. But quite a few babies are being thrown out with the bathwater — neither QTKit nor AV Foundation currently has equivalents for the “get next interesting time” functions (which could find edit points or individual samples), or the ability to read/write individual samples with GetMediaSample() / AddMediaSample().

One other point of interest is one of the last slides, which quotes a macro seen throughout AVFoundation and Core Media in iOS 4:


__OSX_AVAILABLE_STARTING(__MAC_10_7,__IPHONE_4_0);

Does this mean that AV Foundation will appear on Mac OS X 10.7 (or hell, does it mean that 10.7 work is underway)? IMHO, not enough to speculate, other than to say that someone was careful to leave the door open.

Update: Speaking of speaking on AV Foundation, I should mention again that I’m going to be doing a much more intense and detailed Introduction to AV Foundation at the Voices That Matter: iPhone Developer Conference in Philadelphia, October 16-17. $100 off with discount code PHRSPKR.

« Previous PageNext Page »