Well, this was an unpleasant surprise:

Mac App Store rejects my purchase of Final Cut Pro X due to inadequate video card

I’ve never really given a lot of thought to my video card… I usually just get the default option for the Mac Pro when I buy it. It’s not like I do any gaming on my Mac (that’s what the Wii, PS2, iPhone, and iPad are for)

So here’s what System Profiler tells me I have:


ATI Radeon HD 2600 XT:

  Chipset Model:	ATI Radeon HD 2600
  Type:	GPU
  Bus:	PCIe
  Slot:	Slot-1
  PCIe Lane Width:	x16
  VRAM (Total):	256 MB
  Vendor:	ATI (0x1002)
  Device ID:	0x9588
  Revision ID:	0x0000
  ROM Revision:	113-B1480A-252
  EFI Driver Version:	01.00.252
  Displays:
SMEX2220:
  Resolution:	1920 x 1080 @ 60 Hz
  Pixel Depth:	32-Bit Color (ARGB8888)
  Main Display:	Yes
  Mirror:	Off
  Online:	Yes
  Rotation:	Supported
  Television:	Yes
Display Connector:
  Status:	No Display Connected

Guess I’m in the market for a better video card for an Early 2008 Mac Pro… what should I get, given that all I need it for is FCP?

I retweeted this already, but if you care about FCP, read Philip Hodgetts’ A new 64 bit Final Cut Pro? for some excellent analysis about what this could possibly be, given the respective capabilities and release timing of FCP, QTKit, AV Foundation, and Lion:

My biggest doubt was the timing. I believed a rewritten 64 bit Final Cut Pro would require a rewritten 64 bit QuickTime before it can be developed and clearly that wasn’t a valid assumption. Speculating wildly – to pull off a fully rewritten, 64 bit pure Cocoa Final Cut Pro – would require building on AVFoundation (the basis of iMovie for iPhone), which is coming to OS X in 10.7 Lion.

Philip Hodgetts e-mailed me yesterday, having found my recent CocoaHeads Ann Arbor talk on AV Foundation, and searching from there to find my blog. The first thing this brings up is that I’ve been slack about linking my various online identities and outlets… it should be easier for anyone who happens across my stuff to be able to get to it more easily. As a first step, behold the “More of This Stuff” box at the right, which links to my slideshare.net presentations and my Twitter feed. The former is updated less frequently than the latter, but also contains fewer obscenities and references to anime.

Philip co-hosts a podcast about digital media production, and their latest episode is chock-ful of important stuff about QuickTime and QTKit that more people should know (frame rate doesn’t have to be constant!), along with wondering aloud about where the hell Final Cut stands given the QuickTime/QTKit schism on the Mac and the degree to which it is built atop the 32-bit legacy QuickTime API. FWIW, between reported layoffs on the Final Cut team and their key programmers working on iMovie for iPhone, I do not have a particularly good feeling about the future of FCP/FCE.

Philip, being a Mac guy and not an iOS guy, blogged that he was surprised my presentation wasn’t an NDA violation. Actually, AV Foundation has been around since 2.2, but only became a document-based audio/video editing framework in iOS 4. The only thing that’s NDA is what’s in iOS 4.1 (good stuff, BTW… hope we see it Wednesday, even though I might have to race out some code and a blog entry to revise this beastly entry).

He’s right in the podcast, though, that iPhone OS / iOS has sometimes kept some of its video functionality away from third-party developers. For example, Safari could embed a video, but through iPhone OS 3.1, the only video playback option was the MPMoviePlayerController, which takes over the entire screen when you play the movie. 3.2 provided the ability to get a separate view… but recall that 3.2 was iPad-only, and the iPad form factor clearly demands the ability to embed video in a view. In iOS 4, it may make more sense to ditch MPMoviePlayerController and leave MediaPlayer.framework for iPod library access, and instead do playback by getting an AVURLAsset and feeding it to an AVPlayer.

One slide Philip calls attention to in his blog is where I compare the class and method counts of AV Foundation, android.media, QTKit, and QuickTime for Java. A few notes on how I spoke to this slide when I gave my presentation:

  • First, notice that AV Foundation is already larger than QTKit. But also notice that while it has twice as many classes, it only has about 30% more methods. This is because AV Foundation had the option of starting fresh, rather than wrapping the old QuickTime API, and thus could opt for a more hierarchical class structure. AVAssets represent anything playable, while AVCompositions are movies that are being created and edited in-process. Many of the subclasses also split out separate classes for their mutable versions. By comparison, QTKit’s QTMovie class has over 100 methods; it just has to be all things to all people.

  • Not only is android.media smaller than AV Foundation, it also represents the alpha and omega of media on that platform, so while it’s mostly provided as a media player and capture API, it also includes everything else media-related on the platform, like ringtone synthesis and face recognition. While iOS doesn’t do these, keep in mind that on iOS, there are totally different frameworks for media library access (MediaPlayer.framework), low-level audio (Core Audio), photo library access (AssetsLibrary.framework), in-memory audio clips (System Sounds), etc. By this analysis, media support on iOS is many times more comprehensive than what’s currently available in Android.

  • Don’t read too much into my inclusion of QuickTime for Java. It was deprecated at WWDC 2008, after all. I put it in this chart because its use of classes and methods offered an apples-to-apples comparison with the other frameworks. Really, it’s there as a proxy for the old C-based QuickTime API. If you counted the number of functions in QuickTime, I’m sure you’d easily top 10,000. After all, QTJ represented Apple’s last attempt to wrap all of QuickTime with an OO layer. In QTKit, there’s no such ambition to be comprehensive. Instead, QTKit feels like a calculated attempt to include the stuff that the most developers will need. This allows Apple to quietly abandon unneeded legacies like Wired Sprites and QuickTime VR. But quite a few babies are being thrown out with the bathwater — neither QTKit nor AV Foundation currently has equivalents for the “get next interesting time” functions (which could find edit points or individual samples), or the ability to read/write individual samples with GetMediaSample() / AddMediaSample().

One other point of interest is one of the last slides, which quotes a macro seen throughout AVFoundation and Core Media in iOS 4:


__OSX_AVAILABLE_STARTING(__MAC_10_7,__IPHONE_4_0);

Does this mean that AV Foundation will appear on Mac OS X 10.7 (or hell, does it mean that 10.7 work is underway)? IMHO, not enough to speculate, other than to say that someone was careful to leave the door open.

Update: Speaking of speaking on AV Foundation, I should mention again that I’m going to be doing a much more intense and detailed Introduction to AV Foundation at the Voices That Matter: iPhone Developer Conference in Philadelphia, October 16-17. $100 off with discount code PHRSPKR.

News.com.com.com.com, on evidence of a new Apple video format in iMovie 8.0.5

Dubbed iFrame, the new video format is based on industry standard technologies like H.264 video and AAC audio. As expected with H.264, iFrame produces much smaller file sizes than traditional video formats, while maintaining its high-quality video. Of course, the smaller file size increases import speed and helps with editing video files.

Saying smaller files are easier to edit is like saying cutting down the mightiest tree in the forest is easier with a haddock than with a chainsaw, as the former is lighter to hold.

The real flaw with this is that H.264, while a lovely end-user distribution format, uses heavy temporal compression, potentially employing both P-frames (“predicted” frames, meaning they require data from multiple earlier frames), and B-frames (“bidirectionally predicted” frames, meaning they require data from both earlier and subsequent frames). Scrubbing frame-by-frame through H.264 is therefore slowed by sometimes having to read in and decompress multiple frames of data in order to render the next one. And in my Final Cut experience, scrubbing backwards through H.264 is particularly slow; shuttle a few frames backwards and you literally have to let go of the wheel for a few seconds to let the computer catch up. For editing, you see a huge difference when you use a format with only I-frames (“intra” frames, meaning every frame has all the data it needs), such as M-JPEG or Pixlet.

You can use H.264 in an all-I-frame mode (which makes it more or less M-JPEG), but then you’re not getting small file-sizes meant for end-user distribution. I’ll bet that iFrame employs H.264 P- and B-frames, being aimed at the non-pro user whose editing consists of just a handful of cuts, and won’t mind the disk grinding as they identify the frame to cut on.

But for more sophisticated editing, having your source in H.264 would be painful.

This also speaks to a larger point of Apple seemingly turning its back on advanced media creatives in favor of everyday users with simpler needs. I’ve been surprised at CocoaHeads meetings to hear that I’m not the only one who bemoans the massive loss of functionality from the old 32-bit C-based QuickTime API to the easier-to-use but severely limited QTKit. That said, everyone else expects that we’ll see non-trivial editing APIs in QTKit eventually. I hope they’re right, but everything I see from Apple, including iFrame’s apparent use of H.264 as a capture-time and therefore edit-time format, makes me think otherwise.

First iPhone 3GS nit: refuses to charge when connected to the USB 2.0 port of the Bella USA Final Cut Keyboard:
Screenshot 2009.06.19 13.23.07

Can’t wait to see if it balks at connecting to the “Built for iPod / Works with iPhone” car radio I bought four months ago.

Back when we did the iPhone discussion on Late Night Cocoa, I made a point of distinguishing the iPhone’s media frameworks, specifically Core Audio and friends (Audio Queue Services, Audio Session, etc.), from “document-based” media frameworks like QuickTime.

This reflects some thinking I’ve been doing over the last few months, and I don’t think I’m done, but it does reflect a significant change in how I see things and invalidates some of what I’ve written in the past.

Let me explain the breakdown. In the past, I saw a dichotomy between simple media playback frameworks, and those that could do more: mix, record, edit, etc. While there are lots of media frameworks that could enlighten me (I’m admittedly pretty ignorant of both Flash and the Windows’ media frameworks), I’m now organizing things into three general classes of media framework:

  • Playback-only – this is what a lot of people expect when they first envision a media framework: they’ve got some kind of audio or audio/video source and they just care about rendering to screen and speakers. As generally implemented, the source is generally opaque, so you don’t have to care about the contents of the “thing” you’re playing (AVI vs. MOV? MP3 vs. AAC? DKDC!), but you also can’t generally do anything with the source other than play it. Your control may be limited to play (perhaps at a variable rate), stop, jump to a time, etc.

  • Stream-based – In this kind of API, you see the media as a stream of data, meaning that you act on the media as it’s being processed or played. You generally get the ability to mix multiple streams, and add your own custom processing, with the caveat that you’re usually acting in realtime, so anything you do has to finish quickly for fear you’ll drop frames. It makes a lot of sense to think of audio this way, and this model fits two APIs I’ve done significant work with: Java Sound and Core Audio. Conceptually, video can be handled the same way: you can have a stream of A/V data that can be composited, effected, etc. Java Media Framework wanted to be this kind of API, but it didn’t really stick. I suspect there are other examples of this that work; the Slashdot story NVIDIA Releases New Video API For Linux describes a stream-based video API in much the same terms: ‘The Video Decode and Presentation API for Unix (VDPAU) provides a complete solution for decoding, post-processing, compositing, and displaying compressed or uncompressed video streams. These video streams may be combined (composited) with bitmap content, to implement OSDs and other application user interfaces.’.

  • Document-based – No surprise, in this case I’m thinking of QuickTime, though I strongly suspect that a Flash presentation uses the same model. In this model, you use a static representation of media streams and their relationships to one another: rather than mixing live at playback time, you put information about the mix into the media document (this audio stream is this loud and panned this far to the left, that video stream is transformed with this matrix and here’s its layer number in the Z-axis), and then a playback engine applies that mix at playback time. The fact that so few people have worked with such a thing recalls my example of people who try to do video overlays by trying to hack QuickTime’s render pipeline rather than just authoring a multi-layer movie like an end-user would.

I used to insist that Java needed a media API that supported the concept of “media in a stopped state”… clearly that spoke to my bias towards document-based frameworks, specifically QuickTime. Having reached this mental three-way split, I can see that a sufficiently capable stream-based media API would be powerful enough to be interesting. If you had to have a document-based API, you could write one that would then use the stream API as its playback/recording engine. Indeed, this is how things are on the iPhone for audio: the APIs offer deep opportunities for mixing audio streams and for recording, but doing something like audio editing would be a highly DIY option (you’d basically need to store edits, mix data, etc., and then perform that by calling the audio APIs to play the selected file segments, mixed as described, etc.).

But I don’t think it’s enough anymore to have a playback-only API, at least on the desktop, for the simple reason that HTML5 and the <video> tag commoditizes video playback. On JavaPosse #217, the guys were impressed by a blog claiming that a JavaFX media player had been written in just 15 lines. I submit that it should take zero lines to write a JavaFX media player: since JavaFX uses WebKit, and WebKit supports the HTML5 <video> tag (at least on Mac and Windows), then you should be able to score video playback by just putting a web view and an appropriate bit of HTML5 in your app.

One other thing that grates on me is the maxim that playback is what matters the most because that’s all that the majority of media apps are going to use. You sort of see this thinking in QTKit, the new Cocoa API for QuickTime, which currently offers very limited access to QuickTime movies as documents: you can cut/copy/paste, but you can’t access samples directly, insert modifier tracks like effects and tweens, etc.

Sure, 95% of media apps are only going to use playback, but most of them are trivial anyways. If we have 99 hobbyist imitations of WinAmp and YouTube for every one Final Cut, does that justify ignoring the editing APIs? Does it really help the platform to optimize the API for the trivialities? They can just embed WebKit, after all, so make sure that playback story is solid for WebKit views, and then please Apple, give us grownups segment-level editing already!

So, anyways, that’s the mental model I’m currently working with: playback, stream, document. I’ll try to make this distinction clear in future discussions of real and hypothetical media APIs. Thanks for indulging the big think, if anyone’s actually reading.

Next Page »