glFlush(); goto Metal;

Metal is Apple’s new API for lower level access to the GPU than is afforded by OpenGL ES. As it stands the only part it can target is the A7. It’d be foolish to assume that whatever chip is announced for the new iPhones and iPads later in the year won’t also have Metal support. Metal will be Apple’s lowest level API for accessing their GPUs.

Metal is a very slim API and a departure from both OpenGL and Direct3D in a number of ways. Metal, as you might expect from the name, exposes how GPUs work today rather than how they worked back in the day.

Both OpenGL and Direct3D have been rebooted a couple of times to better address the realities of the hardware they were providing access to. Direct3D was horrible for the first couple of revisions but, since it was driven entirely by Microsoft, eventually gained the advantage. A willingness to obsolete old APIs and actively promote new ones served Microsoft well here. When customers looking to buy a video card are asking themselves if a part is DX9 or DX11 compatible then you’ve successfully sold your API as a standard.

OpenGL, on the other hand, has had a slower evolution but an arguably more dynamic history. The OpenGL specification (and the “portable subset”, OpenGL ES) is governed by The Khronos Group. Putting technical issues aside for a moment I think we can all agree that’s just a great name for an organization. OpenGL is extensible by vendors and always has been. Vendors can expose new functionality or GPU state accessors through OpenGL. The OpenGL API provides a primitive API to query which extensions are available and the function calls can be looked up via the platform dynamic library lookup functions. Which is voodoo to say that OpenGL extensions can be provided per vender, per driver for a particular piece of hardware. And a lot of them were. For a while OpenGL was well ahead of the game in terms of exposing the capabilities of various GPUs without needing to change the API entirely. The cost of that innovation landed on the people writing the end user software who had to check for and use various vendor extensions in order to make the best use of the hardware they were running on.

With the passing of the Fixed Pipeline as the primary concern and the introduction of programmable stages in the rendering process both OpenGL and Direct3D converged around a pretty standard core of functionality. The application provides geometric shapes (yes, I’m glossing over geometry shaders), the API takes that input and first runs what’s called a Vertex Program (or Shader) against each vertex provided. There’s a lot of vertexes. That’s cool because the one thing GPUs are really good at is doing an incredible amount of relatively simple work in parallel. Then, after all the vertexes have been computed, the next programmable stage is the Fragment Program (or Pixel Shader). A Fragment, in OpenGL terms, means the little dot that lights up on your screen. A pixel. The smallest possible visual expression. The Fragment Program runs for each individual pixel and is provided with, basically, how far away it is from any of the vertexes that define the triangle that is being rendered. With this information you can apply various gradients or blends to come up with the resulting output colour. That, in a very small nutshell, is how GPU APIs work in the modern era.

The issue with OpenGL is that an incredibly complicated state machine is being addressed in an atomic fashion on a per function basis. By that I mean that each call into OpenGL required a check to make sure the state was valid for the device. OpenGL error reporting is such that no functions return an error code. Instead they set a GL local variable that’s accessed though the glError() function. The lack of immediate error returns, as well as other features of the API, derive from its earlier days where it was envisioned as a largely asynchronous affair. Commands would be buffered and fired off to the renderer and there would be only occasional synchronization points.

While that’s a laudable abstraction, that’s not what we’ve ended up with in the real world. Our ability to render 3D graphics is coming closer to the CPU rather than further. Intel has been shipping respectable (yes, gamers, I know) parts for a couple of years that perform well. Meanwhile, Apple’s A7 has a tremendous amount of capabilities that haven’t, and can’t, be unlocked via the OpenGL API. Integrated memory and graphics processing are not what OpenGL was designed for. Which isn’t a knock against OpenGL. It’s just the facts.

Enter Metal.

A GPU is, in many ways, a giant state machine. Imagine a board of switches, dials and a giant button marked, “Submit”. For each thing you’d want to draw with GL you’d set all the switches and dials to be exactly what you wanted. Each time you flipped a switch or diddled a dial the machine would crank away and decide if that was good for it or not. Then you’d hit “Submit” and in theory something would draw but if you’re new to this game, nope, you’d just get a black screen. Nothing is easier to write hundreds of lines of code in to end up with a black screen than OpenGL.

Metal turns that upside down. Rather than making discrete state changes (flipping switches) directly in the driver Metal allows you to order up a set of state that’d you want applied. If it can’t be done you’ll know. If it can then that set of state is good and can be applied, without further checking, to other rendering operations and contexts. Metal turns a set of many tiny decisions into an opportunity to green light a plan.

This agreement-in-advance structure of the API affords many opportunities for optimization that weren’t available through the OpenGL API. (Nor Direct3D for that matter. Though AMD’s Mantle sits at a similar level.)

Metal treats the GPU as what it has grown up to be: a massively parallel computation device which is best served with giant batches of data and a coherent and concise command stream. What SGI started back in 1992 is terrific and forward looking. The notion of the GPU being a hop away from the CPU carried the industry for years. It was a great design decision. Calling an API crufty after almost twenty years of service doesn’t do justice to the foresight of the developers nor the Khronos Group.

The future of GPU API design will be closer to Metal (and Mantle and DirectX 12) than it is to the ground work that OpenGL laid out years ago. The pragmatic future is that this kind of thing will matter to an increasingly vanishing set of people. Between SpriteKit, Cocos2D, SceneKit, Unreal, Unity and the list goes on — it’d be folly to concern yourself with this kind of detail if you’re making a game. Making games is hard. There’s enough to deal with just with the design and content pipelines without being concerned if you’re using the fanciest API access to iOS GPUs.

Here’s the run down. SpriteKit (or Cocos2D) will serve most of your needs for casual games. SceneKit (now on iOS too) will serve many needs, especially for top down games where culling isn’t a huge bottle neck. Failing that OpenGL will solve 99% of your rendering needs. If you’re reading this article and hope to use it to convince a manager to go with Metal as your rendering API you’ll be disappointed.

If you’ve read this far and have been entirely bored and know exactly what you want from a high performance GPU API then I think Metal is a really interesting direction and worth your time. If not, there are better, higher level solutions.