Debugging OpenGL Performance with Xcode Instruments
Back in January of 2012 I started working on an app experiment called “A Tidy World”. Its a 2d world generator, complete with a landscape, weather, day/night cycle - everything you’d use to create a world for a 2d platform game. Its purpose was primarily as a learning tool for me; parallax scrolling effects, colour palette shifting, game loop/timer manipulation, etc.
I was looking for a 2d framework to get me rolling, and my friend and colleague Mark Smith pointed me towards a few that he’d heard positive things about: Cocos2d, Pixelwave, and Sparrow. After a bit of playing around with samples, poking the forums, and reading some of the API docs, I selected Sparrow; it was a simple design, it had an active community, solid documentation, and covered the basics and avoided the kitchen sink.
As work on A Tidy World has been wrapping up, its performance has hit a brick wall. Previously I could hold a fairly steady frame-rate, but during stress-testing on older iOS devices that frame-rate was plunging. I couldn’t understand why at first, so I cracked open Xcode profiling tools and started digging.
The first thing I noticed when I turned to the profiling tools was that my frames-per-second (FPS) varied wildly between 18 to 40 depending on what was happening both in the foreground (scene complexity) and background (analytics and ads). My memory usage was reasonably low, and my CPU usage was well below 50%. What about the GPU utilization? I’d been looking for an opportunity to test out a new addition to the Xcode toolset: the OpenGL ES Frame Analyzer.
For reference, Tiler represents the % of GPU time spent processing scene geometry, Renderer represents the % of GPU time spent on drawing pixels, and Device represents the % of GPU time spent on rendering in total.
Identifying the Culprit
The frame times of both CPU and GPU were both fairly high and comparatively close in value, yet my GPU utilization was fairly low across the board. I already figured my scene complexity was fairly low and the utilization analysis confirmed it, leading me to suspect something in the code was causing the problem. So what did my call stack look like?
This is not the entire call stack; this is the stack excerpt from a single SPSprite object. I can’t paste the entire stack because there were over 250 of these. Almost immediately, I knew I found my problem: while I had loaded all my textures into an atlas, I wasn’t doing any batching of draw calls for my sprites.
Sprite Batching (v.) : The technique of combining draws of sprite objects that come from the same texture object in memory, enabling multiple sprites to be submitted for rendering in a single draw call to the GPU with the intent to improve performance.
The way Sparrow is designed and implemented as of this writing, each SPSprite instance generates a draw call. There is no support for batching at this time, though some components of Sparrow do perform batched draws: the particle system does this well, and there is a class for compiled sprites but those are largely immutable after they’ve been created - essentially a “static” sprite class.
Not having this technique available is a problem.
How Big A Problem?
I’d wondered how dramatic the difference actually was. So I set up two skeleton projects in Cocos2d and Sparrow, and compared the performance of drawing 500 sprites all using the same 32x32 texture to the screen.
This is a compelling difference. It gets even more so when you examine the call stacks comparatively:
I wanted to see if this performance scaled. So I set the draw to 1000 sprites:
What if we do 50000 sprites:
Reducing the texture size had an immense gain, and I’m able to increase the number of sprites based on that texture dramatically and still see great performance. Is there a linear tradeoff? Possibly, but I will pursue that another day.
My bottleneck was the result of a combination of two factors; the first being that some of my textures were fairly large (512x256), and the second is the number of sprites using those textures (even with batched drawing). When I started reducing the number of sprites, I saw immediate improvement. My biggest culprit were my clouds, which were 512x256 in size, and I’d created many of them to provide the appropriate sky coverage for given weather conditions. Obviously I’d need to optimize my scene at some point, but that would be the second step in remedying my performance issues.
The first step was to optimize my draws via sprite batching - something my currently chosen framework doesn’t allow.
The Fork In The Road
So now I am at a crossroads that many developers have faced with dread: whether to change frameworks. A year of my work is tied into a framework that had zero support for this optimization. Batched drawing is a feature that has been requested in the past, and its a feature I’m certain the authors would love to add. There’s been talk of Sparrow 2.0 for about a year now, but its been in a perpetual state of “coming soon”. With Starling being a funded project, it (sensibly) receives the primary share of time and energy. Consequently the younger sibling that is Sparrow will likely starve. One possible way to prevent that fate is for the original authors to publish all 2.0 development work into the development branch (which has been inactive for some time) and truly open the framework to the community for future development - there are sure to be developers who are capable, willing, and interested.
Ultimately my decision rests with the option that not only addresses current project needs, but future project needs as well. Sparrow’s longevity is in question, given the success of Starling, while Cocos2d continues development progress actively. Porting my app over to Cocos2d will take time, but I have more confidence in the future than if I were to remain with Sparrow.