This example shows how to combine parallel computation on the CPU via GCD with
results processing and display on the GPU via OpenCL and OpenGL.
It computes escape-time fractals in parallel on the global concurrent GCD queue
and uses another GCD queue to upload results to the GPU for processing via two
OpenCL kernels. Calls to OpenCL and OpenGL for display are serialized with a
third GCD queue.
The fractal computation example (without the display) is also available as a
commandline tool, with flags to control the computation parameters available
in the GUI (see usage message).
- Mac OS X v10.6 or later
- Xcode 3.2
- Mac OS X v10.6 or later
- OpenCL-compliant GPU, e.g.
- NVIDIA GeForce 8600M, 8800 GS, 8800 GT, 9400M, 9600M GT
- ATI Radeon HD 4870
- GPUs known not to be supported yet by OpenCL:
- ATI Radeon HD 2600 Pro, NVIDIA GeForce 7300 GT
Source file details:
DispatchFractal.c: Parallel fractal computation engine via recursive square
subdivision. The 'subdivisions' parameter controls how many
times the initial square is subdivided, and thus the
resolution of the final fractal (i.e. subdivisions = 10 ->
resolution 1024 * 1024).
Parallel computation is performed by enqueuing blocks onto
the global GCD queue. The 'stride' parameter controls how
many subdivision steps are performed in each block, once
that limit is reached, the next subdivision step enqueues
four new blocks. This allows control of the per-block
workload, note how performance decreases when stride is
very small (too many blocks being enqueued and dispatch
overhead larger than useful workload).
The computation results in one float value per square in
the subdivision, this is stored lock-free into a global
buffer as a quadtree (i.e. every subdivision square has a
distinct result location in the buffer).
DFFractals.c: Computation blocks for the different fractals available.
Uses long double precision and -ffast-math. Roughly
estimates how many floating point operations are used for
DFView.m: OpenCL/OpenGL display of quadtree results buffer. During
fractal computation, a GCD queue asynchronously uploads the
results buffer to OpenCL and performs the 'quadtree' kernel
on it. The resulting OpenCL memory buffer is passed to a
separate GCD display queue, which performs the 'colorize'
kernel on it and copies the result to an OpenGL texture,
this is then drawn via VBO. The result of the 'quadtree'
kernel is double-buffered so that display can proceed
independently of results buffer upload and processing.
Display refresh rate can be controlled by CoreVideo
DisplayLink or by enabling vertical retrace sync (which
blocks the display queue in CGLFlushDrawable() until VBL),
otherwise the display is redrawn as fast as possible
(wasteful, only interesting for FPS measurement).
Also contains code to download the OpenGL texture from GPU
via CoreImage for image saving, and to interact with the
mouse and display a selection rectangle via an OpenGL
DispatchFractal.cl: OpenCL kernel sources. The 'quadtree' kernel assembles a
float buffer the size of the final texture. For every pixel
it traverses the quadtree results buffer from the bottom up
until a valid value is found.
The 'colorize' kernel transforms this float buffer into
BGRA8 color values. For every pixel it looks up a color in
a constant gradient buffer, according to a curve based on
the 'falloff' and 'cycle speed' parameters.
DFAppDelegate.m: Interaction of the fractal computation and display engines
with the GUI controls. Image saving via ImageKit/ImageIO.
DispatchFractalCLI.c: Interaction of the fractal computation engine with the
Copyright (C) 2009 Apple Inc. All rights reserved.