Developing OpenCL Programs Using Xcode
This chapter describes a streamlined process in which, using tools provided by OS X v10.7, you can include OpenCL kernels as resources in Xcode projects, compile them along with the rest of your application, and use Grand Central Dispatch as the queuing API for executing OpenCL commands and kernels on the CPU and GPU.
If you need to create OpenCL programs at run-time, with source loaded as a string or from a file, or if you want API-level control over queueing, see The OpenCL Specification, available from the Khronos Group at http://www.khronos.org/registry/cl/.
In the OpenCL specification, computational processors are called devices. An OpenCL device has one or more compute units. A workgroup executes on a single compute unit. A compute unit is composed of one or more processing elements and local memory.
A Macintosh computer has a single CPU and GPUs. The CPU on a Macintosh has multiple compute units, which is why it is called a multi-core CPU. The number of compute units in a CPU limits the number of workgroups that can execute concurrently.
CPUs commonly contain two to eight compute units, with the maximum increasing year-to-year. A graphics processing unit (GPU) typically contains many compute units—the GPUs in current Macintosh systems feature tens of compute units, and future GPUs may contain hundreds. As used by OpenCL, a CPU with eight compute units is considered a single device, as is a GPU with 100 compute units.
The OS X v10.7 implementation of the OpenCL API facilitates designing and coding data parallel programs to run on both CPU and GPU devices. In a data parallel program, the same program (or kernel) runs concurrently on different pieces of data and each invocation is called a work item and given a work item ID. The work item IDs are organized in up to three dimensions (called an N-D range).
A kernel is essentially a function written in the OpenCL language that enables it to be compiled for execution on any device that supports OpenCL. Although kernels are enqueued for execution by host applications written in C, C++, or Objective C, a kernel must be compiled separately to be customized for the device on which it is going to run. You can write your OpenCL kernel source code in a separate file or include it inline in your host application source code.
OpenCL kernels can be:
Compiled at compile time, then run when queued by the host application
Compiled and then run at runtime when queued by the host application
Run from a previously-built binary
A work item is a parallel execution of a kernel on some data. It is analogous to a thread. Each kernel is executed upon hundreds of thousands of work items
A workgroup is set of work items. Each workgroup is executed on a compute unit.
Workgroup dimensions determine how the input is operated upon in parallel. The application usually specifies the dimensions based on the size of the input. There are constraints: for example, there may be a maximum number of work items that can be launched for a certain kernel on a certain device.
The program that calls OpenCL functions to set up the context in which kernels run and enqueue the kernels for execution is known as the host application. The application is run by OS X on the CPU. The device on which the host application executes is known as the host device. Before kernels can be run, the host application typically completes the following steps:
Determine what compute devices are available, if necessary.
Select compute devices appropriate for the application.
Create dispatch queues for selected compute devices.
Allocate the memory objects needed by the kernels for execution. (This step may occur earlier in the process, as convenient.)
Note that the host device (the CPU) can itself be an OpenCL device and can be used to execute kernels.
The host application can enqueue commands to read from and write to memory objects. See “Creating and Managing Memory Objects in OS X OpenCL.” Memory objects are used to manipulate device memory. There are two types of memory objects used in OpenCL: buffer objects and image objects. Buffer objects can contain any type of data; image objects contain data organized into pixels in a given format.
Essential Development Tasks
In OS X v10.7, the OpenCL development process includes these major steps:
Identify the tasks to be parallelized.
Determining how to parallelize your program effectively is often the hardest part of developing an OpenCL program. See “Identifying Parallelizable Routines.”
In Xcode, write your kernel functions. See “Basic Kernel Code Sample.”
In Xcode, write the host code that will be calling the kernel(s). See “Basic Host Code Sample.”
Compile using Xcode. See “Creating An Application That Uses OpenCL In Xcode.”
Debug (if necessary). See “Debugging.”
Improve performance (if necessary). See “Improving Performance.”
© 2012 Apple Inc. All Rights Reserved. (Last updated: 2012-07-23)