Hello World!
Creating an OpenCL program in OS X v10.7 is easy with support built into Xcode. This chapter describes step-by-step how to create an OpenCL project in Xcode. If you already have a working OpenCL project, you need not regenerate it, but you can find information in this chapter about support for OpenCL now built into Xcode.
Creating An Application That Uses OpenCL In Xcode
To create a project that uses OpenCL in OS X v10.7:
-
Create your OpenCL project in Xcode as a new OS X project (empty is fine).
-
Place your kernel code in one or more .cl files in your Xcode project. You can place all your kernels into a single .cl file, or you can separate them as you choose. You can also include non-kernel code that will run on the same OpenCL device as the kernel in each .cl file.
Each .cl file is compiled by default into three files containing bitcode for i386, x86_64, and gpu_32 architectures. You can change this using the OpenCL Architectures Build Setting.)
At runtime your host application discovers what kind(s) of devices are available, and determines which of the compiled kernels to enqueue and execute.
Figure 2-1 A simple OpenCL kernel in Xcode
-
You can set the following build settings for your kernel (.cl) files:
Figure 2-2 Build settings for kernel files
-
OpenCL Compiler Version
-
Compiler Version
The OpenCL C compiler version supported by the platform. The default is OpenCL C 1.1. To set this parameter from the command line, use:
-cl-std=CL1.1
-
-
OpenCL - Architectures
-
Valid Architectures
A
StringListspecifying the list of the architectures for which the product will be built. This is usually set to a predefined build setting provided by the platform. The default is that the product is built for all three architectures. To set this parameter from the command line, use:-
-triple i386-applecl-darwin -
-triple x86_64-applecl-darwin -
-triple gpu_32-applecl-darwin
(So to compile for the first two, the command line would read:
-triple i386-applecl-darwin-triple x86_64-applecl-darwin -
-
-
OpenCL - Preprocessing
-
Preprocessor Macros
Space-separated list of preprocessor macros of the form
"foo"or"foo=bar". To set this parameter from the command line, use:-D
-
-
OpenCL - Code Generation
-
Use MAD
Boolean. If true, allow expressions of the type a * b + c to be replaced by a Multiply-Add (MAD) instruction. If MAD is enabled, multistep instructions in the form
a * b + care performed in a single step, but the accuracy of the results may be compromised. For example, to optimize performance, some OpenCL devices implement MAD by truncating the result of thea * boperation before adding it toc.The default for this parameter is
NO. To set this parameter from the command line, use:-cl-mad-enable -
Relax IEEE Compliance
Boolean. If true, allows optimizations for floating-point arithmetic that may violate the IEEE 754 standard and the OpenCL numerical compliance requirements defined in in section 7.4 for single-precision floating-point, section 9.3.9 for double-precision floating-point, and edge case behavior in section 7.5 of the OpenCL 1.1 specification.
This is intended to be a performance optimization.
This option causes the preprocessor macro
__FAST_RELAXED_MATH__to be defined in the OpenCL program. The default isNO. To set this parameter from the command line, use:-cl-fast-relaxed-math -
Double as single
Boolean. If true, double precision floating-point expressions are treated as single precision floating-point expressions. This option is available for GPUs only. The default is
NO. To set this parameter from the command line, use:-cl-double-as-single -
Flush denorms to zero
Boolean that controls how single precision and double precision denormalized numbers are handled. If specified as a build option, the single precision denormalized numbers may be flushed to zero; double precision denormalized numbers may also be flushed to zero if the optional extension for double precision is supported. This is intended to be a performance hint and the OpenCL compiler can choose not to flush denorms to zero if the device supports single precision (or double precision) denormalized numbers.
This option is ignored for single precision numbers if the device does not support single precision denormalized numbers i.e.
CL_FP_DENORMbit is not set inCL_DEVICE_SINGLE_FP_CONFIG.This option is ignored for double precision numbers if the device does not support double precision or if it does support double precision but not double precision denormalized numbers i.e.
CL_FP_DENORMbit is not set inCL_DEVICE_DOUBLE_FP_CONFIG.This flag only applies for scalar and vector single precision floating-point variables and computations on these floating-point variables inside a program. It does not apply to reading from or writing to image objects.
The default is
NO. To set this parameter from the command line, use:-cl-denorms-are-zero -
Auto-vectorizer
Auto-vectorizes the OpenCL kernels for the CPU. This setting takes effect only for the CPU. This makes it possible to write a single kernel that is portable and performant across CPUs and GPUs.
The default is
YES. To set this parameter from the command line, use:-cl-auto-vectorize-enableor
-cl-autovectorize-disable -
Optimization Level
You can choose whether to optimize for smallest code size or not.
The default is fast
O1optimization.To set this parameter from the command line, use:
-
-Ossets it to optimize for smallest code size -
-O, O1sets it to fast -
-O2sets it to faster -
-O3sets it to fastest -
-O0sets it to not optimize
-
-
-
-
Place your host code in one or more .c files in your Xcode project.
Figure 2-3 OpenCL host code in Xcode
-
Link to the OpenCL framework.
-
Build.
-
Run.
Figure 2-6 Results
See “Basic Programming Sample” for a line-by-line description of the host and kernel code in the Hello World sample project.
Compiling From the Command Line
To compile from the command line, call openclc.
Debugging
Here are a few hints to help you debug your OpenCL application:
-
Run your kernel on the CPU first. There is no memory protection on GPUs. If an index goes out of bounds on the GPU, it is likely to take the whole system down. If an index goes out of bounds on the CPU, it may crash the program that’s running, but it will not take the whole system down.
-
You can use the
printffunction from within your kernel. -
You can use the gdb debugger to look at the assembly code once you’ve built your program. See GDB website.
-
On the GPU, use explicit address range checks to look for out-of-range address accesses. (Remember: there is no memory protection on current GPUs.)
© 2012 Apple Inc. All Rights Reserved. (Last updated: 2012-07-23)