Important: This document may not represent best practices for current development. Links to downloads and other resources may no longer be valid.
Preparing Vector-Based Code
This chapter is relevant only for those developers who want to start writing vector-based code or whose applications already directly use the AltiVec extension to the PowerPC instruction set. AltiVec instructions, because they are processor specific, must be replaced on Intel-based Macintosh computers. You can choose from these two options:
Use the Accelerate framework. This is the recommended option because the framework provides a layer of abstraction that lets you perform vector-based operations without needing to use low-level vector instructions yourself. See Accelerate Framework.
Port AltiVec code to the Intel instruction set architecture (ISA). This solution is available for developers who have performance needs that can’t be met by using the Accelerate framework. See Rewriting AltiVec Instructions.
The Accelerate framework, introduced in Mac OS X v10.3 and expanded in v10.4, is a set of high-performance vector-accelerated libraries. You don’t need to be concerned with the architecture of the target machine because the routines in this framework abstract the low-level details. The system automatically invokes the appropriate instruction set for the architecture that your code runs on.
This framework contains the following libraries:
vImage is the Apple image processing framework that includes high-level functions for image manipulation—convolutions, geometric transformations, histogram operations, morphological transformations, and alpha compositing—as well as utility functions that convert formats and perform other operations. See vImage Programming Guide.
vDSP provides mathematical functions that perform digital signal processing (DSP) for applications such as speech, sound, audio, and video processing, diagnostic medical imaging, radar signal processing, seismic analysis, and scientific data processing. The vDSP functions operate on real and complex data types and include data type conversions, fast Fourier transforms (FFTs), and vector-to-vector and vector-to-scalar operations.
vMathLib contains vector-accelerated versions of all routines in the standard math library. See vecLib Framework Reference.
LAPACK is a linear algebra package that solves simultaneous sets of linear equations, tackles eigenvalue and singular solution problems, and determines least-squares solutions for linear systems.
BLAS (Basic Linear Algebra Subroutines) performs basic vector and matrix computations.
vForce contains routines that take matrices as input and output arguments, rather than single variables.
Rewriting AltiVec Instructions
Most of the tasks required to vectorize for AltiVec—restructuring data structures, designing parallel algorithms, eliminating branches, and so forth— are the same as those you’d need to perform for the Intel architecture. If you already have AltiVec code, you’ve already completed the fundamental vectorization work needed to rewrite your application for the Intel architecture. In many cases the translation process will be smooth, involving direct or nearly direct substitution of AltiVec intrinsics with Intel equivalents.
The MMX, SSE, SSE2, and SSE3 extensions provide analogous functionality to AltiVec. Like the AltiVec unit, these extensions are fixed-sized SIMD (Single Instruction Multiple Data) vector units, capable of a high degree of parallelism. Just as for AltiVec, code that is written to use the Intel ISA typically performs many times faster than scalar code.
Before you start rewriting AltiVec instructions for the Intel instruction set architecture, read AltiVec/SSE Migration Guide. It outlines the key differences between architectures in terms of vector-based programming, gives an overview of the SIMD extensions on x86, lists what you need to do to build your code, and provides an in-depth discussion on alignment and other relevant issues.
The following resources are relevant for rewriting AltiVec instructions for the Intel architecture:
Architecture-Independent Vector-Based Code shows how to write a fast matrix-multiplication function with a minimum of architecture-specific coding.
Intel software manuals describe the x86 vector extensions:
Perf-Optimization-dev is a list for discussions on analyzing and optimizing performance in Mac OS X. You can subscribe at: