AltiVec
and the G3 (and earlier processors)
Apple does not provide an AltiVec emulator for the G3 and earlier processors. If one of these processors encounters AltiVec instructions, it may take an illegal instruction exception. If your application will be running on older processors you must check at run time for the presence of the AltiVec unit.
Checking for AltiVec
To check whether the user's machine supports AltiVec in Classic or CFM Carbon, use Apple's Gestalt Manager:
//For Classic, use Gestalt.h
#include <Gestalt.h>
//For Carbon, use CoreServices.h instead
#include <CoreServices/CoreServices.h>
Boolean IsAltiVecAvailable( void )
{
long cpuAttributes;
Boolean hasAltiVec = false;
OSErr err = Gestalt( gestaltPowerPCProcessorFeatures, &cpuAttributes );
if( noErr == err )
hasAltiVec = ( 1 << gestaltPowerPCHasVectorInstructions) & cpuAttributes;
return hasAltiVec;
}
For mach-o applications use sysctl():
#include <sys/sysctl.h>
//returns: 0 for scalar only, 1 for AltiVec
//Note: may return >1 in the future
int GetAltiVecTypeAvailable( void )
{
int sels[2] = { CTL_HW, HW_VECTORUNIT };
int vType = 0; //0 == scalar only
size_t length = sizeof(vType);
int error = sysctl(sels, 2, &vType, &length, NULL, 0);
if( 0 == error ) return vType;
return 0;
}
Finally, for applications that need to also work outside of MacOS X, you may as a last resort patch the signal handler for illegal instructions, attempt a vector instruction and see if the signal handler was called. This should work on any platform that supports signals:
#include <signal.h>
volatile int gIsAltiVecPresent = -1L;
static sigjmp_buf gEnv;
void sig_ill_handler( int sig )
{
//Set our flag to 0 to indicate AltiVec is illegal
gIsAltiVecPresent = 0;
//long jump back to safety
siglongjmp( gEnv, 0);
}
int IsAltiVecAvailable( void )
{
if( -1L == gIsAltiVecPresent )
{
sig_t oldhandler;
sigset_t signame;
struct sigaction sa_new, sa_old;
//Set AltiVec to ON
gIsAltiVecPresent = 1;
//Set up the signal mask
sigemptyset( &signame );
sigaddset( &signame, SIGILL );
//Set up the signal handler
sa_new.sa_handler = sig_ill_handler;
sa_new.sa_mask = signame;
sa_new.sa_flags = 0;
//Install the signal handler
sigaction( SIGILL, &sa_new, &sa_old );
//Attempt to use AltiVec
if( 0 == sigsetjmp( gEnv, 0) )
{
#if defined( __GNUC__ )
asm volatile ( "vor v0, v0, v0" );
#elif defined( __MWERKS__ )
asm{ vor v0, v0, v0 }
#else
#error unknown compiler
#endif
}
//Restore the old signal handler
sigaction( SIGILL, &sa_old, &sa_new );
return gIsAltiVecPresent;
}
Please note that the signal based method relies on the compiler to honor the volatile declaration of gIsAltiVecPresent. In addition, inline asms are used in an attempt to prevent the compiler from generating vector stack save and restore instructions in the signal based IsAltiVecAvailable() function as are normally required by the PowerPC AltiVec ABI. If the compiler decides to generate a stack frame for AltiVec anyway, the function will trigger an illegal instruction exception when executed on a G3 or earlier processor before we make it to the first signal system call, likely causing your application to prematurely terminate. Thus, we are relying on implementation dependent compiler behavior here for correct operation of your application. According to your tolerance for risk, at minimum it may be safer to move the vor statement to a separate function, and take steps to ensure it is not inlined. (See below for further details.) Safer yet would be to construct the function in PowerPC assembly. Some assemblers use "0" as the register name rather than "v0".
For these and other reasons, it is recommended that developers use either the Gestalt() or sysctl() methods instead of the signal method.
Precautions Necessary for Safe Conditional
Execution
A mfspr vrsave instruction is likely to be
automatically generated by the compiler in the preamble of
any function that uses vector types. (This behavior is
required for proper function on MacOS.) The vrsave
special purpose register is not present on G3 and earlier
processors. Trying to use it with mfspr it will
cause problems. For this reason, a simple if
statement such as the following is not in itself sufficient
to prevent earlier processors from encountering vector
code.
//FAILS because the presence of vector code in this block
//induces a mfspr vrsave in the (invisible) prolog to this
//C function!
if( IsAltiVecPresent() )
{
vector unsigned
char a_constant = vec_splat_u8(0);
...
}
else
{
unsigned char
a_constant = 0;
...
}
You must make separate functions to hold the AltiVec and
scalar versions of the same code. Be careful of automatic
inlining by some compilers.
Tips for Writing Code that Runs on Both G3 and G4 / G5
If you intend to write an application that uses AltiVec
but which also must run on a G3 or earlier processor, here
are some tips to help make the process go more smoothly.
- For functions that you know will have two versions,
one AltiVec accelerated, one not, write the AltiVec
accelerated version first.
- AltiVec has more stringent data alignment and organization requirements than do the scalar units.
- This will help ensure that you avoid committing to
data formats or software architectures that are not
amenable to vectorization.
- It is much easier to write scalar code to mirror
vector code than the other way around. This is because
to a limited extent, the scalar units can be used as a
vector unit:
- A PowerPC has two or three scalar integer units
that can be used in parallel. Formally this is like
a single 64 or 96 bit vector register. Only one of
the many integer units can do multiplication and
division however.
- The FPU is pipelined to a depth of several
cycles. In order to avoid FPU data dependency
stalls, you need multiple independent data
"streams". A FPU with a 4 cycle pipeline can be
thought of as a single (unpipelined) vector unit
with registers that hold four elements.
- For these reasons, design elements that work
well for vector code (efficient cache usage,
high-throughput function design, increased
parallelism) also benefit scalar code.
- In situations where writing for AltiVec first
would be premature optimization, but an AltiVec
version seems likely, spend a few minutes designing
your probable vector approach before writing the
scalar version. This will help reduce the probability
that you will have to rewrite both later.
- Separate vector and scalar versions into code units
that can be accessed polymorphically.
- In C++, use a factory method to instantiate a scalar or vector class instance depending on the run time environment. Put the runtime dependent code in virtual class methods.
- In C, you can use function pointers to achieve the same results. It may be useful to move the vector and scalar code elements out into plug-ins, and load the appropriate plug-in at run time.
- This sort of function pointer based interface
helps guarantee that the compiler will not inline
these functions where it should not.
- The best place to branch to scalar vs. AltiVec is not in leaf functions. Usually, it is at the last moment that a function knows how to address all of the data, not a small subset of it. For example, if you are vectorizing PaintRect(), don't branch to vector code at the level of the function that draws each horizontal row of the rectangle, or an individual pixel. Branch at the level of the function in charge of drawing the whole rectangle.
- AltiVec functions typically have high stack frame
setup overhead and are easily data starved.
- AltiVec functions perform best with complex
calculations
- Knowing where all the data is makes it easier to
correctly prefetch data into the caches
- Don't trim so high up that the calculation becomes so complex that the function runs out of vector registers.
- Use the scalar and vector versions of identical
functions to check each other for correctness.
- Avoid lookup tables. If you must use them for scalar
code, document thoroughly how the table was derived, so
that they may be replaced with direct calculation in the
vector unit later.
- Use vector-friendly data layouts for the scalar
code.
Many of these topics and their rationale will be covered in more detail in the sections that follow. Please see the G5 section for key differences between G4 and G5.
Table of Contents Next Previous
|