Apple Developer Connection
Member Login Log In | Not a Member? Contact ADC

< Previous PageNext Page > Hide TOC

Translating Compare Operations

Testing Inequalities

Vector compares are done on SSE in substantially the same way as for AltiVec. The same basic set of compare instructions (similar to vec_cmp*) are available. They return a vector containing like sized elements with -1 for a true result and 0 for a false result in the corresponding element. The floating point compares provide the full set that AltiVec provides (except vec_cmpb) and in addition provide ordered and unordered compares and the != test. In addition, all vector floating point compares come in both scalar and packed versions.

The integer compares test for equality and inequality. The inequality test are for signed integers only. There are no unsigned compare greater than instruction. There are no compare instructions for 64-bit types.

Conditional Execution

Branching based on the result of a compare is handled differently from AltiVec, however. The AltiVec compares set some bits in the condition register, upon which the processor can branch directly. SSE compares set no analogous bits. Instead, use MOVMSKPD, MOVMSKPS or PMOVMSKB instruction to copy the top bit out of each element, crunch them together into a 2- ,4- or 16-bit int for double, float and integer data respectively, and copy to an integer register. You may then test that bit field to decide whether or not to branch. This example implements the SSE version of AltiVec's vec_any_eq intrinsic for vFloat:

int _mm_any_eq( vFloat a, vFloat b )
{
    //test a==b for each float in a & b
    vFloat mask = _mm_cmpeq_ps( a, b );
    //copy top bit of each result to maskbits
    int maskBits = _mm_movemask_ps( mask );
    return maskBits != 0;
}

If you are branching based on the result of a compare of one element only, then you can do the whole thing in one instruction using either UCOMISD/UCOMISS or COMISD/COMISS.

Select

Branching is expensive on Intel, just as it is on PowerPC. Most of the time that a test is done, the developer on either platform will elect not to do conditional execution, but instead evaluate both sides of the branch and select the correct result based on the value of the test. In AltiVec, this would look like this:

// if (a > 0 ) a += a;
vUInt32 mask = vec_cmpgt( a, zero );
vFloat twoA = vec_add( a, a);
a = vec_sel( a, twoA, mask );

In SSE, the same algorithm is used. However, SSE has no select instruction. One must use AND, ANDNOT,, OR instead:

vFloat _mm_sel_ps( vFloat a, vFloat b, vFloat mask )
{
    b = _mm_and_ps( b, mask );
    a = _mm_andnot_ps( mask, a );
    return _mm_or_ps( a, b );
}

Then, the SSE version of the above AltiVec code may be written:

// if (a > 0 ) a += a
vFloat mask = _mm_cmpgt_ps( a, zero );
vFloat twoA = _mm_add_ps( a, a);
a = _mm_sel_ps( a, twoA, mask );

We have found that in practice, it is sometimes possible to cleverly replace select with simpler Boolean operators like a single AND, OR or XOR, especially in vector floating point code. While not a performance win for AltiVec (it's a wash), for SSE this replaces three instructions with one, and can be a large win for code that uses select frequently. Very infrequently, sleepy AltiVec programmers may momentarily forget about vec_min and vec_max, and use compare / select instead. Those are a nice win too, when you can find them.

Algorithms and Conversions

Here is a conversion table for AltiVec to SSE translation for vector compares and select:

Table 3-7  Converting Vector Compare and Select Operations from AltiVec to SSE

AltiVec

Type

SSE

vec_cmpeq(a,b))

vSInt8

_mm_cmpeq_epi8(a,b)

vec_cmpeq(a,b)

vUInt8

_mm_cmpeq_epi8(a,b)

vec_cmpeq(a,b)

vSInt16

_mm_cmpeq_epi16(a,b)

vec_cmpeq(a,b)

vUInt16

_mm_cmpeq_epi16(a,b)

vec_cmpeq(a,b)

vSInt32

_mm_cmpeq_epi32(a,b)

vec_cmpeq(a,b)

vUInt32

_mm_cmpeq_epi32(a,b)

vec_cmpeq(a,b)

vFloat

_mm_cmpeq_ps(a,b)

vec_cmpge(a,b)

vFloat

_mm_cmpge_ps(a,b)

vec_cmpgt(a,b)

vSInt8

_mm_cmpgt_epi8(a,b)

vec_cmpgt(a,b)

vUInt8

_mm_max_epu8(a,b) != b

vec_cmpgt(a,b)

vSInt16

_mm_cmpgt_epi16(a,b)

vec_cmpgt(a,b)

vUInt16

_mm_cmpgt_epi16(a+0x8000, b+0x8000)

vec_cmpgt(a,b)

vSInt32

_mm_cmpgt_epi32(a,b)

vec_cmpgt(a,b)

vUInt32

_mm_cmpgt_epi32(a+0x80000000, b+0x80000000)

vec_cmpgt(a,b)

vFloat

_mm_cmpgt_ps(a,b)

vec_cmple(a,b)

vFloat

_mm_cmple_ps(a,b)

vec_cmplt(a,b)

vSInt8

_mm_cmpgt_epi8(b,a)

vec_cmplt(a,b)

vUInt8

_mm_min_epu8(a,b) != b

vec_cmplt(a,b)

vSInt16

_mm_cmpgt_epi16(b,a)

vec_cmplt(a,b)

vUInt16

_mm_cmpgt_epi16(b+0x8000, a+0x8000)

vec_cmplt(a,b)

vSInt32

_mm_cmpgt_epi32(b,a)

vec_cmplt(a,b)

vUInt32

_mm_cmpgt_epi32(b+0x80000000, a+0x80000000)

vec_cmplt(a,b)

vFloat

_mm_cmplt_ps(a,b)



< Previous PageNext Page > Hide TOC


Last updated: 2005-09-08




Did this document help you?
Yes: Tell us what works for you.

It’s good, but: Report typos, inaccuracies, and so forth.

It wasn’t helpful: Tell us what would have helped.
Get information on Apple products.
Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Copyright © 2007 Apple Inc.
All rights reserved. | Terms of use | Privacy Notice