Documentation Archive Developer
Search

ADC Home > Reference Library > Technical Notes > Legacy Documents > Hardware & Drivers >

Legacy Documentclose button

Important: This document is part of the Legacy section of the ADC Reference Library. This information should not be used for new development.

Current information on this Reference Library topic can be found here:

PowerPC Compatibility and Performance Issues

CONTENTS

This Technical Note discusses differences between the PowerPC 601 chip and future 603 and 604 chips, and how these differences affect application compatibility and performance.

[Sep 01 1994]






Introduction

The PowerPC 601 chip is a transitional CPU, bridging the new PowerPC architecture with the old POWER architecture from which it is descended. As such, it implements most of the old POWER instruction set, as well as the PowerPC instruction set. Subsequent PowerPC CPUs, such as the 603 and 604 only implement the PowerPC architecture. Additionally, implementation differences between the 601 and 603/604 chips can also affect performance. This note discusses the implications for compatibility and performance arising from these differences.


Note:
These issues typically affect developers who are actually generating code for the PowerPC. Most application developers have little control over the code generated as that is the responsibility of the compiler. However, at the time this is being written, few compilers fully address these issues, so all developers should check with their tools developers to determine which versions of tools do (or will) address these issues.


Back to top

POWER Instructions

A variety of instructions which were part of the POWER architecture have been eliminated from the PowerPC architecture, 34 instructions, in all. However, most of these instructions are included in the PowerPC 601 implementation as part of the transition from POWER. This made it easier to bring up POWER code on PowerPC, but they have been eliminated in subsequent implementations, such as the 603 and 604. A list of this instructions can be found in Table B-3 of the PowerPC 601 RISC Microprocessor User's Manual., and for convenience is reproduced below.

Table 1. POWER Instructions Deleted from PowerPC Architecture

abs     absolute                          rrib  rotate right and insert bit
clcs    cache line compute size           sle   shift left extended
clf     cache line flush+                 sleq  shift left extended with MQ
cli     cache line invalidate+            sliq  shift left immediate with MQ
dclst   data cache line store+            slli  shift left long immediate with MQ
                                          q
div     divide                            sllq  shift left long with MQ
divs    divide short                      slq   shift left with MQ
doz     difference or zero                srai  shift right algebraic immediate
                                          q     with MQ
dozi    difference or zero immediate      sraq  shift right algebraic with MQ
lcsbx   load string and compare byte      sre   shift right extended
        indexed
maskg   mask generate                     srea  shift right extended algebraic
maskir  mask insert from register         sreq  shift right extended with MQ
mfsrin  move from segment register        sriq  shift right immediate with MQ
        indirect
mul     multiply                          srli  shift right long immediate with MQ
                                          q
nabs    negative absolute                 srlq  shift right long with MQ
rac     real address compute+             srq   shift right with MQ
rlmi    rotate left then mask insert      scvx  supervisor call, with SA = 0+

+ Instructions not implemented in PowerPC 601

Most compilers designed for PowerPC do not emit these instructions, however, some compilers originally designed for POWER code generation may. You should contact your compiler vendor for the latest information.

The IBM xlc C compiler and xlC C++ compilers have an option to suppress POWER code generation, -qarch=ppc. Anyone using these compilers must use this option to generate code that runs safely on 603/604 CPUs. For developers seeded by Apple with these compilers, this option was part of the recommended cmac stanza in the xlc.cfg file. Developers should verify that this option is in effect for all parts of their code.

The GNU gcc compiler is also a POWER compiler, but currently has no option for PowerPC only code generation. Developers should beware of code generated by this compiler.

Anyone writing PowerPC assembly should also take care not to use these instructions.

The latest Prerelease MPW DumpPEF tool (version 2.0b1 from E.T.O. #15) has an option, -w601, to scan for PowerPC 601 specific instructions. It is one way to test if your code is affected. Be aware, however, that DumpPEF cannot always distinguish between code and data and may flag POWER opcodes that are really data. You should check all warnings from DumpPEF to be sure they are not spurious. Even if the tool finds valid POWER opcodes, there is no guarantee the instructions are part of an executable code path. You should, of course, test your application on 603/604 hardware as soon as it is available.

Back to top

POWER register usage

In addition to the POWER instructions that are only implemented on 601, the 601 has internal registers that are unavailable on subsequent PowerPC processors. These are the multiply-quotient (MQ) and the real-time clock (RTC) registers.

The MQ register is generally accessed using POWER MQ instructions (see table above), and is covered in the previous section.

The RTC register can be useful for timing purposes, but is not accessible from high level languages. It would only be a problem if assembly language code was written to directly access the register.

Back to top

Load/store string and load/store multiple word instructions

A variety of instructions can interfere with instruction pipelining. Most problematic for application code are the multicycle load/store string and load/store multiple word instructions because many compilers make use of them. These instructions are referred to as completion serialized instructions because they cause all prior instructions to complete before they execute. This interferes with performance on more heavily pipelined implementations, such as the 604 where this can cause a 6 cycle delay before instruction execution.

The possible implementation limitations of these instructions were noted in PowerPC 601 documentation but compilers use them anyway for convenience and to reduce code expansion. For example, string instructions are often used to copy contiguous data in memory, such as when assigning a struct to a struct. Load/store multiple word instructions are often used for saving and restoring registers as part of function prolog/epilog code.

Apple is working with compiler developers to establish guidelines for using these instructions appropriately. Developers should check with compiler vendors for the latest information on their tools and their use of these instructions.

Back to top

Cache coherency

PowerPC 601 features a unified instruction/data cache, while 603 and 604 feature separate instruction and data caches. This leads to potential cache coherency problems analogous to those encountered when MC68040 machines were released. Fortunately, the PowerPC runtime architecture reduces this risk as much as possible.

Almost all code for PowerPC is loaded and prepared by the Code Fragment Manager. The CFM ensures that all such code is suitable for execution. If all your code is loaded by the Code Fragment Manager, you don't have to worry about cache coherency.

However, if you generate code in memory for execution, you should be concerned about this issue. This includes compilers that generate code for immediate execution and interpreters that compile an interpreted language into memory for execution.

You can eliminate the cache coherency problem by notifying the system that data is subject to execution. Use the call MakeDataExecutable, defined in OSUtils.h:

extern pascal void MakeDataExecutable(void *baseAddress, unsigned long length);

This call is currently only implemented for PowerPC, so you must conditionally compile it. It takes an address, which is the start of the data to be flushed and a length, for the amount of data. Be very careful about flushing the cache unnecessarily as you will adversely affect performance.

Back to top

Data alignment

Modern RISC designs generally prefer natural alignment for data (for example, shorts only need be 2-byte aligned, but doubles need to be 8-byte aligned.) But 680x0 code typically aligns data on 16-bit boundaries, and PowerPC was explicitly designed to support this kind of data access. The 601 does not suffer much of a performance penalty with most misaligned data accesses. This will no longer be true on later PowerPC CPUs, however. Furthermore, as the PowerPC processor family grows, it is likely that the performance hit for misaligned accesses will grow as well.

While the 603 and 604 designs support misaligned data access, an alignment exception is thrown under some conditions. These exceptions are handled silently by the nanokernel, and you will see no evidence of the exception other than a decrease in performance on 603 and 604 processors. Because these loads and stores are handled by the nanokernel instead of the hardware, a significant performance hit is taken for every misaligned access. As an example, take the following data structure:

struct ArrayWithHeader {
   short    BlockHeader;
   double    Elements[kSomeLargeNumber];
};

When compiled with 68K alignment, an iteration over the array elements (meaning a data access on halfword boundaries for each double) may take up to forty times longer on a 603 or 604 than on a 601. When compiled with PowerPC alignment (which inserts a halfword pad in the data structure, to guarantee that accesses to the array will be on word boundaries), an iteration completes in less time than on a 601.

Note that while this significant performance hit is only present when accessing floats or doubles on non-word boundaries, optimal performance for iterating over this array takes place when accessing its elements on 8 byte boundaries. On the PowerPC misaligned access of integer types do not cause an alignment exception, but there is still a performance loss when compared to aligned integer accesses. If speed is important, make sure that access to any of your data structure fields fall on natural boundaries, or are (minimally) compiled with PowerPC alignment.

While it is essential that you use 680x0 data alignment for data shared with 680x0 code (such as the Toolbox), you should always use PowerPC alignment for data used internally to your application. In particular do not turn on global 680x0 data alignment for your PowerPC code. Use alignment pragmas to turn on 680x0 data alignment only when absolutely necessary.

Back to top

References

PowerPC(TM) 601 RISC Microprocessor User's Manual

PowerPC(TM) 603 RISC Microprocessor User's Manual

The PowerPC(TM) Architecture

Back to top

Downloadables

Acrobat gif

Acrobat version of this Note (K).

Download