Although 64-bit executables make it easier for you to manage large data sets (compared to memory mapping of large files in a 32-bit application), the use of 64-bit executables may raise other issues. Therefore you should transition your software to a 64-bit executable format only when the 64-bit environment offers a compelling advantage for your specific purposes.
This chapter explores some of the reasons you might or might not want to transition your software to a 64-bit executable format. Before you read this entire guide, read this chapter to decide whether your software will benefit from having a 64-bit executable format. When you have finished, if you are convinced that your software will benefit from a 64-bit executable format, you should read the remaining chapters in this document.
If some of the capabilities of a 64-bit environment would be helpful to you but you do not want to transition your software to a 64-bit executable, read the section “Alternatives to 64-Bit Computing” to learn techniques that offer many of the same benefits but let you remain in a 32-bit environment.
Before going further, it is important to dispel a few common misconceptions.
Myth #1
Myth: My software has to be 64-bit (or run on a 64-bit–capable computer) to use 64-bit data or do 64-bit math.
Fact: In 32-bit software, you can already use signed and unsigned 64-bit data types such as long long. Internally, operations on these 64-bit values use a pair of 32-bit registers by default. If your code needs to run only on 64-bit Macintosh computers, you can get better performance by enabling true 64-bit math in leaf functions. See “Alternatives to 64-Bit Computing” for more information.
Myth #2
Myth: The kernel needs to be 64-bit in order to be fully optimized for 64-bit processors.
Fact: The kernel does not generally need to directly address more than 4 GB of RAM at once. The kernel is able to make larger amounts of memory available to 64-bit applications by using long long data types to keep track of mappings internally. See “Kernel Extensions and Drivers” for more information about how 64-bit architectures affect device drivers and kernel extensions.
There is a caveat, however, when dealing with configurations with large amounts of RAM. As the physical RAM grows, the size of data structures used to manage memory mappings also grows. Beyond a certain limit, it becomes impractical to keep them in a 32-bit address space. Thus, beginning in Snow Leopard, the kernel is moving to a 64-bit executable on hardware that supports such large memory configurations.
Myth #3
Myth: All of the system calls and corresponding C library functions have to change (or new ones have to be added) for 64-bit compatibility.
Fact: Most of the system call arguments changed to 64-bit many years ago. Some operating systems have separate 64-bit versions of these functions, such as llseek64. In Mac OS X, these variants are unnecessary because those functions are already 64-bit capable.
The notable exceptions are those functions related to memory management, such as mmap, malloc, and so on. Those functions have changed in terms of the size of data passed (because the size of size_t changed), but this change should be largely transparent to you as a programmer.
Myth #4
Myth: Every application needs the ability to work with more than 4 GB of RAM.
Fact: Most applications have relatively modest memory requirements (a gigabyte or less). Other applications need more, but many of these larger applications can support larger data sets without moving to a 64-bit address space. Some common examples are applications that work with large media files in a sequential manner, such as music and video playback applications.
Myth #5
Myth: My application will run much faster if it is a “native” 64-bit application.
Fact: Some 64-bit executables may run more slowly on 64-bit Intel and PowerPC architectures because of increased cache pressure.
On Intel-based Macintosh computers, you may see some performance improvement. The number of registers and the width of registers increases in 64-bit mode. Because of the increased number of registers, function call parameters can be passed in registers instead of on the stack. The increased register width makes certain performance optimizations possible in 64-bit mode that are not possible in 32-bit mode. These improvements will often (but not always) offset the performance impact caused by increased cache pressure.
The 32-bit PowerPC architecture is a 32-bit subset of a 64-bit architecture. The PowerPC architecture supports 64-bit arithmetic instructions in 32-bit mode (with some limitations). Since there are ample registers on 32-bit PowerPC, function call parameters on PowerPC have always been passed in registers. For these reasons, on PowerPC architectures, software does not generally become significantly faster (and may actually slow down) when compiled as a 64-bit executable.
A 64-bit executable can provide many benefits to users and to programmers, depending on the nature of your program. As a general rule, although a 32-bit application can provide the same functionality as a 64-bit application, a 64-bit application requires less effort to support large data sets.
Some applications can benefit significantly from 64-bit computing on both PowerPC and Intel. These include data mining, web caches and search engines, CAD/CAE/CAM software, large-scale 3D rendering (such as a movie studio might use, not a computer game), scientific computing, large database systems (for custom caching), and specialized image and data processing systems.
On Intel-based Macintosh computers, most applications will be somewhat faster when recompiled as a 64-bit executable. Whether this benefit justifies needed porting effort depends largely on how important performance is to your particular application and whether your application would benefit from a larger address space.
There are a number of factors to consider when deciding whether to make your application run in 64-bit mode. These considerations are described in the sections that follow:
If you are writing a kernel extension, you must make it 64-bit-capable. Beginning in Snow Leopard, some hardware configurations use a 64-bit kernel by default. The 64-bit kernel cannot load 32-bit kernel extensions.
If your application is performance critical, you might want to recompile your application as a 64-bit executable, particularly on Intel-based Macintosh computers.
Here’s why. The 64-bit Intel architecture contains additional CPU registers that are not available when compiling a 32-bit Intel executable. For example, the 64-bit architecture has 16 general-purpose integer registers instead of 8. Because of the extra register space, the first few arguments are passed in registers instead of on the stack. Thus, by compiling some applications as 64-bit, you may improve performance because the code generates fewer memory accesses on function calls. As a general rule, 64-bit Intel executables run somewhat more quickly unless the increased code and data size interact badly (performance-wise) with the CPU cache.
By contrast, executables compiled for the 64-bit PowerPC architecture can access the same number of registers (32) as 32-bit PowerPC executables. As a general rule, 64-bit PowerPC executables will execute slightly more slowly unless they make significant use of 64-bit math. Thus, if your application does not require a 64-bit address space, you may want to ship your application as a 32-bit executable on PowerPC by default.
As with any complicated software system, it is difficult to predict the relative performance of recompiling a piece of software as a 64-bit executable. The only way to know for certain (on either architecture) is to compile for 64-bit and benchmark both versions of the application.
Here are some of the potential performance pitfalls:
Larger code and data size can result in increased cache and translation lookaside buffer (TLB) misses.
Larger code and data (both pointers and long integers) can require more memory to avoid paging.
The instruction sequence to get an address or constant into a register is longer for 64-bit code on PowerPC.
Multiply and divide operations are slower when performed on 64-bit quantities than 32-bit quantities. Other operations take roughly the same amount of time as their 32-bit counterparts. Thus, if your code frequently multiplies values of type long, you will see a performance impact. (The reverse is true for type long long because 64-bit applications do not have to break 64-bit operations up into multiple 32-bit operations.)
When you use a 32-bit signed integer as an array index, if that number is not stored in a register, the CPU will spend extra time on each access to sign-extend the value.
For the most part, these potential performance impacts should be small, but if your application is performance critical, you should be aware of them.
If your application may need random access to exceptionally large (>2GB) data sets, it is easier to support these data sets in a 64-bit environment. You can support large data sets in a 32-bit application using memory mapping, but doing so requires additional code. Thus, for new applications, you should carefully evaluate whether supporting such large data sets is required in the 32-bit version of your application.
Applications that use 64-bit integer math extensively may see performance gains on both PowerPC- and Intel-based Macintosh computers. In 32-bit applications, 64-bit integer math is performed by breaking the 64-bit integer into a pair of 32-bit quantities. It is possible to perform 64-bit computation in leaf functions in 32-bit applications, but this functionality generally offers only limited performance improvement.
If you are writing an application, any plug-ins used by your application must be compiled for the same processor architecture and address width as the running application. For this reason, if your application depends heavily upon plug-ins (audio applications, for example), you may want to ship it as 32-bit for now.
Alternatively, you might add a user-selectable install option for the 64-bit version and then glue the two binaries together using the lipo command in a postinstall script. Doing so will encourage plug-in developers to update their code for 64-bit execution and at the same time will minimize user complaints.
If you are writing a plug-in, you should begin transitioning your plug-in to 64-bit so that when 64-bit versions of the supporting application become available, your plug-in will not get left behind.
Beginning in Snow Leopard, Apple-developed applications (including key components of the OS) are transitioning to 64-bit executables. This means that users with 64-bit-capable computers will be running the 64-bit slice of these key system components. Any plug-ins (screen savers, printer dialog extensions, and so on) that need to load in these applications must be recompiled as 64-bit plug-ins.
As a special exception, the System Preferences application provides a 32-bit fallback mode. If the user selects a system preferences pane without a 64-bit slice, it relaunches itself as a 32-bit executable (after displaying a dialog box). To maximize your users’ experience, however, you should still transition these preference panes to 64-bit plug-ins at your earliest convenience.
The memory usage of a 64-bit application may be significantly larger than for a 32-bit version of the same application. The difference in usage varies from application to application depending on what percentage of data structures contain data members that are larger in a 64-bit process. For this reason, on a computer with a small amount of memory, you may not want to run the 64-bit version of your application even if the computer can support it.
This concern is described in more detail in “Performance Optimization,” along with some tips for improving your memory usage in a 64-bit environment.
If you need your application to do 64-bit integer math, you can do so already in Mac OS X by using long long data types.
On PowerPC, if you compile your application using the -mcpu=G5 flag (to use G5-specific optimizations) and the -mpowerpc64 flag (to allow 64-bit math instructions), your 32-bit application can achieve 64-bit math performance comparable to that of a 64-bit application. This technique has some performance disadvantages, however, because nonleaf functions still work with 64-bit integer values in a pair of 32-bit registers due to the design of the 32-bit function call ABI.
Applications compiled with the -mcpu=G5 and -mpowerpc64 flags will not execute on non-G5 hardware. If you need to support G3 or G4 hardware, you can still do 64-bit math without these options with only a small performance penalty.
If your application accesses large files in a streaming fashion, such as an audio or video application, you can use existing Mac OS X file interfaces. Nearly all the file interfaces in Mac OS X are capable of handling 64-bit offsets even in 32-bit applications. However, Mac OS APIs that existed prior to HFS+ (such as QuickTime) may require you to use different functions for large file access. See the latest documentation for the APIs you are using for more specific information.
If you have a performance-critical application that would benefit from more than 4 GB of memory, you should read the section “Using mmap to Simulate a Large Address Space.”
As an alternative to using a large address space, you can simulate one in your application by creating your own pseudo-virtual-memory engine using the mmap system call. Instead of referring to data using pointers, use a data structure that contains a reference to a file and an offset into that file.
At first glance, this technique may seem incredibly inefficient, because you would expect the operating system to constantly move data into and out of memory. In practice, however, the Mac OS X VM system caches open files heavily. Thus, even though your application has only 4 GB of address space for use at any given time, your application can actually use far more than 4 GB of physical memory concurrently in the form of disk caches.
For this reason, if you do not close the file descriptor after you call mmap on the file, and if your computer’s RAM is large enough to hold your application’s entire data set, most of the memory mapping and unmapping operations should require little or no I/O. If the physical RAM is not large enough, your data ends up being paged to disk anyway; thus your performance is only marginally affected. Upon closing the file descriptor, these pages are released (after flushing dirty pages to disk).
#include <inttypes.h> |
#include <stdio.h> |
main() |
{ |
uint64_t mem_size; |
size_t len = sizeof(mem_size); |
int fail; |
if (sysctlbyname("hw.memsize", &mem_size, &len, NULL, 0) != 0) { |
perror("sctest"); |
} else { |
printf("RAM size in bytes is %" PRIu64 ".\n", mem_size); |
} |
} |
When you need to access a piece of data, your in-application virtual memory code checks to see whether that information has already been mapped into memory. If not, it should map the data using mmap. If the mmap operation fails, your application has probably run out of usable virtual address space and must therefore choose a “victim” memory region and unmap it.
For optimal performance, a user-space VM system must use proper mapping granularity for the data. If the data divides neatly into fixed-size objects, these provide good units for mapping. Because the length of the mapped region always rounds up to the nearest page size boundary, you will usually find that performance improves if you map in groups of objects.
#include <inttypes.h> |
#include <stdio.h> |
main() |
{ |
uint64_t page_size; |
size_t len = sizeof(page_size); |
int fail; |
if (sysctlbyname("hw.pagesize", &page_size, &len, NULL, 0) != 0) { |
perror("sctest"); |
} else { |
printf("RAM size in bytes is %" PRIu64 ".\n", page_size); |
} |
} |
If your data doesn’t have convenient fixed-size objects, you may choose an arbitrary page size (no less than the underlying physical page size) and divide the data into pages of that size. (A power-of-2 boundary is particularly convenient because you can then calculate the page number and the offset into the page by using bit masks and shift operations.)
No matter how you map the data, unless you do a lot of access pattern profiling, you may find it difficult to guess a good mapping granularity for most applications. For this reason, you should design your code with proper abstraction so that you can more easily adjust the mapping granularity in the future.
The code sample in “Simulating a 64-Bit Address Space with mmap and munmap ” demonstrates the use of mmap to map and unmap pieces of a large file.
Last updated: 2009-04-17