The example in this section is designed to show you why byte ordering matters. Take a look at the C data structure defined in Listing 3-1. It contains a four-byte integer, a character string, and a two-byte integer. The listing also initializes the structure.
Listing 3-1 A data structure that contains multibyte and single-byte data
typedef struct { |
uint32_t myOptions; |
char myStringArray [7]; |
short myVariable; |
} myDataStructure; |
myDataStructure aStruct; |
aStruct.myOptions = 0xfeedface; |
strcpy(aStruct.myStringArray, "safari"); |
aStruct.myVariable = 0x1234; |
Figure 3-1 compares how this data structure is stored in memory on big-endian and little-endian systems. In a big-endian system, memory is organized with the address of each data byte increasing from most significant to least significant. In a little-endian system, memory is organized with the address of each data byte increasing from the least significant to the most significant.
As you look at Figure 3-1, note the following:
Multibyte data, such as the 32-bit and 16-bit variables shown in the figure, are stored differently between big-endian and little-endian systems. As you can see in the figure, big-endian systems store data in memory so that the most significant byte of the data is stored in the address with the lowest value. Little-endian systems store data in memory so that the most significant byte of the data is in the address with the highest value. Hence, the least significant byte of the myOptions variable (0xce) is stored in memory location 0x00000003 on the big-endian system while it is stored in memory location 0x00000000 on the little-endian system.
Single-byte data, such as the char values in the myStringArray character array, are stored in the same memory location on either system regardless of the byte ordering format of the system.
Each system pads bytes to maintain four-byte data alignment. Padded bytes in the figure are designated by a shaded box that contains an asterisk.
The byte ordering of multibyte data in memory matters if you are reading data written on one architecture from a system that uses a different architecture and you access the data on a byte-by-byte basis. For example, if your application is written to access the second byte of the myOptions variable, then when you read the data from a system that uses the opposite byte ordering scheme, you’ll end up retrieving the first byte of the myOptions variable instead of the second one.
Suppose the example data values that are initialized by the code shown in Listing 3-1 are generated on a little-endian system and saved to disk. Assume that the data is written to disk in byte-address order. When read from disk by a big-endian system, the data is again laid out in memory as shown in Figure 3-1. The problem is that the data is still in little-endian byte order even though it is interpreted on a big-endian system. This difference causes the values to be evaluated incorrectly. In this example, the value of the field myOptions should be 0xfeedface, but because of the incorrect byte ordering it is evaluated as 0xcefaedfe.
Note: The terms big-endian and little-endian come from Jonathan Swift’s eighteenth-century satire Gulliver’s Travels. The subjects of the empire of Blefuscu were divided into two factions: those who ate eggs starting from the big end and those who ate eggs starting from the little end.
Last updated: 2007-02-26