Intel vs. ARM memory alignment

Question

Created Jun ’20

Replies 3

Boosts 0

Views 4.3k

Participants 3

Does anyone know if Apple has posted a technical comparison or summary of the differences between Intel and ARM processor capabilities/limitations?

Specifically, I have code that de-serializes a custom stream of data. Right now, this can result in non-word-aligned fetches and stores of word and double word values.

68K CPUs used to choke on this, but all modern 64-bit Intel processors handle it just fine. What about ARM?

Also curious to know if 64-bit fetches and stores are always atomic.

Answered by Developer Tools Engineer in 616010022

ARM64 correctly handles unaligned loads and stores at most widths, but that doesn't usually matter for most programmers, because C requires pointers to be adequately aligned for their type regardless of the underlying processor, and programs that violate this rule have undefined behavior. The fact that the code will appear to work if the compiler emits a normal load or store doesn't change the fact that it's not allowed and that the behavior isn't guaranteed. You need to tell the compiler that you're accessing unaligned memory.

There are several ways to tell the compiler that, and the best approach depends on how you're doing the access. If you're reading from a struct, you can simply make the struct packed to tell the compiler that its fields aren't required to be aligned. If you're reading from a pointer, you can make sure that the pointer is to a typedef declared with attribute((aligned(1))), e.g.:

Code Block typedef uint32_t unaligned_int32 attribute((aligned(1)));

(Apologies, attribute is supposed to be surrounded by double underscores, but the forum software appears to mangle it.)

Similarly, 64-bit loads and stores are guaranteed by the processor to be done without tearing, but the compiler does not make such a guarantee. If you need this guarantee, you should use relaxed atomics.

Boost

Answer 1

Developer Tools Engineer OP

Apple

Jun ’20

Accepted Answer

ARM64 correctly handles unaligned loads and stores at most widths, but that doesn't usually matter for most programmers, because C requires pointers to be adequately aligned for their type regardless of the underlying processor, and programs that violate this rule have undefined behavior. The fact that the code will appear to work if the compiler emits a normal load or store doesn't change the fact that it's not allowed and that the behavior isn't guaranteed. You need to tell the compiler that you're accessing unaligned memory.

There are several ways to tell the compiler that, and the best approach depends on how you're doing the access. If you're reading from a struct, you can simply make the struct packed to tell the compiler that its fields aren't required to be aligned. If you're reading from a pointer, you can make sure that the pointer is to a typedef declared with attribute((aligned(1))), e.g.:

Code Block typedef uint32_t unaligned_int32 attribute((aligned(1)));

(Apologies, attribute is supposed to be surrounded by double underscores, but the forum software appears to mangle it.)

Similarly, 64-bit loads and stores are guaranteed by the processor to be done without tearing, but the compiler does not make such a guarantee. If you need this guarantee, you should use relaxed atomics.

3

Answer 2

Developer Tools Engineer OP

Apple

Jun ’20

In addition to the explanation of all the alignment requirement by language (all basic data types in C language family actually have the same alignment requirement between Intel 64bit and ARM 64 bit ABI), you can use UBSAN to identify all the misaligned data access and fix them by either using memcpy to aligned data structure or adding __attribute__((packed)) to the types you are accessing.

2

Answer 3

dawn2dusk OP

Jun ’20

Thank you so much for this information. I was using alignment attributes with my pointers and structs, but wanted to make sure that ARM64 would cooperate.

I regularly use atomics (atomic_store, atomic_exchange, and so on), but I have a few cases where the code checks a pointer/value and isn't at all concerned about concurrency or race conditions ... but I *am* very concerned about loading only half a pointer/value ;) So that's good to know.

Thanks again for the great support.

0