arm64 apple HFA alignment.

Hello, I am porting an app to arm64 apple using this ABI differences from the standard arm64 https://developer.apple.com/documentation/xcode/writing_arm64_code_for_apple_platforms

However, I found out that HFA arguments are aligned to 4 bytes on stack, when standard arm64 convention requires 8 bytes: developer.arm.com/documentation/ihi0055/latest
"If the argument is an HFA or an HVA then the NSRN is set to 8 and the size of the argument is rounded up to the nearest multiple of 8 bytes."
Code Block C
struct Vector3
{
float x;
float y;
float z;
};
float stdcall testVector3(
Vector3 v1,
float f1,
float f2,
float f3,
float f4,
float f5,
float f6,
float f7,
Vector3 v2,
float f8,
float f9,
float f10,
float f11,
float f12,
float f13)

so for such method I was expecting f6 and later arguments on the stack, but v2 to have 16 byte size (according to arm64 abi), however, I see that it takes 12 bytes and there is no padding between v2 and f8.
Code Block
thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 3.1
frame #0: 0x00000001000038f8 a.out`nativeCall_PInvoke_Vector3Arg_Unix(Vector3, float, float, float, float, float, float, float, Vector3, float, float, float, float, float, float)
a.out`nativeCall_PInvoke_Vector3Arg_Unix:
-> 0x1000038f8 <+0>: sub sp, sp, #0x80 ; =0x80
0x1000038fc <+4>: stp x29, x30, [sp, #0x70]
0x100003900 <+8>: add x29, sp, #0x70 ; =0x70
0x100003904 <+12>: ldr w8, [x29, #0x10]
(lldb) memory read -s4 -f float -c20 $sp
0x16fdff9b0: 6 // float f6
0x16fdff9b4: 7 // float f7
0x16fdff9b8: 4 // Vector.x
0x16fdff9bc: 5 // Vector.y
0x16fdff9c0: 6 // Vector.z
0x16fdff9c4: 8 // float f8, where is padding?
0x16fdff9c8: 9 // float f9


Is it an expected behavior? Is it documented somewhere?


Note that for non-HFA structs we have padding:
Code Block
struct SmallStruct <- takes 8 bytes on the stack.
{
short s;
};
//attribute((noinline))
int stdcall callWithSmallStruct(int i1, int i2, int i3, int i4, int i5, int i6, int i7, int i8, SmallStruct s, int i9, int i10, int i11)
{
if (i9 != 9 i10 != 10 || i11 != 11)
{
printf("%d, %d, %d, %d, %d, %d, %d, %d, %d. %d, %d, %d\n", i1,i2,i3,i4,i5,i6,i7,i8,(int)s.s,i9,i10,i11);
return 101;
}
return 100;
}
struct BigStruct <- takes 16 bytes.
{
int x;
int y;
int z;
};

Hi, only just noticed this question, but divergence from the AAPCS is documented on the Apple page you reference:

When passing arguments to functions, Apple platforms diverge from the ARM64 standard ABI in the following ways:

  • Function arguments may consume slots on the stack that are not multiples of 8 bytes. If the total number of bytes for stack-based arguments is not a multiple of 8 bytes, insert padding on the stack to maintain the 8-byte alignment requirements.
arm64 apple HFA alignment.
 
 
Q