Hello, I am porting an app to arm64 apple using this ABI differences from the standard arm64 https://developer.apple.com/documentation/xcode/writing_arm64_code_for_apple_platforms
However, I found out that HFA arguments are aligned to 4 bytes on stack, when standard arm64 convention requires 8 bytes: developer.arm.com/documentation/ihi0055/latest
"If the argument is an HFA or an HVA then the NSRN is set to 8 and the size of the argument is rounded up to the nearest multiple of 8 bytes."
struct Vector3
{
		float x;
		float y;
		float z;		
};
float __stdcall testVector3(
		Vector3 v1,
		float	 f1,
		float	 f2,
		float	 f3,
		float	 f4,
		float	 f5,
		float	 f6,
		float	 f7,
		Vector3 v2,
		float	 f8,
		float	 f9,
		float f10,
		float f11,
		float f12,
		float f13)
so for such method I was expecting f6 and later arguments on the stack, but v2 to have 16 byte size (according to arm64 abi), however, I see that it takes 12 bytes and there is no padding between v2 and f8.
thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 3.1
		frame #0: 0x00000001000038f8 a.out`nativeCall_PInvoke_Vector3Arg_Unix(Vector3, float, float, float, float, float, float, float, Vector3, float, float, float, float, float, float)
a.out`nativeCall_PInvoke_Vector3Arg_Unix:>	0x1000038f8 <+0>:	sub		sp, sp, #0x80						 ; =0x80
		0x1000038fc <+4>:	stp		x29, x30, [sp, #0x70]
		0x100003900 <+8>:	add		x29, sp, #0x70						; =0x70
		0x100003904 <+12>: ldr		w8, [x29, #0x10]
(lldb) memory read -s4 -f float -c20	$sp
0x16fdff9b0: 6 // float f6
0x16fdff9b4: 7 // float f7
0x16fdff9b8: 4 // Vector.x
0x16fdff9bc: 5 // Vector.y
0x16fdff9c0: 6 // Vector.z
0x16fdff9c4: 8 // float f8, where is padding?
0x16fdff9c8: 9 // float f9
Is it an expected behavior? Is it documented somewhere?