Possible memory model error in Virtualized M1 cpus

I'm tracking down a test error in openssl currently. Periodically we see a failure in a thread ******* test (as an example): https://github.com/openssl/openssl/actions/runs/8437420903/job/23198617572

In which the counter in this test reports going backwards, despite using atomic loads/stores to monotonically update it.

we are unable to reproduce this issue on bare metal M1 hardware, its only on virtualized M1 hardware that we see this problem.

One of the things that I have found is that an atomic load, when using apples distributed version of clang 15 is that the arm ldapr instruction is issued. however, if I manually implement the atomic load using an ldar instruction, the problem consistently abates.

I believe (though I'm not yet certain) that github uses apples UTM virtualization on their CI runners. Is there an apple developer here that can comment on the behavior of the UTM virtual machine in regards to its handling of the ldapr instruction. It appears (though again, I'm not certain), that the virtualized M1 cpu may not be honoring (or not configuring) the localised ordering regions which the ldapr instruction honors, leading to incorrect sequential reads and writes.

An update here.

when compiling with the native apple clang 15 compiler on macosx, the code in question:

return __atomic_load_n(p, __ATOMIC_ACQUIRE)

Produces the following assembly

` 0000000100120488 <_ossl_rcu_uptr_deref>:

100120488: f8bfc000 ldapr x0, [x0]

10012048c: d65f03c0 ret `

Whereas the homebrew gcc-13 compiler produces: ` 0000000100143a40 <_ossl_rcu_uptr_deref>:

100143a40: c8dffc00 ldar x0, [x0]

100143a44: d65f03c0 ret

100143a48: d503201f nop

00143a4c: d503201f nop `

It seems like based on the ARM LDAPR instruction docs, that the safe use of the ldapr instruction is dependent on the configuration of the Local Ordering registers in the coprocessor. As such this seems like a compiler bug to issue the instruction unilaterally

Possible memory model error in Virtualized M1 cpus
 
 
Q