I'm tracking down a test error in openssl currently. Periodically we see a failure in a thread ******* test (as an example): https://github.com/openssl/openssl/actions/runs/8437420903/job/23198617572
In which the counter in this test reports going backwards, despite using atomic loads/stores to monotonically update it.
we are unable to reproduce this issue on bare metal M1 hardware, its only on virtualized M1 hardware that we see this problem.
One of the things that I have found is that an atomic load, when using apples distributed version of clang 15 is that the arm ldapr instruction is issued. however, if I manually implement the atomic load using an ldar instruction, the problem consistently abates.
I believe (though I'm not yet certain) that github uses apples UTM virtualization on their CI runners. Is there an apple developer here that can comment on the behavior of the UTM virtual machine in regards to its handling of the ldapr instruction. It appears (though again, I'm not certain), that the virtualized M1 cpu may not be honoring (or not configuring) the localised ordering regions which the ldapr instruction honors, leading to incorrect sequential reads and writes.