Hello,
I am currently working on a research project under ENINCA Consulting, focused on advanced diagnostic tools for pseudorandom number generators (structural metrics, multi-seed stability, cross-architecture reproducibility, and complementary indicators to TestU01).
To validate this diagnostic framework, I prototyped a small non-linear 64-bit PRNG (not as a goal in itself, but simply as a vehicle to test the methodology).
During these evaluations, I observed something interesting on Apple Silicon (Mac M1):
• bit-exact reproducibility between M1 ARM CPU and M1 Metal GPU,
• full BigCrush pass on both CPU and Metal backends,
• excellent p-values,
• stable behaviour across multiple seeds and runs.
This was not the intended objective, the goal was mainly to validate the diagnostic concepts, but these results raised some questions about deterministic compute behaviour in Metal.
My question: Is there any official guidance on achieving (or expecting) deterministic RNG or compute behaviour across CPU ↔ Metal GPU on Apple Silicon? More specifically:
• Are deterministic compute kernels expected or guaranteed on Metal for scientific workloads?
• Are there recommended patterns or best practices to ensure reproducibility across GPU generations (M1 → M2 → M3 → M4)?
• Are there known Metal features that can introduce non-determinism?
I am not sharing the internal recurrence (this work is proprietary), but I can discuss the high-level diagnostic observations if helpful.
Thank you for any insight, very interested in how the Metal engineering team views deterministic compute patterns on Apple Silicon.
Pascal ENINCA Consulting