Problem: CoreML produces NaN on GPU (works fine on CPU) when running transformer attention with fused QKV projection on macOS 26.2.
Root cause: The common::fuse_transpose_matmul optimization pass triggers a Metal kernel bug when sliced tensors feed into matmul(transpose_y=True).
Workaround: pipeline = ct.PassPipeline.DEFAULT pipeline.remove_passes(['common::fuse_transpose_matmul']) mlmodel = ct.convert(model, ..., pass_pipeline=pipeline)
Minimal repro: https://github.com/imperatormk/coreml-birefnet/blob/main/apple_bug_repro.py
Affected: Any ViT/Swin/transformer with fused QKV attention (BiRefNet, etc.)
Has anyone else hit this? Filed FB report too.