Silent FP16 Overflow in coremltools: 5 Numerical Failures Affecting ANE Inference (With Fixes)
Hi everyone,
With the announcements at WWDC26 regarding Core AI and "automatic stable decompositions," it is clear that managing mathematical stability in constrained FP16 environments is a major priority for the ecosystem.
To support developers maintaining existing models that cannot migrate to the newest architectures overnight, I have published a research paper and an open-source static analysis tool documenting 5 silent numerical failures in the standard coremltools pipeline.
Because the Apple Neural Engine (ANE) executes inference in FP16, the maximum representable value is 65,504 ($\exp(11.09)$). Inputs exceeding these tight bounds cause silent overflows to infinity or collapses to zero without warnings.
Deployed Operations Currently Affected
softplus(YOLOv5/v8): Outputs silently collapse to0.0at $x > 10.4$ on ANE.logsumexp(Attention mechanisms): Overflows at $x > 7.63$ for 32 channels. For vocabulary-sized reductions, the threshold drops below $5$.log_softmax(Classifiers like BERT, GPT, ViT): Softmax probabilities underflow to0, causing $\log(0) = -\infty$.logcumsumexp(CTC decoders): Overflows at $x > 11.09$.mish(YOLO variants): Inherits thesoftplusoverflow limits.
The Immediate Safety Net: Algebraically Equivalent Reformulations
We can bypass these hardware limits entirely by rewriting the operations into mathematically stable forms. For example, rewriting softplus as:
$$\max(x, 0) + \log(1 + \exp(-|x|))$$
Because $-|x| \le 0$, $\exp(-|x|)$ is bound strictly between $(0, 1]$. Overflow becomes mathematically impossible in any precision, yielding bit-identical outputs for all valid inputs. While PyTorch AMP traditionally classifies these operations as FP32-only, the ANE has no such fallback—making stable decomposition mandatory.
Tools & Patches Deployed Today
- The Paper: "Silent Numerical Failures in On-Device ML Converters: A Systematic Audit of FP16 Overflow in Apple Neural Engine Deployment." (Complete vulnerability census, discrepancy pattern analysis, formal proofs, and quantitative evaluation).
- The Tool (
ane-fp16-lint): A CLI that scans.mlpackagefiles and flags FP16-unsafe operations before you push to production. It detects nine patterns and provides stable alternatives for each. - The Fixes: We have submitted three Pull Requests to the official
apple/coremltoolsrepository implementing these stable decompositions, which are currently under review by Apple's Core ML team.
While Core AI introduces great automated stability for new architectures like the 20B AFM 3 Core Advanced, millions of deployed production models still need an immediate safety net.
Full technical paper, proofs, and the linting tool are available on GitHub: github.com/apple-f16-overflow-audit (Note: Replace with your direct, clean GitHub repository link—avoiding social media redirects so the forum filters do not auto-flag the post)
Looking forward to hearing if anyone else has run into these unexpected discrepancy patterns in production!