We’ve encountered what appears to be a CoreML regression between macOS 26.0.1 and macOS 26.1 Beta.
In macOS 26.0.1, CoreML models run and produce correct results. However, in macOS 26.1 Beta, the same models produce scrambled or corrupted outputs, suggesting that tensor memory is being read or written incorrectly. The behavior is consistent with a low-level stride or pointer arithmetic issue — for example, using 16-bit strides on 32-bit data or other mismatches in tensor layout handling.
Reproduction
Install ON1 Photo RAW 2026 or ON1 Resize 2026 on macOS 26.0.1.
Use the newest Highest Quality resize model, which is Stable Diffusion–based and runs through CoreML.
Observe correct, high-quality results.
Upgrade to macOS 26.1 Beta and run the same operation again.
The output becomes visually scrambled or corrupted.
We are also seeing similar issues with another Stable Diffusion UNet model that previously worked correctly on macOS 26.0.1. This suggests the regression may affect multiple diffusion-style architectures, likely due to a change in CoreML’s tensor stride, layout computation, or memory alignment between these versions.
Notes
The affected models are exported using standard CoreML conversion pipelines.
No custom operators or third-party CoreML runtime layers are used.
The issue reproduces consistently across multiple machines.
It would be helpful to know if there were changes to CoreML’s tensor layout, precision handling, or MLCompute backend between macOS 26.0.1 and 26.1 Beta, or if this is a known regression in the current beta.
Explore the power of machine learning and Apple Intelligence within apps. Discuss integrating features, share best practices, and explore the possibilities for your app here.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
I'm working on my first model that detects bowling score screens, and I have it working with pictures no problem. But when it comes to video, I have a sizing issue.
I added my model to a small app I wrote for taking a picture of a Bowling Scoring Screen, where my model will frame the screens in the video feed from the camera. My model works, but my boxes are about 2/3 the size of the screens being detected. I don't understand the theory of the video stream the camera is feeding me. What I mean is that I don't want to make tweaks to the size of my rectangles by making them larger, and I'm not sure if the video feed is larger than what I'm detecting in code.
Questions I have are like is the video feed a certain resolution like 1980x something, or a much higher resolution in the 12 megapixel range?
On a static image of say 1920x something, My alignment is perfect.
AI says that it's my model training, that I'm training on square images but video is 16:9. Or that I'm producing 4:3 images in a 16:9 environment.
I'm missing something here but not sure what it is. I already wrote code to force it to fit, but reverted back to trying for a natural fit.
Topic:
Machine Learning & AI
SubTopic:
Core ML
I'm experimenting with Foundation Models and I'm trying to understand how to define a Tool whose input argument is defined at runtime. Specifically, I want a Tool that takes a single String parameter that can only take certain values defined at runtime.
I think my question is basically the same as this one: https://developer.apple.com/forums/thread/793471 However, the answer provided by the engineer doesn't actually demonstrate how to create the GenerationSchema. Trying to piece things together from the documentation that the engineer linked to, I came up with this:
let citiesDefinedAtRuntime = ["London", "New York", "Paris"]
let citySchema = DynamicGenerationSchema(
name: "CityList",
properties: [
DynamicGenerationSchema.Property(
name: "city",
schema: DynamicGenerationSchema(
name: "city",
anyOf: citiesDefinedAtRuntime
)
)
]
)
let generationSchema = try GenerationSchema(root: citySchema, dependencies: [])
let tools = [CityInfo(parameters: generationSchema)]
let session = LanguageModelSession(tools: tools, instructions: "...")
With the CityInfo Tool defined like this:
struct CityInfo: Tool {
let name: String = "getCityInfo"
let description: String = "Get information about a city."
let parameters: GenerationSchema
func call(arguments: GeneratedContent) throws -> String {
let cityName = try arguments.value(String.self, forProperty: "city")
print("Requested info about \(cityName)")
let cityInfo = getCityInfo(for: cityName)
return cityInfo
}
func getCityInfo(for city: String) -> String {
// some backend that provides the info
}
}
This compiles and usually seems to work. However, sometimes the model will try to request info about a city that is not in citiesDefinedAtRuntime. For example, if I prompt the model with "I want to travel to Tokyo in Japan, can you tell me about this city?", the model will try to request info about Tokyo, even though this is not in the citiesDefinedAtRuntime array.
My understanding is that this should not be possible – constrained generation should only allow the LLM to generate an input argument from the list of cities defined in the schema.
Am I missing something here or overcomplicating things?
What's the correct way to make sure the LLM can only call a Tool with an input parameter from a set of possible values defined at runtime?
Many thanks!
Topic:
Machine Learning & AI
SubTopic:
Foundation Models
I've created the following Foundation Models Tool, which uses the .anyOf guide to constrain the LLM's generation of suitable input arguments. When calling the tool, the model is only allowed to request one of a fixed set of sections, as defined in the sections array.
struct SectionReader: Tool {
let article: Article
let sections: [String]
let name: String = "readSection"
let description: String = "Read a specific section from the article."
var parameters: GenerationSchema {
GenerationSchema(
type: GeneratedContent.self,
properties: [
GenerationSchema.Property(
name: "section",
description: "The article section to access.",
type: String.self,
guides: [.anyOf(sections)]
)
]
)
}
func call(arguments: GeneratedContent) async throws -> String {
let requestedSectionName = try arguments.value(String.self, forProperty: "section")
...
}
}
However, I have found that the model will sometimes call the tool with invalid (but plausible) section names, meaning that .anyOf is not actually doing its job (i.e. requestedSectionName is sometimes not a member of sections).
The documentation for the .anyOf guide says, "Enforces that the string be one of the provided values."
Is this a bug or have I made a mistake somewhere?
Many thanks for any help you provide!
Topic:
Machine Learning & AI
SubTopic:
Foundation Models
Environment
MacOC 26
Xcode Version 26.0 beta 7 (17A5305k)
simulator: iPhone 16 pro
iOS: iOS 26
Problem
NLContextualEmbedding.load() fails with the following error
In simulator
Failed to load embedding from MIL representation: filesystem error: in create_directories: Permission denied ["/var/db/com.apple.naturallanguaged/com.apple.e5rt.e5bundlecache"]
filesystem error: in create_directories: Permission denied ["/var/db/com.apple.naturallanguaged/com.apple.e5rt.e5bundlecache"]
Failed to load embedding model 'mul_Latn' - '5C45D94E-BAB4-4927-94B6-8B5745C46289'
assetRequestFailed(Optional(Error Domain=NLNaturalLanguageErrorDomain Code=7 "Embedding model requires compilation" UserInfo={NSLocalizedDescription=Embedding model requires compilation}))
in #Playground
I'm new to this embedding model. Not sure if it's caused by my code or environment.
Code snippet
import Foundation
import NaturalLanguage
import Playgrounds
#Playground {
// Prefer initializing by script for broader coverage; returns NLContextualEmbedding?
guard let embeddingModel = NLContextualEmbedding(script: .latin) else {
print("Failed to create NLContextualEmbedding")
return
}
print(embeddingModel.hasAvailableAssets)
do {
try embeddingModel.load()
print("Model loaded")
} catch {
print("Failed to load model: \(error)")
}
}
I’d like to submit a feature request regarding the availability of Foundation Models in MessageFilter extensions.
Background
MessageFilter extensions play a critical role in protecting users from spam, phishing, and unwanted messages. With the introduction of Foundation Models and Apple Intelligence, Apple has provided powerful on-device natural language understanding capabilities that are highly aligned with the goals of MessageFilter.
However, Foundation Models are currently unavailable in MessageFilter extensions.
Why Foundation Models Are a Great Fit for MessageFilter
Message filtering is fundamentally a natural language classification problem. Foundation Models would significantly improve:
Detection of phishing and scam messages
Classification of promotional vs transactional content
Understanding intent, tone, and semantic context beyond keyword matching
Adaptation to evolving scam patterns without server-side processing
All of this can be done fully on-device, preserving user privacy and aligning with Apple’s privacy-first design principles.
Current Limitations
Today, MessageFilter extensions are limited to relatively simple heuristics or lightweight models. This often results in:
Higher false positives
Lower recall for sophisticated scam messages
Increased development complexity to compensate for limited NLP capabilities
Request
Could Apple consider one of the following:
Allowing Foundation Models to be used directly within MessageFilter extensions
Providing a constrained or optimized Foundation Model API specifically designed for MessageFilter
Enabling a supported mechanism for MessageFilter extensions to delegate inference to the containing app using Foundation Models
Even limited access (e.g. short text only, strict execution limits) would be extremely valuable.
Closing
Foundation Models have the potential to significantly raise the quality and effectiveness of message filtering on Apple platforms while maintaining strong privacy guarantees. Supporting them in MessageFilter extensions would be a major improvement for both developers and users.
Thank you for your consideration and for continuing to invest in on-device intelligence.
Hi! I noticed that on my father's M1 Max MacBook Pro (64gb ram) there's an option for style transfer which I don't see on my M1 MacBook Air (16gb ram). I am running macOS Tahoe and he is running macOS Sequoia.
Topic:
Machine Learning & AI
SubTopic:
Create ML
My app lets you create images with Image Playground. When the user approves an image I move it to the documents dir from the temp storage. With over a year of usage I’ve created a lot of images over time.
Out of nowhere the app stopped loading my custom creations from Image Playground saying it couldn’t find the files. It still had my VoiceOver strings I had added for each image and still had the custom categories I assigned them.
Debug code to look in the docs dir doesn’t find them. I downloaded the app’s container and only see the images I created as a test after the problem started.
But my ~70MB app is still taking up 300MB on my iPhone so it feels like they’re there but not accessible.
Is there anything else I can try?
When trying to open an encrypted CoreML model file on a system with SIP disabled, the error message is
Failed to generate key request for <...> with error: -42187
This should state that SIP is disabled and needs to be enabled.
We are developing Apple AI for overseas markets and adapting it for iPhone 17 and later models. When the system language and Siri language do not match—such as the system being in English while Siri is in Chinese—it may result in Apple AI being unusable. So, I would like to ask, how can this issue be resolved, and are there other reasons that might cause it to be unusable within the app?
Hi,
I am modifying the sample camera app that is here: https://developer.apple.com/tutorials/sample-apps/capturingphotos-camerapreview ... In the processPreviewImages, I am using the Vision APIs to generate a segmentation mask for a person/object, then compositing that person onto a different background (with some other filtering). The filtering and compositing is done via CoreImage. At the end, I convert the CIImage to a CGImage then to a SwiftUI Image. When I run it on my iPhone, it works fine, and has not crashed. When I run it on the iPhone with the debugger, it crashes within a few seconds with:
EXC_BAD_ACCESS in libRPAC.dylib`std::__1::__hash_table<std::__1::__hash_value_type<long, qos_info_t>, std::__1::__unordered_map_hasher<long, std::__1::__hash_value_type<long, qos_info_t>, std::__1::hash, std::__1::equal_to, true>, std::__1::__unordered_map_equal<long, std::__1::__hash_value_type<long, qos_info_t>, std::__1::equal_to, std::__1::hash, true>, std::__1::allocator<std::__1::__hash_value_type<long, qos_info_t>>>::__emplace_unique_key_args<long, std::__1::piecewise_construct_t const&, std::__1::tuple<long const&>, std::__1::tuple<>>:
It had previously been working fine with the debugger, so I'm not sure what has changed. Is there a difference in how the Vision APIs are executed if the debugger is attached vs. not?
Hi everyone,
I've been building an on-device AI safety layer called Newton Engine, designed to validate prompts before they reach FoundationModels (or any LLM). Wanted to share v1.3 and get feedback from the community.
The Problem
Current AI safety is post-training — baked into the model, probabilistic, not auditable. When Apple Intelligence ships with FoundationModels, developers will need a way to catch unsafe prompts before inference, with deterministic results they can log and explain.
What Newton Does
Newton validates every prompt pre-inference and returns:
Phase (0/1/7/8/9)
Shape classification
Confidence score
Full audit trace
If validation fails, generation is blocked. If it passes (Phase 9), the prompt proceeds to the model.
v1.3 Detection Categories (14 total)
Jailbreak / prompt injection
Corrosive self-negation ("I hate myself")
Hedged corrosive ("Not saying I'm worthless, but...")
Emotional dependency ("You're the only one who understands")
Third-person manipulation ("If you refuse, you're proving nobody cares")
Logical contradictions ("Prove truth doesn't exist")
Self-referential paradox ("Prove that proof is impossible")
Semantic inversion ("Explain how truth can be false")
Definitional impossibility ("Square circle")
Delegated agency ("Decide for me")
Hallucination-risk prompts ("Cite the 2025 CDC report")
Unbounded recursion ("Repeat forever")
Conditional unbounded ("Until you can't")
Nonsense / low semantic density
Test Results
94.3% catch rate on 35 adversarial test cases (33/35 passed).
Architecture
User Input
↓
[ Newton ] → Validates prompt, assigns Phase
↓
Phase 9? → [ FoundationModels ] → Response
Phase 1/7/8? → Blocked with explanation
Key Properties
Deterministic (same input → same output)
Fully auditable (ValidationTrace on every prompt)
On-device (no network required)
Native Swift / SwiftUI
String Catalog localization (EN/ES/FR)
FoundationModels-ready (#if canImport)
Code Sample — Validation
let governor = NewtonGovernor()
let result = governor.validate(prompt: userInput)
if result.permitted {
// Proceed to FoundationModels
let session = LanguageModelSession()
let response = try await session.respond(to: userInput)
} else {
// Handle block
print("Blocked: Phase \(result.phase.rawValue) — \(result.reasoning)")
print(result.trace.summary) // Full audit trace
}
Questions for the Community
Anyone else building pre-inference validation for FoundationModels?
Thoughts on the Phase system (0/1/7/8/9) vs. simple pass/fail?
Interest in Shape Theory classification for prompt complexity?
Best practices for integrating with LanguageModelSession?
Links
GitHub: https://github.com/jaredlewiswechs/ada-newton
Technical overview: parcri.net
Happy to share more implementation details. Looking for feedback, collaborators, and anyone else thinking about deterministic AI safety on-device.
Topic:
Machine Learning & AI
SubTopic:
Foundation Models
Tags:
Swift Packages
Machine Learning
Apple Intelligence
Hi,
I'm using LanguageModelSession and giving it two different tools to query data from a local database. I'm wondering how I can have the session generate structured content as the response that includes data one or both tools (or no tool at all).
Here is an example of what I'm trying to do:
Let's say the app has access to a database that contains information about exercise and sleep data (this is just an analogy). There are two tools, GetExerciseData() and GetSleepData(). The user may then prompt something like, "how well did I sleep in November". I have this working so that it calls through to the right tool, which would return a SleepSummary. However, I can't figure out how to have the session return the right structured data.
I can do this and get back good text data:
let response = session.respond(to: userInput), but I believe I want to do something like:
let response = session.respond(to: trimmed, generating: <SomeStructure?>) Sometimes the model I run one tool or the other, or both tools, or no tool at all.
Any help of what the right way to go about this would be much appreciated. Most of the example I found have to do with 1 tool.
I've built an iOS app with a novel approach to AI safety: a deterministic, pre-inference validation layer called Newton Engine.
Instead of relying on the LLM to self-moderate, Newton validates every prompt BEFORE it reaches the model. It uses shape theory and semantic analysis to detect:
• Corrosive frames (self-harm language patterns)
• Logical contradictions (requests that undermine themselves)
• Delegation attempts (asking AI to make human decisions)
• Jailbreak patterns (prompt injection, role-play escapes)
• Hallucination triggers (requests for fabricated citations)
The system achieves a 96% adversarial catch rate across 847 test cases, with zero false positives on benign prompts.
Key technical details:
• Pure Swift/SwiftUI, no external dependencies
• Runs entirely on-device (no server calls for validation)
• Deterministic (same input always produces same output)
• Auditable (full trace logging for every validation)
I'm preparing to submit to the App Store and wanted to ask:
Are there specific App Review guidelines I should reference for AI safety claims?
Is there interest from Apple in deterministic governance layers for Apple Intelligence integration?
Any recommendations for demonstrating safety compliance during review?
The app is called Ada, and the engine is open source at: github.com/jaredlewiswechs/ada-newton
Happy to share technical documentation or discuss the architecture with anyone interested.
See: parcri.net
Topic:
Machine Learning & AI
SubTopic:
Foundation Models
I have recently been having trouble with my iOS 18.2 beta update. It has been 2 weeks since I have updated to iOS 18.2 beta and joined the Genmoji and image playground waitlist. I am wondering how much longer I have to wait till my request is approved.
I'm trying the new RecognizeDocumentsRequest supposed to detect paragraphs (among other things) in a document.
I tried many source images, and I don't see the slightest difference compared to the old API (VN)RecognizedTextRequest
Is it supposed to not work or is it in beta?
I can’t seem to find a way to include an image when prompting the new on-device model in Xcode, even though Apple explicitly states that the model was trained and tested with image data (https://machinelearning.apple.com/research/apple-foundation-models-2025-updates).
Has anyone managed to get this working, or are VLM-style capabilities simply not exposed yet?
Topic:
Machine Learning & AI
SubTopic:
Foundation Models
Hi,
I'm not sure whether this is the appropriate forum for this topic. I just followed a link from the JAX Metal plugin page https://developer.apple.com/metal/jax/
I'm writing a Python app with JAX, and recent JAX versions fail on Metal. E.g. v0.8.2
I have to downgrade JAX pretty hard to make it work:
pip install jax==0.4.35 jaxlib==0.4.35 jax-metal==0.1.1
Can we get an updated release of jax-metal that would fix this issue?
Here is the error I get with JAX v0.8.2:
WARNING:2025-12-26 09:55:28,117:jax._src.xla_bridge:881: Platform 'METAL' is experimental and not all JAX functionality may be correctly supported!
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
W0000 00:00:1766771728.118004 207582 mps_client.cc:510] WARNING: JAX Apple GPU support is experimental and not all JAX functionality is correctly supported!
Metal device set to: Apple M3 Max
systemMemory: 36.00 GB
maxCacheSize: 13.50 GB
I0000 00:00:1766771728.129886 207582 service.cc:145] XLA service 0x600001fad300 initialized for platform METAL (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1766771728.129893 207582 service.cc:153] StreamExecutor device (0): Metal, <undefined>
I0000 00:00:1766771728.130856 207582 mps_client.cc:406] Using Simple allocator.
I0000 00:00:1766771728.130864 207582 mps_client.cc:384] XLA backend will use up to 28990554112 bytes on device 0 for SimpleAllocator.
Traceback (most recent call last):
File "<string>", line 1, in <module>
import jax; print(jax.numpy.arange(10))
~~~~~~~~~~~~~~~~^^^^
File "/Users/florin/git/FlorinAndrei/star-cluster-simulator/.venv/lib/python3.13/site-packages/jax/_src/numpy/lax_numpy.py", line 5951, in arange
return _arange(start, stop=stop, step=step, dtype=dtype,
out_sharding=sharding)
File "/Users/florin/git/FlorinAndrei/star-cluster-simulator/.venv/lib/python3.13/site-packages/jax/_src/numpy/lax_numpy.py", line 6012, in _arange
return lax.broadcasted_iota(dtype, (size,), 0, out_sharding=out_sharding)
~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/florin/git/FlorinAndrei/star-cluster-simulator/.venv/lib/python3.13/site-packages/jax/_src/lax/lax.py", line 3415, in broadcasted_iota
return iota_p.bind(dtype=dtype, shape=shape,
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
dimension=dimension, sharding=out_sharding)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/florin/git/FlorinAndrei/star-cluster-simulator/.venv/lib/python3.13/site-packages/jax/_src/core.py", line 633, in bind
return self._true_bind(*args, **params)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/Users/florin/git/FlorinAndrei/star-cluster-simulator/.venv/lib/python3.13/site-packages/jax/_src/core.py", line 649, in _true_bind
return self.bind_with_trace(prev_trace, args, params)
~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/florin/git/FlorinAndrei/star-cluster-simulator/.venv/lib/python3.13/site-packages/jax/_src/core.py", line 661, in bind_with_trace
return trace.process_primitive(self, args, params)
~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
File "/Users/florin/git/FlorinAndrei/star-cluster-simulator/.venv/lib/python3.13/site-packages/jax/_src/core.py", line 1210, in process_primitive
return primitive.impl(*args, **params)
~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/Users/florin/git/FlorinAndrei/star-cluster-simulator/.venv/lib/python3.13/site-packages/jax/_src/dispatch.py", line 91, in apply_primitive
outs = fun(*args)
jax.errors.JaxRuntimeError: UNKNOWN: -:0:0: error: unknown attribute code: 22
-:0:0: note: in bytecode version 6 produced by: StableHLO_v1.13.0
--------------------
For simplicity, JAX has removed its internal frames from the traceback of the following exception. Set JAX_TRACEBACK_FILTERING=off to include these.
I0000 00:00:1766771728.149951 207582 mps_client.h:209] MetalClient destroyed.
Greetings, and Happy Holidays,
I've been building an on-device AI safety layer called Newton Engine, designed to validate prompts before they reach FoundationModels (or any LLM). Wanted to share v1.3 and get feedback from the community.
The Problem
Current AI safety is post-training — baked into the model, probabilistic, not auditable. When Apple Intelligence ships with FoundationModels, developers will need a way to catch unsafe prompts before inference, with deterministic results they can log and explain.
What Newton Does
Newton validates every prompt pre-inference and returns:
Phase (0/1/7/8/9)
Shape classification
Confidence score
Full audit trace
If validation fails, generation is blocked. If it passes (Phase 9), the prompt proceeds to the model.
v1.3 Detection Categories (14 total)
Jailbreak / prompt injection
Corrosive self-negation ("I hate myself")
Hedged corrosive ("Not saying I'm worthless, but...")
Emotional dependency ("You're the only one who understands")
Third-person manipulation ("If you refuse, you're proving nobody cares")
Logical contradictions ("Prove truth doesn't exist")
Self-referential paradox ("Prove that proof is impossible")
Semantic inversion ("Explain how truth can be false")
Definitional impossibility ("Square circle")
Delegated agency ("Decide for me")
Hallucination-risk prompts ("Cite the 2025 CDC report")
Unbounded recursion ("Repeat forever")
Conditional unbounded ("Until you can't")
Nonsense / low semantic density
Test Results
94.3% catch rate on 35 adversarial test cases (33/35 passed).
Architecture
User Input
↓
[ Newton ] → Validates prompt, assigns Phase
↓
Phase 9? → [ FoundationModels ] → Response
Phase 1/7/8? → Blocked with explanation
Key Properties
Deterministic (same input → same output)
Fully auditable (ValidationTrace on every prompt)
On-device (no network required)
Native Swift / SwiftUI
String Catalog localization (EN/ES/FR)
FoundationModels-ready (#if canImport)
Code Sample — Validation
let governor = NewtonGovernor()
let result = governor.validate(prompt: userInput)
if result.permitted {
// Proceed to FoundationModels
let session = LanguageModelSession()
let response = try await session.respond(to: userInput)
} else {
// Handle block
print("Blocked: Phase \(result.phase.rawValue) — \(result.reasoning)")
print(result.trace.summary) // Full audit trace
}
Questions for the Community
Anyone else building pre-inference validation for FoundationModels?
Thoughts on the Phase system (0/1/7/8/9) vs. simple pass/fail?
Interest in Shape Theory classification for prompt complexity?
Best practices for integrating with LanguageModelSession?
Links
GitHub: https://github.com/jaredlewiswechs/ada-newton
Technical overview: parcri.net
Happy to share more implementation details. Looking for feedback, collaborators, and anyone else thinking about deterministic AI safety on-device.
parcri.net has the link :)
Got new iPhone Boxing Day all works bar image playground uninstalled/reinstalled turns ai on/off still stuck
Topic:
Machine Learning & AI
SubTopic:
Apple Intelligence