I watched this year WWDC25 "Read Documents using the Vision framework". At the end of video there is mention of new DetectHandPoseRequest model for hand pose detection in Vision API.
I looked Apple documentation and I don't see new revision. Moreover probably typo in video because there is only DetectHumanPoseRequst (swift based) and
VNDetectHumanHandPoseRequest (obj-c based) (notice lack of Human prefix in WWDC video)
First one have revision only added in iOS 18+:
https://developer.apple.com/documentation/vision/detecthumanhandposerequest/revision-swift.enum/revision1
Second one have revision only added in iOS14+:
https://developer.apple.com/documentation/vision/vndetecthumanhandposerequestrevision1
I don't see any new revision targeting iOS26+
Explore the power of machine learning and Apple Intelligence within apps. Discuss integrating features, share best practices, and explore the possibilities for your app here.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
In this online session, you can code along with us as we build generative AI features into a sample app live in Xcode. We'll guide you through implementing core features like basic text generation, as well as advanced topics like guided generation for structured data output, streaming responses for dynamic UI updates, and tool calling to retrieve data or take an action.
Check out these resources to get started:
Download the project files: https://developer.apple.com/events/re...
Explore the code along guide: https://developer.apple.com/events/re...
Join the live Q&A: https://developer.apple.com/videos/pl...
Agenda – All times PDT
10 a.m.: Welcome and Xcode setup
10:15 a.m.: Framework basics, guided generation, and building prompts
11 a.m.: Break
11:10 a.m.: UI streaming, tool calling, and performance optimization
11:50 a.m.: Wrap up
All are welcome to attend the session. To actively code along, you'll need a Mac with Apple silicon that supports Apple Intelligence running the latest release of macOS Tahoe 26 and Xcode 26.
If you have questions after the code along concludes please share a post here in the forums and engage with the community.
Topic:
Machine Learning & AI
SubTopic:
Foundation Models
Lookin for J - is this a safe place for discussing full apps ive built but not submitted or shared , I have maybe over 100 but had been unaware any assistance was provided..
is there a formal process to take to submit an app fro review to improve OS, other than during App Store review.
Topic:
Machine Learning & AI
SubTopic:
Apple Intelligence
Tags:
Design
Developer Tools
iCloud Drive
Xcode
Does anyone know if ExecuTorch is officially supported or has been successfully used on visionOS? If so, are there any specific build instructions, example projects, or potential issues (like sandboxing or memory limitations) to be aware of when integrating it into an Xcode project for the Vision Pro?
While ExecuTorch has support for iOS, I can't find any official documentation or community examples specifically mentioning visionOS.
Thanks.
Hi everyone
Im currently developing an object detection model that shall identify up to seven classes in an image. While im usually doing development with basic python and the ultralytics library, i thought i would like to give CreateML a shot. The experience is actually very nice, except for the fact that the model seem not to be using any ANE or GPU (MPS) for accelerated training.
On https://developer.apple.com/machine-learning/create-ml/ it states: "On-device training Train models blazingly fast right on your Mac while taking advantage of CPU and GPU."
Am I doing something wrong?
Im running the training on
Apple M1 Pro 16GB
MacOS 26.1 (Tahoe)
Xcode 26.1 (Build version 17B55)
It would be super nice to get some feedback or instructions.
Thank you in advance!
Hi,
I am new to developing on Apple’s platform yet I want to familiarize myself with Core ML and Core ML Tools. I was watching the WWDC24: Bring your machine learning and AI models to Apple Silicon video and was trying to follow along. After multiple attempts and much reading up on documentation, I am still unable to get a coherent script running that will convert the Mistral model that the host used and convert it to a valid Core ML model.
here is a pastebin to what i have currently:
https://pastebin.com/04cVjF1v
if you require the output as well please let me know
Is the face and body detection service in the Vision framework a local model or a cloud model?
https://developer.apple.com/documentation/vision
I’m trying to follow Apple’s “WWDC24: Bring your machine learning and AI models to Apple Silicon” session to convert the Mistral-7B-Instruct-v0.2 model into a Core ML package, but I’ve run into a roadblock that I can’t seem to overcome. I’ve uploaded my full conversion script here for reference:
https://pastebin.com/T7Zchzfc
When I run the script, it progresses through tracing and MIL conversion but then fails at the backend_mlprogram stage with this error:
https://pastebin.com/fUdEzzKM
The core of the error is:
ValueError: Op "keyCache_tmp" (op_type: identity) Input x="keyCache" expects list, tensor, or scalar but got state[tensor[1,32,8,2048,128,fp16]]
I’ve registered my KV-cache buffers in a StatefulMistralWrapper subclass of nn.Module, matching the keyCache and valueCache state names in my ct.StateType definitions, but Core ML’s backend pass reports the state tensor as an invalid input. I’m using Core ML Tools 8.3.0 on Python 3.9.6, targeting iOS18, and forcing CPU conversion (MPS wasn’t available). Any pointers on how to satisfy the handle_unused_inputs pass or properly declare/cache state for GQA models in Core ML would be greatly appreciated!
Thanks in advance for your help,
Usman Khan
Topic:
Machine Learning & AI
SubTopic:
Core ML
Tags:
Metal
Metal Performance Shaders
Core ML
tensorflow-metal
Hardware: Macbook Pro M4 Nov 2024
Software: macOS Tahoe 26.0 & xcode 26.0
Apple Intelligence is activated and the Image playground macOS app works
Running the following on xcode throws ImagePlayground.ImageCreator.Error.creationFailed
Any suggestions on how to make this work?
import Foundation
import ImagePlayground
Task {
let creator = try await ImageCreator()
guard let style = creator.availableStyles.first else {
print("No styles available")
exit(1)
}
let images = creator.images(
for: [.text("A cat wearing mittens.")],
style: style,
limit: 1)
for try await image in images {
print("Generated image: \(image)")
}
exit(0)
}
RunLoop.main.run()
Topic:
Machine Learning & AI
SubTopic:
Apple Intelligence
Foundation Models framework worked perfectly on macOS 26 Beta 2, but starting from Beta 3 and continuing through Beta 6 (latest), I get dyld symbol errors even
with the exact code from Apple's documentation.
Environment:
macOS 26.0 Beta 6 (25A5351b)
Xcode 26 Beta 6
M4 Max MacBook Pro
Apple Intelligence enabled and downloaded
Error Details:
dyld[Process]: Symbol not found:
_$s16FoundationModels20LanguageModelSessionC5model10guardrails5tools12instructionsAcA06SystemcD0C_AC10GuardrailsVSayAA4Tool_pGAA12InstructionsVSgtcfC
Referenced from: /path/to/app.debug.dylib
Expected in: /System/Library/Frameworks/FoundationModels.framework/Versions/A/FoundationModels
Code Used (Exact from Documentation):
import FoundationModels
// This worked on Beta 2, crashes on Beta 3+
let model = SystemLanguageModel.default
let session = LanguageModelSession(model: model)
let response = try await session.respond(to: "Hello")
What I've Verified:
FoundationModels.framework exists in /System/Library/Frameworks/
Framework is properly linked in Xcode project
Apple Intelligence is enabled and working
Same code works in older beta versions
Issue persists even with completely fresh Xcode projects
Analysis:
The dyld error suggests the LanguageModelSession(model:) constructor is missing. The symbol shows it's looking for a constructor with parameters
(model:guardrails:tools:instructions:), but the documentation still shows the simple (model:) constructor.
Questions:
Has the LanguageModelSession API changed since Beta 2?
Should we now use the constructor with guardrails/tools/instructions parameters?
Is this a known issue with recent betas?
Are there updated code samples for the current API?
Additional Context:
This affects both basic SystemLanguageModel usage AND custom adapter loading. The same dyld symbol errors occur when trying to create
SystemLanguageModel(adapter: adapter) as well.
Any guidance on the correct API usage for current betas would be greatly appreciated. The documentation appears to be out of sync with the actual framework
implementation.
Topic:
Machine Learning & AI
SubTopic:
Foundation Models
Is the face and body detection service in the Vision framework a local model or a cloud model? Is there a performance report?
https://developer.apple.com/documentation/vision
As we described on the title, the model that I have built completely works on iPhone 15 / A16 Bionic, on the other hand it does not run on iPhone 16 / A18 chip with the following error message.
E5RT encountered an STL exception. msg = MILCompilerForANE error: failed to compile ANE model using ANEF. Error=_ANECompiler : ANECCompile() FAILED.
E5RT: MILCompilerForANE error: failed to compile ANE model using ANEF. Error=_ANECompiler : ANECCompile() FAILED (11)
It consumes 1.5 ~ 1.6 GB RAM on the loading the model, then the consumption is decreased to less than 100MB on the both of iPhone 15 and 16. After that, only on iPhone 16, the above error is shown on the Xcode log, the memory consumption is surged to 5 to 6GB, and the system kills the app. It works well only on iPhone 15.
This model is built with the Core ML tools. Until now, I have tried the target iOS 16 to 18 and the compute units of CPU_AND_NE and ALL. But any ways have not solved this issue. Eventually, what kindof fix should I do?
minimum_deployment_target = ct.target.iOS18
compute_units = ct.ComputeUnit.ALL
compute_precision = ct.precision.FLOAT16
The developer tutorial for visual intelligence indicates that the method to detect and handle taps on a displayed entity from the Search section is via an "OpenIntent" associated with your entity.
However, running this intent executes code from within my app. If I have the perform() method display UI, it always displays UI from within my app.
I noticed that the Google app's integration to visual intelligence has a different behavior-- tapping on an entity does not take you to the Google app -- instead, a Webview is presented sheet-style WITHIN the Visual Intelligence environment (see below)
How is that accomplished?
Topic:
Machine Learning & AI
SubTopic:
Apple Intelligence
Due to our min iOS version, this is my first time using .xcstrings instead of .strings for AppShortcuts.
When using the migrate .strings to .xcstrings Xcode context menu option, an .xcstrings catalog is produced that, as expected, has each invocation phrase as a separate string key.
However, after compilation, the catalog changes to group all invocation phrases under the first phrase listed for each intent (see attached screenshot). It is possible to hover in blank space on the right and add more translations, but there is no 1:1 key matching requirement to the phrases on the left nor a requirement that there are the same number of keys in one language vs. another. (The lines just happen to align due to my window size.)
What does that mean, practically?
Do all sub-phrases in each language in AppShortcuts.xcstrings get processed during compilation, even if there isn't an equivalent phrase key declared in the AppShortcut (e.g., the ja translation has more phrases than the English)? (That makes some logical sense, as these phrases need not be 1:1 across languages.)
In the AppShortcut declaration, if I delete all but the top invocation phrase, does nothing change with Siri?
Is there something I'm doing incorrectly?
struct WatchShortcuts: AppShortcutsProvider {
static var appShortcuts: [AppShortcut] {
AppShortcut(
intent: QuickAddWaterIntent(),
phrases: [
"\(.applicationName) log water",
"\(.applicationName) log my water",
"Log water in \(.applicationName)",
"Log my water in \(.applicationName)",
"Log a bottle of water in \(.applicationName)",
],
shortTitle: "Log Water",
systemImageName: "drop.fill"
)
}
}
JAX Metal shows 55x slower random number generation compared to NVIDIA CUDA on equivalent workloads. This makes Monte Carlo simulations and scientific computing impractical on Apple Silicon.
Performance Comparison
NVIDIA GPU: 0.475s for 12.6M random elements
M1 Max Metal: 26.3s for same workload
Performance gap: 55x slower
Environment
Apple M1 Max, 64GB RAM, macOS Sequoia Version 15.6.1
JAX 0.4.34, jax-metal latest
Backend: Metal
Reproduction Code
import time
import jax
import jax.numpy as jnp
from jax import random
key = random.PRNGKey(42)
start_time = time.time()
random_array = random.normal(key, (50000, 252))
duration = time.time() - start_time
print(f"Duration: {duration:.3f}s")
Hi everyone,
I'm trying to use VNDetectTextRectanglesRequest to detect text rectangles in an image. Here's my current code:
guard let cgImage = image.cgImage(forProposedRect: nil, context: nil, hints: nil) else {
return
}
let textDetectionRequest = VNDetectTextRectanglesRequest { request, error in
if let error = error {
print("Text detection error: \(error)")
return
}
guard let observations = request.results as? [VNTextObservation] else {
print("No text rectangles detected.")
return
}
print("Detected \(observations.count) text rectangles.")
for observation in observations {
print(observation.boundingBox)
}
}
textDetectionRequest.revision = VNDetectTextRectanglesRequestRevision1
textDetectionRequest.reportCharacterBoxes = true
let handler = VNImageRequestHandler(cgImage: cgImage, orientation: .up, options: [:])
do {
try handler.perform([textDetectionRequest])
} catch {
print("Vision request error: \(error)")
}
The request completes without error, but no text rectangles are detected — the observations array is empty (count = 0). Here's a sample image I'm testing with:
I expected VNTextObservation results, but I'm not getting any. Is there something I'm missing in how this API works? Or could it be a limitation of this request or revision?
Thanks for any help!
I've created a "Transfer Learning BERT Embeddings" model with the default "Latin" language family and "Automatic" Language setting. This model performs exceptionally well against the test data set and functions as expected when I preview it in Create ML. However, when I add it to the Xcode project of the application to which I am deploying it, I am getting runtime errors that suggest it can't find the embedding resources:
Failed to locate assets for 'mul_Latn' - '5C45D94E-BAB4-4927-94B6-8B5745C46289' embedding model
Note, I am adding the model to the app project the same way that I added an earlier "Maximum Entropy" model. That model had no runtime issues. So it seems there is an issue getting hold of the embeddings at runtime.
For now, "runtime" means in the Simulator. I intend to deploy my application to iOS devices once GM 26 is released (the app also uses AFM).
I'm developing on Tahoe 26 beta, running on iOS 26 beta, using Xcode 26 beta.
Is this a known/expected issue? Are the embeddings expected to be a resource in the model? Is there a workaround?
I did try opening the model in Xcode and saving it as an mlpackage, then adding that to my app project, but that also didn't resolve the issue.
Hi i'm curently crating a model to identify car plates (object detection) i use asitop to monitor my macbook pro and i see that only the cpu is used for the training and i wanted to know why
WWDC25: Combine Metal 4 machine learning and graphics
Demonstrated a way to combine neural network in the graphics pipeline directly through the shaders, using an example of Texture Compression. However there is no mention of using which ML technique texture is compressed.
Can anyone point me to some well known model/s for this particular use case shown in WWDC25.
Hello Apple Developer Community,
I'm investigating Core ML model loading behavior and noticed that even when the compiled model path remains unchanged after an APP update, the first run still triggers an "uncached load" process. This seems to impact user experience with unnecessary delays.
Question: Does Core ML provide any public API to check whether a compiled model (from a specific .mlmodelc path) is already cached in the system?
If such API exists, we'd like to use it for pre-loading decision logic - only perform background pre-load when the model isn't cached.
Has anyone encountered similar scenarios or found official solutions? Any insights would be greatly appreciated!