The What’s New in Create ML session in WWDC24 went into great depth with time-series forecasting models (beginning at: 15:14) and mentioned these new models, capabilities, and tools for iOS 18. So, far, all I can find is API documentation. I don’t see any other session in WWDC24 covering these new time-series forecasting Create ML features.
Is there more substance/documentation on how to use these with Create ML? Maybe I am looking in the wrong place but I am fairly new with ML.
Are there any food truck / donut shop demo/sample code like in the video?
It is of great interest to get ahead of the curve on this within business applications that may take advantage of this with inventory / ordering data.
Explore the power of machine learning and Apple Intelligence within apps. Discuss integrating features, share best practices, and explore the possibilities for your app here.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
Hi,
I'm trying to use the new RecognizeDocumentsRequest from the Vision Framework to read a receipt. It looks very promising by being able to read paragraphs, lines and detect data. So far it unfortunately seems to read every line on the receipt as a paragraph and when there is more space on one line it creates two paragraphs.
Is there perhaps an Apple Engineer who knows if this is expected behaviour or if I should file a Feedback for this?
Code setup:
let request = RecognizeDocumentsRequest()
let observations = try await request.perform(on: image)
guard let document = observations.first?.document else {
return
}
for paragraph in document.paragraphs {
print(paragraph.transcript)
for data in paragraph.detectedData {
switch data.match.details {
case .phoneNumber(let data):
print("Phone: \(data)")
case .postalAddress(let data):
print("Postal: \(data)")
case .calendarEvent(let data):
print("Calendar: \(data)")
case .moneyAmount(let data):
print("Money: \(data)")
case .measurement(let data):
print("Measurement: \(data)")
default:
continue
}
}
}
See attached image as an example of a receipt I'd like to parse. The top 3 lines are the name, street, and postal code + city. These are all separate paragraphs. Checking on detectedData does see the street (2nd line) as PostalAddress, but not the complete address. Might that be a location thing since it's a Dutch address.
And lower on the receipt it sees the block with "Pomp 1 95 Ongelood" and the things below also as separate paragraphs. First picking up the left side and after that the right side. So it's something like this:
*
Pomp 1
Volume
Prijs
€
TOTAAL
*
BTW
Netto
21.00 %
95 Ongelood
41,90 l
1.949/ 1
81.66
€
14.17
67.49
I'm testing Foundation Model on my iPad Pro (5th gen) iOS 26. Up until late this morning, I can no longer load the SystemLanguageModel.default. I'm not doing anything interesting, something as basic as this is only going to unavailable, specifically I get unavailable reason: modelNotReady.
let model = SystemLanguageModel.default
...
switch model.availability {
case .available:
print("LM available")
case .unavailable(let reason):
print("unavailable reason: ", String(describing: reason))
}
I also ran the FoundationModelsTripPlanner app, same thing. It was working yesterday, I have not modified that project either.
Why is the Model not ready? How do I fix this? Yes, I tried restarting both my laptop and iPad, no luck.
Topic:
Machine Learning & AI
SubTopic:
Foundation Models
I'm playing with the new Vision API for iOS18, specifically with the new CalculateImageAestheticsScoresRequest API.
When I try to perform the image observation request I get this error:
internalError("Error Domain=NSOSStatusErrorDomain Code=-1 \"Failed to create espresso context.\" UserInfo={NSLocalizedDescription=Failed to create espresso context.}")
The code is pretty straightforward:
if let image = image {
let request = CalculateImageAestheticsScoresRequest()
Task {
do {
let cgImg = image.cgImage!
let observations = try await request.perform(on: cgImg)
let description = observations.description
let score = observations.overallScore
print(description)
print(score)
} catch {
print(error)
}
}
}
I'm running it on a M2 using the simulator.
Is it a bug? What's wrong?
It seems like there was an undocumented change that made Transcript.init(entries: [Transcript.Entry] initializer private, which broke my application, which relies on (manual) reconstruction of Transcript entries.
Worked fine on beta 1, on beta 2 there's this error
dyld[72381]: Symbol not found: _$s16FoundationModels10TranscriptV7entriesACSayAC5EntryOG_tcfC
Referenced from: <44342398-591C-3850-9889-87C9458E1440> /Users/mika/experiments/apple-on-device-ai/fm
Expected in: <66A793F6-CB22-3D1D-A560-D1BD5B109B0D> /System/Library/Frameworks/FoundationModels.framework/Versions/A/FoundationModels
Is this a part of an API transition, if so -
Apple, please update your documentation
Topic:
Machine Learning & AI
SubTopic:
Foundation Models
Hi everyone,
I'm a Mac enthusiast experimenting with tensorflow-metal on my Mac Pro (2013). My question is about GPU selection in tensorflow-metal (v0.8.0), which still supports Intel-based Macs, including my machine.
I've noticed that when running TensorFlow with Metal, it automatically selects a GPU, regardless of what I specify using device indices like "gpu:0", "gpu:1", or "gpu:2". I'm wondering if there's a way to manually specify which GPU should be used via an environment variable or another method.
For reference, I’ve tried the example from TensorFlow’s guide on multi-GPU selection: https://www.tensorflow.org/guide/gpu#using_a_single_gpu_on_a_multi-gpu_system
My goal is to explore performance optimizations by using MirroredStrategy in TensorFlow to leverage multiple GPUs: https://www.tensorflow.org/guide/distributed_training#mirroredstrategy
Interestingly, I discovered that the metalcompute Python library (https://pypi.org/project/metalcompute/) allows to utilize manually selected GPUs on my system, allowing for proper multi-GPU computations. This makes me wonder:
Is there a hidden environment variable or setting that allows manual GPU selection in tensorflow-metal?
Has anyone successfully used MirroredStrategy on multiple GPUs with tensorflow-metal?
Would a bridge between metalcompute and tensorflow-metal be necessary for this use case, or is there a more direct approach?
I’d love to hear if anyone else has experimented with this or has insights on getting finer control over GPU selection. Any thoughts or suggestions would be greatly appreciated!
Thanks!
Hello, I'm using videotoolbox superresolution API in MACOS 26: https://developer.apple.com/documentation/videotoolbox/vtsuperresolutionscalerconfiguration/downloadconfigurationmodel(completionhandler:)?language=objc, when using swift, it's ok, when using objective-c, I get error when downloading model with downloadConfigurationModelWithCompletionHandler:
[Auto] MA-auto{_failedLockContent} | failure reported by server | error:[com.apple.MobileAssetError.AutoAsset:MissingReference(6111)]
[Auto] MA-auto{_failedLockContent} | failure reported by server | error:[com.apple.MobileAssetError.AutoAsset:UnderlyingError(6107)_1_com.apple.MobileAssetError.Download:47]
Download completion handler called with error: The operation couldnxe2x80x99t be completed. (VTFrameProcessorErrorDomain error -19743.)
I am calling into an app extension from a Safari Web Extension (sendNativeMessage, which in turn results in a call to NSExtensionRequestHandling’s beginRequest). My Safari extension aims to make use of the new foundation models for some of the features it provides.
In my testing, I hit the rate limit by sending 4 requests, waiting 30 seconds between each. This makes the FoundationModels framework (which would otherwise serve my use case perfectly well) unusable in this context, because the model is called in response to user input, and this rate of user input is perfectly plausible in a real world scenario.
The error thrown as a result of the rate limit is “Safety guardrail was triggered after consecutive failures during streaming.", but looking at the system logs in Console.app shows the rate limit as the real culprit.
My suggestions:
Please introduce sensible rate limits for app extensions, through an entitlement if need be. If it is rate limited to 1 request per every couple of seconds, that would already fix the issue for me.
Please document the rate limit.
Please make the thrown error reflect that it is the result of a rate limit and not a generic guardrail violation. IMPORTANT: please indicate in the thrown error when it is safe to try again.
Filed a feedback here: FB18332004
Topic:
Machine Learning & AI
SubTopic:
Foundation Models
Hi,
I have trained a basic adapter using the adapter training toolkit. I am trying a very basic example of loading it and running inference with it, but am getting the following error:
Passing along InferenceError::inferenceFailed::loadFailed::Error Domain=com.apple.TokenGenerationInference.E5Runner Code=0 "Failed to load model: ANE adapted model load failure: createProgramInstanceWithWeights:modelToken:qos:baseModelIdentifier:owningPid:numWeightFiles:error:: Program load new instance failure (0x170006)." UserInfo={NSLocalizedDescription=Failed to load model: ANE adapted model load failure: createProgramInstanceWithWeights:modelToken:qos:baseModelIdentifier:owningPid:numWeightFiles:error:: Program load new instance failure (0x170006).} in response to ExecuteRequest
Any ideas / direction?
For testing I am including the .fmadapter file inside the app bundle. This is where I load it:
@State private var session: LanguageModelSession? // = LanguageModelSession()
func loadAdapter() async throws {
if let assetURL = Bundle.main.url(forResource: "qasc---afm---4-epochs-adapter", withExtension: "fmadapter") {
print("Asset URL: \(assetURL)")
let adapter = try SystemLanguageModel.Adapter(fileURL: assetURL)
let adaptedModel = SystemLanguageModel(adapter: adapter)
session = LanguageModelSession(model: adaptedModel)
print("Loaded adapter and updated session")
} else {
print("Asset not found in the main bundle.")
}
}
This seems to work fine as I get to the log Loaded adapter and updated session. However when the below inference code runs I get the aforementioned error:
func sendMessage(_ msg: String) {
self.loading = true
if let session = session {
Task {
do {
let modelResponse = try await session.respond(to: msg)
DispatchQueue.main.async {
self.response = modelResponse.content
self.loading = false
}
} catch {
print("Error: \(error)")
DispatchQueue.main.async {
self.loading = false
}
}
}
}
}
Topic:
Machine Learning & AI
SubTopic:
Foundation Models
Hello Apple Developer Community,
I'm exploring the integration of Apple Intelligence features into my mobile application and have a couple of questions regarding the current and upcoming API capabilities:
Custom Prompt Support: Is there a way to pass custom prompts to Apple Intelligence to generate specific inferences? For instance, can we provide a unique prompt to the Writing Tools or Image Playground APIs to obtain tailored outputs?
Direct Inference Capabilities: Beyond the predefined functionalities like text rewriting or image generation, does Apple Intelligence offer APIs that allow for more generalized inference tasks based on custom inputs?
I understand that Apple has provided APIs such as Writing Tools, Image Playground, and Genmoji. However, I'm interested in understanding the extent of customization and flexibility these APIs offer, especially concerning custom prompts and generalized inference.
Additionally, are there any plans or timelines for expanding these capabilities, perhaps with the introduction of new SDKs or frameworks that allow deeper integration and customization?
Any insights, documentation links, or experiences shared would be greatly appreciated.
Thank you in advance for your assistance!
Topic:
Machine Learning & AI
SubTopic:
Apple Intelligence
I have rewatched WWDC22 a few times , but still not getting full understanding how to get .mlmodel model file type from components .
Example with banana ripeness is cool , but what need to be added to actually have output of .mlmodel , is somewhere full sample code for this type of modular project ?
Code is from [https://developer.apple.com/videos/play/wwdc2022/10019)
import CoreImage
import CreateMLComponents
struct ImageRegressor {
static let trainingDataURL = URL(fileURLWithPath: "~/Desktop/bananas")
static let parametersURL = URL(fileURLWithPath: "~/Desktop/parameters")
static func train() async throws -> some Transformer<CIImage, Float> {
let estimator = ImageFeaturePrint()
.appending(LinearRegressor())
// File name example: banana-5.jpg
let data = try AnnotatedFiles(labeledByNamesAt: trainingDataURL, separator: "-", index: 1, type: .image)
.mapFeatures(ImageReader.read)
.mapAnnotations({ Float($0)! })
let (training, validation) = data.randomSplit(by: 0.8)
let transformer = try await estimator.fitted(to: training, validateOn: validation)
try estimator.write(transformer, to: parametersURL)
return transformer
}
}
I have tried to run it in Mac OS command line type app, Swift-UI but most what I had as output was .pkg with
"pipeline.json,
parameters,
optimizer.json,
optimizer"
Hello, I was trying to test out Foundation Model however it says Model assets are unavailable. I got my MacBook M1 back in China when i was living there. is this due to region lock?
Hello
I’m experimenting with Apple’s on‑device language model via the FoundationModels framework in Xcode (using LanguageModelSession in my code). I’d like to confirm a few points:
• Is the language model provided by FoundationModels designed and trained by Apple? Or is it based on an open‑source model?
• Is this on‑device model available on iOS (and iPadOS), or is it limited to macOS?
• When I write code in Xcode, is code completion powered by this same local model? If so, why isn’t the same model available in the left‑hand chat sidebar in Xcode (so that I can use it there instead of relying on ChatGPT)?
• Can I grant this local model access to my personal data (photos, contacts, SMS, emails) so it can answer questions based on that information? If yes, what APIs, permission prompts, and privacy constraints apply?
Thanks
I generate an array of random floats using the code shown below. However, I would like to do this with Double instead of Float. Are there any BNNS random number generators for double values, something like BNNSRandomFillUniformDouble? If not, is there a way I can convert BNNSNDArrayDescriptor from float to double?
import Accelerate
let n = 100_000_000
let result = Array<Float>(unsafeUninitializedCapacity: n) { buffer, initCount in
var descriptor = BNNSNDArrayDescriptor(data: buffer, shape: .vector(n))!
let randomGenerator = BNNSCreateRandomGenerator(BNNSRandomGeneratorMethodAES_CTR, nil)
BNNSRandomFillUniformFloat(randomGenerator, &descriptor, 0, 1)
initCount = n
}
Hello!
I have a swift program that tracks the location of a ball (through the back camera). It seems to be working fine, but the only issue is the run time, particularly my concatenate, normalize, and argmax functions, which are meant to be a 1 to 1 copy of the PyTorch argmax function and the following python lines:
imgs = np.concatenate((img, img_prev, img_preprev), axis=2)
imgs = imgs.astype(np.float32)/255.0
imgs = np.rollaxis(imgs, 2, 0)
inp = np.expand_dims(imgs, axis=0) # used to pass into model
However, I need my program to run in real time and in an ideal world, I want it to run way under real time. Below is a run down of the run times that result from my code:
Starting model inference
Setup took: 0.0 seconds
Resize took: 0.03741896152496338 seconds
Concatenation took: 0.3359949588775635 seconds
Normalization took: 0.9906361103057861 seconds
Model prediction took: 0.3425499200820923 seconds
Argmax took: 28.17007803916931 seconds
Postprocess took: 0.054128050804138184 seconds
Model inference took 29.934185028076172 seconds
Here are the concatenateBuffers, normalizeBuffers, and argmax functions that I use:
func concatenateBuffers(_ buffers: [CVPixelBuffer?]) -> CVPixelBuffer? {
guard buffers.count == 3, let first = buffers[0] else { return nil }
let width = CVPixelBufferGetWidth(first)
let height = CVPixelBufferGetHeight(first)
let targetChannels = 9
var concatenated: CVPixelBuffer?
let attrs = [kCVPixelBufferCGImageCompatibilityKey: kCFBooleanTrue] as CFDictionary
CVPixelBufferCreate(kCFAllocatorDefault, width, height, kCVPixelFormatType_32BGRA, attrs, &concatenated)
guard let output = concatenated else { return nil }
CVPixelBufferLockBaseAddress(output, [])
defer { CVPixelBufferUnlockBaseAddress(output, []) }
guard let outputData = CVPixelBufferGetBaseAddress(output) else { return nil }
let outputPtr = UnsafeMutablePointer<UInt8>(OpaquePointer(outputData))
// Lock all input buffers at once
buffers.forEach { buffer in
guard let buffer = buffer else { return }
CVPixelBufferLockBaseAddress(buffer, .readOnly)
}
defer {
buffers.forEach { CVPixelBufferUnlockBaseAddress($0!, .readOnly) }
}
// Process each input buffer
for (frameIdx, buffer) in buffers.enumerated() {
guard let buffer = buffer,
let inputData = CVPixelBufferGetBaseAddress(buffer) else { continue }
let inputPtr = UnsafePointer<UInt8>(OpaquePointer(inputData))
let bytesPerRow = CVPixelBufferGetBytesPerRow(buffer)
let totalPixels = width * height
// Process all pixels in one go for this frame
for i in 0..<totalPixels {
let y = i / width
let x = i % width
let inputOffset = y * bytesPerRow + x * 4
let outputOffset = i * targetChannels + frameIdx * 3
// BGR order to match numpy
outputPtr[outputOffset] = inputPtr[inputOffset + 2] // B
outputPtr[outputOffset + 1] = inputPtr[inputOffset + 1] // G
outputPtr[outputOffset + 2] = inputPtr[inputOffset] // R
}
}
return output
}
func normalizeBuffer(_ buffer: CVPixelBuffer?) -> MLMultiArray? {
guard let input = buffer else { return nil }
let width = CVPixelBufferGetWidth(input)
let height = CVPixelBufferGetHeight(input)
let channels = 9
CVPixelBufferLockBaseAddress(input, .readOnly)
defer { CVPixelBufferUnlockBaseAddress(input, .readOnly) }
guard let inputData = CVPixelBufferGetBaseAddress(input) else { return nil }
let shape = [1, NSNumber(value: channels), NSNumber(value: height), NSNumber(value: width)]
guard let output = try? MLMultiArray(shape: shape, dataType: .float32) else { return nil }
let inputPtr = inputData.assumingMemoryBound(to: UInt8.self)
let bytesPerRow = CVPixelBufferGetBytesPerRow(input)
let ptr = UnsafeMutablePointer<Float>(OpaquePointer(output.dataPointer))
let totalSize = width * height
for c in 0..<channels {
for idx in 0..<totalSize {
let h = idx / width
let w = idx % width
let inputIdx = h * bytesPerRow + w * channels + c
ptr[c * totalSize + idx] = Float(inputPtr[inputIdx]) / 255.0
}
}
return output
}
func argmax(_ array: MLMultiArray) -> MLMultiArray? {
let shape = array.shape.map { $0.intValue }
guard shape.count == 3,
shape[0] == 1,
shape[1] == 256,
shape[2] == 230400 else {
return nil
}
guard let output = try? MLMultiArray(shape: [1, NSNumber(value: 230400)], dataType: .int32) else { return nil }
let ptr = UnsafePointer<Float>(OpaquePointer(array.dataPointer))
let outputPtr = UnsafeMutablePointer<Int32>(OpaquePointer(output.dataPointer))
let channelSize = 230400
for pos in 0..<230400 {
var maxValue = -Float.infinity
var maxIndex: Int32 = 0
for channel in 0..<256 {
let value = ptr[channel * channelSize + pos]
if value > maxValue {
maxValue = value
maxIndex = Int32(channel)
}
}
outputPtr[pos] = maxIndex
}
return output
}
Are there any glaring areas of inefficiencies that can be reduced to allow for under real time processing whilst following the same logic as found in the python code exactly? Would using Obj-C speed things up for some reason? Are there any tools I can use so I don't have to write these functions myself?
Additionally, in the classes init, function, I tried to check the compute units being used since I feel 0.34 seconds for a singular model prediction is also far too long, but no print statements are showing for some reason:
init() {
guard let loadedModel = try? BallTrackerModel() else {
fatalError("Could not load model")
}
let config = MLModelConfiguration()
config.computeUnits = .all
guard let configuredModel = try? BallTrackerModel(configuration: config) else {
fatalError("Could not configure model")
}
self.model = configuredModel
print("model loaded with compute units \(config.computeUnits.rawValue)")
}
Thanks!
Bear with me, please. Please make sure a highly skilled technical person reads and understands this.
I want to describe my vision for (AI/Algorithmically) Optimised Operating Systems. To explain it properly, I will describe the process to build it (pseudo).
Required Knowledge (no particular order): Processor Logic Circuits, LLM models, LLM tool usage, Python OO coding, Procedural vs OO, NLP fuzzy matching, benchmarking, canvas/artefacts/dynamic HTML interfaces, concepts of how AI models are vastly compressed and miniaturised forms of full data, Algorithmic vs AI.
First, take all OO Python code (example) on GitHub (example), then separate each function from each object into its own procedure (procedural logic) by making a logical procedural list of actions to perform only that function based on its entire dependency chain (i.e. all other objects it relies on). Relate all compiled functions using (for example) fuzzy matching on the name, or AI-based functional profiling to get multiple instances of each function.
Starting with the most used function, test each one against the others that perform the same task for bugs and completeness. Determine the fastest, most optimal version of that function (and every function). Add a single instance of each most optimal function to the centralised tool codebase, which will later be utilised by the language models. This ensures we rely only on the most optimised function for each and every use case — with every program using one shared instance of that function instead of compiling it separately.
In this way, we optimise all existing (Python OO) code based on maximum availability of certain resources. Use the optimised code to teach the AI how to map each user task directly to optimised code (i.e. each user-facing function in unoptimised program code to its optimised version). Later, teach the codebase how to optimise these functions directly to the logic circuits on the processor — as an Operating System — using essentially the same process. Eventually, use the most optimised form for tasks via a dynamic canvas/artefacts UI interface generated by the Start Menu using the language model.
Allow custom interfaces to emerge dynamically from the available functions based on language understanding. Grow and optimise the algorithmic codebase until the entire operating system — with versioning — evolves based on the AI-generated user interfaces (compressing them into new versions of the tools codebase).
Please don’t give up until you understand this concept completely and nothing in the description is beyond your full comprehension. It’s a highly significant step forward.
Apple's Image Playground primarily performs image generation on-device, but can use secure Private Cloud Compute for more complex requests that require larger models. Private Cloud Compute (PCC)
For more complex tasks that require greater computational power than the device can provide, Image Playground leverages Apple's Private Cloud Compute. This system extends the privacy and security of the device to the cloud:
Secure Environment: PCC runs on Apple silicon servers and uses a secure enclave to protect data, ensuring requests are processed in a verified, secure environment.
No Data Storage: Data is never stored or made accessible to Apple when using PCC; it is used only to fulfill the specific request.
Independent Verification: Independent experts are able to inspect the code running on these servers to verify Apple's privacy promises.
Note: I posted this to the feedback assistant but haven't gotten a response for 3months =( FB13482199
I am trying to train a large image classifier. I have a training run for ~300000 images. Each image has a folder and the file names within the folders are somewhat random. 381 classes. I am on an M2 Pro, Sonoma 14.0 running CreateML Version 5.0 (121.1). I would prefer not to pursue the pytorch/HF -> coremltools route.
CreateML seems to consistently crash ~25000-30000 images in during the feature extraction phase with "Unexpected Error". It does not seem to be due to an out of memory issue. I am looking for some guidance since it seems impossible to debug why this is consistently crashing.
My initial assumption was that it could be due to blank/corrupt files. I do not think that is the case. I also checked if there were any special characters in the data/folders. I wasn't able to go through all, but did try some programatic regex. Don't think this is the case either.
I attached the sysdiagnose results in feedback assistant after the crash happened. I did notice when going into /var/logs there was some write issue saying that Mac had written too much to disk. Note: I also tried Xcode 15.2-beta this time and the associated CoreML version.
My questions:
How can I fix this?
How should I go about debugging CreateML errors in the future?
'Unexpected Error' - where can I go about getting the exact createml logs on my device? This is far too broad of an error statement
Please let me know. As a note, I did successfully train a past model on ~100000 images. I am planning to 10-15x that if this run is successful. Please help, spent a lot of time gathering the extra data and to date have been an occasional power user of createml. Haven't heard back from Apple since December =/. I assume I'm not the only one with this problem, so looking for any instructions to hands on debug and help others. Thx!
Lately I am getting this error.
GenerativeModelsAvailability.Parameters: Initialized with invalid language code: en-GB. Expected to receive two-letter ISO 639 code. e.g. 'zh' or 'en'. Falling back to: en
Does anyone know what this is and how it can be resolved. The error does not crash the app
I was able to open a new project and play around with the Foundation Model, but when I dropped this class in a production app (with a lot of files) I'm running into Safety Guardrail errors for this very small prompt. Specifically it's "Safety guardrail was triggered after consecutive failures during streaming." Does it have something to do with the size of the app? I don't know what else to try to get it to work?
import FoundationModels
import Playgrounds
@available(iOS 26.0, *)
#Playground {
Task {
do {
let session = LanguageModelSession()
let prompt = "Write a short story about a talking cat."
let response = try await session.respond(to: prompt)
print(response)
} catch {
print("Error: \(error)")
}
}
}
Topic:
Machine Learning & AI
SubTopic:
Foundation Models