Integrate machine learning models into your app using Core ML.

Posts under Core ML tag

47 Posts

Post

Replies

Boosts

Views

Activity

How can I change the output dimensions of a CoreML model in Xcode when the outputs come from a NonMaximumSuppression layer?
After exerting a custom model with nms=True. In Xcode, the outputs show as: confidence: MultiArray (0 × 5) coordinates: MultiArray (0 × 4) I want to set fixed shapes (e.g., 100 × 5, 100 × 4), but Xcode does not allow editing—the shape fields are locked. The model graph shows both outputs come directly from a NonMaximumSuppression layer. Is it possible to set fixed output dimensions for NMS outputs in CoreML?
2
0
509
Mar ’26
SSC 2026 — Will unlisted .mlmodel cause build failure?
Hi, I submitted my Swift Student Challenge 2026 app and I'm worried about a build error I got when testing. I have both PlateClassifier_2.mlmodel and PlateClassifier_2.mlmodelc in my Sources folder. Only the .mlmodelc is listed in my Package.swift resources. When building I got: PlateClassifier_2.mlmodel: No predominant language detected. Set COREML_CODEGEN_LANGUAGE to preferred language. Build failed — 1 error Will judges hit this same error? Does having an unlisted .mlmodel alongside the .mlmodelc cause a hard build failure on other machines too, or is this specific to my setup (Xcode 26.2 beta, building to physical device)? Will this get me instantly disqualified? Any help appreciated.
0
0
588
Mar ’26
reinforcement learning from Apple?
I don't know if these forums are any good for rumors or plans, but does anybody know whether or not Apple plans to release a library for training reinforcement learning? It would be handy, implementing games in Swift, for example, to be able to train the computer players on the same code.
0
0
506
Mar ’26
Building a 4-agent autonomous coding pipeline on Apple Silicon — MLX backend questions
Hi, I'm building ANF (Autonomous Native Forge) — a cloud-free, 4-agent autonomous software production pipeline running on local hardware with local LLM inference. No middleware, pure Node.js native. Currently running on NVIDIA Blackwell GB10 with vLLM + DeepSeek-R1-32B. Now porting to Apple Silicon. Three technical questions: How production-ready is mlx-lm's OpenAI-compatible API server for long context generation (32K tokens)? What's the recommended approach for KV Cache management with Unified Memory architecture — any specific flags or configurations for M4 Ultra? MLX vs GGUF (llama.cpp) for a multi-agent pipeline where 4 agents call the inference endpoint concurrently — which handles parallel requests better on Apple Silicon? GitHub: github.com/trgysvc/AutonomousNativeForge Any guidance appreciated.
0
0
438
Mar ’26
Building Real-Time Voice Input on macOS 26 with SpeechAnalyzer + ScreenCaptureKit
We built an open-source macOS menu bar app that turns speech into text and pastes it into the active app — using SpeechAnalyzer for on-device transcription, ScreenCaptureKit + Vision for screen-aware context, and FluidAudio for speaker diarization in meeting mode. Here's what we learned shipping it on macOS 26. GitHub: github.com/Marvinngg/ambient-voice Architecture The app has two modes: hotkey dictation (press to talk, release to inject) and meeting recording (continuous transcription with a floating panel). Dictation Mode Audio capture uses AVCaptureSession (more on why below). The captured audio feeds into SpeechAnalyzer via an AsyncStream: let transcriber = SpeechTranscriber( locale: locale, transcriptionOptions: [], reportingOptions: [.volatileResults, .alternativeTranscriptions], attributeOptions: [.audioTimeRange, .transcriptionConfidence] ) let analyzer = SpeechAnalyzer(modules: [transcriber]) let (inputSequence, inputBuilder) = AsyncStream.makeStream() try await analyzer.start(inputSequence: inputSequence) While recording, we capture a screenshot of the focused window using ScreenCaptureKit, run Vision OCR (VNRecognizeTextRequest), extract keywords, and inject them into SpeechAnalyzer as contextual bias: let context = AnalysisContext() context.contextualStrings[.general] = ocrKeywords try await analyzer.setContext(context) This improves accuracy for technical terms and proper nouns visible on screen. If your screen shows "SpeechAnalyzer", saying it out loud is more likely to be transcribed correctly. After transcription, an optional L2 step sends the text through a local LLM (ollama) for spoken-to-written cleanup, then CGEvent simulates Cmd+V to paste into the active app. Meeting Mode Meeting mode forks the same audio stream to two consumers: SpeechAnalyzer — real-time streaming transcription, displayed in a floating NSPanel FluidAudio buffer — accumulates 16kHz Float32 mono samples for batch speaker diarization after recording stops When the user ends the meeting, FluidAudio's performCompleteDiarization() runs on the accumulated audio. We align transcription segments with speaker segments using audioTimeRange overlap matching — each transcription segment gets assigned the speaker ID with the most time overlap. Results export to Markdown. Pitfalls We Hit on macOS 26 1. AVAudioEngine installTap doesn't fire with Bluetooth devices We started with AVAudioEngine.inputNode.installTap() for audio capture. It worked fine with built-in mics but the tap callback never fired with Bluetooth devices (tested with vivo TWS 4 Hi-Fi). Fix: switched to AVCaptureSession. The delegate callback captureOutput(_:didOutput:from:) fires reliably regardless of audio device. The tradeoff is you get CMSampleBuffer instead of AVAudioPCMBuffer, so you need a conversion step. 2. NSEvent addGlobalMonitorForEvents crashes Our global hotkey listener used NSEvent.addGlobalMonitorForEvents. On macOS 26, this crashes with a Bus error inside GlobalObserverHandler — appears to be a Swift actor runtime issue. Fix: switched to CGEventTap. Works reliably, but the callback runs on a CFRunLoop context, which Swift doesn't recognize as MainActor. 3. CGEventTap callbacks aren't on MainActor If your CGEventTap callback touches any @MainActor state, you'll get concurrency violations. The callback runs on whatever thread owns the CFRunLoop. Fix: bridge with DispatchQueue.main.async {} inside the tap callback before touching any MainActor state. 4. CGPreflightScreenCaptureAccess doesn't request permission We used CGPreflightScreenCaptureAccess() as a guard before calling ScreenCaptureKit. If it returned false, we'd bail out. The problem: this function only checks — it never triggers macOS to add your app to the Screen Recording permission list. Chicken-and-egg: you can't get permission because you never ask for it. Fix: call CGRequestScreenCaptureAccess() at app startup. This adds your app to System Settings → Screen Recording. Then let ScreenCaptureKit calls proceed without the preflight guard — SCShareableContent will also trigger the permission prompt on first use. 5. Ad-hoc signing breaks TCC permissions on every rebuild During development, codesign --sign - (ad-hoc) generates a different code directory hash on every build. macOS TCC tracks permissions by this hash, so every rebuild = new app identity = all permissions reset. Fix: sign with a stable certificate. If you have an Apple Development certificate, use that. The TeamIdentifier stays constant across rebuilds, so TCC permissions persist. We also discovered that launching via open WE.app (LaunchServices) instead of directly executing the binary is required — otherwise macOS attributes TCC permissions to Terminal, not your app. Benchmarks We ran end-to-end benchmarks on public datasets (Mac Mini M4 16GB, macOS 26): Transcription (SpeechAnalyzer, AliMeeting Chinese): • Near-field CER 34% (excluding outliers ~25%) • Far-field CER 40% (single channel, no beamforming, >30% overlap) • Processing speed 74-89x real-time Speaker diarization (FluidAudio offline): • AMI English 16 meetings: avg DER 23.2% (collar=0.25s, ignoreOverlap=True) • AliMeeting Chinese 8 meetings: DER 48.5% (including overlap regions) • Memory: RSS ~500MB, peak 730-930MB Full evaluation methodology, scripts, and raw results are in the repo. Open Source The project is MIT licensed: github.com/Marvinngg/ambient-voice It includes the macOS client (Swift 6.2, SPM), server-side distillation/training scripts (Python), and a complete evaluation framework with reproducible benchmarks. Feedback and contributions welcome.
0
0
559
Mar ’26
AI framework usage without user session
We are evaluating various AI frameworks to use within our code, and are hoping to use some of the build-in frameworks in macOS including CoreML and Vision. However, we need to use these frameworks in a background process (system extension) that has no user session attached to it. (To be pedantic, we'll be using an XPC service that is spawned by the system extension, but neither would have an associated user session). Saying the daemon-safe frameworks list has not been updated in a while is an understatement, but it's all we have to go on. CoreGraphics isn't even listed--back then it part of ApplicationServices (I think?) and ApplicationServices is a no go. Vision does use CoreGraphics symbols and data types so I have doubts. We do have a POC that uses both frameworks and they seem to function fine but obviously having something official is better. Any Apple engineers that can comment on this?
6
0
1.2k
3w
Does using Vision API offline to label a custom dataset for Core ML training violate DPLA?
Hello everyone, I am currently developing a smart camera app for iOS that recommends optimal zoom and exposure values on-device using a custom Core ML model. I am still waiting for an official response from Apple Support, but I wanted to ask the community if anyone has experience with a similar workflow regarding App Review and the DPLA. Here is my training methodology: I gathered my own proprietary dataset of original landscape photos. I generated multiple variants of these photos with different zoom and exposure settings offline on my Mac. I used the CalculateImageAestheticsScoresRequest (Vision framework) via a local macOS command-line tool to evaluate and score each variant. Based on those scores, I labeled the "best" zoom and exposure parameters for each original photo. I used this labeled dataset to train my own independent neural network using PyTorch, and then converted it to a Core ML model to ship inside my app. Since the app uses my own custom model on-device and does not send any user data to a server, the privacy aspect is clear. However, I am curious if using the output of Apple's Vision API strictly offline to label my own dataset could be interpreted as "reverse engineering" or a violation of the Developer Program License Agreement (DPLA). Has anyone successfully shipped an app using a similar knowledge distillation or automated dataset labeling approach with Apple's APIs? Did you face any pushback during App Review? Any insights or shared experiences would be greatly appreciated!
1
0
355
Apr ’26
How can I change the output dimensions of a CoreML model in Xcode when the outputs come from a NonMaximumSuppression layer?
After exerting a custom model with nms=True. In Xcode, the outputs show as: confidence: MultiArray (0 × 5) coordinates: MultiArray (0 × 4) I want to set fixed shapes (e.g., 100 × 5, 100 × 4), but Xcode does not allow editing—the shape fields are locked. The model graph shows both outputs come directly from a NonMaximumSuppression layer. Is it possible to set fixed output dimensions for NMS outputs in CoreML?
Replies
2
Boosts
0
Views
509
Activity
Mar ’26
SSC 2026 — Will unlisted .mlmodel cause build failure?
Hi, I submitted my Swift Student Challenge 2026 app and I'm worried about a build error I got when testing. I have both PlateClassifier_2.mlmodel and PlateClassifier_2.mlmodelc in my Sources folder. Only the .mlmodelc is listed in my Package.swift resources. When building I got: PlateClassifier_2.mlmodel: No predominant language detected. Set COREML_CODEGEN_LANGUAGE to preferred language. Build failed — 1 error Will judges hit this same error? Does having an unlisted .mlmodel alongside the .mlmodelc cause a hard build failure on other machines too, or is this specific to my setup (Xcode 26.2 beta, building to physical device)? Will this get me instantly disqualified? Any help appreciated.
Replies
0
Boosts
0
Views
588
Activity
Mar ’26
reinforcement learning from Apple?
I don't know if these forums are any good for rumors or plans, but does anybody know whether or not Apple plans to release a library for training reinforcement learning? It would be handy, implementing games in Swift, for example, to be able to train the computer players on the same code.
Replies
0
Boosts
0
Views
506
Activity
Mar ’26
Building a 4-agent autonomous coding pipeline on Apple Silicon — MLX backend questions
Hi, I'm building ANF (Autonomous Native Forge) — a cloud-free, 4-agent autonomous software production pipeline running on local hardware with local LLM inference. No middleware, pure Node.js native. Currently running on NVIDIA Blackwell GB10 with vLLM + DeepSeek-R1-32B. Now porting to Apple Silicon. Three technical questions: How production-ready is mlx-lm's OpenAI-compatible API server for long context generation (32K tokens)? What's the recommended approach for KV Cache management with Unified Memory architecture — any specific flags or configurations for M4 Ultra? MLX vs GGUF (llama.cpp) for a multi-agent pipeline where 4 agents call the inference endpoint concurrently — which handles parallel requests better on Apple Silicon? GitHub: github.com/trgysvc/AutonomousNativeForge Any guidance appreciated.
Replies
0
Boosts
0
Views
438
Activity
Mar ’26
Building Real-Time Voice Input on macOS 26 with SpeechAnalyzer + ScreenCaptureKit
We built an open-source macOS menu bar app that turns speech into text and pastes it into the active app — using SpeechAnalyzer for on-device transcription, ScreenCaptureKit + Vision for screen-aware context, and FluidAudio for speaker diarization in meeting mode. Here's what we learned shipping it on macOS 26. GitHub: github.com/Marvinngg/ambient-voice Architecture The app has two modes: hotkey dictation (press to talk, release to inject) and meeting recording (continuous transcription with a floating panel). Dictation Mode Audio capture uses AVCaptureSession (more on why below). The captured audio feeds into SpeechAnalyzer via an AsyncStream: let transcriber = SpeechTranscriber( locale: locale, transcriptionOptions: [], reportingOptions: [.volatileResults, .alternativeTranscriptions], attributeOptions: [.audioTimeRange, .transcriptionConfidence] ) let analyzer = SpeechAnalyzer(modules: [transcriber]) let (inputSequence, inputBuilder) = AsyncStream.makeStream() try await analyzer.start(inputSequence: inputSequence) While recording, we capture a screenshot of the focused window using ScreenCaptureKit, run Vision OCR (VNRecognizeTextRequest), extract keywords, and inject them into SpeechAnalyzer as contextual bias: let context = AnalysisContext() context.contextualStrings[.general] = ocrKeywords try await analyzer.setContext(context) This improves accuracy for technical terms and proper nouns visible on screen. If your screen shows "SpeechAnalyzer", saying it out loud is more likely to be transcribed correctly. After transcription, an optional L2 step sends the text through a local LLM (ollama) for spoken-to-written cleanup, then CGEvent simulates Cmd+V to paste into the active app. Meeting Mode Meeting mode forks the same audio stream to two consumers: SpeechAnalyzer — real-time streaming transcription, displayed in a floating NSPanel FluidAudio buffer — accumulates 16kHz Float32 mono samples for batch speaker diarization after recording stops When the user ends the meeting, FluidAudio's performCompleteDiarization() runs on the accumulated audio. We align transcription segments with speaker segments using audioTimeRange overlap matching — each transcription segment gets assigned the speaker ID with the most time overlap. Results export to Markdown. Pitfalls We Hit on macOS 26 1. AVAudioEngine installTap doesn't fire with Bluetooth devices We started with AVAudioEngine.inputNode.installTap() for audio capture. It worked fine with built-in mics but the tap callback never fired with Bluetooth devices (tested with vivo TWS 4 Hi-Fi). Fix: switched to AVCaptureSession. The delegate callback captureOutput(_:didOutput:from:) fires reliably regardless of audio device. The tradeoff is you get CMSampleBuffer instead of AVAudioPCMBuffer, so you need a conversion step. 2. NSEvent addGlobalMonitorForEvents crashes Our global hotkey listener used NSEvent.addGlobalMonitorForEvents. On macOS 26, this crashes with a Bus error inside GlobalObserverHandler — appears to be a Swift actor runtime issue. Fix: switched to CGEventTap. Works reliably, but the callback runs on a CFRunLoop context, which Swift doesn't recognize as MainActor. 3. CGEventTap callbacks aren't on MainActor If your CGEventTap callback touches any @MainActor state, you'll get concurrency violations. The callback runs on whatever thread owns the CFRunLoop. Fix: bridge with DispatchQueue.main.async {} inside the tap callback before touching any MainActor state. 4. CGPreflightScreenCaptureAccess doesn't request permission We used CGPreflightScreenCaptureAccess() as a guard before calling ScreenCaptureKit. If it returned false, we'd bail out. The problem: this function only checks — it never triggers macOS to add your app to the Screen Recording permission list. Chicken-and-egg: you can't get permission because you never ask for it. Fix: call CGRequestScreenCaptureAccess() at app startup. This adds your app to System Settings → Screen Recording. Then let ScreenCaptureKit calls proceed without the preflight guard — SCShareableContent will also trigger the permission prompt on first use. 5. Ad-hoc signing breaks TCC permissions on every rebuild During development, codesign --sign - (ad-hoc) generates a different code directory hash on every build. macOS TCC tracks permissions by this hash, so every rebuild = new app identity = all permissions reset. Fix: sign with a stable certificate. If you have an Apple Development certificate, use that. The TeamIdentifier stays constant across rebuilds, so TCC permissions persist. We also discovered that launching via open WE.app (LaunchServices) instead of directly executing the binary is required — otherwise macOS attributes TCC permissions to Terminal, not your app. Benchmarks We ran end-to-end benchmarks on public datasets (Mac Mini M4 16GB, macOS 26): Transcription (SpeechAnalyzer, AliMeeting Chinese): • Near-field CER 34% (excluding outliers ~25%) • Far-field CER 40% (single channel, no beamforming, >30% overlap) • Processing speed 74-89x real-time Speaker diarization (FluidAudio offline): • AMI English 16 meetings: avg DER 23.2% (collar=0.25s, ignoreOverlap=True) • AliMeeting Chinese 8 meetings: DER 48.5% (including overlap regions) • Memory: RSS ~500MB, peak 730-930MB Full evaluation methodology, scripts, and raw results are in the repo. Open Source The project is MIT licensed: github.com/Marvinngg/ambient-voice It includes the macOS client (Swift 6.2, SPM), server-side distillation/training scripts (Python), and a complete evaluation framework with reproducible benchmarks. Feedback and contributions welcome.
Replies
0
Boosts
0
Views
559
Activity
Mar ’26
AI framework usage without user session
We are evaluating various AI frameworks to use within our code, and are hoping to use some of the build-in frameworks in macOS including CoreML and Vision. However, we need to use these frameworks in a background process (system extension) that has no user session attached to it. (To be pedantic, we'll be using an XPC service that is spawned by the system extension, but neither would have an associated user session). Saying the daemon-safe frameworks list has not been updated in a while is an understatement, but it's all we have to go on. CoreGraphics isn't even listed--back then it part of ApplicationServices (I think?) and ApplicationServices is a no go. Vision does use CoreGraphics symbols and data types so I have doubts. We do have a POC that uses both frameworks and they seem to function fine but obviously having something official is better. Any Apple engineers that can comment on this?
Replies
6
Boosts
0
Views
1.2k
Activity
3w
Does using Vision API offline to label a custom dataset for Core ML training violate DPLA?
Hello everyone, I am currently developing a smart camera app for iOS that recommends optimal zoom and exposure values on-device using a custom Core ML model. I am still waiting for an official response from Apple Support, but I wanted to ask the community if anyone has experience with a similar workflow regarding App Review and the DPLA. Here is my training methodology: I gathered my own proprietary dataset of original landscape photos. I generated multiple variants of these photos with different zoom and exposure settings offline on my Mac. I used the CalculateImageAestheticsScoresRequest (Vision framework) via a local macOS command-line tool to evaluate and score each variant. Based on those scores, I labeled the "best" zoom and exposure parameters for each original photo. I used this labeled dataset to train my own independent neural network using PyTorch, and then converted it to a Core ML model to ship inside my app. Since the app uses my own custom model on-device and does not send any user data to a server, the privacy aspect is clear. However, I am curious if using the output of Apple's Vision API strictly offline to label my own dataset could be interpreted as "reverse engineering" or a violation of the Developer Program License Agreement (DPLA). Has anyone successfully shipped an app using a similar knowledge distillation or automated dataset labeling approach with Apple's APIs? Did you face any pushback during App Review? Any insights or shared experiences would be greatly appreciated!
Replies
1
Boosts
0
Views
355
Activity
Apr ’26