Explore the power of machine learning and Apple Intelligence within apps. Discuss integrating features, share best practices, and explore the possibilities for your app here.

All subtopics
Posts under Machine Learning & AI topic

Post

Replies

Boosts

Views

Created

A Summary of the WWDC25 Group Lab - Machine Learning and AI Frameworks
At WWDC25 we launched a new type of Lab event for the developer community - Group Labs. A Group Lab is a panel Q&A designed for a large audience of developers. Group Labs are a unique opportunity for the community to submit questions directly to a panel of Apple engineers and designers. Here are the highlights from the WWDC25 Group Lab for Machine Learning and AI Frameworks. What are you most excited about in the Foundation Models framework? The Foundation Models framework provides access to an on-device Large Language Model (LLM), enabling entirely on-device processing for intelligent features. This allows you to build features such as personalized search suggestions and dynamic NPC generation in games. The combination of guided generation and streaming capabilities is particularly exciting for creating delightful animations and features with reliable output. The seamless integration with SwiftUI and the new design material Liquid Glass is also a major advantage. When should I still bring my own LLM via CoreML? It's generally recommended to first explore Apple's built-in system models and APIs, including the Foundation Models framework, as they are highly optimized for Apple devices and cover a wide range of use cases. However, Core ML is still valuable if you need more control or choice over the specific model being deployed, such as customizing existing system models or augmenting prompts. Core ML provides the tools to get these models on-device, but you are responsible for model distribution and updates. Should I migrate PyTorch code to MLX? MLX is an open-source, general-purpose machine learning framework designed for Apple Silicon from the ground up. It offers a familiar API, similar to PyTorch, and supports C, C++, Python, and Swift. MLX emphasizes unified memory, a key feature of Apple Silicon hardware, which can improve performance. It's recommended to try MLX and see if its programming model and features better suit your application's needs. MLX shines when working with state-of-the-art, larger models. Can I test Foundation Models in Xcode simulator or device? Yes, you can use the Xcode simulator to test Foundation Models use cases. However, your Mac must be running macOS Tahoe. You can test on a physical iPhone running iOS 18 by connecting it to your Mac and running Playgrounds or live previews directly on the device. Which on-device models will be supported? any open source models? The Foundation Models framework currently supports Apple's first-party models only. This allows for platform-wide optimizations, improving battery life and reducing latency. While Core ML can be used to integrate open-source models, it's generally recommended to first explore the built-in system models and APIs provided by Apple, including those in the Vision, Natural Language, and Speech frameworks, as they are highly optimized for Apple devices. For frontier models, MLX can run very large models. How often will the Foundational Model be updated? How do we test for stability when the model is updated? The Foundation Model will be updated in sync with operating system updates. You can test your app against new model versions during the beta period by downloading the beta OS and running your app. It is highly recommended to create an "eval set" of golden prompts and responses to evaluate the performance of your features as the model changes or as you tweak your prompts. Report any unsatisfactory or satisfactory cases using Feedback Assistant. Which on-device model/API can I use to extract text data from images such as: nutrition labels, ingredient lists, cashier receipts, etc? Thank you. The Vision framework offers the RecognizeDocumentRequest which is specifically designed for these use cases. It not only recognizes text in images but also provides the structure of the document, such as rows in a receipt or the layout of a nutrition label. It can also identify data like phone numbers, addresses, and prices. What is the context window for the model? What are max tokens in and max tokens out? The context window for the Foundation Model is 4,096 tokens. The split between input and output tokens is flexible. For example, if you input 4,000 tokens, you'll have 96 tokens remaining for the output. The API takes in text, converting it to tokens under the hood. When estimating token count, a good rule of thumb is 3-4 characters per token for languages like English, and 1 character per token for languages like Japanese or Chinese. Handle potential errors gracefully by asking for shorter prompts or starting a new session if the token limit is exceeded. Is there a rate limit for Foundation Models API that is limited by power or temperature condition on the iPhone? Yes, there are rate limits, particularly when your app is in the background. A budget is allocated for background app usage, but exceeding it will result in rate-limiting errors. In the foreground, there is no rate limit unless the device is under heavy load (e.g., camera open, game mode). The system dynamically balances performance, battery life, and thermal conditions, which can affect the token throughput. Use appropriate quality of service settings for your tasks (e.g., background priority for background work) to help the system manage resources effectively. Do the foundation models support languages other than English? Yes, the on-device Foundation Model is multilingual and supports all languages supported by Apple Intelligence. To get the model to output in a specific language, prompt it with instructions indicating the user's preferred language using the locale API (e.g., "The user's preferred language is en-US"). Putting the instructions in English, but then putting the user prompt in the desired output language is a recommended practice. Are larger server-based models available through Foundation Models? No, the Foundation Models API currently only provides access to the on-device Large Language Model at the core of Apple Intelligence. It does not support server-side models. On-device models are preferred for privacy and for performance reasons. Is it possible to run Retrieval-Augmented Generation (RAG) using the Foundation Models framework? Yes, it is possible to run RAG on-device, but the Foundation Models framework does not include a built-in embedding model. You'll need to use a separate database to store vectors and implement nearest neighbor or cosine distance searches. The Natural Language framework offers simple word and sentence embeddings that can be used. Consider using a combination of Foundation Models and Core ML, using Core ML for your embedding model.
1
0
1.5k
Jun ’25
After loading my custom model - unsupportedTokenizer error
In Oct25, using mlx_lm.lora I created an adapter and a fused model uploaded to Huggingface. I was able to incorporate this model into my SwiftUI app using the mlx package. MLX-libraries 2.25.8. My base LLM was mlx-community/Mistral-7B-Instruct-v0.3-4bit. Looking at LLMModelFactory.swift the current version 2.29.1 the only changes are the addition of a few models. The earlier model was called: pharmpk/pk-mistral-7b-v0.3-4bit The new model is called: pharmpk/pk-mistral-2026-03-29 The base model (mlx-community/Mistral-7B-Instruct-v0.3-4bit.) must still be available. Could the error 'unsupportedTokenizer' be related to changes in the mlx package? I noticed mention of splitting the package into two parts but don't see anything at github. Feeling rather lost. Does anone have any thoguths and/or suggestions. Thanks, David
1
0
128
1d
CoreML MLE5ProgramLibrary AOT recompilation hangs/crashes on iOS 26.4 — C++ exception in espresso IR compiler bypasses Swift error handling
Area: CoreML / Machine Learning Describe the issue: On iOS 26.4, calling MLModel(contentsOf:configuration:) to load an .mlpackage model hangs indefinitely and eventually kills the app via watchdog. The same model loads and runs inference successfully in under 1 second on iOS 26.3.1. The hang occurs inside eort_eo_compiler_compile_from_ir_program (espresso) during on-device AOT recompilation triggered by MLE5ProgramLibraryOnDeviceAOTCompilationImpl createProgramLibraryHandleWithRespecialization:error:. A C++ exception (__cxa_throw) is thrown inside libBNNS.dylib during the exception unwind, which then hangs inside __cxxabiv1::dyn_cast_slow and __class_type_info::search_below_dst. Swift's try/catch does not catch this — the exception originates in C++ and the process hangs rather than terminating cleanly. Setting config.computeUnits = .cpuOnly does not resolve the issue. MLE5ProgramLibrary initialises as shared infrastructure regardless of compute units. Steps to reproduce: Create an app with an .mlpackage CoreML model using the MLE5/espresso backend Call MLModel(contentsOf: modelURL, configuration: config) at runtime Run on a device on iOS 26.3.1 — loads successfully in <1 second Update device to iOS 26.4 — hangs indefinitely, app killed by watchdog after 60–745 seconds Expected behaviour: Model loads successfully, or throws a catchable Swift error on failure. Actual behaviour: Process hangs in MLE5ProgramLibrary.lazyInitQueue. App killed by watchdog. No Swift error thrown. Full stack trace at point of hang: Thread 1 Queue: com.apple.coreml.MLE5ProgramLibrary.lazyInitQueue (serial) frame 0: __cxxabiv1::__class_type_info::search_below_dst libc++abi.dylib frame 1: __cxxabiv1::(anonymous namespace)::dyn_cast_slow libc++abi.dylib frame 2: ___lldb_unnamed_symbol_23ab44dd4 libBNNS.dylib frame 23: eort_eo_compiler_compile_from_ir_program espresso frame 24: -[MLE5ProgramLibraryOnDeviceAOTCompilationImpl createProgramLibraryHandleWithRespecialization:error:] CoreML frame 25: -[MLE5ProgramLibrary _programLibraryHandleWithForceRespecialization:error:] CoreML frame 26: __44-[MLE5ProgramLibrary prepareAndReturnError:]_block_invoke CoreML frame 27: _dispatch_client_callout libdispatch.dylib frame 28: _dispatch_lane_barrier_sync_invoke_and_complete libdispatch.dylib frame 29: -[MLE5ProgramLibrary prepareAndReturnError:] CoreML frame 30: -[MLE5Engine initWithContainer:configuration:error:] CoreML frame 31: +[MLE5Engine loadModelFromCompiledArchive:modelVersionInfo:compilerVersionInfo:configuration:error:] CoreML frame 32: +[MLLoader _loadModelWithClass:fromArchive:modelVersionInfo:compilerVersionInfo:configuration:error:] CoreML frame 45: +[MLModel modelWithContentsOfURL:configuration:error:] CoreML frame 46: @nonobjc MLModel.__allocating_init(contentsOf:configuration:) GKPersonalV2 frame 47: MDNA_GaitEncoder_v1_3.__allocating_init(contentsOf:configuration:) frame 48: MDNA_GaitEncoder_v1_3.__allocating_init(configuration:) frame 50: GaitModelInference.loadModel() frame 51: GaitModelInference.init() iOS version: Reproduced on iOS 26.4. Works correctly on iOS 26.3.1. Xcode version: 26.2 Device: iPhone (model used in testing) Model format: .mlpackage
1
0
141
1d
Unable to use FoundationModels in older app?
Hi, I'm trying to add FoundationModels to an older project but always get the following error: "Unable to resolve 'dependency' 'FoundationModels' import FoundationModels" The error comes and goes while its compiling and then doesn't run the app. I have my target set to 26.0 (and can't go any higher) and am using Xcode 26 (17E192). Is anyone else having this issue? Thanks, Dan Uff
1
0
106
1d
iOS 26.4: Regressions in Foundation Models
After installing iOS 26.4 the Foundation Models instruction following and tool calling capabilities have been degraded significantly. The model is not usable anymore. Examples: This works: "Is the car plugged in?" This does not work: "Tell me if the car is plugged in" Anything with the work "frunk" (front trunk) triggers Guardrail Violation. Phrases like "Lock Pride" also trigger Guardrail Violation (Pride is the name of the car). Tool calling only works half the time for really obvious things.
3
0
403
4d
Plenty of LanguageModelSession.GenerationError.refusal errors after 26.4 update
Hello! After the 26.4 update I get a huge number of LanguageModelSession.GenerationError.refusal errors when using guided generation Generables for inexplicable reasons. Such errors also occur, if I want to cast a response to boolean by using 'generating: Bool.self'. The explanation generated on the grounds of the error always looks like this: Response(userPrompt: "", duration: 0.230917542, promptTokenCount: Optional(66), responseTokenCount: Optional(11), feedbackAttachment: nil, content: "I apologize, but I cannot fulfill this request.", rawContent: "I apologize, but I cannot fulfill this request.", transcriptEntries: ArraySlice([])) All the prompts and Generables I use are definitely not profane. Before 26.4 such errors on the same prompts and Generables never occurred. The 26.4 update rendered those features unusable to me. Is this a known bug or what am I doing wrong?
3
0
426
4d
Official One-Click Local LLM Deployment for 2019 Mac Pro (7,1) Dual W6900X
I am a professional user of the 2019 Mac Pro (7,1) with dual AMD Radeon Pro W6900X MPX modules (32GB VRAM each). This hardware is designed for high-performance compute, but it is currently crippled for modern local LLM/AI workloads under Linux due to Apple's EFI/PCIe routing restrictions. Core Issue: rocminfo reports "No HIP GPUs available" when attempting to use ROCm/amdgpu on Linux Apple's custom EFI firmware blocks full initialization of professional GPU compute assets The dual W6900X GPUs have 64GB combined VRAM and high-bandwidth Infinity Fabric Link, but cannot be fully utilized for local AI inference/training My Specific Request: Apple should provide an official, one-click deployable application that enables full utilization of dual W6900X GPUs for local large language model (LLM) inference and training under Linux. This application must: Fully initialize both W6900X GPUs via HIP/ROCm, establishing valid compute contexts Bypass artificial EFI/PCIe routing restrictions that block access to professional GPU resources Provide a stable, user-friendly one-click deployment experience (similar to NVIDIA's AI Enterprise or AMD's ROCm Hub) Why This Matters: The 2019 Mac Pro is Apple's flagship professional workstation, marketed for compute-intensive workloads. Its high-cost W6900X GPUs should not be locked down for modern AI/LLM use cases. An official one-click deployment solution would demonstrate Apple's commitment to professional AI and unlock significant value for professional users. I look forward to Apple's response and a clear roadmap for enabling this critical capability. #MacPro #Linux #ROCm #LocalLLM #W6900X #CoreML
0
0
113
5d
Request: Official One-Click Local LLM Deployment for 2019 Mac Pro (7,1) Dual W6900X
I am a professional user of the 2019 Mac Pro (7,1) with dual AMD Radeon Pro W6900X MPX modules (32GB VRAM each). This hardware is designed for high-performance compute, but it is currently crippled for modern local LLM/AI workloads under Linux due to Apple's EFI/PCIe routing restrictions. Core Issue: rocminfo reports "No HIP GPUs available" when attempting to use ROCm/amdgpu on Linux Apple's custom EFI firmware blocks full initialization of professional GPU compute assets The dual W6900X GPUs have 64GB combined VRAM and high-bandwidth Infinity Fabric Link, but cannot be fully utilized for local AI inference/training My Specific Request: Apple should provide an official, one-click deployable application that enables full utilization of dual W6900X GPUs for local large language model (LLM) inference and training under Linux. This application must: Fully initialize both W6900X GPUs via HIP/ROCm, establishing valid compute contexts Bypass artificial EFI/PCIe routing restrictions that block access to professional GPU resources Provide a stable, user-friendly one-click deployment experience (similar to NVIDIA's AI Enterprise or AMD's ROCm Hub) Why This Matters: The 2019 Mac Pro is Apple's flagship professional workstation, marketed for compute-intensive workloads. Its high-cost W6900X GPUs should not be locked down for modern AI/LLM use cases. An official one-click deployment solution would demonstrate Apple's commitment to professional AI and unlock significant value for professional users. I look forward to Apple's response and a clear roadmap for enabling this critical capability. #MacPro #Linux #ROCm #LocalLLM #W6900X #CoreML
0
0
79
5d
How does ARKit achieve low-latency and stable head tracking using only RGB camera ?
Hi, I’m working on a real-time head/face tracking pipeline using a standard 2D RGB camera, and I’m trying to better understand how ARKit achieves such stable and responsive results in comparable conditions. To clarify upfront: I’m specifically interested in RGB-only tracking and the underlying vision/ML pipeline. I’m not using TrueDepth or any depth/IR-based sensors, and I’d like to understand how similar stability and responsiveness can be achieved under those constraints. In my current setup, I estimate head pose from RGB frames (facial landmarks + PnP) and apply temporal filtering (e.g., exponential smoothing and Kalman filtering). This significantly reduces jitter, but introduces noticeable latency, especially during faster head movements. What stands out in ARKit is that it appears to maintain both: Very low jitter Very low perceived latency even when operating with camera input alone. I’m trying to understand what techniques might contribute to this behavior. In particular: Does ARKit use predictive tracking (e.g., velocity or acceleration-based pose extrapolation) to compensate for camera and processing delays in RGB-only scenarios? Are there recommended strategies for balancing temporal smoothing and responsiveness without introducing visible lag in camera-based pose estimation pipelines? Is the tracking pipeline internally decoupled from rendering (e.g., asynchronous processing with prediction applied at render time)? Are there general best practices for minimizing end-to-end latency in vision-based head tracking systems beyond standard filtering approaches? I understand that implementation details may not be public, but any high-level insights or pointers would be greatly appreciated. Thanks!
0
0
179
6d
AI framework usage without user session
We are evaluating various AI frameworks to use within our code, and are hoping to use some of the build-in frameworks in macOS including CoreML and Vision. However, we need to use these frameworks in a background process (system extension) that has no user session attached to it. (To be pedantic, we'll be using an XPC service that is spawned by the system extension, but neither would have an associated user session). Saying the daemon-safe frameworks list has not been updated in a while is an understatement, but it's all we have to go on. CoreGraphics isn't even listed--back then it part of ApplicationServices (I think?) and ApplicationServices is a no go. Vision does use CoreGraphics symbols and data types so I have doubts. We do have a POC that uses both frameworks and they seem to function fine but obviously having something official is better. Any Apple engineers that can comment on this?
1
0
174
1w
MPS SDPA Attention Kernel Regression on A14-class (M1) in macOS 26.3.1 — Works on A15+ (M2+)
Summary Since macOS 26, our Core ML / MPS inference pipeline produces incorrect results on Mac mini M1 (Macmini9,1, A14-class SoC). The same model and code runs correctly on M2 and newer (A15-class and up). The regression appears to be in the Scaled Dot-Product Attention (SDPA) kernel path in the MPS backend. Environment Affected Mac mini M1 — Macmini9,1 (A14-class) Not affected M2 and newer (A15-class and up) Last known good macOS Sequoia First broken macOS 26 (Tahoe) ? Confirmed broken on macOS 26.3.1 Framework Core ML + MPS backend Language C++ (via CoreML C++ API) Description We ship an audio processing application (VoiceAssist by NoiseWorks) that runs a deep learning model (based on Demucs architecture) via Core ML with the MPS compute unit. On macOS Sequoia this works correctly on all Apple Silicon Macs including M1. After updating to macOS 26 (Tahoe), inference on M1 Macs fails — either producing garbage output or crashing. The same binary, same .mlpackage, same inputs work correctly on M2+. Our Apple contact has suggested the root cause is a regression in the A14-specific MPS SDPA attention kernel, which may have broken when the Metal/MPS stack was updated in macOS 26. The model makes heavy use of attention layers, and the failure correlates precisely with the SDPA path being exercised on A14 hardware. Steps to Reproduce Load a Core ML model that uses Scaled Dot-Product Attention (e.g. a transformer or attention-based audio model) Run inference with MLComputeUnits::cpuAndGPU (MPS active) Run on Mac mini M1 (Macmini9,1) with macOS 26.3.1 Compare output to the same model running on M2 / macOS Sequoia Expected: Correct inference output, consistent with M2+ and macOS Sequoia behavior Actual: Incorrect / corrupted output (or crash), only on A14-class hardware running macOS 26+ Workaround Forcing MLComputeUnits::cpuOnly bypasses MPS entirely and produces correct output on M1, confirming the issue is in the MPS compute path. This is not acceptable as a shipping workaround due to performance impact. Additional Notes The failure is hardware-specific (A14 only) and OS-specific (macOS 26+), pointing to a kernel-level regression rather than a model or app bug We first became aware of this through a customer report Happy to provide a symbolicated crash log if helpful this text was summarized by AI and human verified
1
0
187
1w
Powermetrics GPU power vs system DC power discrepancy on M4 Max
While analyzing system power on an M4 Max under GPU-heavy compute workloads, I noticed that the the GPU power reported by powermetrics does not come anywhere close to total system DC power reported by the SMC counter PDTR (as used by utilities like mactop). For example, in a heavy GPU workload, powermetrics would report a 65W idle-load delta on the GPU, but at the same time system DC power would rise by 179W, leaving 114W or nearly 2/3 of total system DC power on a Mac Studio M4 Max unexplained. From measurements, the difference appears to correlate with the amount of on-chip data movement (for example, varying bytes-per-FLOP in the workload changes the observed gap). Using SMC and IOReport, I was able to reverse engineer an energy model for the GPU that explains almost all of the energy flow with less than 2% error on the workload I studied. The result is a simple two-term energy roofline model: P_GPU (GPU_combined term in the plot) ≈ a * bytes + b * FLOPs with: ~5 pJ/byte for SRAM movement ~2.7 pJ/FLOP for compute. Has anyone observed similar behavior, or is there guidance on how GPU power reported by IOReport/powermetrics should be interpreted relative to total system power? In particular, I’m interested in whether certain classes of GPU activity may not be attributed to the GPU component in IOReport. Full details with the methodology and results are available here: https://youtu.be/HKxIGgyeISM
0
0
60
1w
Programmatic image creation using ImageCreator
Hello, Could you please provide details for maximum string length of the prompt and the title when using ImageCreator and the method extracted(from:title:)? static func extracted( from text: String, title: String? = nil ) -> ImagePlaygroundConcept Any additional details or example of prompt and title would help. Additionally, are ImagePlaygroundStyle.animation, ImagePlaygroundStyle.illustration and ImagePlaygroundStyle.sketch all available when using extracted(from:title:)? I am trying to generate images programmatically and would appreciate your guidance. Thank you.
1
0
384
1w
Building Real-Time Voice Input on macOS 26 with SpeechAnalyzer + ScreenCaptureKit
We built an open-source macOS menu bar app that turns speech into text and pastes it into the active app — using SpeechAnalyzer for on-device transcription, ScreenCaptureKit + Vision for screen-aware context, and FluidAudio for speaker diarization in meeting mode. Here's what we learned shipping it on macOS 26. GitHub: github.com/Marvinngg/ambient-voice Architecture The app has two modes: hotkey dictation (press to talk, release to inject) and meeting recording (continuous transcription with a floating panel). Dictation Mode Audio capture uses AVCaptureSession (more on why below). The captured audio feeds into SpeechAnalyzer via an AsyncStream: let transcriber = SpeechTranscriber( locale: locale, transcriptionOptions: [], reportingOptions: [.volatileResults, .alternativeTranscriptions], attributeOptions: [.audioTimeRange, .transcriptionConfidence] ) let analyzer = SpeechAnalyzer(modules: [transcriber]) let (inputSequence, inputBuilder) = AsyncStream.makeStream() try await analyzer.start(inputSequence: inputSequence) While recording, we capture a screenshot of the focused window using ScreenCaptureKit, run Vision OCR (VNRecognizeTextRequest), extract keywords, and inject them into SpeechAnalyzer as contextual bias: let context = AnalysisContext() context.contextualStrings[.general] = ocrKeywords try await analyzer.setContext(context) This improves accuracy for technical terms and proper nouns visible on screen. If your screen shows "SpeechAnalyzer", saying it out loud is more likely to be transcribed correctly. After transcription, an optional L2 step sends the text through a local LLM (ollama) for spoken-to-written cleanup, then CGEvent simulates Cmd+V to paste into the active app. Meeting Mode Meeting mode forks the same audio stream to two consumers: SpeechAnalyzer — real-time streaming transcription, displayed in a floating NSPanel FluidAudio buffer — accumulates 16kHz Float32 mono samples for batch speaker diarization after recording stops When the user ends the meeting, FluidAudio's performCompleteDiarization() runs on the accumulated audio. We align transcription segments with speaker segments using audioTimeRange overlap matching — each transcription segment gets assigned the speaker ID with the most time overlap. Results export to Markdown. Pitfalls We Hit on macOS 26 1. AVAudioEngine installTap doesn't fire with Bluetooth devices We started with AVAudioEngine.inputNode.installTap() for audio capture. It worked fine with built-in mics but the tap callback never fired with Bluetooth devices (tested with vivo TWS 4 Hi-Fi). Fix: switched to AVCaptureSession. The delegate callback captureOutput(_:didOutput:from:) fires reliably regardless of audio device. The tradeoff is you get CMSampleBuffer instead of AVAudioPCMBuffer, so you need a conversion step. 2. NSEvent addGlobalMonitorForEvents crashes Our global hotkey listener used NSEvent.addGlobalMonitorForEvents. On macOS 26, this crashes with a Bus error inside GlobalObserverHandler — appears to be a Swift actor runtime issue. Fix: switched to CGEventTap. Works reliably, but the callback runs on a CFRunLoop context, which Swift doesn't recognize as MainActor. 3. CGEventTap callbacks aren't on MainActor If your CGEventTap callback touches any @MainActor state, you'll get concurrency violations. The callback runs on whatever thread owns the CFRunLoop. Fix: bridge with DispatchQueue.main.async {} inside the tap callback before touching any MainActor state. 4. CGPreflightScreenCaptureAccess doesn't request permission We used CGPreflightScreenCaptureAccess() as a guard before calling ScreenCaptureKit. If it returned false, we'd bail out. The problem: this function only checks — it never triggers macOS to add your app to the Screen Recording permission list. Chicken-and-egg: you can't get permission because you never ask for it. Fix: call CGRequestScreenCaptureAccess() at app startup. This adds your app to System Settings → Screen Recording. Then let ScreenCaptureKit calls proceed without the preflight guard — SCShareableContent will also trigger the permission prompt on first use. 5. Ad-hoc signing breaks TCC permissions on every rebuild During development, codesign --sign - (ad-hoc) generates a different code directory hash on every build. macOS TCC tracks permissions by this hash, so every rebuild = new app identity = all permissions reset. Fix: sign with a stable certificate. If you have an Apple Development certificate, use that. The TeamIdentifier stays constant across rebuilds, so TCC permissions persist. We also discovered that launching via open WE.app (LaunchServices) instead of directly executing the binary is required — otherwise macOS attributes TCC permissions to Terminal, not your app. Benchmarks We ran end-to-end benchmarks on public datasets (Mac Mini M4 16GB, macOS 26): Transcription (SpeechAnalyzer, AliMeeting Chinese): • Near-field CER 34% (excluding outliers ~25%) • Far-field CER 40% (single channel, no beamforming, >30% overlap) • Processing speed 74-89x real-time Speaker diarization (FluidAudio offline): • AMI English 16 meetings: avg DER 23.2% (collar=0.25s, ignoreOverlap=True) • AliMeeting Chinese 8 meetings: DER 48.5% (including overlap regions) • Memory: RSS ~500MB, peak 730-930MB Full evaluation methodology, scripts, and raw results are in the repo. Open Source The project is MIT licensed: github.com/Marvinngg/ambient-voice It includes the macOS client (Swift 6.2, SPM), server-side distillation/training scripts (Python), and a complete evaluation framework with reproducible benchmarks. Feedback and contributions welcome.
0
0
383
1w
Building Real-Time Voice Input on macOS 26 with SpeechAnalyzer + ScreenCaptureKit
We built an open-source macOS menu bar app that turns speech into text and pastes it into the active app — using SpeechAnalyzer for on-device transcription, ScreenCaptureKit + Vision for screen-aware context, and FluidAudio for speaker diarization in meeting mode. Here's what we learned shipping it on macOS 26. GitHub: github.com/Marvinngg/ambient-voice Architecture The app has two modes: hotkey dictation (press to talk, release to inject) and meeting recording (continuous transcription with a floating panel). Dictation Mode Audio capture uses AVCaptureSession (more on why below). The captured audio feeds into SpeechAnalyzer via an AsyncStream: let transcriber = SpeechTranscriber( locale: locale, transcriptionOptions: [], reportingOptions: [.volatileResults, .alternativeTranscriptions], attributeOptions: [.audioTimeRange, .transcriptionConfidence] ) let analyzer = SpeechAnalyzer(modules: [transcriber]) let (inputSequence, inputBuilder) = AsyncStream.makeStream() try await analyzer.start(inputSequence: inputSequence) While recording, we capture a screenshot of the focused window using ScreenCaptureKit, run Vision OCR (VNRecognizeTextRequest), extract keywords, and inject them into SpeechAnalyzer as contextual bias: let context = AnalysisContext() context.contextualStrings[.general] = ocrKeywords try await analyzer.setContext(context) This improves accuracy for technical terms and proper nouns visible on screen. If your screen shows "SpeechAnalyzer", saying it out loud is more likely to be transcribed correctly. After transcription, an optional L2 step sends the text through a local LLM (ollama) for spoken-to-written cleanup, then CGEvent simulates Cmd+V to paste into the active app. Meeting Mode Meeting mode forks the same audio stream to two consumers: SpeechAnalyzer — real-time streaming transcription, displayed in a floating NSPanel FluidAudio buffer — accumulates 16kHz Float32 mono samples for batch speaker diarization after recording stops When the user ends the meeting, FluidAudio's performCompleteDiarization() runs on the accumulated audio. We align transcription segments with speaker segments using audioTimeRange overlap matching — each transcription segment gets assigned the speaker ID with the most time overlap. Results export to Markdown. Pitfalls We Hit on macOS 26 1. AVAudioEngine installTap doesn't fire with Bluetooth devices We started with AVAudioEngine.inputNode.installTap() for audio capture. It worked fine with built-in mics but the tap callback never fired with Bluetooth devices (tested with vivo TWS 4 Hi-Fi). Fix: switched to AVCaptureSession. The delegate callback captureOutput(_:didOutput:from:) fires reliably regardless of audio device. The tradeoff is you get CMSampleBuffer instead of AVAudioPCMBuffer, so you need a conversion step. 2. NSEvent addGlobalMonitorForEvents crashes Our global hotkey listener used NSEvent.addGlobalMonitorForEvents. On macOS 26, this crashes with a Bus error inside GlobalObserverHandler — appears to be a Swift actor runtime issue. Fix: switched to CGEventTap. Works reliably, but the callback runs on a CFRunLoop context, which Swift doesn't recognize as MainActor. 3. CGEventTap callbacks aren't on MainActor If your CGEventTap callback touches any @MainActor state, you'll get concurrency violations. The callback runs on whatever thread owns the CFRunLoop. Fix: bridge with DispatchQueue.main.async {} inside the tap callback before touching any MainActor state. 4. CGPreflightScreenCaptureAccess doesn't request permission We used CGPreflightScreenCaptureAccess() as a guard before calling ScreenCaptureKit. If it returned false, we'd bail out. The problem: this function only checks — it never triggers macOS to add your app to the Screen Recording permission list. Chicken-and-egg: you can't get permission because you never ask for it. Fix: call CGRequestScreenCaptureAccess() at app startup. This adds your app to System Settings → Screen Recording. Then let ScreenCaptureKit calls proceed without the preflight guard — SCShareableContent will also trigger the permission prompt on first use. 5. Ad-hoc signing breaks TCC permissions on every rebuild During development, codesign --sign - (ad-hoc) generates a different code directory hash on every build. macOS TCC tracks permissions by this hash, so every rebuild = new app identity = all permissions reset. Fix: sign with a stable certificate. If you have an Apple Development certificate, use that. The TeamIdentifier stays constant across rebuilds, so TCC permissions persist. We also discovered that launching via open WE.app (LaunchServices) instead of directly executing the binary is required — otherwise macOS attributes TCC permissions to Terminal, not your app. Benchmarks We ran end-to-end benchmarks on public datasets (Mac Mini M4 16GB, macOS 26): Transcription (SpeechAnalyzer, AliMeeting Chinese): • Near-field CER 34% (excluding outliers ~25%) • Far-field CER 40% (single channel, no beamforming, >30% overlap) • Processing speed 74-89x real-time Speaker diarization (FluidAudio offline): • AMI English 16 meetings: avg DER 23.2% (collar=0.25s, ignoreOverlap=True) • AliMeeting Chinese 8 meetings: DER 48.5% (including overlap regions) • Memory: RSS ~500MB, peak 730-930MB Full evaluation methodology, scripts, and raw results are in the repo. Open Source The project is MIT licensed: github.com/Marvinngg/ambient-voice It includes the macOS client (Swift 6.2, SPM), server-side distillation/training scripts (Python), and a complete evaluation framework with reproducible benchmarks. Feedback and contributions welcome.
0
0
364
1w
Building a 4-agent autonomous coding pipeline on Apple Silicon — MLX backend questions
Hi, I'm building ANF (Autonomous Native Forge) — a cloud-free, 4-agent autonomous software production pipeline running on local hardware with local LLM inference. No middleware, pure Node.js native. Currently running on NVIDIA Blackwell GB10 with vLLM + DeepSeek-R1-32B. Now porting to Apple Silicon. Three technical questions: How production-ready is mlx-lm's OpenAI-compatible API server for long context generation (32K tokens)? What's the recommended approach for KV Cache management with Unified Memory architecture — any specific flags or configurations for M4 Ultra? MLX vs GGUF (llama.cpp) for a multi-agent pipeline where 4 agents call the inference endpoint concurrently — which handles parallel requests better on Apple Silicon? GitHub: github.com/trgysvc/AutonomousNativeForge Any guidance appreciated.
0
0
240
2w
Best approach for animating a speaking avatar in a macOS/iOS SwiftUI application
I am developing a macOS application using SwiftUI (with an iOS version as well). One feature we are exploring is displaying an avatar that reads or speaks dynamically generated text produced by an AI service. The basic flow would be: Text generated by an AI service Text converted to speech using a TTS engine An avatar (2D or 3D) rendered in the app that animates lip movement synchronized with the speech Ideally the avatar would render locally on the device. Questions: What Apple frameworks would be most appropriate for implementing a speaking avatar? SceneKit RealityKit SpriteKit (for 2D avatars) Is there any recommended way to drive lip-sync animation from speech audio using Apple frameworks? Does AVSpeechSynthesizer expose phoneme or viseme timing information that could be used for avatar animation? If such timing information is not available, what is the recommended approach for synchronizing character mouth animation with speech audio on macOS/iOS? Are there examples of real-time character animation synchronized with speech on macOS/iOS? Any architectural guidance or references would be greatly appreciated.
0
0
527
2w
Provide spoken voice search string
Hello, My goal is to enable users to perform a freeform search request for any product I sell using a spoken phrase, for example, "Hey Siri, search GAMING CONSOLES on MyCatalogApp". The result would launch MyCatalogApp and navigate to a search results page displaying gaming consoles. I have defined a SearchIntent (using the .system.search schema) and a Shortcut to accomplish this. However, Siri doesn't seem to be able to correctly parse the spoken phrase, extract the search string, and provide it as the critiria term within SearchIntent. What am I doing wrong? Here is the SearchIntent. Note the print() statement outputs the search string--which in the scenario above would be "GAMING CONSOLES"--but it doesn't work. import AppIntents @available(iOS 17.2, *) @AppIntent(schema: .system.search) struct SearchIntent: ShowInAppSearchResultsIntent { static var searchScopes: [StringSearchScope] = [.general] @Parameter(title: "Criteria") var criteria: StringSearchCriteria static var title: LocalizedStringResource = "Search with MyCatalogApp" @MainActor func perform() async throws -> some IntentResult { let searchString = criteria.term print("**** Search String: \(searchString) ****") // tmp debugging try await MyCatalogSearchHelper.search(for: searchString) // fetch results via my async fetch API return .result() } } Here's the Shortcuts definition: import AppIntents @available(iOS 17.2, *) struct Shortcuts: AppShortcutsProvider { @AppShortcutsBuilder static var appShortcuts: [AppShortcut] { AppShortcut( intent: SearchIntent(), phrases: ["Search for \(\.$criteria) on \(.applicationName)."], shortTitle: "Search", systemImageName: "magnifyingglass" ) } } Thanks for any help!
1
0
586
2w
New project with new AppIntent throws build error
I opened a new project, iOS app, in XCode and then tabbed into the system_search snippet and built the project and got a build error. I can't imagine this was intended, at least not for new developers to the ecosystem like me. I solved it by tweaking a configuration I don't really understand advised here: https://github.com/apple/swift-openapi-generator/issues/796, hopefully that's a valid workaround
2
0
444
3w
AppIntent search schema opens app as only option
I am trying to use @AppIntent(schema: .system.search) to search in my app via a Siri voice command, but I want to be able to return a .result that does not open the app, yet still get the model training benefits from the schema. Very new to this, this is my first app, so I would appreciate some guidance. I haven't gotten to the voice part, I tested on Shortcuts. Do I need to do AppIntents without the schema and wait until there is a search schema that does not open the app, or should I be using a different schema? What am I missing?
2
0
575
3w
A Summary of the WWDC25 Group Lab - Machine Learning and AI Frameworks
At WWDC25 we launched a new type of Lab event for the developer community - Group Labs. A Group Lab is a panel Q&A designed for a large audience of developers. Group Labs are a unique opportunity for the community to submit questions directly to a panel of Apple engineers and designers. Here are the highlights from the WWDC25 Group Lab for Machine Learning and AI Frameworks. What are you most excited about in the Foundation Models framework? The Foundation Models framework provides access to an on-device Large Language Model (LLM), enabling entirely on-device processing for intelligent features. This allows you to build features such as personalized search suggestions and dynamic NPC generation in games. The combination of guided generation and streaming capabilities is particularly exciting for creating delightful animations and features with reliable output. The seamless integration with SwiftUI and the new design material Liquid Glass is also a major advantage. When should I still bring my own LLM via CoreML? It's generally recommended to first explore Apple's built-in system models and APIs, including the Foundation Models framework, as they are highly optimized for Apple devices and cover a wide range of use cases. However, Core ML is still valuable if you need more control or choice over the specific model being deployed, such as customizing existing system models or augmenting prompts. Core ML provides the tools to get these models on-device, but you are responsible for model distribution and updates. Should I migrate PyTorch code to MLX? MLX is an open-source, general-purpose machine learning framework designed for Apple Silicon from the ground up. It offers a familiar API, similar to PyTorch, and supports C, C++, Python, and Swift. MLX emphasizes unified memory, a key feature of Apple Silicon hardware, which can improve performance. It's recommended to try MLX and see if its programming model and features better suit your application's needs. MLX shines when working with state-of-the-art, larger models. Can I test Foundation Models in Xcode simulator or device? Yes, you can use the Xcode simulator to test Foundation Models use cases. However, your Mac must be running macOS Tahoe. You can test on a physical iPhone running iOS 18 by connecting it to your Mac and running Playgrounds or live previews directly on the device. Which on-device models will be supported? any open source models? The Foundation Models framework currently supports Apple's first-party models only. This allows for platform-wide optimizations, improving battery life and reducing latency. While Core ML can be used to integrate open-source models, it's generally recommended to first explore the built-in system models and APIs provided by Apple, including those in the Vision, Natural Language, and Speech frameworks, as they are highly optimized for Apple devices. For frontier models, MLX can run very large models. How often will the Foundational Model be updated? How do we test for stability when the model is updated? The Foundation Model will be updated in sync with operating system updates. You can test your app against new model versions during the beta period by downloading the beta OS and running your app. It is highly recommended to create an "eval set" of golden prompts and responses to evaluate the performance of your features as the model changes or as you tweak your prompts. Report any unsatisfactory or satisfactory cases using Feedback Assistant. Which on-device model/API can I use to extract text data from images such as: nutrition labels, ingredient lists, cashier receipts, etc? Thank you. The Vision framework offers the RecognizeDocumentRequest which is specifically designed for these use cases. It not only recognizes text in images but also provides the structure of the document, such as rows in a receipt or the layout of a nutrition label. It can also identify data like phone numbers, addresses, and prices. What is the context window for the model? What are max tokens in and max tokens out? The context window for the Foundation Model is 4,096 tokens. The split between input and output tokens is flexible. For example, if you input 4,000 tokens, you'll have 96 tokens remaining for the output. The API takes in text, converting it to tokens under the hood. When estimating token count, a good rule of thumb is 3-4 characters per token for languages like English, and 1 character per token for languages like Japanese or Chinese. Handle potential errors gracefully by asking for shorter prompts or starting a new session if the token limit is exceeded. Is there a rate limit for Foundation Models API that is limited by power or temperature condition on the iPhone? Yes, there are rate limits, particularly when your app is in the background. A budget is allocated for background app usage, but exceeding it will result in rate-limiting errors. In the foreground, there is no rate limit unless the device is under heavy load (e.g., camera open, game mode). The system dynamically balances performance, battery life, and thermal conditions, which can affect the token throughput. Use appropriate quality of service settings for your tasks (e.g., background priority for background work) to help the system manage resources effectively. Do the foundation models support languages other than English? Yes, the on-device Foundation Model is multilingual and supports all languages supported by Apple Intelligence. To get the model to output in a specific language, prompt it with instructions indicating the user's preferred language using the locale API (e.g., "The user's preferred language is en-US"). Putting the instructions in English, but then putting the user prompt in the desired output language is a recommended practice. Are larger server-based models available through Foundation Models? No, the Foundation Models API currently only provides access to the on-device Large Language Model at the core of Apple Intelligence. It does not support server-side models. On-device models are preferred for privacy and for performance reasons. Is it possible to run Retrieval-Augmented Generation (RAG) using the Foundation Models framework? Yes, it is possible to run RAG on-device, but the Foundation Models framework does not include a built-in embedding model. You'll need to use a separate database to store vectors and implement nearest neighbor or cosine distance searches. The Natural Language framework offers simple word and sentence embeddings that can be used. Consider using a combination of Foundation Models and Core ML, using Core ML for your embedding model.
Replies
1
Boosts
0
Views
1.5k
Activity
Jun ’25
After loading my custom model - unsupportedTokenizer error
In Oct25, using mlx_lm.lora I created an adapter and a fused model uploaded to Huggingface. I was able to incorporate this model into my SwiftUI app using the mlx package. MLX-libraries 2.25.8. My base LLM was mlx-community/Mistral-7B-Instruct-v0.3-4bit. Looking at LLMModelFactory.swift the current version 2.29.1 the only changes are the addition of a few models. The earlier model was called: pharmpk/pk-mistral-7b-v0.3-4bit The new model is called: pharmpk/pk-mistral-2026-03-29 The base model (mlx-community/Mistral-7B-Instruct-v0.3-4bit.) must still be available. Could the error 'unsupportedTokenizer' be related to changes in the mlx package? I noticed mention of splitting the package into two parts but don't see anything at github. Feeling rather lost. Does anone have any thoguths and/or suggestions. Thanks, David
Replies
1
Boosts
0
Views
128
Activity
1d
CoreML MLE5ProgramLibrary AOT recompilation hangs/crashes on iOS 26.4 — C++ exception in espresso IR compiler bypasses Swift error handling
Area: CoreML / Machine Learning Describe the issue: On iOS 26.4, calling MLModel(contentsOf:configuration:) to load an .mlpackage model hangs indefinitely and eventually kills the app via watchdog. The same model loads and runs inference successfully in under 1 second on iOS 26.3.1. The hang occurs inside eort_eo_compiler_compile_from_ir_program (espresso) during on-device AOT recompilation triggered by MLE5ProgramLibraryOnDeviceAOTCompilationImpl createProgramLibraryHandleWithRespecialization:error:. A C++ exception (__cxa_throw) is thrown inside libBNNS.dylib during the exception unwind, which then hangs inside __cxxabiv1::dyn_cast_slow and __class_type_info::search_below_dst. Swift's try/catch does not catch this — the exception originates in C++ and the process hangs rather than terminating cleanly. Setting config.computeUnits = .cpuOnly does not resolve the issue. MLE5ProgramLibrary initialises as shared infrastructure regardless of compute units. Steps to reproduce: Create an app with an .mlpackage CoreML model using the MLE5/espresso backend Call MLModel(contentsOf: modelURL, configuration: config) at runtime Run on a device on iOS 26.3.1 — loads successfully in <1 second Update device to iOS 26.4 — hangs indefinitely, app killed by watchdog after 60–745 seconds Expected behaviour: Model loads successfully, or throws a catchable Swift error on failure. Actual behaviour: Process hangs in MLE5ProgramLibrary.lazyInitQueue. App killed by watchdog. No Swift error thrown. Full stack trace at point of hang: Thread 1 Queue: com.apple.coreml.MLE5ProgramLibrary.lazyInitQueue (serial) frame 0: __cxxabiv1::__class_type_info::search_below_dst libc++abi.dylib frame 1: __cxxabiv1::(anonymous namespace)::dyn_cast_slow libc++abi.dylib frame 2: ___lldb_unnamed_symbol_23ab44dd4 libBNNS.dylib frame 23: eort_eo_compiler_compile_from_ir_program espresso frame 24: -[MLE5ProgramLibraryOnDeviceAOTCompilationImpl createProgramLibraryHandleWithRespecialization:error:] CoreML frame 25: -[MLE5ProgramLibrary _programLibraryHandleWithForceRespecialization:error:] CoreML frame 26: __44-[MLE5ProgramLibrary prepareAndReturnError:]_block_invoke CoreML frame 27: _dispatch_client_callout libdispatch.dylib frame 28: _dispatch_lane_barrier_sync_invoke_and_complete libdispatch.dylib frame 29: -[MLE5ProgramLibrary prepareAndReturnError:] CoreML frame 30: -[MLE5Engine initWithContainer:configuration:error:] CoreML frame 31: +[MLE5Engine loadModelFromCompiledArchive:modelVersionInfo:compilerVersionInfo:configuration:error:] CoreML frame 32: +[MLLoader _loadModelWithClass:fromArchive:modelVersionInfo:compilerVersionInfo:configuration:error:] CoreML frame 45: +[MLModel modelWithContentsOfURL:configuration:error:] CoreML frame 46: @nonobjc MLModel.__allocating_init(contentsOf:configuration:) GKPersonalV2 frame 47: MDNA_GaitEncoder_v1_3.__allocating_init(contentsOf:configuration:) frame 48: MDNA_GaitEncoder_v1_3.__allocating_init(configuration:) frame 50: GaitModelInference.loadModel() frame 51: GaitModelInference.init() iOS version: Reproduced on iOS 26.4. Works correctly on iOS 26.3.1. Xcode version: 26.2 Device: iPhone (model used in testing) Model format: .mlpackage
Replies
1
Boosts
0
Views
141
Activity
1d
Unable to use FoundationModels in older app?
Hi, I'm trying to add FoundationModels to an older project but always get the following error: "Unable to resolve 'dependency' 'FoundationModels' import FoundationModels" The error comes and goes while its compiling and then doesn't run the app. I have my target set to 26.0 (and can't go any higher) and am using Xcode 26 (17E192). Is anyone else having this issue? Thanks, Dan Uff
Replies
1
Boosts
0
Views
106
Activity
1d
iOS 26.4: Regressions in Foundation Models
After installing iOS 26.4 the Foundation Models instruction following and tool calling capabilities have been degraded significantly. The model is not usable anymore. Examples: This works: "Is the car plugged in?" This does not work: "Tell me if the car is plugged in" Anything with the work "frunk" (front trunk) triggers Guardrail Violation. Phrases like "Lock Pride" also trigger Guardrail Violation (Pride is the name of the car). Tool calling only works half the time for really obvious things.
Replies
3
Boosts
0
Views
403
Activity
4d
Plenty of LanguageModelSession.GenerationError.refusal errors after 26.4 update
Hello! After the 26.4 update I get a huge number of LanguageModelSession.GenerationError.refusal errors when using guided generation Generables for inexplicable reasons. Such errors also occur, if I want to cast a response to boolean by using 'generating: Bool.self'. The explanation generated on the grounds of the error always looks like this: Response(userPrompt: "", duration: 0.230917542, promptTokenCount: Optional(66), responseTokenCount: Optional(11), feedbackAttachment: nil, content: "I apologize, but I cannot fulfill this request.", rawContent: "I apologize, but I cannot fulfill this request.", transcriptEntries: ArraySlice([])) All the prompts and Generables I use are definitely not profane. Before 26.4 such errors on the same prompts and Generables never occurred. The 26.4 update rendered those features unusable to me. Is this a known bug or what am I doing wrong?
Replies
3
Boosts
0
Views
426
Activity
4d
Official One-Click Local LLM Deployment for 2019 Mac Pro (7,1) Dual W6900X
I am a professional user of the 2019 Mac Pro (7,1) with dual AMD Radeon Pro W6900X MPX modules (32GB VRAM each). This hardware is designed for high-performance compute, but it is currently crippled for modern local LLM/AI workloads under Linux due to Apple's EFI/PCIe routing restrictions. Core Issue: rocminfo reports "No HIP GPUs available" when attempting to use ROCm/amdgpu on Linux Apple's custom EFI firmware blocks full initialization of professional GPU compute assets The dual W6900X GPUs have 64GB combined VRAM and high-bandwidth Infinity Fabric Link, but cannot be fully utilized for local AI inference/training My Specific Request: Apple should provide an official, one-click deployable application that enables full utilization of dual W6900X GPUs for local large language model (LLM) inference and training under Linux. This application must: Fully initialize both W6900X GPUs via HIP/ROCm, establishing valid compute contexts Bypass artificial EFI/PCIe routing restrictions that block access to professional GPU resources Provide a stable, user-friendly one-click deployment experience (similar to NVIDIA's AI Enterprise or AMD's ROCm Hub) Why This Matters: The 2019 Mac Pro is Apple's flagship professional workstation, marketed for compute-intensive workloads. Its high-cost W6900X GPUs should not be locked down for modern AI/LLM use cases. An official one-click deployment solution would demonstrate Apple's commitment to professional AI and unlock significant value for professional users. I look forward to Apple's response and a clear roadmap for enabling this critical capability. #MacPro #Linux #ROCm #LocalLLM #W6900X #CoreML
Replies
0
Boosts
0
Views
113
Activity
5d
Request: Official One-Click Local LLM Deployment for 2019 Mac Pro (7,1) Dual W6900X
I am a professional user of the 2019 Mac Pro (7,1) with dual AMD Radeon Pro W6900X MPX modules (32GB VRAM each). This hardware is designed for high-performance compute, but it is currently crippled for modern local LLM/AI workloads under Linux due to Apple's EFI/PCIe routing restrictions. Core Issue: rocminfo reports "No HIP GPUs available" when attempting to use ROCm/amdgpu on Linux Apple's custom EFI firmware blocks full initialization of professional GPU compute assets The dual W6900X GPUs have 64GB combined VRAM and high-bandwidth Infinity Fabric Link, but cannot be fully utilized for local AI inference/training My Specific Request: Apple should provide an official, one-click deployable application that enables full utilization of dual W6900X GPUs for local large language model (LLM) inference and training under Linux. This application must: Fully initialize both W6900X GPUs via HIP/ROCm, establishing valid compute contexts Bypass artificial EFI/PCIe routing restrictions that block access to professional GPU resources Provide a stable, user-friendly one-click deployment experience (similar to NVIDIA's AI Enterprise or AMD's ROCm Hub) Why This Matters: The 2019 Mac Pro is Apple's flagship professional workstation, marketed for compute-intensive workloads. Its high-cost W6900X GPUs should not be locked down for modern AI/LLM use cases. An official one-click deployment solution would demonstrate Apple's commitment to professional AI and unlock significant value for professional users. I look forward to Apple's response and a clear roadmap for enabling this critical capability. #MacPro #Linux #ROCm #LocalLLM #W6900X #CoreML
Replies
0
Boosts
0
Views
79
Activity
5d
How does ARKit achieve low-latency and stable head tracking using only RGB camera ?
Hi, I’m working on a real-time head/face tracking pipeline using a standard 2D RGB camera, and I’m trying to better understand how ARKit achieves such stable and responsive results in comparable conditions. To clarify upfront: I’m specifically interested in RGB-only tracking and the underlying vision/ML pipeline. I’m not using TrueDepth or any depth/IR-based sensors, and I’d like to understand how similar stability and responsiveness can be achieved under those constraints. In my current setup, I estimate head pose from RGB frames (facial landmarks + PnP) and apply temporal filtering (e.g., exponential smoothing and Kalman filtering). This significantly reduces jitter, but introduces noticeable latency, especially during faster head movements. What stands out in ARKit is that it appears to maintain both: Very low jitter Very low perceived latency even when operating with camera input alone. I’m trying to understand what techniques might contribute to this behavior. In particular: Does ARKit use predictive tracking (e.g., velocity or acceleration-based pose extrapolation) to compensate for camera and processing delays in RGB-only scenarios? Are there recommended strategies for balancing temporal smoothing and responsiveness without introducing visible lag in camera-based pose estimation pipelines? Is the tracking pipeline internally decoupled from rendering (e.g., asynchronous processing with prediction applied at render time)? Are there general best practices for minimizing end-to-end latency in vision-based head tracking systems beyond standard filtering approaches? I understand that implementation details may not be public, but any high-level insights or pointers would be greatly appreciated. Thanks!
Replies
0
Boosts
0
Views
179
Activity
6d
AI framework usage without user session
We are evaluating various AI frameworks to use within our code, and are hoping to use some of the build-in frameworks in macOS including CoreML and Vision. However, we need to use these frameworks in a background process (system extension) that has no user session attached to it. (To be pedantic, we'll be using an XPC service that is spawned by the system extension, but neither would have an associated user session). Saying the daemon-safe frameworks list has not been updated in a while is an understatement, but it's all we have to go on. CoreGraphics isn't even listed--back then it part of ApplicationServices (I think?) and ApplicationServices is a no go. Vision does use CoreGraphics symbols and data types so I have doubts. We do have a POC that uses both frameworks and they seem to function fine but obviously having something official is better. Any Apple engineers that can comment on this?
Replies
1
Boosts
0
Views
174
Activity
1w
MPS SDPA Attention Kernel Regression on A14-class (M1) in macOS 26.3.1 — Works on A15+ (M2+)
Summary Since macOS 26, our Core ML / MPS inference pipeline produces incorrect results on Mac mini M1 (Macmini9,1, A14-class SoC). The same model and code runs correctly on M2 and newer (A15-class and up). The regression appears to be in the Scaled Dot-Product Attention (SDPA) kernel path in the MPS backend. Environment Affected Mac mini M1 — Macmini9,1 (A14-class) Not affected M2 and newer (A15-class and up) Last known good macOS Sequoia First broken macOS 26 (Tahoe) ? Confirmed broken on macOS 26.3.1 Framework Core ML + MPS backend Language C++ (via CoreML C++ API) Description We ship an audio processing application (VoiceAssist by NoiseWorks) that runs a deep learning model (based on Demucs architecture) via Core ML with the MPS compute unit. On macOS Sequoia this works correctly on all Apple Silicon Macs including M1. After updating to macOS 26 (Tahoe), inference on M1 Macs fails — either producing garbage output or crashing. The same binary, same .mlpackage, same inputs work correctly on M2+. Our Apple contact has suggested the root cause is a regression in the A14-specific MPS SDPA attention kernel, which may have broken when the Metal/MPS stack was updated in macOS 26. The model makes heavy use of attention layers, and the failure correlates precisely with the SDPA path being exercised on A14 hardware. Steps to Reproduce Load a Core ML model that uses Scaled Dot-Product Attention (e.g. a transformer or attention-based audio model) Run inference with MLComputeUnits::cpuAndGPU (MPS active) Run on Mac mini M1 (Macmini9,1) with macOS 26.3.1 Compare output to the same model running on M2 / macOS Sequoia Expected: Correct inference output, consistent with M2+ and macOS Sequoia behavior Actual: Incorrect / corrupted output (or crash), only on A14-class hardware running macOS 26+ Workaround Forcing MLComputeUnits::cpuOnly bypasses MPS entirely and produces correct output on M1, confirming the issue is in the MPS compute path. This is not acceptable as a shipping workaround due to performance impact. Additional Notes The failure is hardware-specific (A14 only) and OS-specific (macOS 26+), pointing to a kernel-level regression rather than a model or app bug We first became aware of this through a customer report Happy to provide a symbolicated crash log if helpful this text was summarized by AI and human verified
Replies
1
Boosts
0
Views
187
Activity
1w
Powermetrics GPU power vs system DC power discrepancy on M4 Max
While analyzing system power on an M4 Max under GPU-heavy compute workloads, I noticed that the the GPU power reported by powermetrics does not come anywhere close to total system DC power reported by the SMC counter PDTR (as used by utilities like mactop). For example, in a heavy GPU workload, powermetrics would report a 65W idle-load delta on the GPU, but at the same time system DC power would rise by 179W, leaving 114W or nearly 2/3 of total system DC power on a Mac Studio M4 Max unexplained. From measurements, the difference appears to correlate with the amount of on-chip data movement (for example, varying bytes-per-FLOP in the workload changes the observed gap). Using SMC and IOReport, I was able to reverse engineer an energy model for the GPU that explains almost all of the energy flow with less than 2% error on the workload I studied. The result is a simple two-term energy roofline model: P_GPU (GPU_combined term in the plot) ≈ a * bytes + b * FLOPs with: ~5 pJ/byte for SRAM movement ~2.7 pJ/FLOP for compute. Has anyone observed similar behavior, or is there guidance on how GPU power reported by IOReport/powermetrics should be interpreted relative to total system power? In particular, I’m interested in whether certain classes of GPU activity may not be attributed to the GPU component in IOReport. Full details with the methodology and results are available here: https://youtu.be/HKxIGgyeISM
Replies
0
Boosts
0
Views
60
Activity
1w
Programmatic image creation using ImageCreator
Hello, Could you please provide details for maximum string length of the prompt and the title when using ImageCreator and the method extracted(from:title:)? static func extracted( from text: String, title: String? = nil ) -> ImagePlaygroundConcept Any additional details or example of prompt and title would help. Additionally, are ImagePlaygroundStyle.animation, ImagePlaygroundStyle.illustration and ImagePlaygroundStyle.sketch all available when using extracted(from:title:)? I am trying to generate images programmatically and would appreciate your guidance. Thank you.
Replies
1
Boosts
0
Views
384
Activity
1w
Building Real-Time Voice Input on macOS 26 with SpeechAnalyzer + ScreenCaptureKit
We built an open-source macOS menu bar app that turns speech into text and pastes it into the active app — using SpeechAnalyzer for on-device transcription, ScreenCaptureKit + Vision for screen-aware context, and FluidAudio for speaker diarization in meeting mode. Here's what we learned shipping it on macOS 26. GitHub: github.com/Marvinngg/ambient-voice Architecture The app has two modes: hotkey dictation (press to talk, release to inject) and meeting recording (continuous transcription with a floating panel). Dictation Mode Audio capture uses AVCaptureSession (more on why below). The captured audio feeds into SpeechAnalyzer via an AsyncStream: let transcriber = SpeechTranscriber( locale: locale, transcriptionOptions: [], reportingOptions: [.volatileResults, .alternativeTranscriptions], attributeOptions: [.audioTimeRange, .transcriptionConfidence] ) let analyzer = SpeechAnalyzer(modules: [transcriber]) let (inputSequence, inputBuilder) = AsyncStream.makeStream() try await analyzer.start(inputSequence: inputSequence) While recording, we capture a screenshot of the focused window using ScreenCaptureKit, run Vision OCR (VNRecognizeTextRequest), extract keywords, and inject them into SpeechAnalyzer as contextual bias: let context = AnalysisContext() context.contextualStrings[.general] = ocrKeywords try await analyzer.setContext(context) This improves accuracy for technical terms and proper nouns visible on screen. If your screen shows "SpeechAnalyzer", saying it out loud is more likely to be transcribed correctly. After transcription, an optional L2 step sends the text through a local LLM (ollama) for spoken-to-written cleanup, then CGEvent simulates Cmd+V to paste into the active app. Meeting Mode Meeting mode forks the same audio stream to two consumers: SpeechAnalyzer — real-time streaming transcription, displayed in a floating NSPanel FluidAudio buffer — accumulates 16kHz Float32 mono samples for batch speaker diarization after recording stops When the user ends the meeting, FluidAudio's performCompleteDiarization() runs on the accumulated audio. We align transcription segments with speaker segments using audioTimeRange overlap matching — each transcription segment gets assigned the speaker ID with the most time overlap. Results export to Markdown. Pitfalls We Hit on macOS 26 1. AVAudioEngine installTap doesn't fire with Bluetooth devices We started with AVAudioEngine.inputNode.installTap() for audio capture. It worked fine with built-in mics but the tap callback never fired with Bluetooth devices (tested with vivo TWS 4 Hi-Fi). Fix: switched to AVCaptureSession. The delegate callback captureOutput(_:didOutput:from:) fires reliably regardless of audio device. The tradeoff is you get CMSampleBuffer instead of AVAudioPCMBuffer, so you need a conversion step. 2. NSEvent addGlobalMonitorForEvents crashes Our global hotkey listener used NSEvent.addGlobalMonitorForEvents. On macOS 26, this crashes with a Bus error inside GlobalObserverHandler — appears to be a Swift actor runtime issue. Fix: switched to CGEventTap. Works reliably, but the callback runs on a CFRunLoop context, which Swift doesn't recognize as MainActor. 3. CGEventTap callbacks aren't on MainActor If your CGEventTap callback touches any @MainActor state, you'll get concurrency violations. The callback runs on whatever thread owns the CFRunLoop. Fix: bridge with DispatchQueue.main.async {} inside the tap callback before touching any MainActor state. 4. CGPreflightScreenCaptureAccess doesn't request permission We used CGPreflightScreenCaptureAccess() as a guard before calling ScreenCaptureKit. If it returned false, we'd bail out. The problem: this function only checks — it never triggers macOS to add your app to the Screen Recording permission list. Chicken-and-egg: you can't get permission because you never ask for it. Fix: call CGRequestScreenCaptureAccess() at app startup. This adds your app to System Settings → Screen Recording. Then let ScreenCaptureKit calls proceed without the preflight guard — SCShareableContent will also trigger the permission prompt on first use. 5. Ad-hoc signing breaks TCC permissions on every rebuild During development, codesign --sign - (ad-hoc) generates a different code directory hash on every build. macOS TCC tracks permissions by this hash, so every rebuild = new app identity = all permissions reset. Fix: sign with a stable certificate. If you have an Apple Development certificate, use that. The TeamIdentifier stays constant across rebuilds, so TCC permissions persist. We also discovered that launching via open WE.app (LaunchServices) instead of directly executing the binary is required — otherwise macOS attributes TCC permissions to Terminal, not your app. Benchmarks We ran end-to-end benchmarks on public datasets (Mac Mini M4 16GB, macOS 26): Transcription (SpeechAnalyzer, AliMeeting Chinese): • Near-field CER 34% (excluding outliers ~25%) • Far-field CER 40% (single channel, no beamforming, >30% overlap) • Processing speed 74-89x real-time Speaker diarization (FluidAudio offline): • AMI English 16 meetings: avg DER 23.2% (collar=0.25s, ignoreOverlap=True) • AliMeeting Chinese 8 meetings: DER 48.5% (including overlap regions) • Memory: RSS ~500MB, peak 730-930MB Full evaluation methodology, scripts, and raw results are in the repo. Open Source The project is MIT licensed: github.com/Marvinngg/ambient-voice It includes the macOS client (Swift 6.2, SPM), server-side distillation/training scripts (Python), and a complete evaluation framework with reproducible benchmarks. Feedback and contributions welcome.
Replies
0
Boosts
0
Views
383
Activity
1w
Building Real-Time Voice Input on macOS 26 with SpeechAnalyzer + ScreenCaptureKit
We built an open-source macOS menu bar app that turns speech into text and pastes it into the active app — using SpeechAnalyzer for on-device transcription, ScreenCaptureKit + Vision for screen-aware context, and FluidAudio for speaker diarization in meeting mode. Here's what we learned shipping it on macOS 26. GitHub: github.com/Marvinngg/ambient-voice Architecture The app has two modes: hotkey dictation (press to talk, release to inject) and meeting recording (continuous transcription with a floating panel). Dictation Mode Audio capture uses AVCaptureSession (more on why below). The captured audio feeds into SpeechAnalyzer via an AsyncStream: let transcriber = SpeechTranscriber( locale: locale, transcriptionOptions: [], reportingOptions: [.volatileResults, .alternativeTranscriptions], attributeOptions: [.audioTimeRange, .transcriptionConfidence] ) let analyzer = SpeechAnalyzer(modules: [transcriber]) let (inputSequence, inputBuilder) = AsyncStream.makeStream() try await analyzer.start(inputSequence: inputSequence) While recording, we capture a screenshot of the focused window using ScreenCaptureKit, run Vision OCR (VNRecognizeTextRequest), extract keywords, and inject them into SpeechAnalyzer as contextual bias: let context = AnalysisContext() context.contextualStrings[.general] = ocrKeywords try await analyzer.setContext(context) This improves accuracy for technical terms and proper nouns visible on screen. If your screen shows "SpeechAnalyzer", saying it out loud is more likely to be transcribed correctly. After transcription, an optional L2 step sends the text through a local LLM (ollama) for spoken-to-written cleanup, then CGEvent simulates Cmd+V to paste into the active app. Meeting Mode Meeting mode forks the same audio stream to two consumers: SpeechAnalyzer — real-time streaming transcription, displayed in a floating NSPanel FluidAudio buffer — accumulates 16kHz Float32 mono samples for batch speaker diarization after recording stops When the user ends the meeting, FluidAudio's performCompleteDiarization() runs on the accumulated audio. We align transcription segments with speaker segments using audioTimeRange overlap matching — each transcription segment gets assigned the speaker ID with the most time overlap. Results export to Markdown. Pitfalls We Hit on macOS 26 1. AVAudioEngine installTap doesn't fire with Bluetooth devices We started with AVAudioEngine.inputNode.installTap() for audio capture. It worked fine with built-in mics but the tap callback never fired with Bluetooth devices (tested with vivo TWS 4 Hi-Fi). Fix: switched to AVCaptureSession. The delegate callback captureOutput(_:didOutput:from:) fires reliably regardless of audio device. The tradeoff is you get CMSampleBuffer instead of AVAudioPCMBuffer, so you need a conversion step. 2. NSEvent addGlobalMonitorForEvents crashes Our global hotkey listener used NSEvent.addGlobalMonitorForEvents. On macOS 26, this crashes with a Bus error inside GlobalObserverHandler — appears to be a Swift actor runtime issue. Fix: switched to CGEventTap. Works reliably, but the callback runs on a CFRunLoop context, which Swift doesn't recognize as MainActor. 3. CGEventTap callbacks aren't on MainActor If your CGEventTap callback touches any @MainActor state, you'll get concurrency violations. The callback runs on whatever thread owns the CFRunLoop. Fix: bridge with DispatchQueue.main.async {} inside the tap callback before touching any MainActor state. 4. CGPreflightScreenCaptureAccess doesn't request permission We used CGPreflightScreenCaptureAccess() as a guard before calling ScreenCaptureKit. If it returned false, we'd bail out. The problem: this function only checks — it never triggers macOS to add your app to the Screen Recording permission list. Chicken-and-egg: you can't get permission because you never ask for it. Fix: call CGRequestScreenCaptureAccess() at app startup. This adds your app to System Settings → Screen Recording. Then let ScreenCaptureKit calls proceed without the preflight guard — SCShareableContent will also trigger the permission prompt on first use. 5. Ad-hoc signing breaks TCC permissions on every rebuild During development, codesign --sign - (ad-hoc) generates a different code directory hash on every build. macOS TCC tracks permissions by this hash, so every rebuild = new app identity = all permissions reset. Fix: sign with a stable certificate. If you have an Apple Development certificate, use that. The TeamIdentifier stays constant across rebuilds, so TCC permissions persist. We also discovered that launching via open WE.app (LaunchServices) instead of directly executing the binary is required — otherwise macOS attributes TCC permissions to Terminal, not your app. Benchmarks We ran end-to-end benchmarks on public datasets (Mac Mini M4 16GB, macOS 26): Transcription (SpeechAnalyzer, AliMeeting Chinese): • Near-field CER 34% (excluding outliers ~25%) • Far-field CER 40% (single channel, no beamforming, >30% overlap) • Processing speed 74-89x real-time Speaker diarization (FluidAudio offline): • AMI English 16 meetings: avg DER 23.2% (collar=0.25s, ignoreOverlap=True) • AliMeeting Chinese 8 meetings: DER 48.5% (including overlap regions) • Memory: RSS ~500MB, peak 730-930MB Full evaluation methodology, scripts, and raw results are in the repo. Open Source The project is MIT licensed: github.com/Marvinngg/ambient-voice It includes the macOS client (Swift 6.2, SPM), server-side distillation/training scripts (Python), and a complete evaluation framework with reproducible benchmarks. Feedback and contributions welcome.
Replies
0
Boosts
0
Views
364
Activity
1w
Ideal and Largest RDMA Burst Width
In macOS Tahoe 26.2 an RDMA capability was added for Thunderbolt-5 interfaces. This has been demonstrated to significantly decrease the latency and maintain bandwidth for "clustered" Apple Silicon devices with TB5. What is the ideal and the maximum RDMA burst width for transfers over RDMA-enabled Thunderbolt-5 interfaces?
Replies
4
Boosts
0
Views
321
Activity
1w
Building a 4-agent autonomous coding pipeline on Apple Silicon — MLX backend questions
Hi, I'm building ANF (Autonomous Native Forge) — a cloud-free, 4-agent autonomous software production pipeline running on local hardware with local LLM inference. No middleware, pure Node.js native. Currently running on NVIDIA Blackwell GB10 with vLLM + DeepSeek-R1-32B. Now porting to Apple Silicon. Three technical questions: How production-ready is mlx-lm's OpenAI-compatible API server for long context generation (32K tokens)? What's the recommended approach for KV Cache management with Unified Memory architecture — any specific flags or configurations for M4 Ultra? MLX vs GGUF (llama.cpp) for a multi-agent pipeline where 4 agents call the inference endpoint concurrently — which handles parallel requests better on Apple Silicon? GitHub: github.com/trgysvc/AutonomousNativeForge Any guidance appreciated.
Replies
0
Boosts
0
Views
240
Activity
2w
Best approach for animating a speaking avatar in a macOS/iOS SwiftUI application
I am developing a macOS application using SwiftUI (with an iOS version as well). One feature we are exploring is displaying an avatar that reads or speaks dynamically generated text produced by an AI service. The basic flow would be: Text generated by an AI service Text converted to speech using a TTS engine An avatar (2D or 3D) rendered in the app that animates lip movement synchronized with the speech Ideally the avatar would render locally on the device. Questions: What Apple frameworks would be most appropriate for implementing a speaking avatar? SceneKit RealityKit SpriteKit (for 2D avatars) Is there any recommended way to drive lip-sync animation from speech audio using Apple frameworks? Does AVSpeechSynthesizer expose phoneme or viseme timing information that could be used for avatar animation? If such timing information is not available, what is the recommended approach for synchronizing character mouth animation with speech audio on macOS/iOS? Are there examples of real-time character animation synchronized with speech on macOS/iOS? Any architectural guidance or references would be greatly appreciated.
Replies
0
Boosts
0
Views
527
Activity
2w
Provide spoken voice search string
Hello, My goal is to enable users to perform a freeform search request for any product I sell using a spoken phrase, for example, "Hey Siri, search GAMING CONSOLES on MyCatalogApp". The result would launch MyCatalogApp and navigate to a search results page displaying gaming consoles. I have defined a SearchIntent (using the .system.search schema) and a Shortcut to accomplish this. However, Siri doesn't seem to be able to correctly parse the spoken phrase, extract the search string, and provide it as the critiria term within SearchIntent. What am I doing wrong? Here is the SearchIntent. Note the print() statement outputs the search string--which in the scenario above would be "GAMING CONSOLES"--but it doesn't work. import AppIntents @available(iOS 17.2, *) @AppIntent(schema: .system.search) struct SearchIntent: ShowInAppSearchResultsIntent { static var searchScopes: [StringSearchScope] = [.general] @Parameter(title: "Criteria") var criteria: StringSearchCriteria static var title: LocalizedStringResource = "Search with MyCatalogApp" @MainActor func perform() async throws -> some IntentResult { let searchString = criteria.term print("**** Search String: \(searchString) ****") // tmp debugging try await MyCatalogSearchHelper.search(for: searchString) // fetch results via my async fetch API return .result() } } Here's the Shortcuts definition: import AppIntents @available(iOS 17.2, *) struct Shortcuts: AppShortcutsProvider { @AppShortcutsBuilder static var appShortcuts: [AppShortcut] { AppShortcut( intent: SearchIntent(), phrases: ["Search for \(\.$criteria) on \(.applicationName)."], shortTitle: "Search", systemImageName: "magnifyingglass" ) } } Thanks for any help!
Replies
1
Boosts
0
Views
586
Activity
2w
New project with new AppIntent throws build error
I opened a new project, iOS app, in XCode and then tabbed into the system_search snippet and built the project and got a build error. I can't imagine this was intended, at least not for new developers to the ecosystem like me. I solved it by tweaking a configuration I don't really understand advised here: https://github.com/apple/swift-openapi-generator/issues/796, hopefully that's a valid workaround
Replies
2
Boosts
0
Views
444
Activity
3w
AppIntent search schema opens app as only option
I am trying to use @AppIntent(schema: .system.search) to search in my app via a Siri voice command, but I want to be able to return a .result that does not open the app, yet still get the model training benefits from the schema. Very new to this, this is my first app, so I would appreciate some guidance. I haven't gotten to the voice part, I tested on Shortcuts. Do I need to do AppIntents without the schema and wait until there is a search schema that does not open the app, or should I be using a different schema? What am I missing?
Replies
2
Boosts
0
Views
575
Activity
3w