General

Explore the power of machine learning within apps. Discuss integrating machine learning features, share best practices, and explore the possibilities for your app.

Post

Replies

Boosts

Views

Activity

RecognizeDocumentsRequest not detecting paragraphs

I'm trying the new RecognizeDocumentsRequest supposed to detect paragraphs (among other things) in a document. I tried many source images, and I don't see the slightest difference compared to the old API (VN)RecognizedTextRequest Is it supposed to not work or is it in beta?

Machine Learning & AI General Vision

436

Jan ’26

Translation Framework: Code 16 "Offline models not available" despite status showing .installed

Hi everyone, I'm experiencing an inconsistent behavior with the Translation framework on iOS 18. The LanguageAvailability.status() API reports language models as .installed, but translation fails with Code 16. Setup: Using translationTask modifier with TranslationSession Batch translation with explicit source/target languages Languages: Portuguese→English, German→English Issue: let status = await LanguageAvailability().status(from: sourceLang, to: targetLang) // Returns: .installed // But translation fails: let responses = try await session.translations(from: requests) // Error: TranslationErrorDomain Code=16 "Offline models not available" Logs: Language model installed: pt -> en Language model installed: de -> en Starting translation: de -> en Error Domain=TranslationErrorDomain Code=16 "Translation failed"NSLocalizedFailureReason=Offline models not available for language pair What I've tried: Re-downloading languages in Settings Using source: nil for auto-detection Fresh TranslationSession.Configuration each time Questions: Is there a way to force model re-validation/re-download programmatically? Should translationTask show download popup when Code 16 occurs? Has anyone found a reliable workaround? I've seen similar reports in threads 791357 and 777113. Any guidance appreciated! Thanks!

Machine Learning & AI General

470

Jan ’26

Core Image for depth maps & segmentation masks: numeric fidelity issues when rendering CIImage to CVPixelBuffer (looking for Architecture suggestions)

Hello All, I’m working on a computer-vision–heavy iOS application that uses the camera, LiDAR depth maps, and semantic segmentation to reason about the environment (object identification, localization and measurement - not just visualization). Current architecture I initially built the image pipeline around CIImage as a unifying abstraction. It seemed like a good idea because: CIImage integrates cleanly with Vision, ARKit, AVFoundation, Metal, Core Graphics, etc. It provides a rich set of out-of-the-box transforms and filters. It is immutable and thread-safe, which significantly simplified concurrency in a multi-queue pipeline. The LiDAR depth maps, semantic segmentation masks, etc. were treated as CIImages, with conversion to CVPixelBuffer or MTLTexture only at the edges when required. Problem I’ve run into cases where Core Image transformations do not preserve numeric fidelity for non-visual data. Example: Rendering a CIImage-backed segmentation mask into a larger CVPixelBuffer can cause label values to change in predictable but incorrect ways. This occurs even when: using nearest-neighbor sampling disabling color management (workingColorSpace / outputColorSpace = NSNull) applying identity or simple affine transforms I’ve confirmed via controlled tests that: Metal → CVPixelBuffer paths preserve values correctly CIImage → CVPixelBuffer paths can introduce value changes when resampling or expanding the render target This makes CIImage unsafe as a source of numeric truth for segmentation masks and depth-based logic, even though it works well for visualization, and I should have realized this much sooner. Direction I’m considering I’m now considering refactoring toward more intent-based abstractions instead of a single image type, for example: Visual images: CIImage (camera frames, overlays, debugging, UI) Scalar fields: depth / confidence maps backed by CVPixelBuffer + Metal Label maps: segmentation masks backed by integer-preserving buffers (no interpolation, no transforms) In this model, CIImage would still be used extensively — but primarily for visualization and perceptual processing, not as the container for numerically sensitive data. Thread safety concern One of the original advantages of CIImage was that it is thread-safe by design, and that was my biggest incentive. For CVPixelBuffer / MTLTexture–backed data, I’m considering enforcing thread safety explicitly via: Swift Concurrency (actor-owned data, explicit ownership) Questions For those may have experience with CV / AR / imaging-heavy iOS apps, I was hoping to know the following: Is this separation of image intent (visual vs numeric vs categorical) a reasonable architectural direction? Do you generally keep CIImage at the heart of your pipeline, or push it to the edges (visualization only)? How do you manage thread safety and ownership when working heavily with CVPixelBuffer and Metal? Using actor-based abstractions, GCD, or adhoc? Are there any best practices or gotchas around using Core Image with depth maps or segmentation masks that I should be aware of? I’d really appreciate any guidance or experience-based advice. I suspect I’ve hit a boundary of Core Image’s design, and I’m trying to refactor in a way that doesn't involve too much immediate tech debt, remains robust and maintainable long-term. Thank you in advance!

Machine Learning & AI General Metal Vision Core Video Core Image

506

Feb ’26

Tensorflow metal: Issue using assign operation on MacBook M4

I get the following error when running this command in a Jupyter notebook: v = tf.Variable(initial_value=tf.random.normal(shape=(3, 1))) v[0, 0].assign(3.) Environment: python == 3.11.14 tensorflow==2.19.1 tensorflow-metal==1.2.0 { "name": "InvalidArgumentError", "message": "Cannot assign a device for operation ResourceStridedSliceAssign: Could not satisfy explicit device specification '/job:localhost/replica:0/task:0/device:GPU:0' because no supported kernel for GPU devices is available.\nColocation Debug Info:\nColocation group had the following types and supported devices: \nRoot Member(assigned_device_name_index_=1 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]\nResourceStridedSliceAssign: CPU \n_Arg: GPU CPU \n\nColocation members, user-requested devices, and framework assigned devices, if any:\n ref (_Arg) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0\n ResourceStridedSliceAssign (ResourceStridedSliceAssign) /job:localhost/replica:0/task:0/device:GPU:0\n\nOp: ResourceStridedSliceAssign\n [...] [[{{node ResourceStridedSliceAssign}}]] [Op:ResourceStridedSliceAssign] name: strided_slice/_assign" } It seems like the ResourceStridedSliceAssign operation is not implemented for the GPU

Machine Learning & AI General tensorflow-metal

205

Feb ’26

Apple OCR framework seems to be holding on to allocations every time it is called.

Environment: macOS 26.2 (Tahoe) Xcode 16.3 Apple Silicon (M4) Sandboxed Mac App Store app Description: Repeated use of VNRecognizeTextRequest causes permanent memory growth in the host process. The physical footprint increases by approximately 3-15 MB per OCR call and never returns to baseline, even after all references to the request, handler, observations, and image are released. ` private func selectAndProcessImage() { let panel = NSOpenPanel() panel.allowedContentTypes = [.image] panel.allowsMultipleSelection = false panel.canChooseDirectories = false panel.message = "Select an image for OCR processing" guard panel.runModal() == .OK, let url = panel.url else { return } selectedImageURL = url isProcessing = true recognizedText = "Processing..." // Run OCR on a background thread to keep UI responsive let workItem = DispatchWorkItem { let result = performOCR(on: url) DispatchQueue.main.async { recognizedText = result isProcessing = false } } DispatchQueue.global(qos: .userInitiated).async(execute: workItem) } private func performOCR(on url: URL) -> String { // Wrap EVERYTHING in autoreleasepool so all ObjC objects are drained immediately let resultText: String = autoreleasepool { // Load image and convert to CVPixelBuffer for explicit memory control guard let imageData = try? Data(contentsOf: url) else { return "Error: Could not read image file." } guard let nsImage = NSImage(data: imageData) else { return "Error: Could not create image from file data." } guard let cgImage = nsImage.cgImage(forProposedRect: nil, context: nil, hints: nil) else { return "Error: Could not create CGImage." } let width = cgImage.width let height = cgImage.height // Create a CVPixelBuffer from the CGImage var pixelBuffer: CVPixelBuffer? let attrs: [String: Any] = [ kCVPixelBufferCGImageCompatibilityKey as String: true, kCVPixelBufferCGBitmapContextCompatibilityKey as String: true ] let status = CVPixelBufferCreate( kCFAllocatorDefault, width, height, kCVPixelFormatType_32ARGB, attrs as CFDictionary, &pixelBuffer ) guard status == kCVReturnSuccess, let buffer = pixelBuffer else { return "Error: Could not create CVPixelBuffer (status: \(status))." } // Draw the CGImage into the pixel buffer CVPixelBufferLockBaseAddress(buffer, []) guard let context = CGContext( data: CVPixelBufferGetBaseAddress(buffer), width: width, height: height, bitsPerComponent: 8, bytesPerRow: CVPixelBufferGetBytesPerRow(buffer), space: CGColorSpaceCreateDeviceRGB(), bitmapInfo: CGImageAlphaInfo.noneSkipFirst.rawValue ) else { CVPixelBufferUnlockBaseAddress(buffer, []) return "Error: Could not create CGContext for pixel buffer." } context.draw(cgImage, in: CGRect(x: 0, y: 0, width: width, height: height)) CVPixelBufferUnlockBaseAddress(buffer, []) // Run OCR let requestHandler = VNImageRequestHandler(cvPixelBuffer: buffer, options: [:]) let request = VNRecognizeTextRequest() request.recognitionLevel = .accurate request.usesLanguageCorrection = true do { try requestHandler.perform([request]) } catch { return "Error during OCR: \(error.localizedDescription)" } guard let observations = request.results, !observations.isEmpty else { return "No text found in image." } let lines = observations.compactMap { observation in observation.topCandidates(1).first?.string } // Explicitly nil out the pixel buffer before the pool drains pixelBuffer = nil return lines.joined(separator: "\n") } // Everything — Data, NSImage, CGImage, CVPixelBuffer, VN objects — released here return resultText } `

Machine Learning & AI General Vision

195

Feb ’26

Subject: Technical Report: Float32 Precision Ceiling & Memory Fragmentation in JAX/Metal Workloads on M3

Subject: Technical Report: Float32 Precision Ceiling & Memory Fragmentation in JAX/Metal Workloads on M3 To: Metal Developer Relations Hello, I am reporting a repeatable numerical saturation point encountered during sustained recursive high-order differential workloads on the Apple M3 (16 GB unified memory) using the JAX Metal backend. Workload Characteristics: Large-scale vector projections across multi-dimensional industrial datasets Repeated high-order finite-difference calculations Heavy use of jax.grad and lax.cond inside long-running loops Observation: Under these conditions, the Metal/MPS backend consistently enters a terminal quantization lock where outputs saturate at a fixed scalar value (2.0000), followed by system-wide NaN propagation. This appears to be a precision-limited boundary in the JAX-Metal bridge when handling high-order operations with cubic time-scale denominators. have identified the specific threshold where recursive high-order tensor derivatives exceed the numerical resolution of 32-bit consumer architectures, necessitating a migration to a dedicated 64-bit industrial stack. I have prepared a minimal synthetic test script (randomized vectors only, no proprietary logic) that reliably reproduces the allocator fragmentation and saturation behavior. Let me know if your team would like the telemetry for XLA/MPS optimization purposes. Best regards, Alex Severson Architect, QuantumPulse AI

Machine Learning & AI General ML Compute Machine Learning tensorflow-metal

373

Mar ’26

reinforcement learning from Apple?

I don't know if these forums are any good for rumors or plans, but does anybody know whether or not Apple plans to release a library for training reinforcement learning? It would be handy, implementing games in Swift, for example, to be able to train the computer players on the same code.

Machine Learning & AI General Core ML

487

Mar ’26

Provide spoken voice search string

Hello, My goal is to enable users to perform a freeform search request for any product I sell using a spoken phrase, for example, "Hey Siri, search GAMING CONSOLES on MyCatalogApp". The result would launch MyCatalogApp and navigate to a search results page displaying gaming consoles. I have defined a SearchIntent (using the .system.search schema) and a Shortcut to accomplish this. However, Siri doesn't seem to be able to correctly parse the spoken phrase, extract the search string, and provide it as the critiria term within SearchIntent. What am I doing wrong? Here is the SearchIntent. Note the print() statement outputs the search string--which in the scenario above would be "GAMING CONSOLES"--but it doesn't work. import AppIntents @available(iOS 17.2, *) @AppIntent(schema: .system.search) struct SearchIntent: ShowInAppSearchResultsIntent { static var searchScopes: [StringSearchScope] = [.general] @Parameter(title: "Criteria") var criteria: StringSearchCriteria static var title: LocalizedStringResource = "Search with MyCatalogApp" @MainActor func perform() async throws -> some IntentResult { let searchString = criteria.term print("**** Search String: \(searchString) ****") // tmp debugging try await MyCatalogSearchHelper.search(for: searchString) // fetch results via my async fetch API return .result() } } Here's the Shortcuts definition: import AppIntents @available(iOS 17.2, *) struct Shortcuts: AppShortcutsProvider { @AppShortcutsBuilder static var appShortcuts: [AppShortcut] { AppShortcut( intent: SearchIntent(), phrases: ["Search for \(\.$criteria) on \(.applicationName)."], shortTitle: "Search", systemImageName: "magnifyingglass" ) } } Thanks for any help!

Machine Learning & AI General Siri and Voice Shortcuts App Intents

798

Mar ’26

Building Real-Time Voice Input on macOS 26 with SpeechAnalyzer + ScreenCaptureKit

We built an open-source macOS menu bar app that turns speech into text and pastes it into the active app — using SpeechAnalyzer for on-device transcription, ScreenCaptureKit + Vision for screen-aware context, and FluidAudio for speaker diarization in meeting mode. Here's what we learned shipping it on macOS 26. GitHub: github.com/Marvinngg/ambient-voice Architecture The app has two modes: hotkey dictation (press to talk, release to inject) and meeting recording (continuous transcription with a floating panel). Dictation Mode Audio capture uses AVCaptureSession (more on why below). The captured audio feeds into SpeechAnalyzer via an AsyncStream: let transcriber = SpeechTranscriber( locale: locale, transcriptionOptions: [], reportingOptions: [.volatileResults, .alternativeTranscriptions], attributeOptions: [.audioTimeRange, .transcriptionConfidence] ) let analyzer = SpeechAnalyzer(modules: [transcriber]) let (inputSequence, inputBuilder) = AsyncStream.makeStream() try await analyzer.start(inputSequence: inputSequence) While recording, we capture a screenshot of the focused window using ScreenCaptureKit, run Vision OCR (VNRecognizeTextRequest), extract keywords, and inject them into SpeechAnalyzer as contextual bias: let context = AnalysisContext() context.contextualStrings[.general] = ocrKeywords try await analyzer.setContext(context) This improves accuracy for technical terms and proper nouns visible on screen. If your screen shows "SpeechAnalyzer", saying it out loud is more likely to be transcribed correctly. After transcription, an optional L2 step sends the text through a local LLM (ollama) for spoken-to-written cleanup, then CGEvent simulates Cmd+V to paste into the active app. Meeting Mode Meeting mode forks the same audio stream to two consumers: SpeechAnalyzer — real-time streaming transcription, displayed in a floating NSPanel FluidAudio buffer — accumulates 16kHz Float32 mono samples for batch speaker diarization after recording stops When the user ends the meeting, FluidAudio's performCompleteDiarization() runs on the accumulated audio. We align transcription segments with speaker segments using audioTimeRange overlap matching — each transcription segment gets assigned the speaker ID with the most time overlap. Results export to Markdown. Pitfalls We Hit on macOS 26 1. AVAudioEngine installTap doesn't fire with Bluetooth devices We started with AVAudioEngine.inputNode.installTap() for audio capture. It worked fine with built-in mics but the tap callback never fired with Bluetooth devices (tested with vivo TWS 4 Hi-Fi). Fix: switched to AVCaptureSession. The delegate callback captureOutput(_:didOutput:from:) fires reliably regardless of audio device. The tradeoff is you get CMSampleBuffer instead of AVAudioPCMBuffer, so you need a conversion step. 2. NSEvent addGlobalMonitorForEvents crashes Our global hotkey listener used NSEvent.addGlobalMonitorForEvents. On macOS 26, this crashes with a Bus error inside GlobalObserverHandler — appears to be a Swift actor runtime issue. Fix: switched to CGEventTap. Works reliably, but the callback runs on a CFRunLoop context, which Swift doesn't recognize as MainActor. 3. CGEventTap callbacks aren't on MainActor If your CGEventTap callback touches any @MainActor state, you'll get concurrency violations. The callback runs on whatever thread owns the CFRunLoop. Fix: bridge with DispatchQueue.main.async {} inside the tap callback before touching any MainActor state. 4. CGPreflightScreenCaptureAccess doesn't request permission We used CGPreflightScreenCaptureAccess() as a guard before calling ScreenCaptureKit. If it returned false, we'd bail out. The problem: this function only checks — it never triggers macOS to add your app to the Screen Recording permission list. Chicken-and-egg: you can't get permission because you never ask for it. Fix: call CGRequestScreenCaptureAccess() at app startup. This adds your app to System Settings → Screen Recording. Then let ScreenCaptureKit calls proceed without the preflight guard — SCShareableContent will also trigger the permission prompt on first use. 5. Ad-hoc signing breaks TCC permissions on every rebuild During development, codesign --sign - (ad-hoc) generates a different code directory hash on every build. macOS TCC tracks permissions by this hash, so every rebuild = new app identity = all permissions reset. Fix: sign with a stable certificate. If you have an Apple Development certificate, use that. The TeamIdentifier stays constant across rebuilds, so TCC permissions persist. We also discovered that launching via open WE.app (LaunchServices) instead of directly executing the binary is required — otherwise macOS attributes TCC permissions to Terminal, not your app. Benchmarks We ran end-to-end benchmarks on public datasets (Mac Mini M4 16GB, macOS 26): Transcription (SpeechAnalyzer, AliMeeting Chinese): • Near-field CER 34% (excluding outliers ~25%) • Far-field CER 40% (single channel, no beamforming, >30% overlap) • Processing speed 74-89x real-time Speaker diarization (FluidAudio offline): • AMI English 16 meetings: avg DER 23.2% (collar=0.25s, ignoreOverlap=True) • AliMeeting Chinese 8 meetings: DER 48.5% (including overlap regions) • Memory: RSS ~500MB, peak 730-930MB Full evaluation methodology, scripts, and raw results are in the repo. Open Source The project is MIT licensed: github.com/Marvinngg/ambient-voice It includes the macOS client (Swift 6.2, SPM), server-side distillation/training scripts (Python), and a complete evaluation framework with reproducible benchmarks. Feedback and contributions welcome.

Machine Learning & AI General macOS Speech Core ML ScreenCaptureKit

522

Mar ’26

Powermetrics GPU power vs system DC power discrepancy on M4 Max

While analyzing system power on an M4 Max under GPU-heavy compute workloads, I noticed that the the GPU power reported by powermetrics does not come anywhere close to total system DC power reported by the SMC counter PDTR (as used by utilities like mactop). For example, in a heavy GPU workload, powermetrics would report a 65W idle-load delta on the GPU, but at the same time system DC power would rise by 179W, leaving 114W or nearly 2/3 of total system DC power on a Mac Studio M4 Max unexplained. From measurements, the difference appears to correlate with the amount of on-chip data movement (for example, varying bytes-per-FLOP in the workload changes the observed gap). Using SMC and IOReport, I was able to reverse engineer an energy model for the GPU that explains almost all of the energy flow with less than 2% error on the workload I studied. The result is a simple two-term energy roofline model: P_GPU (GPU_combined term in the plot) ≈ a * bytes + b * FLOPs with: ~5 pJ/byte for SRAM movement ~2.7 pJ/FLOP for compute. Has anyone observed similar behavior, or is there guidance on how GPU power reported by IOReport/powermetrics should be interpreted relative to total system power? In particular, I’m interested in whether certain classes of GPU activity may not be attributed to the GPU component in IOReport. Full details with the methodology and results are available here: https://youtu.be/HKxIGgyeISM

Machine Learning & AI General ML Compute Metal Performance Shaders Battery Life

176

Mar ’26

Request: Official One-Click Local LLM Deployment for 2019 Mac Pro (7,1) Dual W6900X

I am a professional user of the 2019 Mac Pro (7,1) with dual AMD Radeon Pro W6900X MPX modules (32GB VRAM each). This hardware is designed for high-performance compute, but it is currently crippled for modern local LLM/AI workloads under Linux due to Apple's EFI/PCIe routing restrictions. Core Issue: rocminfo reports "No HIP GPUs available" when attempting to use ROCm/amdgpu on Linux Apple's custom EFI firmware blocks full initialization of professional GPU compute assets The dual W6900X GPUs have 64GB combined VRAM and high-bandwidth Infinity Fabric Link, but cannot be fully utilized for local AI inference/training My Specific Request: Apple should provide an official, one-click deployable application that enables full utilization of dual W6900X GPUs for local large language model (LLM) inference and training under Linux. This application must: Fully initialize both W6900X GPUs via HIP/ROCm, establishing valid compute contexts Bypass artificial EFI/PCIe routing restrictions that block access to professional GPU resources Provide a stable, user-friendly one-click deployment experience (similar to NVIDIA's AI Enterprise or AMD's ROCm Hub) Why This Matters: The 2019 Mac Pro is Apple's flagship professional workstation, marketed for compute-intensive workloads. Its high-cost W6900X GPUs should not be locked down for modern AI/LLM use cases. An official one-click deployment solution would demonstrate Apple's commitment to professional AI and unlock significant value for professional users. I look forward to Apple's response and a clear roadmap for enabling this critical capability. #MacPro #Linux #ROCm #LocalLLM #W6900X #CoreML

Machine Learning & AI General

191

Mar ’26

Official One-Click Local LLM Deployment for 2019 Mac Pro (7,1) Dual W6900X

Machine Learning & AI General External Graphics Processors AppleScript Core Graphics Virtualization

1.1k

After loading my custom model - unsupportedTokenizer error

In Oct25, using mlx_lm.lora I created an adapter and a fused model uploaded to Huggingface. I was able to incorporate this model into my SwiftUI app using the mlx package. MLX-libraries 2.25.8. My base LLM was mlx-community/Mistral-7B-Instruct-v0.3-4bit. Looking at LLMModelFactory.swift the current version 2.29.1 the only changes are the addition of a few models. The earlier model was called: pharmpk/pk-mistral-7b-v0.3-4bit The new model is called: pharmpk/pk-mistral-2026-03-29 The base model (mlx-community/Mistral-7B-Instruct-v0.3-4bit.) must still be available. Could the error 'unsupportedTokenizer' be related to changes in the mlx package? I noticed mention of splitting the package into two parts but don't see anything at github. Feeling rather lost. Does anone have any thoguths and/or suggestions. Thanks, David

Machine Learning & AI General

438

How Is useful AI

I want to introduce how is usefully AI

Machine Learning & AI General

139

Will the upcomming Mac Book Pro M6 Max has at least 256GB RAM

Hi Guys, I want to use the newest Mac Book Pro M6 (Max or Ultra) with at least 256GB RAM for AI development. Will my wish may come true? What do you think? One of Apples most advantage here is unified memory and with the privacy first approach, i want to run local modells and show it to my customer just on the macbook. That has much more magic then first plug the power supply for a sparc, connect a network cable and fiddling around. The perfect match would be a Max Book Pro, M6 Ultra, 512GB. But I guess this is just a dream :-(. Please let me know what you think abou that. Thanks

Machine Learning & AI General

629

VNRecognizeTextRequest .accurate model failing to load

When I try to use VNRecognizeTextRequest in a simple program on apple silicon .accurate works, but when I add the same code to a helper process in a larger project, .accurate doesn’t return any results while only .fast works. This happens on apple silicon machines but not older intel ones. When I call VNRecognizeTextRequest I see the error [Espresso::handle_ex_plan] exception= in the logs along with (TextRecognition) Error loading network 0, -1. And when I catch the exception in lldb and print it I see Null bundleID. In the code, [[NSBundle mainBundle] returns null even though plutil -p on the helper process binary shows an embedded plist, as well as on the process that spawns the helper.

Machine Learning & AI General

379

MPS backend reports ~40 GiB 'other allocations' on 48 GB M5 Pro under macOS 26.4.1, blocking large tensor operations (PyTorch)

Product macOS Version macOS 26.4.1 (public release) Hardware Apple M5 Pro, 48 GB unified memory Summary On macOS 26.4.1, the MPS backend consistently reports approximately 40 GiB of “other allocations” on a 48 GB M5 Pro machine, even on a freshly rebooted system with minimal user applications running. This leaves insufficient memory for large GPU tensor operations that previously succeeded on earlier macOS versions. The failure manifests as: RuntimeError: MPS backend out of memory (MPS allocated: 17.60 GiB, other allocations: 40.17 GiB, max allowed: 63.65 GiB). Tried to allocate 7.63 GiB on private pool. The “other allocations: 40.17 GiB” value is consistent across reboots and does not change materially when user applications are quit. This suggests macOS 26.4.1 has increased its baseline GPU/unified memory consumption compared to prior releases in a way that is visible to the MPS allocator. Steps to Reproduce Fresh reboot of M5 Pro, 48 GB, macOS 26.4.1 Launch a PyTorch 2.11.0 application using MPS as the compute device Load a large model into MPS memory (~17 GiB, e.g. a VAE encoder in bfloat16) Attempt to allocate an additional ~7.6 GiB workspace tensor for a matrix multiplication operation (torch.bmm) Result: RuntimeError: MPS backend out of memory, with “other allocations” reported at ~40 GiB despite no large user processes holding GPU memory. Expected: The operation should succeed. 17.60 + 7.63 = 25.23 GiB, which is well within the 48 GiB physical memory of the machine. Additional Observations • vm_stat on a clean boot shows ~24 GB of free system RAM before the PyTorch application launches, consistent with normal OS usage. The 40 GiB figure reported by the MPS allocator as “other allocations” does not correspond to identifiable user processes. • The max allowed: 63.65 GiB ceiling reported by MPS exceeds the physical 48 GiB of the machine, suggesting MPS is using a memory limit calculation that does not account for actual physical constraints on unified memory architectures. • macOS 26.4 introduced a related regression (deterministic RuntimeError: MPSGraph does not support tensor dims larger than INT_MAX) in the same MPS buffer stride arithmetic path. That specific error was resolved in 26.4.1, but the OOM regression described here persists. • This operation succeeded on the same hardware under earlier macOS releases. The increased “other allocations” baseline appears to be specific to macOS 26.x. Impact Machine learning workloads that previously ran successfully on 48 GB Apple Silicon machines are failing on macOS 26.4.1 due to this increased baseline GPU memory consumption. Applications using PyTorch MPS, Core ML, and potentially Metal Performance Shaders directly may be affected. Workaround None identified. Reducing application model size or splitting operations into smaller chunks does not resolve the issue because the constraint is in the “other allocations” baseline, not in the application’s own allocations.

Machine Learning & AI General

1.2k

Does the new API: BNNSGraph support quantization

Hello, I spent some time going through the documentation and videos. I did not see how to implement quantized arithmetic for my neural network using BNNSGraph. Could someone please help me.

Machine Learning & AI General

620

RecognizeDocumentsRequest not detecting paragraphs

Machine Learning & AI General Vision

Replies: 0
Boosts: 0
Views: 436
Activity: Jan ’26

Translation Framework: Code 16 "Offline models not available" despite status showing .installed

Machine Learning & AI General

Replies: 1
Boosts: 0
Views: 470
Activity: Jan ’26

Core Image for depth maps & segmentation masks: numeric fidelity issues when rendering CIImage to CVPixelBuffer (looking for Architecture suggestions)

Machine Learning & AI General Metal Vision Core Video Core Image

Replies: 2
Boosts: 0
Views: 506
Activity: Feb ’26

Tensorflow metal: Issue using assign operation on MacBook M4

Machine Learning & AI General tensorflow-metal

Replies: 0
Boosts: 0
Views: 205
Activity: Feb ’26

Apple OCR framework seems to be holding on to allocations every time it is called.

Machine Learning & AI General Vision

Replies: 0
Boosts: 0
Views: 195
Activity: Feb ’26

Subject: Technical Report: Float32 Precision Ceiling & Memory Fragmentation in JAX/Metal Workloads on M3

Machine Learning & AI General ML Compute Machine Learning tensorflow-metal

Replies: 0
Boosts: 0
Views: 373
Activity: Mar ’26

reinforcement learning from Apple?

Machine Learning & AI General Core ML

Replies: 0
Boosts: 0
Views: 487
Activity: Mar ’26

Provide spoken voice search string

Machine Learning & AI General Siri and Voice Shortcuts App Intents

Replies: 1
Boosts: 0
Views: 798
Activity: Mar ’26

Building Real-Time Voice Input on macOS 26 with SpeechAnalyzer + ScreenCaptureKit

Machine Learning & AI General macOS Speech Core ML ScreenCaptureKit

Replies: 0
Boosts: 0
Views: 522
Activity: Mar ’26

Powermetrics GPU power vs system DC power discrepancy on M4 Max

Machine Learning & AI General ML Compute Metal Performance Shaders Battery Life

Replies: 0
Boosts: 0
Views: 176
Activity: Mar ’26

Request: Official One-Click Local LLM Deployment for 2019 Mac Pro (7,1) Dual W6900X

Machine Learning & AI General

Replies: 0
Boosts: 0
Views: 191
Activity: Mar ’26

Official One-Click Local LLM Deployment for 2019 Mac Pro (7,1) Dual W6900X

Machine Learning & AI General External Graphics Processors AppleScript Core Graphics Virtualization

Replies: 3
Boosts: 0
Views: 1.1k
Activity: 2w

After loading my custom model - unsupportedTokenizer error

Machine Learning & AI General

Replies: 3
Boosts: 0
Views: 438
Activity: 4w

How Is useful AI

I want to introduce how is usefully AI

Machine Learning & AI General

Replies: 0
Boosts: 0
Views: 139
Activity: 4w

Will the upcomming Mac Book Pro M6 Max has at least 256GB RAM

Machine Learning & AI General

Replies: 1
Boosts: 0
Views: 629
Activity: 2w

VNRecognizeTextRequest .accurate model failing to load

Machine Learning & AI General

Replies: 0
Boosts: 0
Views: 379
Activity: 1w

MPS backend reports ~40 GiB 'other allocations' on 48 GB M5 Pro under macOS 26.4.1, blocking large tensor operations (PyTorch)

Machine Learning & AI General

Replies: 1
Boosts: 0
Views: 1.2k
Activity: 1w

Does the new API: BNNSGraph support quantization

Hello, I spent some time going through the documentation and videos. I did not see how to implement quantized arithmetic for my neural network using BNNSGraph. Could someone please help me.

Machine Learning & AI General

Replies: 0
Boosts: 0
Views: 620
Activity: 1w