Explore the power of machine learning and Apple Intelligence within apps. Discuss integrating features, share best practices, and explore the possibilities for your app here.

All subtopics
Posts under Machine Learning & AI topic

Post

Replies

Boosts

Views

Activity

MPS SDPA Attention Kernel Regression on A14-class (M1) in macOS 26.3.1 — Works on A15+ (M2+)
Summary Since macOS 26, our Core ML / MPS inference pipeline produces incorrect results on Mac mini M1 (Macmini9,1, A14-class SoC). The same model and code runs correctly on M2 and newer (A15-class and up). The regression appears to be in the Scaled Dot-Product Attention (SDPA) kernel path in the MPS backend. Environment Affected Mac mini M1 — Macmini9,1 (A14-class) Not affected M2 and newer (A15-class and up) Last known good macOS Sequoia First broken macOS 26 (Tahoe) ? Confirmed broken on macOS 26.3.1 Framework Core ML + MPS backend Language C++ (via CoreML C++ API) Description We ship an audio processing application (VoiceAssist by NoiseWorks) that runs a deep learning model (based on Demucs architecture) via Core ML with the MPS compute unit. On macOS Sequoia this works correctly on all Apple Silicon Macs including M1. After updating to macOS 26 (Tahoe), inference on M1 Macs fails — either producing garbage output or crashing. The same binary, same .mlpackage, same inputs work correctly on M2+. Our Apple contact has suggested the root cause is a regression in the A14-specific MPS SDPA attention kernel, which may have broken when the Metal/MPS stack was updated in macOS 26. The model makes heavy use of attention layers, and the failure correlates precisely with the SDPA path being exercised on A14 hardware. Steps to Reproduce Load a Core ML model that uses Scaled Dot-Product Attention (e.g. a transformer or attention-based audio model) Run inference with MLComputeUnits::cpuAndGPU (MPS active) Run on Mac mini M1 (Macmini9,1) with macOS 26.3.1 Compare output to the same model running on M2 / macOS Sequoia Expected: Correct inference output, consistent with M2+ and macOS Sequoia behavior Actual: Incorrect / corrupted output (or crash), only on A14-class hardware running macOS 26+ Workaround Forcing MLComputeUnits::cpuOnly bypasses MPS entirely and produces correct output on M1, confirming the issue is in the MPS compute path. This is not acceptable as a shipping workaround due to performance impact. Additional Notes The failure is hardware-specific (A14 only) and OS-specific (macOS 26+), pointing to a kernel-level regression rather than a model or app bug We first became aware of this through a customer report Happy to provide a symbolicated crash log if helpful this text was summarized by AI and human verified
1
0
188
1w
The answer of "apple" goes to guardrailViolation?
I have been using "apple" to test foundation models. I thought this is local, but today the answer changed - half way through explanation, suddenly guardrailViolation error was activated! And yesterday, all reference to "Apple II", "Apple III" now refers me to consult apple.com! Does foundation models connect to Internet for answer? Using beta 3.
3
0
180
Jul ’25
AppShortcuts.xcstrings does not translate each invocation phrase option separately, just the first
Due to our min iOS version, this is my first time using .xcstrings instead of .strings for AppShortcuts. When using the migrate .strings to .xcstrings Xcode context menu option, an .xcstrings catalog is produced that, as expected, has each invocation phrase as a separate string key. However, after compilation, the catalog changes to group all invocation phrases under the first phrase listed for each intent (see attached screenshot). It is possible to hover in blank space on the right and add more translations, but there is no 1:1 key matching requirement to the phrases on the left nor a requirement that there are the same number of keys in one language vs. another. (The lines just happen to align due to my window size.) What does that mean, practically? Do all sub-phrases in each language in AppShortcuts.xcstrings get processed during compilation, even if there isn't an equivalent phrase key declared in the AppShortcut (e.g., the ja translation has more phrases than the English)? (That makes some logical sense, as these phrases need not be 1:1 across languages.) In the AppShortcut declaration, if I delete all but the top invocation phrase, does nothing change with Siri? Is there something I'm doing incorrectly? struct WatchShortcuts: AppShortcutsProvider { static var appShortcuts: [AppShortcut] { AppShortcut( intent: QuickAddWaterIntent(), phrases: [ "\(.applicationName) log water", "\(.applicationName) log my water", "Log water in \(.applicationName)", "Log my water in \(.applicationName)", "Log a bottle of water in \(.applicationName)", ], shortTitle: "Log Water", systemImageName: "drop.fill" ) } }
0
0
325
Aug ’25
is it possible to let siri monitor phone calls, and notify me when a certain trigger happens?
the specific context is that i would like to build an agent that monitors my phone call (with a customer support for example), and simiply identify whether or not im still put on hold, and notify me when im not. currently after reading the doc, i dont think its possible yet, but im so annoyed by the customer support calls that im willing to go the distance and see if theres any way.
0
0
167
Jun ’25
Provide actionable feedback for the Foundation Models framework and the on-device LLM
We are really excited to have introduced the Foundation Models framework in WWDC25. When using the framework, you might have feedback about how it can better fit your use cases. Starting in macOS/iOS 26 Beta 4, the best way to provide feedback is to use #Playground in Xcode. To do so: In Xcode, create a playground using #Playground. Fore more information, see Running code snippets using the playground macro. Reproduce the issue by setting up a session and generating a response with your prompt. In the canvas on the right, click the thumbs-up icon to the right of the response. Follow the instructions on the pop-up window and submit your feedback by clicking Share with Apple. Another way to provide your feedback is to file a feedback report with relevant details. Specific to the Foundation Models framework, it’s super important to add the following information in your report: Language model feedback This feedback contains the session transcript, including the instructions, the prompts, the responses, etc. Without that, we can’t reason the model’s behavior, and hence can hardly take any action. Use logFeedbackAttachment(sentiment:issues:desiredOutput: ) to retrieve the feedback data of your current model session, as shown in the usage example, write the data into a file, and then attach the file to your feedback report. If you believe what you’d report is related to the system configuration, please capture a sysdiagnose and attach it to your feedback report as well. The framework is still new. Your actionable feedback helps us evolve the framework quickly, and we appreciate that. Thanks, The Foundation Models framework team
0
0
857
Aug ’25
Apple Intelligence Naughty Naughty
When doing some exploratory research into using Apple Intelligence in our aviation-focused application, I noticed that there were several times that key phases would be marked as inappropriate. I tried to stifle these using prompts and rules but couldn't get it to take hold. I was encouraged by an Apple employee to go ahead and post this so that the AI team can use the feedback. There were several terms that triggered this warning, but the two that were most prominent were: 'Tailwind' 'JFK' or 'KJFK' (NY airport ICAO/IATA codes)
2
0
575
3w
Converting TF2 object detection to CoreML
I've spent way too long today trying to convert an Object Detection TensorFlow2 model to a CoreML object classifier (with bounding boxes, labels and probability score) The 'SSD MobileNet v2 320x320' is here: https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md And I've been following all sorts of posts and ChatGPT https://apple.github.io/coremltools/docs-guides/source/tensorflow-2.html#convert-a-tensorflow-concrete-function https://developer.apple.com/videos/play/wwdc2020/10153/?time=402 To convert it. I keep hitting the same errors though, mostly around: NotImplementedError: Expected model format: [SavedModel | concrete_function | tf.keras.Model | .h5 | GraphDef], got <ConcreteFunction signature_wrapper(input_tensor) at 0x366B87790> I've had varying success including missing output labels/predictions. But I simply want to create the CoreML model with all the right inputs and outputs (including correct names) as detailed in the docs here: https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tf2.md It goes without saying I don't have much (any) experience with this stuff including Python so the whole thing's been a bit of a headache. If anyone is able to help that would be great. FWIW I'm not attached to any one specific model, but what I do need at minimum is a CoreML model that can detect objects (has to at least include lights and lamps) within a live video image, detecting where in the image the object is. The simplest script I have looks like this: import coremltools as ct import tensorflow as tf model = tf.saved_model.load("~/tf_models/ssd_mobilenet_v2_320x320_coco17_tpu-8/saved_model") concrete_func = model.signatures[tf.saved_model.DEFAULT_SERVING_SIGNATURE_DEF_KEY] mlmodel = ct.convert( concrete_func, source="tensorflow", inputs=[ct.TensorType(shape=(1, 320, 320, 3))] ) mlmodel.save("YourModel.mlpackage", save_format="mlpackage")
1
0
496
Jul ’25
Is it possible to pass the streaming output of Foundation Models down a function chain
I am writing a custom package wrapping Foundation Models which provides a chain-of-thought with intermittent self-evaluation among other things. At first I was designing this package with the command line in mind, but after seeing how well it augments the models and makes them more intelligent I wanted to try and build a SwiftUI wrapper around the package. When I started I was using synchronous generation rather than streaming, but to give the best user experience (as I've seen in the WWDC sessions) it is necessary to provide constant feedback to the user that something is happening. I have created a super simplified example of my setup so it's easier to understand. First, there is the Reasoning conversation item, which can be converted to an XML representation which is then fed back into the model (I've found XML works best for structured input) public typealias ConversationContext = XMLDocument extension ConversationContext { public func toPlainText() -> String { return xmlString(options: [.nodePrettyPrint]) } } /// Represents a reasoning item in a conversation, which includes a title and reasoning content. /// Reasoning items are used to provide detailed explanations or justifications for certain decisions or responses within a conversation. @Generable(description: "A reasoning item in a conversation, containing content and a title.") struct ConversationReasoningItem: ConversationItem { @Guide(description: "The content of the reasoning item, which is your thinking process or explanation") public var reasoningContent: String @Guide(description: "A short summary of the reasoning content, digestible in an interface.") public var title: String @Guide(description: "Indicates whether reasoning is complete") public var done: Bool } extension ConversationReasoningItem: ConversationContextProvider { public func toContext() -> ConversationContext { // <ReasoningItem title="${title}"> // ${reasoningContent} // </ReasoningItem> let root = XMLElement(name: "ReasoningItem") root.addAttribute(XMLNode.attribute(withName: "title", stringValue: title) as! XMLNode) root.stringValue = reasoningContent return ConversationContext(rootElement: root) } } Then there is the generator, which creates a reasoning item from a user query and previously generated items: struct ReasoningItemGenerator { var instructions: String { """ <omitted for brevity> """ } func generate(from input: (String, [ConversationReasoningItem])) async throws -> sending LanguageModelSession.ResponseStream<ConversationReasoningItem> { let session = LanguageModelSession(instructions: instructions) // build the context for the reasoning item out of the user's query and the previous reasoning items let userQuery = "User's query: \(input.0)" let reasoningItemsText = input.1.map { $0.toContext().toPlainText() }.joined(separator: "\n") let context = userQuery + "\n" + reasoningItemsText let reasoningItemResponse = try await session.streamResponse( to: context, generating: ConversationReasoningItem.self) return reasoningItemResponse } } I'm not sure if returning LanguageModelSession.ResponseStream<ConversationReasoningItem> is the right move, I am just trying to imitate what session.streamResponse returns. Then there is the orchestrator, which I can't figure out. It receives the streamed ConversationReasoningItems from the Generator and is responsible for streaming those to SwiftUI later and also for evaluating each reasoning item after it is complete to see if it needs to be regenerated (to keep the model on-track). I want the users of the orchestrator to receive partially generated reasoning items as they are being generated by the generator. Later, when they finish, if the evaluation passes, the item is kept, but if it fails, the reasoning item should be removed from the stream before a new one is generated. So in-flight reasoning items should be outputted aggresively. I really am having trouble figuring this out so if someone with more knowledge about asynchronous stuff in Swift, or- even better- someone who has worked on the Foundation Models framework could point me in the right direction, that would be awesome!
0
0
287
Jul ’25
Core Model Editor and Params
Optimal Precision • Current Precision: Mixed (Float32, int32) • Optimal Precision: Not specified in the image, but typically involves using the most efficient data type for the model's operations to balance speed and memory usage without significant loss of accuracy. Comparison: • Mixed Precision: Utilizes both Float32 and int32 to optimize performance. Float32 provides high precision, while int32 reduces memory usage and increases computational speed. • Optimal Precision: Aimed at achieving the best trade-off between performance and accuracy, potentially using other data types like Float16 (bfloat16) for even greater efficiency in certain hardware environments. Operation Distribution • Current Distribution: • iOS18.mul: 168 • iOS18.transpose: 126 • iOS18.linear: 98 • iOS18.add: 97 • iOS18.sliceByIndex: 96 • iOS18.expandDims: 74 • iOS18.concat: 72 • iOS18.squeeze: 72 • iOS18.reshape: 67 • iOS18.layerNorm: 49 • iOS18.matmul: 48 • iOS18.gelu: 26 • iOS18.softmax: 24 • Split: 24 • conv: 1 • iOS18.conv: 1 Comparison: • Operation Count: Indicates how frequently each operation is executed. High counts for operations like mul, transpose, and linear suggest these are computationally intensive parts of the model. • Optimization Opportunities: Reducing the count of high-frequency operations or optimizing their execution can improve performance. This might involve pruning unnecessary operations, optimizing algorithms, or leveraging hardware acceleration. General Recommendations • Precision Tuning: Experiment with different precision levels to find the best balance for your specific hardware and accuracy requirements. • Operation Optimization: Focus on optimizing the most frequent operations. Techniques include using more efficient algorithms, parallelizing computations, or utilizing specialized hardware like GPUs or TPUs. • Benchmarking: Regularly benchmark the model to assess the impact of changes and ensure that optimizations lead to meaningful performance improvements. By focusing on these areas, you can potentially enhance the efficiency and performance of your ML model.
0
0
87
Feb ’26
jax-metal failing due to incompatibility with jax 0.5.1 or later.
Hello, I am interested in using jax-metal to train ML models using Apple Silicon. I understand this is experimental. After installing jax-metal according to https://developer.apple.com/metal/jax/, my python code fails with the following error JaxRuntimeError: UNKNOWN: -:0:0: error: unknown attribute code: 22 -:0:0: note: in bytecode version 6 produced by: StableHLO_v1.12.1 My issue is identical to the one reported here https://github.com/jax-ml/jax/issues/26968#issuecomment-2733120325, and is fixed by pinning to jax-metal 0.1.1., jax 0.5.0 and jaxlib 0.5.0. Thank you!
1
0
879
Feb ’26
Tensorflow metal: Issue using assign operation on MacBook M4
I get the following error when running this command in a Jupyter notebook: v = tf.Variable(initial_value=tf.random.normal(shape=(3, 1))) v[0, 0].assign(3.) Environment: python == 3.11.14 tensorflow==2.19.1 tensorflow-metal==1.2.0 { "name": "InvalidArgumentError", "message": "Cannot assign a device for operation ResourceStridedSliceAssign: Could not satisfy explicit device specification '/job:localhost/replica:0/task:0/device:GPU:0' because no supported kernel for GPU devices is available.\nColocation Debug Info:\nColocation group had the following types and supported devices: \nRoot Member(assigned_device_name_index_=1 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]\nResourceStridedSliceAssign: CPU \n_Arg: GPU CPU \n\nColocation members, user-requested devices, and framework assigned devices, if any:\n ref (_Arg) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0\n ResourceStridedSliceAssign (ResourceStridedSliceAssign) /job:localhost/replica:0/task:0/device:GPU:0\n\nOp: ResourceStridedSliceAssign\n [...] [[{{node ResourceStridedSliceAssign}}]] [Op:ResourceStridedSliceAssign] name: strided_slice/_assign" } It seems like the ResourceStridedSliceAssign operation is not implemented for the GPU
0
0
176
Feb ’26
SpeechAnalyzer / AssetInventory and preinstalled assets
During testing the “Bringing advanced speech-to-text capabilities to your app” sample app demonstrating the use of iOS 26 SpeechAnalyzer, I noticed that the language model for the English locale was presumably already downloaded. Upon checking the documentation of AssetInventory, I found out that indeed, the language model can be preinstalled on the system. Can someone from the dev team share more info about what assets are preinstalled by the system? For example, can we safely assume that the English language model will almost certainly be already preinstalled by the OS if the phone has the English locale?
1
0
271
Jul ’25
Best approach for animating a speaking avatar in a macOS/iOS SwiftUI application
I am developing a macOS application using SwiftUI (with an iOS version as well). One feature we are exploring is displaying an avatar that reads or speaks dynamically generated text produced by an AI service. The basic flow would be: Text generated by an AI service Text converted to speech using a TTS engine An avatar (2D or 3D) rendered in the app that animates lip movement synchronized with the speech Ideally the avatar would render locally on the device. Questions: What Apple frameworks would be most appropriate for implementing a speaking avatar? SceneKit RealityKit SpriteKit (for 2D avatars) Is there any recommended way to drive lip-sync animation from speech audio using Apple frameworks? Does AVSpeechSynthesizer expose phoneme or viseme timing information that could be used for avatar animation? If such timing information is not available, what is the recommended approach for synchronizing character mouth animation with speech audio on macOS/iOS? Are there examples of real-time character animation synchronized with speech on macOS/iOS? Any architectural guidance or references would be greatly appreciated.
0
0
527
2w
Gazetteer encryption?
I have an app that uses a couple of mlmodels (word tagger and gazetteer) and I’m trying to encrypt them before publishing. The models are part of a package. I understand that Xcode can’t automatically handle the encryption for a model in a package the way it can within a traditional app structure. Given that, I’ve generated the Apple MLModel encryption key from Xcode and am encrypting via the command line with: xcrun coremlcompiler compile Gazetteer.mlmodel GazetteerENC.mlmodelc --encrypt Gazetteerkey.mlmodelkey In the package manifest, I’ve listed the encrypted models as .copy resources for my target and have verified the URL to that file is good. When I try to load the encrypted .mlmodelc file (on a physical device) with the line:
 gazetteer = try NLGazetteer(contentsOf: gazetteerURL!) I get the error: Failed to open file: /…/Scanner.bundle/GazetteerENC.mlmodelc/coremldata.bin. It is not a valid .mlmodelc file. So my questions are: Does the NLGazetteer class support encrypted MLModel files? Given that my models are in a package, do I have the right general approach? Thanks for any help or thoughts.
0
0
161
May ’25
Plenty of LanguageModelSession.GenerationError.refusal errors after 26.4 update
Hello! After the 26.4 update I get a huge number of LanguageModelSession.GenerationError.refusal errors when using guided generation Generables for inexplicable reasons. Such errors also occur, if I want to cast a response to boolean by using 'generating: Bool.self'. The explanation generated on the grounds of the error always looks like this: Response(userPrompt: "", duration: 0.230917542, promptTokenCount: Optional(66), responseTokenCount: Optional(11), feedbackAttachment: nil, content: "I apologize, but I cannot fulfill this request.", rawContent: "I apologize, but I cannot fulfill this request.", transcriptEntries: ArraySlice([])) All the prompts and Generables I use are definitely not profane. Before 26.4 such errors on the same prompts and Generables never occurred. The 26.4 update rendered those features unusable to me. Is this a known bug or what am I doing wrong?
3
0
432
2d
Apple ANE Peformance - throttling?
I can no longer achieve 100% ANE usage since upgrading to MacOS26 Beta 5. I used to be able to get 100%. Has Apple activated throttling or power saving features in the new Betas? Is there any new rate limiting on the API? I can hardly get above 3w or 40%. I have a M4 Pro mini (64GB) with High Power energy setting. MacOS 26 Beta 5.
2
0
341
Aug ’25
Official One-Click Local LLM Deployment for 2019 Mac Pro (7,1) Dual W6900X
I am a professional user of the 2019 Mac Pro (7,1) with dual AMD Radeon Pro W6900X MPX modules (32GB VRAM each). This hardware is designed for high-performance compute, but it is currently crippled for modern local LLM/AI workloads under Linux due to Apple's EFI/PCIe routing restrictions. Core Issue: rocminfo reports "No HIP GPUs available" when attempting to use ROCm/amdgpu on Linux Apple's custom EFI firmware blocks full initialization of professional GPU compute assets The dual W6900X GPUs have 64GB combined VRAM and high-bandwidth Infinity Fabric Link, but cannot be fully utilized for local AI inference/training My Specific Request: Apple should provide an official, one-click deployable application that enables full utilization of dual W6900X GPUs for local large language model (LLM) inference and training under Linux. This application must: Fully initialize both W6900X GPUs via HIP/ROCm, establishing valid compute contexts Bypass artificial EFI/PCIe routing restrictions that block access to professional GPU resources Provide a stable, user-friendly one-click deployment experience (similar to NVIDIA's AI Enterprise or AMD's ROCm Hub) Why This Matters: The 2019 Mac Pro is Apple's flagship professional workstation, marketed for compute-intensive workloads. Its high-cost W6900X GPUs should not be locked down for modern AI/LLM use cases. An official one-click deployment solution would demonstrate Apple's commitment to professional AI and unlock significant value for professional users. I look forward to Apple's response and a clear roadmap for enabling this critical capability. #MacPro #Linux #ROCm #LocalLLM #W6900X #CoreML
0
0
114
6d
Massive CoreML latency spike on live AVFoundation camera feed vs. offline inference (CPU+ANE)
Hello, I’m experiencing a severe performance degradation when running CoreML models on a live AVFoundation video feed compared to offline or synthetic inference. This happens across multiple models I've converted (including SCI, RTMPose, and RTMW) and affects multiple devices. The Environment OS: macOS 26.3, iOS 26.3, iPadOS 26.3 Hardware: Mac14,6 (M2 Max), iPad Pro 11 M1, iPhone 13 mini Compute Units: cpuAndNeuralEngine The Numbers When testing my SCI_output_image_int8.mlpackage model, the inference timings are drastically different: Synthetic/Offline Inference: ~1.34 ms Live Camera Inference: ~15.96 ms Preprocessing is completely ruled out as the bottleneck. My profiling shows total preprocessing (nearest-neighbor resize + feature provider creation) takes only ~0.4 ms in camera mode. Furthermore, no frames are being dropped. What I've Tried I am building a latency-critical app and have implemented almost every recommended optimization to try and fix this, but the camera-feed penalty remains: Matched the AVFoundation camera output format exactly to the model input (640x480 at 30/60fps). Used IOSurface-backed pixel buffers for everything (camera output, synthetic buffer, and resize buffer). Enabled outputBackings. Loaded the model once and reused it for all predictions. Configured MLModelConfiguration with reshapeFrequency = .frequent and specializationStrategy = .fastPrediction. Wrapped inference in ProcessInfo.processInfo.beginActivity(options: .latencyCritical, reason: "CoreML_Inference"). Set DispatchQueue to qos: .userInteractive. Disabled the idle timer and enabled iOS Game Mode. Exported models using coremltools 9.0 (deployment target iOS 26) with ImageType inputs/outputs and INT8 quantization. Reproduction To completely rule out UI or rendering overhead, I wrote a standalone Swift CLI script that isolates the AVFoundation and CoreML pipeline. The script clearly demonstrates the ~15ms latency on live camera frames versus the ~1ms latency on synthetic buffers. (I have attached camera_coreml_benchmark.swift and coreml model (very light low light enghancement model) to this repo on github https://github.com/pzoltowski/apple-coreml-camera-latency-repro). My Question: Is this massive overhead expected behavior for AVFoundation + Core ML on live feeds, or is this a framework/runtime bug? If expected, what is the Apple-recommended pattern to bypass this camera-only inference slowdown? One think found interesting when running in debug model was faster (not as fast as in performance benchmark but faster than 16ms. Also somehow if I did some dummy calculation on on different DispatchQueue also seems like model got slightly faster. So maybe its related to ANE Power State issues (Jitter/SoC Wake) and going to fast to sleep and taking a long time to wakeup? Doing dummy calculation in background thought is probably not a solution. Thanks in advance for any insights!
5
0
712
1w
Powermetrics GPU power vs system DC power discrepancy on M4 Max
While analyzing system power on an M4 Max under GPU-heavy compute workloads, I noticed that the the GPU power reported by powermetrics does not come anywhere close to total system DC power reported by the SMC counter PDTR (as used by utilities like mactop). For example, in a heavy GPU workload, powermetrics would report a 65W idle-load delta on the GPU, but at the same time system DC power would rise by 179W, leaving 114W or nearly 2/3 of total system DC power on a Mac Studio M4 Max unexplained. From measurements, the difference appears to correlate with the amount of on-chip data movement (for example, varying bytes-per-FLOP in the workload changes the observed gap). Using SMC and IOReport, I was able to reverse engineer an energy model for the GPU that explains almost all of the energy flow with less than 2% error on the workload I studied. The result is a simple two-term energy roofline model: P_GPU (GPU_combined term in the plot) ≈ a * bytes + b * FLOPs with: ~5 pJ/byte for SRAM movement ~2.7 pJ/FLOP for compute. Has anyone observed similar behavior, or is there guidance on how GPU power reported by IOReport/powermetrics should be interpreted relative to total system power? In particular, I’m interested in whether certain classes of GPU activity may not be attributed to the GPU component in IOReport. Full details with the methodology and results are available here: https://youtu.be/HKxIGgyeISM
0
0
60
1w
AI and ML
Hello. I am willing to hire game developer for cards game called baloot. My question is Can the developer implement an AI when the computer is playing and the computer on the same time the conputer improves his rises level without any interaction? 🌹
0
0
109
Jun ’25
MPS SDPA Attention Kernel Regression on A14-class (M1) in macOS 26.3.1 — Works on A15+ (M2+)
Summary Since macOS 26, our Core ML / MPS inference pipeline produces incorrect results on Mac mini M1 (Macmini9,1, A14-class SoC). The same model and code runs correctly on M2 and newer (A15-class and up). The regression appears to be in the Scaled Dot-Product Attention (SDPA) kernel path in the MPS backend. Environment Affected Mac mini M1 — Macmini9,1 (A14-class) Not affected M2 and newer (A15-class and up) Last known good macOS Sequoia First broken macOS 26 (Tahoe) ? Confirmed broken on macOS 26.3.1 Framework Core ML + MPS backend Language C++ (via CoreML C++ API) Description We ship an audio processing application (VoiceAssist by NoiseWorks) that runs a deep learning model (based on Demucs architecture) via Core ML with the MPS compute unit. On macOS Sequoia this works correctly on all Apple Silicon Macs including M1. After updating to macOS 26 (Tahoe), inference on M1 Macs fails — either producing garbage output or crashing. The same binary, same .mlpackage, same inputs work correctly on M2+. Our Apple contact has suggested the root cause is a regression in the A14-specific MPS SDPA attention kernel, which may have broken when the Metal/MPS stack was updated in macOS 26. The model makes heavy use of attention layers, and the failure correlates precisely with the SDPA path being exercised on A14 hardware. Steps to Reproduce Load a Core ML model that uses Scaled Dot-Product Attention (e.g. a transformer or attention-based audio model) Run inference with MLComputeUnits::cpuAndGPU (MPS active) Run on Mac mini M1 (Macmini9,1) with macOS 26.3.1 Compare output to the same model running on M2 / macOS Sequoia Expected: Correct inference output, consistent with M2+ and macOS Sequoia behavior Actual: Incorrect / corrupted output (or crash), only on A14-class hardware running macOS 26+ Workaround Forcing MLComputeUnits::cpuOnly bypasses MPS entirely and produces correct output on M1, confirming the issue is in the MPS compute path. This is not acceptable as a shipping workaround due to performance impact. Additional Notes The failure is hardware-specific (A14 only) and OS-specific (macOS 26+), pointing to a kernel-level regression rather than a model or app bug We first became aware of this through a customer report Happy to provide a symbolicated crash log if helpful this text was summarized by AI and human verified
Replies
1
Boosts
0
Views
188
Activity
1w
The answer of "apple" goes to guardrailViolation?
I have been using "apple" to test foundation models. I thought this is local, but today the answer changed - half way through explanation, suddenly guardrailViolation error was activated! And yesterday, all reference to "Apple II", "Apple III" now refers me to consult apple.com! Does foundation models connect to Internet for answer? Using beta 3.
Replies
3
Boosts
0
Views
180
Activity
Jul ’25
AppShortcuts.xcstrings does not translate each invocation phrase option separately, just the first
Due to our min iOS version, this is my first time using .xcstrings instead of .strings for AppShortcuts. When using the migrate .strings to .xcstrings Xcode context menu option, an .xcstrings catalog is produced that, as expected, has each invocation phrase as a separate string key. However, after compilation, the catalog changes to group all invocation phrases under the first phrase listed for each intent (see attached screenshot). It is possible to hover in blank space on the right and add more translations, but there is no 1:1 key matching requirement to the phrases on the left nor a requirement that there are the same number of keys in one language vs. another. (The lines just happen to align due to my window size.) What does that mean, practically? Do all sub-phrases in each language in AppShortcuts.xcstrings get processed during compilation, even if there isn't an equivalent phrase key declared in the AppShortcut (e.g., the ja translation has more phrases than the English)? (That makes some logical sense, as these phrases need not be 1:1 across languages.) In the AppShortcut declaration, if I delete all but the top invocation phrase, does nothing change with Siri? Is there something I'm doing incorrectly? struct WatchShortcuts: AppShortcutsProvider { static var appShortcuts: [AppShortcut] { AppShortcut( intent: QuickAddWaterIntent(), phrases: [ "\(.applicationName) log water", "\(.applicationName) log my water", "Log water in \(.applicationName)", "Log my water in \(.applicationName)", "Log a bottle of water in \(.applicationName)", ], shortTitle: "Log Water", systemImageName: "drop.fill" ) } }
Replies
0
Boosts
0
Views
325
Activity
Aug ’25
is it possible to let siri monitor phone calls, and notify me when a certain trigger happens?
the specific context is that i would like to build an agent that monitors my phone call (with a customer support for example), and simiply identify whether or not im still put on hold, and notify me when im not. currently after reading the doc, i dont think its possible yet, but im so annoyed by the customer support calls that im willing to go the distance and see if theres any way.
Replies
0
Boosts
0
Views
167
Activity
Jun ’25
Provide actionable feedback for the Foundation Models framework and the on-device LLM
We are really excited to have introduced the Foundation Models framework in WWDC25. When using the framework, you might have feedback about how it can better fit your use cases. Starting in macOS/iOS 26 Beta 4, the best way to provide feedback is to use #Playground in Xcode. To do so: In Xcode, create a playground using #Playground. Fore more information, see Running code snippets using the playground macro. Reproduce the issue by setting up a session and generating a response with your prompt. In the canvas on the right, click the thumbs-up icon to the right of the response. Follow the instructions on the pop-up window and submit your feedback by clicking Share with Apple. Another way to provide your feedback is to file a feedback report with relevant details. Specific to the Foundation Models framework, it’s super important to add the following information in your report: Language model feedback This feedback contains the session transcript, including the instructions, the prompts, the responses, etc. Without that, we can’t reason the model’s behavior, and hence can hardly take any action. Use logFeedbackAttachment(sentiment:issues:desiredOutput: ) to retrieve the feedback data of your current model session, as shown in the usage example, write the data into a file, and then attach the file to your feedback report. If you believe what you’d report is related to the system configuration, please capture a sysdiagnose and attach it to your feedback report as well. The framework is still new. Your actionable feedback helps us evolve the framework quickly, and we appreciate that. Thanks, The Foundation Models framework team
Replies
0
Boosts
0
Views
857
Activity
Aug ’25
Apple Intelligence Naughty Naughty
When doing some exploratory research into using Apple Intelligence in our aviation-focused application, I noticed that there were several times that key phases would be marked as inappropriate. I tried to stifle these using prompts and rules but couldn't get it to take hold. I was encouraged by an Apple employee to go ahead and post this so that the AI team can use the feedback. There were several terms that triggered this warning, but the two that were most prominent were: 'Tailwind' 'JFK' or 'KJFK' (NY airport ICAO/IATA codes)
Replies
2
Boosts
0
Views
575
Activity
3w
Converting TF2 object detection to CoreML
I've spent way too long today trying to convert an Object Detection TensorFlow2 model to a CoreML object classifier (with bounding boxes, labels and probability score) The 'SSD MobileNet v2 320x320' is here: https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md And I've been following all sorts of posts and ChatGPT https://apple.github.io/coremltools/docs-guides/source/tensorflow-2.html#convert-a-tensorflow-concrete-function https://developer.apple.com/videos/play/wwdc2020/10153/?time=402 To convert it. I keep hitting the same errors though, mostly around: NotImplementedError: Expected model format: [SavedModel | concrete_function | tf.keras.Model | .h5 | GraphDef], got <ConcreteFunction signature_wrapper(input_tensor) at 0x366B87790> I've had varying success including missing output labels/predictions. But I simply want to create the CoreML model with all the right inputs and outputs (including correct names) as detailed in the docs here: https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tf2.md It goes without saying I don't have much (any) experience with this stuff including Python so the whole thing's been a bit of a headache. If anyone is able to help that would be great. FWIW I'm not attached to any one specific model, but what I do need at minimum is a CoreML model that can detect objects (has to at least include lights and lamps) within a live video image, detecting where in the image the object is. The simplest script I have looks like this: import coremltools as ct import tensorflow as tf model = tf.saved_model.load("~/tf_models/ssd_mobilenet_v2_320x320_coco17_tpu-8/saved_model") concrete_func = model.signatures[tf.saved_model.DEFAULT_SERVING_SIGNATURE_DEF_KEY] mlmodel = ct.convert( concrete_func, source="tensorflow", inputs=[ct.TensorType(shape=(1, 320, 320, 3))] ) mlmodel.save("YourModel.mlpackage", save_format="mlpackage")
Replies
1
Boosts
0
Views
496
Activity
Jul ’25
Is it possible to pass the streaming output of Foundation Models down a function chain
I am writing a custom package wrapping Foundation Models which provides a chain-of-thought with intermittent self-evaluation among other things. At first I was designing this package with the command line in mind, but after seeing how well it augments the models and makes them more intelligent I wanted to try and build a SwiftUI wrapper around the package. When I started I was using synchronous generation rather than streaming, but to give the best user experience (as I've seen in the WWDC sessions) it is necessary to provide constant feedback to the user that something is happening. I have created a super simplified example of my setup so it's easier to understand. First, there is the Reasoning conversation item, which can be converted to an XML representation which is then fed back into the model (I've found XML works best for structured input) public typealias ConversationContext = XMLDocument extension ConversationContext { public func toPlainText() -> String { return xmlString(options: [.nodePrettyPrint]) } } /// Represents a reasoning item in a conversation, which includes a title and reasoning content. /// Reasoning items are used to provide detailed explanations or justifications for certain decisions or responses within a conversation. @Generable(description: "A reasoning item in a conversation, containing content and a title.") struct ConversationReasoningItem: ConversationItem { @Guide(description: "The content of the reasoning item, which is your thinking process or explanation") public var reasoningContent: String @Guide(description: "A short summary of the reasoning content, digestible in an interface.") public var title: String @Guide(description: "Indicates whether reasoning is complete") public var done: Bool } extension ConversationReasoningItem: ConversationContextProvider { public func toContext() -> ConversationContext { // <ReasoningItem title="${title}"> // ${reasoningContent} // </ReasoningItem> let root = XMLElement(name: "ReasoningItem") root.addAttribute(XMLNode.attribute(withName: "title", stringValue: title) as! XMLNode) root.stringValue = reasoningContent return ConversationContext(rootElement: root) } } Then there is the generator, which creates a reasoning item from a user query and previously generated items: struct ReasoningItemGenerator { var instructions: String { """ <omitted for brevity> """ } func generate(from input: (String, [ConversationReasoningItem])) async throws -> sending LanguageModelSession.ResponseStream<ConversationReasoningItem> { let session = LanguageModelSession(instructions: instructions) // build the context for the reasoning item out of the user's query and the previous reasoning items let userQuery = "User's query: \(input.0)" let reasoningItemsText = input.1.map { $0.toContext().toPlainText() }.joined(separator: "\n") let context = userQuery + "\n" + reasoningItemsText let reasoningItemResponse = try await session.streamResponse( to: context, generating: ConversationReasoningItem.self) return reasoningItemResponse } } I'm not sure if returning LanguageModelSession.ResponseStream<ConversationReasoningItem> is the right move, I am just trying to imitate what session.streamResponse returns. Then there is the orchestrator, which I can't figure out. It receives the streamed ConversationReasoningItems from the Generator and is responsible for streaming those to SwiftUI later and also for evaluating each reasoning item after it is complete to see if it needs to be regenerated (to keep the model on-track). I want the users of the orchestrator to receive partially generated reasoning items as they are being generated by the generator. Later, when they finish, if the evaluation passes, the item is kept, but if it fails, the reasoning item should be removed from the stream before a new one is generated. So in-flight reasoning items should be outputted aggresively. I really am having trouble figuring this out so if someone with more knowledge about asynchronous stuff in Swift, or- even better- someone who has worked on the Foundation Models framework could point me in the right direction, that would be awesome!
Replies
0
Boosts
0
Views
287
Activity
Jul ’25
Core Model Editor and Params
Optimal Precision • Current Precision: Mixed (Float32, int32) • Optimal Precision: Not specified in the image, but typically involves using the most efficient data type for the model's operations to balance speed and memory usage without significant loss of accuracy. Comparison: • Mixed Precision: Utilizes both Float32 and int32 to optimize performance. Float32 provides high precision, while int32 reduces memory usage and increases computational speed. • Optimal Precision: Aimed at achieving the best trade-off between performance and accuracy, potentially using other data types like Float16 (bfloat16) for even greater efficiency in certain hardware environments. Operation Distribution • Current Distribution: • iOS18.mul: 168 • iOS18.transpose: 126 • iOS18.linear: 98 • iOS18.add: 97 • iOS18.sliceByIndex: 96 • iOS18.expandDims: 74 • iOS18.concat: 72 • iOS18.squeeze: 72 • iOS18.reshape: 67 • iOS18.layerNorm: 49 • iOS18.matmul: 48 • iOS18.gelu: 26 • iOS18.softmax: 24 • Split: 24 • conv: 1 • iOS18.conv: 1 Comparison: • Operation Count: Indicates how frequently each operation is executed. High counts for operations like mul, transpose, and linear suggest these are computationally intensive parts of the model. • Optimization Opportunities: Reducing the count of high-frequency operations or optimizing their execution can improve performance. This might involve pruning unnecessary operations, optimizing algorithms, or leveraging hardware acceleration. General Recommendations • Precision Tuning: Experiment with different precision levels to find the best balance for your specific hardware and accuracy requirements. • Operation Optimization: Focus on optimizing the most frequent operations. Techniques include using more efficient algorithms, parallelizing computations, or utilizing specialized hardware like GPUs or TPUs. • Benchmarking: Regularly benchmark the model to assess the impact of changes and ensure that optimizations lead to meaningful performance improvements. By focusing on these areas, you can potentially enhance the efficiency and performance of your ML model.
Replies
0
Boosts
0
Views
87
Activity
Feb ’26
jax-metal failing due to incompatibility with jax 0.5.1 or later.
Hello, I am interested in using jax-metal to train ML models using Apple Silicon. I understand this is experimental. After installing jax-metal according to https://developer.apple.com/metal/jax/, my python code fails with the following error JaxRuntimeError: UNKNOWN: -:0:0: error: unknown attribute code: 22 -:0:0: note: in bytecode version 6 produced by: StableHLO_v1.12.1 My issue is identical to the one reported here https://github.com/jax-ml/jax/issues/26968#issuecomment-2733120325, and is fixed by pinning to jax-metal 0.1.1., jax 0.5.0 and jaxlib 0.5.0. Thank you!
Replies
1
Boosts
0
Views
879
Activity
Feb ’26
Tensorflow metal: Issue using assign operation on MacBook M4
I get the following error when running this command in a Jupyter notebook: v = tf.Variable(initial_value=tf.random.normal(shape=(3, 1))) v[0, 0].assign(3.) Environment: python == 3.11.14 tensorflow==2.19.1 tensorflow-metal==1.2.0 { "name": "InvalidArgumentError", "message": "Cannot assign a device for operation ResourceStridedSliceAssign: Could not satisfy explicit device specification '/job:localhost/replica:0/task:0/device:GPU:0' because no supported kernel for GPU devices is available.\nColocation Debug Info:\nColocation group had the following types and supported devices: \nRoot Member(assigned_device_name_index_=1 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]\nResourceStridedSliceAssign: CPU \n_Arg: GPU CPU \n\nColocation members, user-requested devices, and framework assigned devices, if any:\n ref (_Arg) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0\n ResourceStridedSliceAssign (ResourceStridedSliceAssign) /job:localhost/replica:0/task:0/device:GPU:0\n\nOp: ResourceStridedSliceAssign\n [...] [[{{node ResourceStridedSliceAssign}}]] [Op:ResourceStridedSliceAssign] name: strided_slice/_assign" } It seems like the ResourceStridedSliceAssign operation is not implemented for the GPU
Replies
0
Boosts
0
Views
176
Activity
Feb ’26
SpeechAnalyzer / AssetInventory and preinstalled assets
During testing the “Bringing advanced speech-to-text capabilities to your app” sample app demonstrating the use of iOS 26 SpeechAnalyzer, I noticed that the language model for the English locale was presumably already downloaded. Upon checking the documentation of AssetInventory, I found out that indeed, the language model can be preinstalled on the system. Can someone from the dev team share more info about what assets are preinstalled by the system? For example, can we safely assume that the English language model will almost certainly be already preinstalled by the OS if the phone has the English locale?
Replies
1
Boosts
0
Views
271
Activity
Jul ’25
Best approach for animating a speaking avatar in a macOS/iOS SwiftUI application
I am developing a macOS application using SwiftUI (with an iOS version as well). One feature we are exploring is displaying an avatar that reads or speaks dynamically generated text produced by an AI service. The basic flow would be: Text generated by an AI service Text converted to speech using a TTS engine An avatar (2D or 3D) rendered in the app that animates lip movement synchronized with the speech Ideally the avatar would render locally on the device. Questions: What Apple frameworks would be most appropriate for implementing a speaking avatar? SceneKit RealityKit SpriteKit (for 2D avatars) Is there any recommended way to drive lip-sync animation from speech audio using Apple frameworks? Does AVSpeechSynthesizer expose phoneme or viseme timing information that could be used for avatar animation? If such timing information is not available, what is the recommended approach for synchronizing character mouth animation with speech audio on macOS/iOS? Are there examples of real-time character animation synchronized with speech on macOS/iOS? Any architectural guidance or references would be greatly appreciated.
Replies
0
Boosts
0
Views
527
Activity
2w
Gazetteer encryption?
I have an app that uses a couple of mlmodels (word tagger and gazetteer) and I’m trying to encrypt them before publishing. The models are part of a package. I understand that Xcode can’t automatically handle the encryption for a model in a package the way it can within a traditional app structure. Given that, I’ve generated the Apple MLModel encryption key from Xcode and am encrypting via the command line with: xcrun coremlcompiler compile Gazetteer.mlmodel GazetteerENC.mlmodelc --encrypt Gazetteerkey.mlmodelkey In the package manifest, I’ve listed the encrypted models as .copy resources for my target and have verified the URL to that file is good. When I try to load the encrypted .mlmodelc file (on a physical device) with the line:
 gazetteer = try NLGazetteer(contentsOf: gazetteerURL!) I get the error: Failed to open file: /…/Scanner.bundle/GazetteerENC.mlmodelc/coremldata.bin. It is not a valid .mlmodelc file. So my questions are: Does the NLGazetteer class support encrypted MLModel files? Given that my models are in a package, do I have the right general approach? Thanks for any help or thoughts.
Replies
0
Boosts
0
Views
161
Activity
May ’25
Plenty of LanguageModelSession.GenerationError.refusal errors after 26.4 update
Hello! After the 26.4 update I get a huge number of LanguageModelSession.GenerationError.refusal errors when using guided generation Generables for inexplicable reasons. Such errors also occur, if I want to cast a response to boolean by using 'generating: Bool.self'. The explanation generated on the grounds of the error always looks like this: Response(userPrompt: "", duration: 0.230917542, promptTokenCount: Optional(66), responseTokenCount: Optional(11), feedbackAttachment: nil, content: "I apologize, but I cannot fulfill this request.", rawContent: "I apologize, but I cannot fulfill this request.", transcriptEntries: ArraySlice([])) All the prompts and Generables I use are definitely not profane. Before 26.4 such errors on the same prompts and Generables never occurred. The 26.4 update rendered those features unusable to me. Is this a known bug or what am I doing wrong?
Replies
3
Boosts
0
Views
432
Activity
2d
Apple ANE Peformance - throttling?
I can no longer achieve 100% ANE usage since upgrading to MacOS26 Beta 5. I used to be able to get 100%. Has Apple activated throttling or power saving features in the new Betas? Is there any new rate limiting on the API? I can hardly get above 3w or 40%. I have a M4 Pro mini (64GB) with High Power energy setting. MacOS 26 Beta 5.
Replies
2
Boosts
0
Views
341
Activity
Aug ’25
Official One-Click Local LLM Deployment for 2019 Mac Pro (7,1) Dual W6900X
I am a professional user of the 2019 Mac Pro (7,1) with dual AMD Radeon Pro W6900X MPX modules (32GB VRAM each). This hardware is designed for high-performance compute, but it is currently crippled for modern local LLM/AI workloads under Linux due to Apple's EFI/PCIe routing restrictions. Core Issue: rocminfo reports "No HIP GPUs available" when attempting to use ROCm/amdgpu on Linux Apple's custom EFI firmware blocks full initialization of professional GPU compute assets The dual W6900X GPUs have 64GB combined VRAM and high-bandwidth Infinity Fabric Link, but cannot be fully utilized for local AI inference/training My Specific Request: Apple should provide an official, one-click deployable application that enables full utilization of dual W6900X GPUs for local large language model (LLM) inference and training under Linux. This application must: Fully initialize both W6900X GPUs via HIP/ROCm, establishing valid compute contexts Bypass artificial EFI/PCIe routing restrictions that block access to professional GPU resources Provide a stable, user-friendly one-click deployment experience (similar to NVIDIA's AI Enterprise or AMD's ROCm Hub) Why This Matters: The 2019 Mac Pro is Apple's flagship professional workstation, marketed for compute-intensive workloads. Its high-cost W6900X GPUs should not be locked down for modern AI/LLM use cases. An official one-click deployment solution would demonstrate Apple's commitment to professional AI and unlock significant value for professional users. I look forward to Apple's response and a clear roadmap for enabling this critical capability. #MacPro #Linux #ROCm #LocalLLM #W6900X #CoreML
Replies
0
Boosts
0
Views
114
Activity
6d
Massive CoreML latency spike on live AVFoundation camera feed vs. offline inference (CPU+ANE)
Hello, I’m experiencing a severe performance degradation when running CoreML models on a live AVFoundation video feed compared to offline or synthetic inference. This happens across multiple models I've converted (including SCI, RTMPose, and RTMW) and affects multiple devices. The Environment OS: macOS 26.3, iOS 26.3, iPadOS 26.3 Hardware: Mac14,6 (M2 Max), iPad Pro 11 M1, iPhone 13 mini Compute Units: cpuAndNeuralEngine The Numbers When testing my SCI_output_image_int8.mlpackage model, the inference timings are drastically different: Synthetic/Offline Inference: ~1.34 ms Live Camera Inference: ~15.96 ms Preprocessing is completely ruled out as the bottleneck. My profiling shows total preprocessing (nearest-neighbor resize + feature provider creation) takes only ~0.4 ms in camera mode. Furthermore, no frames are being dropped. What I've Tried I am building a latency-critical app and have implemented almost every recommended optimization to try and fix this, but the camera-feed penalty remains: Matched the AVFoundation camera output format exactly to the model input (640x480 at 30/60fps). Used IOSurface-backed pixel buffers for everything (camera output, synthetic buffer, and resize buffer). Enabled outputBackings. Loaded the model once and reused it for all predictions. Configured MLModelConfiguration with reshapeFrequency = .frequent and specializationStrategy = .fastPrediction. Wrapped inference in ProcessInfo.processInfo.beginActivity(options: .latencyCritical, reason: "CoreML_Inference"). Set DispatchQueue to qos: .userInteractive. Disabled the idle timer and enabled iOS Game Mode. Exported models using coremltools 9.0 (deployment target iOS 26) with ImageType inputs/outputs and INT8 quantization. Reproduction To completely rule out UI or rendering overhead, I wrote a standalone Swift CLI script that isolates the AVFoundation and CoreML pipeline. The script clearly demonstrates the ~15ms latency on live camera frames versus the ~1ms latency on synthetic buffers. (I have attached camera_coreml_benchmark.swift and coreml model (very light low light enghancement model) to this repo on github https://github.com/pzoltowski/apple-coreml-camera-latency-repro). My Question: Is this massive overhead expected behavior for AVFoundation + Core ML on live feeds, or is this a framework/runtime bug? If expected, what is the Apple-recommended pattern to bypass this camera-only inference slowdown? One think found interesting when running in debug model was faster (not as fast as in performance benchmark but faster than 16ms. Also somehow if I did some dummy calculation on on different DispatchQueue also seems like model got slightly faster. So maybe its related to ANE Power State issues (Jitter/SoC Wake) and going to fast to sleep and taking a long time to wakeup? Doing dummy calculation in background thought is probably not a solution. Thanks in advance for any insights!
Replies
5
Boosts
0
Views
712
Activity
1w
Powermetrics GPU power vs system DC power discrepancy on M4 Max
While analyzing system power on an M4 Max under GPU-heavy compute workloads, I noticed that the the GPU power reported by powermetrics does not come anywhere close to total system DC power reported by the SMC counter PDTR (as used by utilities like mactop). For example, in a heavy GPU workload, powermetrics would report a 65W idle-load delta on the GPU, but at the same time system DC power would rise by 179W, leaving 114W or nearly 2/3 of total system DC power on a Mac Studio M4 Max unexplained. From measurements, the difference appears to correlate with the amount of on-chip data movement (for example, varying bytes-per-FLOP in the workload changes the observed gap). Using SMC and IOReport, I was able to reverse engineer an energy model for the GPU that explains almost all of the energy flow with less than 2% error on the workload I studied. The result is a simple two-term energy roofline model: P_GPU (GPU_combined term in the plot) ≈ a * bytes + b * FLOPs with: ~5 pJ/byte for SRAM movement ~2.7 pJ/FLOP for compute. Has anyone observed similar behavior, or is there guidance on how GPU power reported by IOReport/powermetrics should be interpreted relative to total system power? In particular, I’m interested in whether certain classes of GPU activity may not be attributed to the GPU component in IOReport. Full details with the methodology and results are available here: https://youtu.be/HKxIGgyeISM
Replies
0
Boosts
0
Views
60
Activity
1w
AI and ML
Hello. I am willing to hire game developer for cards game called baloot. My question is Can the developer implement an AI when the computer is playing and the computer on the same time the conputer improves his rises level without any interaction? 🌹
Replies
0
Boosts
0
Views
109
Activity
Jun ’25