Metal Performance Shaders

RSS for tag

Optimize graphics and compute performance with kernels that are fine-tuned for the unique characteristics of each Metal GPU family using Metal Performance Shaders.

Posts under Metal Performance Shaders tag

23 Posts

Post

Replies

Boosts

Views

Activity

CoreML memory allocation logic
hello, I got a question about coreml. I loaded the coreml model in the project and set the computing unit to CPU+GPU. When I used instruments to analyze the performance, I found that there was an overhead of prepare gpu request before each inference. I also checked the freezing point graph and found that memory was frequently allocated. Is this as expected? Is there any way to avoid frequent prepares? I have tried some methods, such as memory sharing of predict interface input parameters, but it seems to be ineffective.
0
0
148
May ’25
CoreML Model Conversion Help
I’m trying to follow Apple’s “WWDC24: Bring your machine learning and AI models to Apple Silicon” session to convert the Mistral-7B-Instruct-v0.2 model into a Core ML package, but I’ve run into a roadblock that I can’t seem to overcome. I’ve uploaded my full conversion script here for reference: https://pastebin.com/T7Zchzfc When I run the script, it progresses through tracing and MIL conversion but then fails at the backend_mlprogram stage with this error: https://pastebin.com/fUdEzzKM The core of the error is: ValueError: Op "keyCache_tmp" (op_type: identity) Input x="keyCache" expects list, tensor, or scalar but got state[tensor[1,32,8,2048,128,fp16]] I’ve registered my KV-cache buffers in a StatefulMistralWrapper subclass of nn.Module, matching the keyCache and valueCache state names in my ct.StateType definitions, but Core ML’s backend pass reports the state tensor as an invalid input. I’m using Core ML Tools 8.3.0 on Python 3.9.6, targeting iOS18, and forcing CPU conversion (MPS wasn’t available). Any pointers on how to satisfy the handle_unused_inputs pass or properly declare/cache state for GQA models in Core ML would be greatly appreciated! Thanks in advance for your help, Usman Khan
0
0
294
May ’25
Regarding Smoothing in Spectrogram using Metal
Hey, I need to know how to use texture mapping for rendering a spectrogram in metal. As I need smoothens the spectrogram. In my current project I am using vertex based approach which results in blocky behaviour between each quad. I need to smooth across each qaud so that It will smoothly gradient over.
Replies
0
Boosts
0
Views
145
Activity
Jun ’25
CoreML memory allocation logic
hello, I got a question about coreml. I loaded the coreml model in the project and set the computing unit to CPU+GPU. When I used instruments to analyze the performance, I found that there was an overhead of prepare gpu request before each inference. I also checked the freezing point graph and found that memory was frequently allocated. Is this as expected? Is there any way to avoid frequent prepares? I have tried some methods, such as memory sharing of predict interface input parameters, but it seems to be ineffective.
Replies
0
Boosts
0
Views
148
Activity
May ’25
CoreML Model Conversion Help
I’m trying to follow Apple’s “WWDC24: Bring your machine learning and AI models to Apple Silicon” session to convert the Mistral-7B-Instruct-v0.2 model into a Core ML package, but I’ve run into a roadblock that I can’t seem to overcome. I’ve uploaded my full conversion script here for reference: https://pastebin.com/T7Zchzfc When I run the script, it progresses through tracing and MIL conversion but then fails at the backend_mlprogram stage with this error: https://pastebin.com/fUdEzzKM The core of the error is: ValueError: Op "keyCache_tmp" (op_type: identity) Input x="keyCache" expects list, tensor, or scalar but got state[tensor[1,32,8,2048,128,fp16]] I’ve registered my KV-cache buffers in a StatefulMistralWrapper subclass of nn.Module, matching the keyCache and valueCache state names in my ct.StateType definitions, but Core ML’s backend pass reports the state tensor as an invalid input. I’m using Core ML Tools 8.3.0 on Python 3.9.6, targeting iOS18, and forcing CPU conversion (MPS wasn’t available). Any pointers on how to satisfy the handle_unused_inputs pass or properly declare/cache state for GQA models in Core ML would be greatly appreciated! Thanks in advance for your help, Usman Khan
Replies
0
Boosts
0
Views
294
Activity
May ’25