Metal Performance Shaders

RSS for tag

Optimize graphics and compute performance with kernels that are fine-tuned for the unique characteristics of each Metal GPU family using Metal Performance Shaders.

Metal Performance Shaders Documentation

Posts under Metal Performance Shaders tag

41 Posts
Sort by:
Post not yet marked as solved
1 Replies
475 Views
I have a following MTLBuffer created. How can I send INPUTVALUE to the memINPUT buffer? I need to send repeatedly in Objective-C. // header file @property id<MTLBuffer> memINPUT; // main file int length = 1000; ... memINPUT = [_device newBufferWithLength:(sizeof(float)*length) options:0]; ... float INPUTVALUE[length]; for (int i=0; i < length; i++) { INPUTVALUE[i] = (float)i; } // How to send to INPUTVALUE to memINPUT? ... The following is Swift version. I am looking for Objective-c version. memINPUT.contents().copyMemory(from: INPUTVALUE, byteCount: length * MemoryLayout<Float>.stride);
Posted
by oh1226.
Last updated
.
Post not yet marked as solved
2 Replies
1.2k Views
Hi, I am training an adversarial auto encoder using PyTorch 2.0.0 on Apple M2 (Ventura 13.1), with conda 23.1.0 as manager. I encountered this error: /AppleInternal/Library/BuildRoots/5b8a32f9-5db2-11ed-8aeb-7ef33c48bc85/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSNDArray/Kernels/MPSNDArrayConvolutionA14.mm:3967: failed assertion `destination kernel width and filter kernel width mismatch' /Users/vk/miniconda3/envs/betavae/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown To my knowledge, the code broke down when running self.manual_backward(loss["g_loss"]) this block: g_opt.zero_grad() self.manual_backward(loss["g_loss"]) g_opt.step() The same code run without problems on linux distribution. Any thoughts on how to fix it are highly appreciated!
Posted
by RayXC.
Last updated
.
Post marked as solved
1 Replies
721 Views
In the video here, the speaker refers to MPSGraphTool, which is supposed to convert from CoreML and other formats to the new MPSGraphPackage format. Searching for MPSGraphTool on Google returns only that video, and there is no mention of it on the forums here or elsewhere. When can we expect the tool to be released? How can we find out more information about it? My use case is that the ANECompilerService that runs on the Mac / iOS devices to compile CoreML Models / Programs is extremely slow and unreliable for large models. It often crashes entirely, sitting at 100% CPU usage forever and never completing the task at hand, meaning the user is stuck in a loading state. This also applies in Xcode when running a performance test. I would really like to compile the graph once and just run it on device directly.
Posted
by ephemer.
Last updated
.
Post not yet marked as solved
0 Replies
426 Views
Hello, I am doing ray tracing and plan to do multiple intersection tests with different rays in one kernel (shading loop). It works fine when I have two intersection tests, but will cause gpu break down when there are three intersection tests. Is there some rules that I need to obey? Thanks.
Posted
by QuintonQ.
Last updated
.
Post not yet marked as solved
0 Replies
552 Views
I am following this https://developer.apple.com/documentation/metal/performing_calculations_on_a_gpu on building a metal app for performing a GPU calculation. I am not able to figure out how to build and execute the project from the command line. Any help on how to build a main.m file using xcrun will be useful. I have tried xcrun -sdk macosx clang MetalComputeBasic/main.m but it doesn't work.
Posted
by arunppsg.
Last updated
.
Post not yet marked as solved
0 Replies
738 Views
I am learning Accelerating ray tracing using Metal. The area light has its own struct in this sample code, but I want to sample rays directly from the LightMesh. Can I get the instances and geometry of lightMesh without using resources buffer? It seems the geometries are already loaded in the GPU because Metal3 is able to do the intersection test. However, I can only get primitive_data during the intersection, and cannot get the information when I tried to do sampling. Thanks a lot!
Posted
by QuintonQ.
Last updated
.
Post not yet marked as solved
3 Replies
1.1k Views
I have been experimenting with different rendering approaches in Metal and am hitting a wall when it comes to reconciling "bindless" or GPU-driven approaches* with a dynamic scene where meshes can be added, removed, and changed. All the examples I have found of such approaches use fixed scenes, where all the data is fixed before the first draw call into something like a MeshBuffer that holds all scene geometry in the form of Mesh objects (for instance). While I can assume that recreating a MeshBuffer from scratch each frame would be possible but completely undesirable, and that there may be some clever tricks with pointers to update a MeshBuffer as needed, I would like to know if there is an established or optimal solution to this problem, or if these approaches are simply incompatible with dynamic geometry. Any example projects that do what I am asking that I may have missed would be appreciated, too. * I know these are not the same, but seem to share some common characteristics, namely providing your entire geometry to the GPU at once. Looping over an array of meshes and calling drawIndexedPrimitives from the CPU does not post any such obstacles, but also precludes some of the benefits of offloading work to the GPU, or having access to all geometry on the GPU for things like path tracing.
Posted
by spamheat.
Last updated
.
Post not yet marked as solved
1 Replies
790 Views
It seems like Apple Silicon (even M1/M2 MAX) devices can only use a certain percentage of their total unified memory for GPU/Metal. This seems to be a limitation related to: recommendedMaxWorkingSetSize Which is quite odd because even M1 Mac Mini's or Macbook Airs run totally fine with 8GB of total memory for both the OS and GPU so why limit this in the first place? Also seems like false advertising to me from Apple by not clearly stating this limitation. I am asking this in regards to the following open source project (but of course more software will be impacted by the same limitation): https://github.com/ggerganov/llama.cpp/pull/1826 another resource I've found: https://developer.apple.com/videos/play/tech-talks/10580/?time=546 If anyone has any ideas on how these limitations can be overcome and how to get apps to use more Memory for GPU (Meta)l I (and the open source community) would be truly grateful! thanks in advance!
Posted
by Jake88.
Last updated
.
Post marked as solved
1 Replies
1.2k Views
I'm trying to use the randomTensor function from MPS graph to initialize the weights of a fully connected layer. I can create the graph and run inference using the randomly initialized values, but when I try to train and update these randomly initialized weights, I'm hitting a crash: Assertion failed: (isa<To>(Val) && "cast<Ty>() argument of incompatible type!"), function cast, file Casting.h, line 578. I can train the graph if I instead initialize the weights myself on the CPU, but I thought using the randomTensor functions would be faster/allow initialization to occur on the GPU. Here's my code for building the graph including both methods of weight initialization: func buildGraph(variables: inout [MPSGraphTensor]) -> (MPSGraphTensor, MPSGraphTensor, MPSGraphTensor, MPSGraphTensor) { let inputPlaceholder = graph.placeholder(shape: [2], dataType: .float32, name: nil) let labelPlaceholder = graph.placeholder(shape: [1], name: nil) // This works for inference but not training let descriptor = MPSGraphRandomOpDescriptor(distribution: .uniform, dataType: .float32)! let weightTensor = graph.randomTensor(withShape: [2, 1], descriptor: descriptor, seed: 2, name: nil) // This works for inference and training // let weights = [Float](repeating: 1, count: 2) // let weightTensor = graph.variable(with: Data(bytes: weights, count: 2 * MemoryLayout<Float32>.size), shape: [2, 1], dataType: .float32, name: nil) variables += [weightTensor] let output = graph.matrixMultiplication(primary: inputPlaceholder, secondary: weightTensor, name: nil) let loss = graph.softMaxCrossEntropy(output, labels: labelPlaceholder, axis: -1, reuctionType: .sum, name: nil) return (inputPlaceholder, labelPlaceholder, output, loss) } And to run the graph I have the following in my sample view controller: override func viewDidLoad() { super.viewDidLoad() var variables: [MPSGraphTensor] = [] let (inputPlaceholder, labelPlaceholder, output, loss) = buildGraph(variables: &variables) let gradients = graph.gradients(of: loss, with: variables, name: nil) let learningRate = graph.constant(0.001, dataType: .float32) var updateOps: [MPSGraphOperation] = [] for (key, value) in gradients { let updates = graph.stochasticGradientDescent(learningRate: learningRate, values: key, gradient: value, name: nil) let assign = graph.assign(key, tensor: updates, name: nil) updateOps += [assign] } let commandBuffer = MPSCommandBuffer(commandBuffer: Self.commandQueue.makeCommandBuffer()!) let executionDesc = MPSGraphExecutionDescriptor() executionDesc.completionHandler = { (resultsDictionary, nil) in for (key, value) in resultsDictionary { var output: [Float] = [0] value.mpsndarray().readBytes(&output, strideBytes: nil) print(output) } } let inputDesc = MPSNDArrayDescriptor(dataType: .float32, shape: [2]) let input = MPSNDArray(device: Self.device, descriptor: inputDesc) var inputArray: [Float] = [1, 2] input.writeBytes(&inputArray, strideBytes: nil) let source = MPSGraphTensorData(input) let labelMPSArray = MPSNDArray(device: Self.device, descriptor: MPSNDArrayDescriptor(dataType: .float32, shape: [1])) var labelArray: [Float] = [1] labelMPSArray.writeBytes(&labelArray, strideBytes: nil) let label = MPSGraphTensorData(labelMPSArray) // This runs inference and works // graph.encode(to: commandBuffer, feeds: [inputPlaceholder: source], targetTensors: [output], targetOperations: [], executionDescriptor: executionDesc) // // commandBuffer.commit() // commandBuffer.waitUntilCompleted() // This trains but does not work graph.encode( to: commandBuffer, feeds: [inputPlaceholder: source, labelPlaceholder: label], targetTensors: [], targetOperations: updateOps, executionDescriptor: executionDesc) commandBuffer.commit() commandBuffer.waitUntilCompleted() } And a few other relevant variables are created at the class scope: let graph = MPSGraph() static let device = MTLCreateSystemDefaultDevice()! static let commandQueue = device.makeCommandQueue()! How can I use these randomTensor functions on MPSGraph to randomly initialize weights for training?
Posted Last updated
.
Post not yet marked as solved
0 Replies
550 Views
Hello 👋 Kindly Looking for your support as I'm facing an issue with my MacBook Pro 2019 16 gb ram 4gb. the main problem is: I'm getting the error of kernel panic 3-4 times a week without any restarting, it only happens when sleep or even shutdown. my MacBook shows this message: your computer restarted because of a problem and details give me this following log: panic(cpu 2 caller 0xffffff7f95957747): GPU Panic: [3:0:0][PPLIB] Failed to send PPLIB IRI to Accelerator. : 23308iE1_ : mux-regs 5 3 3 7f 1f 0 0 switch-state 0 IG FBs 0 EG FBs 0:1f power-state 0 3D idle HDA idle system-state 0 power-level 20:20 power-retry 0:0 connect-change 0 WS-ready 0 Panicked task 0xffffffa056441698: 11 threads: pid 170: WindowServer Backtrace (CPU 2), panicked thread: 0xffffffa0564b2598, Frame : Return Address 0xffffffc4b5e53330 : 0xffffff8000c6f4fd 0xffffffc4b5e53380 : 0xffffff8000dc3c54 0xffffffc4b5e533c0 : 0xffffff8000db36d9 0xffffffc4b5e53420 : 0xffffff8000c0f951 0xffffffc4b5e53440 : 0xffffff8000c6f7dd 0xffffffc4b5e53530 : 0xffffff8000c6ee87 0xffffffc4b5e53590 : 0xffffff80013dceab 0xffffffc4b5e53680 : 0xffffff7f95957747 0xffffffc4b5e53760 : 0xffffff7f95957217 0xffffffc4b5e53850 : 0xffffff7f8db94c40 0xffffffc4b5e53960 : 0xffffff7f8db56d0d 0xffffffc4b5e539a0 : 0xffffff7f8db93b5e 0xffffffc4b5e539f0 : 0xffffff7f8db5b5f8 0xffffffc4b5e53a50 : 0xffffff7f9648bd56 0xffffffc4b5e53ae0 : 0xffffff7f8db5b584 0xffffffc4b5e53b00 : 0xffffff7f96482156 0xffffffc4b5e53b90 : 0xffffff80012e4e6c 0xffffffc4b5e53bf0 : 0xffffff8001353304 0xffffffc4b5e53c70 : 0xffffff8000d6fd5b 0xffffffc4b5e53cc0 : 0xffffff8000c4975a 0xffffffc4b5e53d60 : 0xffffff8000c604a2 0xffffffc4b5e53dd0 : 0xffffff8000c60b27 0xffffffc4b5e53ef0 : 0xffffff8000d98f53 0xffffffc4b5e53fa0 : 0xffffff8000c0fdb6 Kernel Extensions in backtrace: com.apple.iokit.IOGraphicsFamily(597.0)[718E01CF-8B05-3042-88F4-DE3441395D00]@0xffffff7f96471000-&gt;0xffffff7f9649ffff dependency: com.apple.iokit.IOPCIFamily(2.9)[6E72A292-C2AA-3F1B-8141-213720748CBD]@0xffffff8003691000-&gt;0xffffff80036c2fff com.apple.kext.AMDRadeonX6000Framebuffer(4.1.2)[C2C59945-AFF0-33C6-BB68-6FF2576066AE]@0xffffff7f8db46000-&gt;0xffffff7f8ddcffff dependency: com.apple.AppleGraphicsDeviceControl(7.1.18)[B22B74AE-08E9-3D23-8F7A-EAD3C39EE7AD]@0xffffff7f95903000-&gt;0xffffff7f95906fff dependency: com.apple.iokit.IOACPIFamily(1.4)[D342E754-A422-3F44-BFFB-DEE93F6723BC]@0xffffff8003221000-&gt;0xffffff8003222fff dependency: com.apple.iokit.IOGraphicsFamily(597)[718E01CF-8B05-3042-88F4-DE3441395D00]@0xffffff7f96471000-&gt;0xffffff7f9649ffff dependency: com.apple.iokit.IOPCIFamily(2.9)[6E72A292-C2AA-3F1B-8141-213720748CBD]@0xffffff8003691000-&gt;0xffffff80036c2fff com.apple.driver.AppleMuxControl2(7.1.18)[66B94BCE-5211-3273-BF5A-FDAA39C8E758]@0xffffff7f95947000-&gt;0xffffff7f95959fff dependency: com.apple.AppleGraphicsDeviceControl(7.1.18)[B22B74AE-08E9-3D23-8F7A-EAD3C39EE7AD]@0xffffff7f95903000-&gt;0xffffff7f95906fff dependency: com.apple.driver.AppleGraphicsControl(7.1.18)[43869897-FA64-3461-B5EE-9594DA6B29CA]@0xffffff7f958eb000-&gt;0xffffff7f958ebfff dependency: com.apple.iokit.IOACPIFamily(1.4)[D342E754-A422-3F44-BFFB-DEE93F6723BC]@0xffffff8003221000-&gt;0xffffff8003222fff dependency: com.apple.iokit.IOGraphicsFamily(597)[718E01CF-8B05-3042-88F4-DE3441395D00]@0xffffff7f96471000-&gt;0xffffff7f9649ffff dependency: com.apple.iokit.IOPCIFamily(2.9)[6E72A292-C2AA-3F1B-8141-213720748CBD]@0xffffff8003691000-&gt;0xffffff80036c2fff Kernel version: Darwin Kernel Version 22.5.0: Mon Apr 24 20:51:50 PDT 2023; root:xnu-8796.121.2~5/RELEASE_X86_64 Kernel UUID: 7E997BC9-2104-3D4D-9AAE-17BD7A3FEC2D roots installed: 0 KernelCache slide: 0x0000000000800000 KernelCache base: 0xffffff8000a00000 Kernel slide: 0x00000000008dc000 Kernel text base: 0xffffff8000adc000 __HIB text base: 0xffffff8000900000 System model name: MacBookPro16,1 (Mac-E1008331FDC96864) System shutdown begun: NO Hibernation exit count: 0 !AActuatorDriver 6440.7 !AMultitouchDriver 6440.7 !AInputDeviceSupport 6440.8 @kext.AMDRadeonX6100HWLibs 1.0 @kext.AMDRadeonX6000HWServices 4.1.2 !UAudio 540.8 !AAudioClockLibs 240.1 !ASMBusPCI 1.0.14d1 !A!ILpssUARTv1 3.0.60 !A!ILpssUARTCommon 3.0.60 !AOnboardSerial 1.0 !AGraphicsControl 7.1.18 !AHDA!C 440.2 |IOHDA!F 440.2 |IOAudio!F 440.2 @vecLib.kext 1.2.0 @kext.AMDRadeonX6000Framebuffer 4.1.2 @kext.AMDSupport 4.1.2 @kext.triggers 1.0 IOHIDPowerSource 1 !ACallbackPowerSource 1 |IO!BSerialManager 9.0.0 |IO!BPacketLogger 9.0.0 |IO!BHost!CUSBTransport 9.0.0 |IO!BHost!CUARTTransport 9.0.0 |IO!BHost!CTransport 9.0.0 IO!BHost!CPCIeTransport 9.0.0 |CSR!BHost!CUSBTransport 9.0.0 |Broadcom!BHost!CUSBTransport 9.0.0 |Broadcom!B20703USBTransport 9.0.0 !ARSMChannel 1 |IORSM!F 1 !AIPAppender 1.0 @!AGPUWrangler 7.1.18 |IOSlowAdaptiveClocking!F 1.0.0 !ABacklightExpert 1.1.0 |IONDRVSupport 597 |IOAccelerator!F2 475.40.6 @!AGraphicsDeviceControl 7.1.18 IOPlatformPluginLegacy 1.0.0 X86PlatformPlugin 1.0.0 !AThunderboltEDMSink 5.0.3 !AThunderboltDPOutAdapter 8.5.1 IOPlatformPlugin!F 6.0.0d8 driverkit.serial 6.0.0 |IOGraphics!F 597 !ASMBus!C 1.0.18d1 usb.IOUSBHostHIDDevice 1.2 usb.cdc.ecm 5.0.0 usb.cdc.ncm 5.0.0 usb.cdc 5.0.0 usb.networking 5.0.0 usb.!UHostCompositeDevice 1.2 !AThunderboltDPInAdapter 8.5.1 !AThunderboltDPAdapter!F 8.5.1 !AThunderboltPCIDownAdapter 4.1.1 !AHPM 3.4.4 !A!ILpssI2C!C 3.0.60 !A!ILpssI2C 3.0.60 !A!ILpssDmac 3.0.60 !ABSDKextStarter 3 |IOSurface 336.50.1 @filesystems.hfs.encodings.kext 1 !ASyntheticGame!C 10.6.3 !AXsanScheme 3 !AThunderboltNHI 7.2.81 |IOThunderbolt!F 9.3.3 usb.!UVHCIBCE 1.2 usb.!UVHCICommonBCE 1.0 usb.!UVHCI 1.2 usb.!UVHCICommon 1.0 !AEffaceableNOR 1.0 |IOBufferCopy!C 1.1.0 |IOBufferCopyEngine!F 1 |IONVMe!F 2.1.0 !ABCMWLANCoreMac 1.0.0 |IO80211!F 1200.13.0 IOImageLoader 1.0.0 !AOLYHALMac 1 |IOSerial!F 11 corecapture 1.0.4 usb.!UXHCIPCI 1.2 usb.!UXHCI 1.2 usb.!UHostPacketFilter 1.0 |IOUSB!F 900.4.2 !AEFINVRAM 2.1 !AEFIRuntime 2.1 !ASMCRTC 1.0 |IOSMBus!F 1.1 |IOHID!F 2.0.0 |IOTimeSync!F 1150.2 |IOSkywalk!F 1.0 mDNSOffloadUserClient 1.0.1b8 |IONetworking!F 3.4 DiskImages 493.0.0 |IO!B!F 9.0.0 |IOReport!F 47 $quarantine 4 $sandbox 300.0 @kext.!AMatch 1.0.0d1 !ASSE 1.0 !AKeyStore 2 !UTDM 554 |IOUSBMass!SDriver 235.100.2 |IOSCSIBlockCommandsDevice 482.120.2 |IO!S!F 2.1 |IOSCSIArchitectureModel!F 482.120.2 !AFDEKeyStore 28.30 !AEffaceable!S 1.0 !ACyrus 1 !AMobileFileIntegrity 1.0.5 $!AImage4 5.0.0 @kext.CoreTrust 1 !ACredentialManager 1.0 |CoreAnalytics!F 1 KernelRelayHost 1 |IOUSBHost!F 1.2 !UHostMergeProperties 1.2 usb.!UCommon 1.0 !ABusPower!C 1.0 !ASEPManager 1.0.1 IOSlaveProcessor 1 !AACPIPlatform 6.1 !ASMC 3.1.9 |IOPCI!F 2.9 |IOACPI!F 1.4 watchdog 1 @kec.pthread 1 @kec.Libm 1 @kec.corecrypto 12.0
Posted
by ShahdReda.
Last updated
.
Post marked as solved
2 Replies
838 Views
From the manual https://keith.github.io/xcode-man-pages/MetalValidation.1.html, I figure out that, when I launch my app with thos environment variables, It may work. this is my current environment variables And I really get the API Validation Message from the logs like this: This is my another test , I do not set the Metal_DEBUG_ERROR_MODE, METAL_ERROR_MODE and METAL_ERROR_CHECK_EXTENDED_MODE, just let it as the default value, then I got these results: 1.the app trigger a crash when met with API Validation Failure 2.the Failure message is too simple to fix the bug: Here is my **problem: Can I trigger a crash when the API Validation Fail, and get the API Validation detail message by setting the environment variables? **
Posted Last updated
.
Post not yet marked as solved
1 Replies
1.8k Views
In my game project, there is a functions.data file in then /AppData/Library/Caches/[bundleID]/com.apple.metal/functions.data, when we reboot and launch the game, this file was rest to about 40KB, normaly this file's is about 30MB, this operation was done by the metal, Is there any way to avoid it?
Posted Last updated
.
Post not yet marked as solved
5 Replies
3.8k Views
Hello guys. With the release of the M1 Pro and M1 Max in particular, the Mac has become a platform that could become very interesting for games in the future. However, since some features are still missing in Metal, it could be problematic for some developers to port their games to Metal. Especially with the Unreal Engine 5 you can already see a tendency in this direction, since e.g. Nanite and Lumen are unfortunately not available on the Mac. As a Vulkan developer I wanted to inquire about some features that are not yet available in Metal at the moment. These features are very interesting if you want to write a GPU driven renderer for modern game engines. Furthermore, these features could be used to emulate D3D12 on the Mac via MoltenVK, which would result in more games being available on the Mac. Buffer device address: This feature allows the application to query a 64-bit buffer device address value for a buffer. It is very useful for D3D12 emulation and for compatibility with Vulkan, e.g. to implement ray tracing on MoltenVK. DrawIndirectCount: This feature allows an application to source the number of draws for indirect drawing calls from a buffer. Also very useful in many gpu driven situations Only 500000 resources per argument buffer Metal has a limit of 500000 resources per argument buffer. To be equivalent to D3D12 Resource Binding Tear 2, you would need 1 million. This is also very important as so many DirectX12 game engines could be ported to Metal more easily. Mesh shader / Task shader: Two interesting new shader stages to optimize the rendering pipeline Are there any plans to implement this features in future? Is there a roadmap for metal? Is there a website where I can suggest features to the metal developers? I hope to see at least the first 3 features in metal in the future and I think that many developers feel the same way. Best regards, Marlon
Posted
by zmxrlxn.
Last updated
.
Post not yet marked as solved
2 Replies
1.4k Views
Hi Apple, we have purchased Silicon-based Macs at our studio and intend to use Unreal Engine with it, Epic games said it's left to you to allow this on the hardware, while I also learnt that the M chips allow ray-tracing. Are you in talks with Epic Games? or should we expect this feature soon? if so, how soon? or will it not be possible on these present Silicon-based Macs? An answer would really point our company in the right direction. Thank you very much.
Posted
by Qunik.
Last updated
.
Post not yet marked as solved
1 Replies
1k Views
failed assertion `Completed handler provided after commit call'. how to clear this error any. when i run with cpu i am getting storage error so i tried with GPU. partial code #PositionalEncoding class PositionalEncoding(nn.Module): def init(self, d_model, max_len, dropout_prob=0.1): super(PositionalEncoding, self).init() self.dropout = nn.Dropout(p=dropout_prob) # Create positional encoding matrix pe = torch.zeros(max_len, d_model) position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1) div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model)) # Pad div_term with zeros if necessary div_term_padded = torch.zeros(d_model) div_term_padded[:div_term.size(0)] = div_term pe[:, 0::2] = torch.sin(position * div_term_padded[0::2]) pe[:, 1::2] = torch.cos(position * div_term_padded[1::2]) pe = pe.unsqueeze(0).transpose(0, 1) self.register_buffer('pe', pe) def forward(self, x): x = x + self.pe[:x.size(0), :] return self.dropout(x) #transformermodel class class TransformerModel(nn.Module): def init(self, input_size, hidden_size, num_layers, d_model, num_heads, dropout_prob, output_size, device, max_len): super(TransformerModel, self).init() self.device = device self.hidden_size = hidden_size self.d_model = d_model self.num_heads = num_heads #self.embedding = nn.Embedding(input_size, d_model).to(device) self.embedding = nn.Linear(input_size, d_model).to(device) self.pos_encoder = PositionalEncoding(d_model, max_len, dropout_prob).to(device) self.transformer_encoder_layer = nn.TransformerEncoderLayer(d_model, num_heads, hidden_size, dropout_prob).to(device) self.transformer_encoder = nn.TransformerEncoder(self.transformer_encoder_layer, num_layers).to(device) self.decoder = nn.Linear(d_model, output_size).to(device) self.to(device) # Ensure the model is on the correct device def forward(self, x): #x = x.long() x = x.transpose(0, 1) # Transpose the input tensor to match the expected shape for the transformer x = x.squeeze() # Remove the extra dimension from the input tensor x = self.embedding(x) # Apply the input embedding x = self.pos_encoder(x) # Add positional encoding x = self.transformer_encoder(x) # Apply the transformer encoder x = self.decoder(x[:, -1, :]) # Decode the last time step's output to get the final prediction return x #train transformer model class def train_transformer_model(train_X_scaled, train_y, input_size, d_model, hidden_size, num_layers, output_size, learning_rate, num_epochs, num_heads, dropout_prob, device, n_accumulation_steps=32): train_X_tensor = torch.from_numpy(train_X_scaled).float().to(device) train_y_tensor = torch.from_numpy(train_y).float().unsqueeze(1).to(device) # Create the dataset and DataLoader train_data = TensorDataset(train_X_tensor, train_y_tensor) train_loader = DataLoader(train_data, batch_size=8, shuffle=True) # Compute the maximum length of the input sequences max_len = train_X_tensor.size(1) # Create the model model = TransformerModel(input_size, hidden_size, num_layers, d_model, num_heads, dropout_prob, output_size, device, max_len).to(device) q = 0.5 criterion = lambda y_pred, y_true: quantile_loss(q, y_true, y_pred) optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate) for epoch in range(1, num_epochs + 1): model.train() print(f"Transformer inputs shape: {train_X_tensor.shape}, targets shape: {train_y_tensor.shape}") for epoch in range(1, num_epochs + 1): model.train() print(f"transformer Epoch {epoch}/{num_epochs}") for i, (batch_X, batch_y) in enumerate(train_loader): batch_X = batch_X.to(device) print("transformer batch_X shape:", batch_X.shape) batch_y = batch_y.to(device) print("transformer batch_Y shape:", batch_y.shape) optimizer.zero_grad() batch_X = batch_X.transpose(0, 1) train_pred = model(batch_X.squeeze(0)).to(device) print("train_pred=",train_pred) loss = criterion(train_pred, batch_y).to(device) loss.backward() # Gradient accumulation if (i + 1) % n_accumulation_steps == 0: optimizer.step() optimizer.zero_grad() print(f"transformer Epoch {epoch}/{num_epochs}, Step {i+1}/{len(train_loader)}, Loss: {loss.item():.6f}") return model
Posted Last updated
.
Post not yet marked as solved
4 Replies
952 Views
I just upgraded to Xcode 14.3. I have started seeing the following debug message show up in my console and I am not sure why. [GPUDebug] Null texture access executing kernel function "ColorCorrection" encoder: "Keyframing.SurfaceWarpFuser.InverseWarpKeyframe", dispatch: 2 It seems Metal related, but I am very confused by it. My project uses a very minimal amount of Metal. Only to get depth data from ARKit and to draw points, and I definitely do NOT have any kernel functions named ColorCorrection or an encoder named "Keyframing.SurfaceWarpFuser.InverseWarpKeyframe" I haven't changed any of my metal code, so I don't know if this is something bigger I should be concerned about. Sincerely, Stan
Posted Last updated
.
Post not yet marked as solved
3 Replies
1.2k Views
I've looked in multiple places online, including here in the forums where a somewhat similar question is asked (and never answered :( ) but i'm going to ask anyway: vImage, Metal Performance Shaders, and Core Image all have a big overlap in the kinds of operations they perform on image data. But none of supporting materials (documentation, WWDC session videos, help) ever seem to bother with paying much heed to even the existence of the others when talking about themselves. For example, Core Image talks about how efficient and fast it is. MPS talks about everything being "hand rolled" to be optimized for the hardware its running on. Which means yes, fast and efficient. and vImage talks about being fast and..yup, energy-saving. But I and other have very little to go on as to when vImage makes sense over MPS. Or Core Image. If I have a large set of images and I want to get the mean color value of each image and i want to equalize or adjust the histogram of each, or perform some other color operation on each in the set, for example, which is best? I hope someone from Apple -- preferably multiple people from the multiple teams that work on these multiple technologies -- can help clear some of this up?
Posted Last updated
.
Post not yet marked as solved
1 Replies
1.2k Views
Hello Apple Developer Community, I'm experiencing an issue when using PyTorch in combination with Metal Performance Shaders (MPS) on an A14 device. During the execution of the backward() function, I encounter the following error message: /AppleInternal/Library/BuildRoots/9941690d-bcf7-11ed-a645-863efbbaf80d/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSNDArray/Kernels/MPSNDArrayConvolutionA14.mm:4332: failed assertion `destination datatype must be fp32' I have already verified that both the input tensors and gradient tensors are of float32 datatype before the backward() function is called. However, the error seems to be originating from the MPS code, specifically within the MPSNDArrayConvolutionA14.mm file. Could you provide any guidance or recommendations on how to resolve this issue? Is there any specific constraint or requirement that I should be aware of when using MPS with PyTorch on A14 devices? I would greatly appreciate any help or suggestions. Thank you in advance for your support. Best regards, kiyotaka86
Posted Last updated
.
Post not yet marked as solved
2 Replies
564 Views
MacBook Pro M2 Max 96gb macOS 13.3 tensorflow-macos 2.9.0 tensorflow-metal 0.5.0 Here's the reproducible test case from transformers import AutoTokenizer, TFDistilBertForSequenceClassification from datasets import load_dataset imdb = load_dataset('imdb') sentences = imdb['train']['text'][:500] tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased") model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-cased') for i, sentence in tqdm(enumerate(sentences)): inputs = tokenizer(sentence, truncation=True, return_tensors='tf') output = model(inputs).logits pred = np.argmax(output.numpy(), axis=1) if i % 100 == 0: print(f"len(input_ids): {inputs['input_ids'].shape[-1]}") I monitored GPU utilization slowly decayed from 50% to 10%. It is excruciating slow towards the end. The print statement also confirmed this: Metal device set to: Apple M2 Max systemMemory: 96.00 GB maxCacheSize: 36.00 GB 3it [00:00, 10.87it/s] len(input_ids): 391 101it [00:13, 6.38it/s] len(input_ids): 215 201it [00:34, 4.78it/s] len(input_ids): 237 301it [00:55, 4.26it/s] len(input_ids): 256 401it [01:54, 1.12it/s] len(input_ids): 55 500it [03:40, 2.27it/s] I found no evidence yet this is a heat throttling issue, 'cos after the huge drop in GPU utilization, other processes will overtake using the GPU (like 2%). I wonder what's going on? Is there any profiling tips I can do to help investigate. I am aware I can "fix" this by doing batch inferences. But seeing this GPU utilization decay is unsettling, since this can potentially happen for a training session (which is far longer).
Posted
by kechan.
Last updated
.
Post marked as solved
2 Replies
802 Views
Hello, im trying to add RT shadows to my deferred rendering engine and I been stuck for days now. Using MPSRayIntersector and following apple's example on https://developer.apple.com/documentation/metalperformanceshaders/animating_and_denoising_a_raytraced_scene Here's my scene with 2 Models please help thank you !
Posted Last updated
.