ML Compute

RSS for tag

Accelerate training and validation of neural networks using the CPU and GPUs.

ML Compute Documentation

Posts under ML Compute tag

45 Posts
Sort by:
Post not yet marked as solved
1 Replies
476 Views
When attempting to load an mlmodel and run it on the CPU/GPU by passing the ComputeUnit you'd like to use when creating the model with: model = ct.models.MLModel('mymodel.mlmodel', ct.ComputeUnit.CPU_ONLY) Documentation for coremltools v7.0 says: compute_units: coremltools.ComputeUnit coremltools.ComputeUnit.ALL: Use all compute units available, including the neural engine. coremltools.ComputeUnit.CPU_ONLY: Limit the model to only use the CPU. coremltools.ComputeUnit.CPU_AND_GPU: Use both the CPU and GPU, but not the neural engine. coremltools.ComputeUnit.CPU_AND_NE: Use both the CPU and neural engine, but not the GPU. Available only for macOS >= 13.0. coremltools 7.0 (and previous versions I've tried) now seems to ignore that hint and only runs my models on the ANE. Same model when loaded into XCode and run a perf test with cpu only runs happily on the CPU and selected in Xcode performance tool. Is there a way in python to get our models to run on different compute units?
Posted
by pzagacki.
Last updated
.
Post not yet marked as solved
0 Replies
139 Views
I hope this message finds you well. I recently had the opportunity to watch the insightful session titled "Improve Core ML Integration with Async Prediction" and was thoroughly impressed by the depth of information and the practical demonstration provided. The session offered valuable insights that I believe would greatly benefit my ongoing projects and my understanding of Core ML integration. As I am keen on implementing the demonstrated workflows and techniques within my own work, I am reaching out to kindly request access to the source code and any related material presented during the session. Having access to the code would enable me to better understand the concepts discussed and apply them more effectively in real-world scenarios. I believe that being able to review and experiment with the actual code would significantly enhance my learning experience and the implementation efficiency of my projects. It would also serve as a valuable resource for referencing best practices in Core ML integration and async prediction techniques. Thank you very much for considering my request. I greatly appreciate the effort that went into creating such an informative session and am looking forward to potentially exploring the material in greater depth. Best regards, Fabio G.
Posted
by fguzman82.
Last updated
.
Post not yet marked as solved
1 Replies
160 Views
Hi can you add new feature in Pages and Numbers using Ai to apply style from PDF or template to documents, so ai arrange footers and headers and fonts , pages breaks , pages numbers, like one in PDF or templates , so we can auto format documents to desired look standard, also for Numbers. So we can on raw text upload pdf of another documents or report and get documents in that style for export to pdf or print Best regards,
Posted
by Isain.
Last updated
.
Post not yet marked as solved
0 Replies
132 Views
NLEmembedding.wordEmbedding is not available in your language. This is a very serious issue for any service that caters to Koreans, please fix it quickly. We have added the sample code below. import UIKit import CoreML import NaturalLanguage class MLTextViewController: UIViewController { override func viewDidLoad() { super.viewDidLoad() execute() } func execute() { if let embedding = NLEmbedding.wordEmbedding(for: .korean) { let word = "bicycle" if let vector = embedding.vector(for: word) { print(vector) } let specificDistance = embedding.distance(between: word, and: "motorcycle") print("✅ \(specificDistance.description)") embedding.enumerateNeighbors(for: word, maximumCount: 5) { neighbor, distance in print("\(neighbor): \(distance.description)") return true } } } }
Posted
by karrotman.
Last updated
.
Post not yet marked as solved
0 Replies
199 Views
I cannot find the bug ... but run this code (python) on torch device mps0 is slow quicker and cpu0 or cpu1 ... but where is the bug? or run it on neural engine with cpu1? you need a setup like this: #!/bin/bash export HOMEBREW_BREW_GIT_REMOTE="https://github.com/Homebrew/brew" # put your Git mirror of Homebrew/brew here export HOMEBREW_CORE_GIT_REMOTE="https://github.com/Homebrew/homebrew-core" # put your Git mirror of Homebrew/homebrew-core here /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)" eval "$(/opt/homebrew/bin/brew shellenv)" brew update --force --quiet chmod -R go-w "$(brew --prefix)/share/zsh" export OPENBLAS=$(/opt/homebrew/bin/brew --prefix openblas) export CFLAGS="-falign-functions=8 ${CFLAGS}" brew install wget brew install unzip conda init --all conda create -n torch-gpu python=3.10 conda activate torch-gpu conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 -c pytorch conda install -c conda-forge jupyter jupyterlab python3 -m pip install --upgrade pip python3 -m pip install insightface==0.2.1 onnx imageio scikit-learn scikit-image moviepy python3 -m pip install googledrivedownloader python3 -m pip install imageio==2.4.1 python3 -m pip install Cython python3 -m pip install --no-use-pep517 numpy python3 -m pip install torch python3 -m pip install image python3 -m pip install timm python3 -m pip install PlL python3 -m pip install h5py for i in `seq 1 6`; do python3 test.py done conda deactivate exit 0 test.py: import torch import math # this ensures that the current MacOS version is at least 12.3+ print(torch.backends.mps.is_available()) # this ensures that the current current PyTorch installation was built with MPS activated. print(torch.backends.mps.is_built()) dtype = torch.float device = torch.device("cpu",0) #device = torch.device("cpu",1) #device = torch.device("mps",0) # Create random input and output data x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype) y = torch.sin(x) # Randomly initialize weights a = torch.randn((), device=device, dtype=dtype) b = torch.randn((), device=device, dtype=dtype) c = torch.randn((), device=device, dtype=dtype) d = torch.randn((), device=device, dtype=dtype) learning_rate = 1e-6 for t in range(2000): # Forward pass: compute predicted y y_pred = a + b * x + c * x ** 2 + d * x ** 3 # Compute and print loss loss = (y_pred - y).pow(2).sum().item() if t % 100 == 99: print(t, loss) # Backprop to compute gradients of a, b, c, d with respect to loss grad_y_pred = 2.0 * (y_pred - y) grad_a = grad_y_pred.sum() grad_b = (grad_y_pred * x).sum() grad_c = (grad_y_pred * x ** 2).sum() grad_d = (grad_y_pred * x ** 3).sum() # Update weights using gradient descent a -= learning_rate * grad_a b -= learning_rate * grad_b c -= learning_rate * grad_c d -= learning_rate * grad_d print(f'Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3')
Posted
by Smiril .
Last updated
.
Post not yet marked as solved
0 Replies
291 Views
Hi, I am looking for a routine to perform complex-valued linear algebra on the GPU in python for scientific programming, in particular quantum physics simulations. At the moment I am looking for a routine for complex-valued matrix multiplication. I found MLX has a routine for float matrix multiplication, but it does not directly work for complex-valued matrices. I figured a work-around by splitting the complex valued matrix into real and imaginary part and working with the pair, but it makes it cumbersome to integrate with the remainder of the code. I was hoping for a library-based implementation similar to cupy. I also tried out using the tensorflow linear algebra routines, but I couldn't get them to run on the GPU by now. Specifically, a testfile with a tensorflow.keras.applications.ResNet50 routine runs on the GPU, but the routines from tensorflow.linalg and tensorflow.math that I tested (matmul, expm, eigh) were not running on the GPU. Any advice on how to make linear algebra calculations on mac GPUs work is highly appreciated! For my application the unified memory might be especially beneficial. Thank you!
Posted
by MG607.
Last updated
.
Post not yet marked as solved
0 Replies
497 Views
In theory, sending signals from iPhone apps to and from the brain with non-invasive technology could be achieved through a combination of brain-computer interface (BCI) technologies, machine learning algorithms, and mobile app development. Brain-Computer Interface (BCI): BCI technology can be used to record brain signals and translate them into commands that can be understood by a computer or a mobile device. Non-invasive BCIs, such as electroencephalography (EEG), can track brain activity using sensors placed on or near the head[6]. For instance, a portable, non-invasive, mind-reading AI developed by UTS uses an AI model called DeWave to translate EEG signals into words and sentences[3]. Machine Learning Algorithms: Machine learning algorithms can be used to analyze and interpret the brain signals recorded by the BCI. These algorithms can learn from large quantities of EEG data to translate brain signals into specific commands[3]. Mobile App Development: A mobile app can be developed to receive these commands and perform specific actions on the iPhone. The app could also potentially send signals back to the brain using technologies like transcranial magnetic stimulation (TMS), which can deliver information to the brain[5]. However, it's important to note that while this technology is theoretically possible, it's still in the early stages of development and faces significant technical and ethical challenges. Current non-invasive BCIs do not have the same level of fidelity as invasive devices, and the practical application of these systems is still limited[1][3]. Furthermore, ethical considerations around privacy, consent, and the potential for misuse of this technology must also be addressed[13]. Sources [1] You can now use your iPhone with your brain after a major breakthrough | Semafor https://www.semafor.com/article/11/01/2022/you-can-now-use-your-iphone-with-your-brain [2] ! Are You A Robot? https://www.sciencedirect.com/science/article/pii/S1110866515000237 [3] Portable, non-invasive, mind-reading AI turns thoughts into text https://techxplore.com/news/2023-12-portable-non-invasive-mind-reading-ai-thoughts.html [4] Elon Musk's Neuralink implants brain chip in first human https://www.reuters.com/technology/neuralink-implants-brain-chip-first-human-musk-says-2024-01-29/ [5] BrainNet: A Multi-Person Brain-to-Brain Interface for Direct Collaboration Between Brains - Scientific Reports https://www.nature.com/articles/s41598-019-41895-7 [6] Brain-computer interfaces and the future of user engagement https://www.fastcompany.com/90802262/brain-computer-interfaces-and-the-future-of-user-engagement [7] Mobile App + Wearable For Neurostimulation - Accion Labs https://www.accionlabs.com/mobile-app-wearable-for-neurostimulation [8] Signal Generation, Acquisition, and Processing in Brain Machine Interfaces: A Unified Review https://www.frontiersin.org/articles/10.3389/fnins.2021.728178/full [9] Mind-reading technology has arrived https://www.vox.com/future-perfect/2023/5/4/23708162/neurotechnology-mind-reading-brain-neuralink-brain-computer-interface [10] Synchron Brain Implant - Breakthrough Allows You to Control Your iPhone With Your Mind - Grit Daily News https://gritdaily.com/synchron-brain-implant-controls-tech-with-the-mind/ [11] Mind uploading - Wikipedia https://en.wikipedia.org/wiki/Mind_uploading [12] BirgerMind - Express your thoughts loudly https://birgermind.com [13] Elon Musk wants to merge humans with AI. How many brains will be damaged along the way? https://www.vox.com/future-perfect/23899981/elon-musk-ai-neuralink-brain-computer-interface [14] Models of communication and control for brain networks: distinctions, convergence, and future outlook https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7655113/ [15] Mind Control for the Masses—No Implant Needed https://www.wired.com/story/nextmind-noninvasive-brain-computer-interface/ [16] Elon Musk unveils Neuralink’s plans for brain-reading ‘threads’ and a robot to insert them https://www.theverge.com/2019/7/16/20697123/elon-musk-neuralink-brain-reading-thread-robot [17] Essa and Kotte https://arxiv.org/pdf/2201.04229.pdf [18] Synchron's Brain Implant Breakthrough Lets Users Control iPhones And iPads With Their Mind https://hothardware.com/news/brain-implant-breakthrough-lets-you-control-ipad-with-your-mind [19] An Apple Watch for Your Brain https://www.thedeload.com/p/an-apple-watch-for-your-brain [20] Toward an information theoretical description of communication in brain networks https://direct.mit.edu/netn/article/5/3/646/97541/Toward-an-information-theoretical-description-of [21] A soft, wearable brain–machine interface https://news.ycombinator.com/item?id=28447778 [22] Portable neurofeedback App https://www.psychosomatik.com/en/portable-neurofeedback-app/ [23] Intro to Brain Computer Interface http://learn.neurotechedu.com/introtobci/
Posted
by ztick.
Last updated
.
Post marked as solved
1 Replies
416 Views
Hello, My understanding of the paper below is that iOS ships with a MobileNetv3-based ML model backbone, which then uses different heads for specific tasks in iOS. I understand that this backbone is accessible for various uses through the Vision framework, but I was wondering if it is also accessible for on-device fine-tuning for other purposes. Just as an example, if I want to have a model to detect some unique object in a photo, can I use the built in backbone or do I have to include my own in the app. Thanks very much for any advice and apologies if I didn't understand something correctly. Source: https://machinelearning.apple.com/research/on-device-scene-analysis
Posted
by Sark.
Last updated
.
Post not yet marked as solved
0 Replies
418 Views
I am currently facing a performance issue while using CoreML on iOS 16+ devices to run a simple grid_sample model. When profiling the model using xcode Profiler, I noticed that before each NPU computation, there is a significant delay caused by the "input copy" and "neural engine-data copy" operations.I have specified that both the input and output of the model are of type float16, there shouldn't be any data type convert. I would appreciate any insights or suggestions regarding the reasons behind this delay and possible solutions My simple model is class GridSample(torch.nn.Module): def __init__( self, ): super().__init__() def forward(self, input: torch.Tensor, grid: torch.Tensor) -> torch.Tensor: output = F.grid_sample( input, grid.to(input), mode='nearest', padding_mode='zeros', align_corners=True, ) return output tr_input = torch.randn((8, 64, 512, 512) tr_grid = torch.randn((8, 256, 256, 2) simple_model = GridSample() simple_model.eval() traced_model = torch.jit.trace(simple_model, [tr_input, tr_grid]) coreml_input = [coremltools.TensorType(name="image_input", shape=tr_input.shape, dtype=np.float16), coremltools.TensorType(name="warp_grid", shape=tr_grid.shape, dtype=np.float16)] mlmodel = coremltools.converters.convert(traced_model, inputs=coreml_input, convert_to="mlprogram", minimum_deployment_target=coremltools.target.iOS16, compute_units=coremltools.ComputeUnit.ALL, compute_precision = coremltools.precision.FLOAT16, outputs=[ct.TensorType(name="x0", dtype=np.float16)], debug=False) mlmodel.save("./grid_sample.mlpackage") os.system(f"xcrun coremlcompiler compile './grid_sample.mlpackage' './')
Posted
by jwyyy.
Last updated
.
Post not yet marked as solved
0 Replies
333 Views
I have a neural network that should run on my device with 3 different input shapes. When converting it to mlmodel or mlpackage files with fixed input size it runs on ANE. But when converted it with EnumeratedShape it runs only on CPU. Why? I think that the problematic layer is the slice (which converted in the flexible model to SliceStatic), but don't understand why and if there is any way to solve it and run the Enumerated model on ANE. Here is my code class TestModel(torch.nn.Module): def __init__(self): super(TestModel, self).__init__() self.dw1 = torch.nn.Conv2d(in_channels=641, out_channels=641, kernel_size=(5,4), groups=641) self.pw1 = torch.nn.Conv2d(in_channels=641, out_channels=512, kernel_size=(1,1)) self.relu = torch.nn.ReLU() self.pw2 = torch.nn.Conv2d(in_channels=512, out_channels=641, kernel_size=(1,1)) self.dw2 = torch.nn.Conv2d(in_channels=641, out_channels=641, kernel_size=(5,1), groups=641) self.pw3 = torch.nn.Conv2d(in_channels=641, out_channels=512, kernel_size=(1,1)) self.block1_dw = torch.nn.Conv2d(in_channels=512, out_channels=512, kernel_size=(5,1), groups=512) self.block1_pw = torch.nn.Conv2d(in_channels=512, out_channels=512, kernel_size=(1,1)) def forward(self, inputs): x = self.dw1(inputs) x = self.pw1(x) x = self.relu(x) x = self.pw2(x) x = self.dw2(x) x = self.pw3(x) x = self.relu(x) y = self.block1_dw(x) y = self.block1_pw(y) y = self.relu(y) z = x[:,:,4:,:] + y return z ex_input = torch.rand(1, 641, 44, 4) traced_model = torch.jit.trace(TestModel().eval(), [ex_input,]) ct_enum_inputs = [ct.TensorType(name='inputs', shape=enum_shape)] ct_outputs = [ct.TensorType(name='out')] mlmodel_enum = ct.convert(traced_model, inputs=ct_enum_inputs, outputs=ct_outputs, convert_to="neuralnetwork") mlmodel.save(...) Thanks.
Posted
by yanivz.
Last updated
.
Post not yet marked as solved
0 Replies
314 Views
I created a new environment on Conda and then installed TensorFlow using the command "pip install TensorFlow" on my Mac M1 Pro machine. But TensorFlow is not working.
Posted Last updated
.
Post marked as solved
1 Replies
469 Views
Hello Apple Developer community, I hope this message finds you well. I am currently facing an issue with Create ML in Xcode, and I am seeking assistance from the knowledgeable members of this forum. Any help or guidance would be greatly appreciated. Problem Description: I am encountering an unexpected issue when attempting to create a classification model for images using Create ML in Xcode. Upon opening Create ML, the application closes unexpectedly when I choose to create a new image classification model. Steps I Have Taken: I have already tried the following steps to troubleshoot the issue: Updated Xcode and macOS to the latest versions. Restarted Xcode and my computer. Created a new sample project to isolate the issue. Despite these efforts, the problem persists. System Information: Xcode Version: 15.2 macOS Version: Sonoma 14.0 I am on a tight deadline for a project, and resolving this issue quickly is crucial. Your help is invaluable, and I thank you in advance for any support you can provide. Best regards.
Posted
by JuanLos.
Last updated
.
Post not yet marked as solved
2 Replies
563 Views
Hello, I followed the instructions provided here: https://developer.apple.com/metal/tensorflow-plugin/ and while trying to run the example I am getting following error: otFoundError: dlopen(/Users/nedimhadzic/venv-metal/lib/python3.11/site-packages/tensorflow-plugins/libmetal_plugin.dylib, 0x0006): Symbol not found: __ZN10tensorflow16TensorShapeProtoC1ERKS0_ Referenced from: <C62E0AB4-567E-3E14-8F96-9F07A746C4DC> /Users/nedimhadzic/venv-metal/lib/python3.11/site-packages/tensorflow-plugins/libmetal_plugin.dylib Expected in: <FFF31651-3926-3E79-A442-143B7156FB13> /Users/nedimhadzic/venv-metal/lib/python3.11/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so tensorflow: 2.15.0 tensorlow-metal: 1.0.0 macos: 14.2.1 Intel CPU and AMD Radeon Pro 5500M Any idea? Regards, Nedim
Posted
by nedo99.
Last updated
.
Post not yet marked as solved
0 Replies
387 Views
When the input dimension is 600w, the operator runs on ANE. But when the input shape is 100w or 200w, this operator can only run on the CPU. The data dimension has decreased, but it does not run on ANE. What is the reason for this and what are the ways to avoid it
Posted
by zhouzheng.
Last updated
.
Post not yet marked as solved
1 Replies
539 Views
Hello, I'm trying to train a MLImageClassifier dataset using Swift using the function MLImageClassifier.train. It doesn't change the dataset size (I have the same problem with a smaller one), but when the train reaches the 9 completedUnitCount of 10, even if the CPU usage is still high, seems to happen a soft lock that doesn't never brings the model to its completion (or error). The dataset is made of jpg images, using the CreateML app doesn't appear any problem during the training. There is any known issue with CreateML training APIs about part 9 of the process? There is any information about this part of the training job? Thank you
Posted Last updated
.
Post not yet marked as solved
1 Replies
645 Views
I'm trying to create an updatable model, but this seems possible only by creating from scratch a neural network model and then, using the NeuralNetworkBuilder, call the make_updatable method. But I met a lot of problems on this way for the solution. In this example I try to open a converted ML Model (neural network) using the NeuralNetworkBuilder: import coremltools model = coremltools.models.MLModel("SimpleImageClassifier.mlpackage") spec = model.get_spec() builder = coremltools.models.neural_network.NeuralNetworkBuilder(spec=spec) builder.inspect_layers() But I met this error in the builder instance line: AttributeError: 'NoneType' object has no attribute 'layers' I also tried to define a neural network using the NeuralNetworkBuilder but then what do I have to do with this object? I didn't find a way to save it or convert it. The result I want is simple, the possibility to train more the model on the user device to meet his exigences. However the way to obtain an updatable model seems incomprehensible. In my case, the model should be an image classification. What approach should I follow to achieve this result? Thank you
Posted Last updated
.
Post not yet marked as solved
2 Replies
594 Views
I Instrument's CPU Profiling tool I've noticed that a significant portion (22.5%) of the CPU-side overhead related to MPS matrix multiplication (GEMM) is in a call to getenv(). Please see attached screenshot. It seems unnecessary to perform this same check over and over, as whatever hack that needs this should be able to perform the getenv() only once and cache the result for future use.
Posted
by jacobgorm.
Last updated
.
Post not yet marked as solved
1 Replies
476 Views
I've been running tensorflow with python 3.9 to training a CNN model, and this process is accelerated by the GPU. After 80 epochs the process went to sleep (status S) and its GPU usage drops to 0 percent, I am wondering if this traing process crashed the GPU or the OS is mandatating the process to go to sleep because it takes up too much GPU time? Thanks a lot!
Posted
by chaoyi240.
Last updated
.