Machine Learning

RSS for tag

Create intelligent features and enable new experiences for your apps by leveraging powerful on-device machine learning.

Posts under Machine Learning tag

101 Posts
Sort by:
Post not yet marked as solved
0 Replies
22 Views
I'm referring to this talk: https://developer.apple.com/videos/play/wwdc2021/10152 I was wondering if the code for the "Image composition" project he demonstrates at the end of the talk (around 24:00) is available somewhere? Would much appreciate any help.
Posted
by kapsystk.
Last updated
.
Post marked as solved
2 Replies
90 Views
Is it possible to create an updatable sound classifier model which uses Apple's built in MLSoundClassifier available via Create ML that can be trained/personalized on device using Core ML? I tried to look up in quite a few places for a long while, however, I know that when on-device training was initially announced in 2019, updatable models were only restricted to non built-in classifiers, but any additional information that may have come out after 2019 in this regard has been hard to find.
Posted
by mspattan.
Last updated
.
Post not yet marked as solved
1 Replies
71 Views
I want to know how the preview function is implemented. I have a mlmodel for object detection. I found that when I open the model in xcode, xcode provides a preview function. I put a photo into it and get the target prediction box. I would like to know how this visualization function is implemented. At present, I can only get the three data items of Label, Confidence, and BoundingBox in the playground, and the drawing of the prediction box still requires me to write code for processing. import Vision func performObjectDetection() { do { let model = try VNCoreMLModel(for: court().model) let request = VNCoreMLRequest(model: model) { (request, error) in if let error = error { print("Failed to perform request: \(error)") return } guard let results = request.results as? [VNRecognizedObjectObservation] else { print("No results found") return } for result in results { print("Label: \(result.labels.first?.identifier ?? "No label")") print("Confidence: \(result.labels.first?.confidence ?? 0.0)") print("BoundingBox: \(result.boundingBox)") } } guard let image = UIImage(named: "nbaPics.jpeg"), let ciImage = CIImage(image: image) else { print("Failed to load image") return } let handler = VNImageRequestHandler(ciImage: ciImage, orientation: .up, options: [:]) try handler.perform([request]) } catch { print("Failed to load model: \(error)") } } performObjectDetection() These are my codes and results
Posted Last updated
.
Post not yet marked as solved
1 Replies
110 Views
We have CoreML models in our app, each encrypted with a separate key generated in XCode. After app update we are receiving following error ` `[coreml] Could not create persistent key blob for EFD428E8-CDE7-4E0A-B379-FC169E50DE4D : error=Error Domain=com.apple.CoreML Code=8 "Fetching decryption key from server failed." UserInfo={NSLocalizedDescription=Fetching decryption key from server failed., NSUnderlyingError=0x281d80ab0 {Error Domain=CKErrorDomain Code=6 "CKInternalErrorDomain: 2022" UserInfo={NSDebugDescription=CKInternalErrorDomain: 2022, RequestUUID=D5CF13CF-6A10-436B-AB93-4C5C04859FFE, NSLocalizedDescription=Request failed with http status code 503, CKErrorDescription=Request failed with http status code 503, CKRetryAfter=35, NSUnderlyingError=0x281d80000 {Error Domain=CKInternalErrorDomain Code=2022 "Request failed with http status code 503" UserInfo={CKRetryAfter=35, CKHTTPStatus=503, CKErrorDescription=Request failed with http status code 503, RequestUUID=D5CF13CF-6A10-436B-AB93-4C5C04859FFE, NSLocalizedDescription=Request failed with http status code 503}}, CKHTTPStatus=503}}}` Tried deleting app, restarting device but nothing works. This was released on Appstore earlier and was working fine. It stopped working after update. Any help is appreciated.
Posted
by appmast.
Last updated
.
Post not yet marked as solved
11 Replies
1.8k Views
Hello, I'm new using CoreML and I'm trying to do a test app with the models that already exist. I'm having next error at the moment to classifier the image: [coreml] Failed to get the home directory when checking model path. I would like to receive your help to solve this error. Thanks.
Posted Last updated
.
Post marked as solved
1 Replies
140 Views
When I run the performance test on a CoreML model, it shows predictions are 834% faster running on the Neural Engine as it is on the GPU. It also shows, that 100% of the model can run on the Neural Engine: GPU only: But when I set the compute units to all: let config = MLModelConfiguration() config.computeUnits = .all and profile, it shows that the neural engine isn’t used at all. Well, other than loading the model which takes 25 seconds when allowed to use the neural engine versus less than a second when not allowing the neural engine: The difference in speed is the difference between the app being too slow to even release versus quite reasonable performance. I have a lot of work invested in this, so I am really hoping that I can get it to run on the Neural Engine. Why isn't it actually running on the Neural Engine when it shows that it is supported and I have the compute unit set to run on the Neural Engine?
Posted
by 3DTOPO.
Last updated
.
Post not yet marked as solved
0 Replies
145 Views
Hey guys, I converted a T5-base (encoder/decoder) model to a CoreML model using https://github.com/huggingface/exporters (which are using coremltools under the hood). When creating a performance report for the decoder model within XCode it shows that all compute units are mapped to the CPU. This is also the experience I have when profiling the model (GPU and ANE are not used). I was under the impression that CoreML would divide up the layers and run those that can run on the GPU / ANE, but maybe I misunderstood. Is there anything I can do to get this to not run on the CPU exclusively?
Posted
by seboslaw.
Last updated
.
Post not yet marked as solved
1 Replies
614 Views
Hi everyone, I'm a Machine Learning Engineer, and I'm planning to buy the MacBook Pro M2 Max with a 38-core GPU variant. I'm uncertain about whether to choose the 32GB RAM or 64GB RAM option. Based on my research and use case, it seems that 32GB should be sufficient for most tasks, including the 4K video rendering I occasionally do. However, I'm concerned about the longevity of the device, as I'd like to keep the MacBook up-to-date for at least five years. Additionally, considering the 38-core GPU, I wonder if 32GB of unified memory might be insufficient, particularly when I need to train Machine Learning models or run docker or even kubernetes cluster. I don't have any budget constraints, as the additional $400 cost isn't an issue, but I want to make a wise decision. I would appreciate any advice on this matter. Thanks in advance!
Posted
by Aditya-ai.
Last updated
.
Post not yet marked as solved
4 Replies
1.1k Views
Im on the recent version of MacOs and I recently trained a Style Transfer model using CreateML. I used the preview tab of CreateML to preview my model with a video (as well as an image), however when I press the button to export or share the result from the neural network none are exported. The modal window appears but doesnt save after the progress bar shows up for the conversion I tried converting the CoreML model file into a CoreML package, however when I tried exporting the preview it crashed and switched tabs to the package information section. I've been having this issue with all three export buttons on the model preview section of both the CreateML application and Xcode. Is this happening to anyone else? Ive also tried using the coremltools package for Python to extract a preview, however documentation for Style Transfer networks doesnt exist for loading videos with that package. The style transfer network only takes an input of images, so its unclear where a video file can be loaded.
Posted
by trzroy.
Last updated
.
Post not yet marked as solved
1 Replies
189 Views
failed assertion `Completed handler provided after commit call'. how to clear this error any. when i run with cpu i am getting storage error so i tried with GPU. partial code #PositionalEncoding class PositionalEncoding(nn.Module): def init(self, d_model, max_len, dropout_prob=0.1): super(PositionalEncoding, self).init() self.dropout = nn.Dropout(p=dropout_prob) # Create positional encoding matrix pe = torch.zeros(max_len, d_model) position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1) div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model)) # Pad div_term with zeros if necessary div_term_padded = torch.zeros(d_model) div_term_padded[:div_term.size(0)] = div_term pe[:, 0::2] = torch.sin(position * div_term_padded[0::2]) pe[:, 1::2] = torch.cos(position * div_term_padded[1::2]) pe = pe.unsqueeze(0).transpose(0, 1) self.register_buffer('pe', pe) def forward(self, x): x = x + self.pe[:x.size(0), :] return self.dropout(x) #transformermodel class class TransformerModel(nn.Module): def init(self, input_size, hidden_size, num_layers, d_model, num_heads, dropout_prob, output_size, device, max_len): super(TransformerModel, self).init() self.device = device self.hidden_size = hidden_size self.d_model = d_model self.num_heads = num_heads #self.embedding = nn.Embedding(input_size, d_model).to(device) self.embedding = nn.Linear(input_size, d_model).to(device) self.pos_encoder = PositionalEncoding(d_model, max_len, dropout_prob).to(device) self.transformer_encoder_layer = nn.TransformerEncoderLayer(d_model, num_heads, hidden_size, dropout_prob).to(device) self.transformer_encoder = nn.TransformerEncoder(self.transformer_encoder_layer, num_layers).to(device) self.decoder = nn.Linear(d_model, output_size).to(device) self.to(device) # Ensure the model is on the correct device def forward(self, x): #x = x.long() x = x.transpose(0, 1) # Transpose the input tensor to match the expected shape for the transformer x = x.squeeze() # Remove the extra dimension from the input tensor x = self.embedding(x) # Apply the input embedding x = self.pos_encoder(x) # Add positional encoding x = self.transformer_encoder(x) # Apply the transformer encoder x = self.decoder(x[:, -1, :]) # Decode the last time step's output to get the final prediction return x #train transformer model class def train_transformer_model(train_X_scaled, train_y, input_size, d_model, hidden_size, num_layers, output_size, learning_rate, num_epochs, num_heads, dropout_prob, device, n_accumulation_steps=32): train_X_tensor = torch.from_numpy(train_X_scaled).float().to(device) train_y_tensor = torch.from_numpy(train_y).float().unsqueeze(1).to(device) # Create the dataset and DataLoader train_data = TensorDataset(train_X_tensor, train_y_tensor) train_loader = DataLoader(train_data, batch_size=8, shuffle=True) # Compute the maximum length of the input sequences max_len = train_X_tensor.size(1) # Create the model model = TransformerModel(input_size, hidden_size, num_layers, d_model, num_heads, dropout_prob, output_size, device, max_len).to(device) q = 0.5 criterion = lambda y_pred, y_true: quantile_loss(q, y_true, y_pred) optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate) for epoch in range(1, num_epochs + 1): model.train() print(f"Transformer inputs shape: {train_X_tensor.shape}, targets shape: {train_y_tensor.shape}") for epoch in range(1, num_epochs + 1): model.train() print(f"transformer Epoch {epoch}/{num_epochs}") for i, (batch_X, batch_y) in enumerate(train_loader): batch_X = batch_X.to(device) print("transformer batch_X shape:", batch_X.shape) batch_y = batch_y.to(device) print("transformer batch_Y shape:", batch_y.shape) optimizer.zero_grad() batch_X = batch_X.transpose(0, 1) train_pred = model(batch_X.squeeze(0)).to(device) print("train_pred=",train_pred) loss = criterion(train_pred, batch_y).to(device) loss.backward() # Gradient accumulation if (i + 1) % n_accumulation_steps == 0: optimizer.step() optimizer.zero_grad() print(f"transformer Epoch {epoch}/{num_epochs}, Step {i+1}/{len(train_loader)}, Loss: {loss.item():.6f}") return model
Posted Last updated
.
Post not yet marked as solved
0 Replies
155 Views
This issue has already been raised a few times in the coremltools repo (here, here, and here). I'm reposting here because this may be an issue in CoreML itself. In short, converting Huggingface's Bert implementation from PyTorch to CoreML results in significantly different model outputs. This test was originally posted in one of the linked issues: import numpy as np import torch from transformers import AutoTokenizer, AutoModel import coremltools as ct MODEL_NAME = "bert-base-uncased" sentences = ["This is a test."] tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME) model = AutoModel.from_pretrained(MODEL_NAME, torchscript=True).eval() encoded_input = tokenizer(sentences, return_tensors='pt') traced_model = torch.jit.trace(model, tuple(encoded_input.values())) scripted_model = torch.jit.script(traced_model) model = ct.convert(scripted_model, source="pytorch", inputs=[ct.TensorType(name="input_ids", shape=(ct.RangeDim(), ct.RangeDim()), dtype=np.int32), ct.TensorType(name="token_type_ids", shape=(ct.RangeDim(), ct.RangeDim()), dtype=np.int32), ct.TensorType(name="attention_mask", shape=(ct.RangeDim(), ct.RangeDim()), dtype=np.int32)], convert_to="mlprogram", compute_units=ct.ComputeUnit.CPU_ONLY) with torch.no_grad(): pt_out = scripted_model(**encoded_input) cml_inputs = {k: v.to(torch.int32).numpy() for k, v in encoded_input.items()} pred_coreml = model.predict(cml_inputs) np.testing.assert_allclose(pt_out[0].detach().numpy(), pred_coreml["hidden_states"], atol=1e-5, rtol=1e-4) Running this shows that the model outputs are highly divergent: Max absolute difference: 7.901174 Max relative difference: 3424.6594 By contrast, running the same test with Huggingface's Distilbert implementation (distilbert-base-uncased) shows a much smaller difference in output: Max absolute difference: 0.00523943 Max relative difference: 45.603153 Again, I'm not totally sure that this is an issue in CoreML, but it would be great to be able to run Bert based models with CoreML!
Posted Last updated
.
Post not yet marked as solved
1 Replies
491 Views
I am trying to train an image classification network in Keras with tensorflow-metal. The training freezes after the first 2-3 epochs if image augmentation layers are used (RandomFlip, RandomContrast, RandomBrightness) The system appears to use both GPU as well as CPU (as indicated by Activity Monitor). Also, warnings appear both in Jupyter and Terminal (see below). When the image augmentation layers are removed (i.e. we only rebuild the head and feed images from disk), CPU appears to be idle, no warnings appear, and training completes successfully. Versions: python 3.8, tensorflow-macos 2.11.0, tensorflow-metal 0.7.1 Sample code: img_augmentation = Sequential( [ layers.RandomFlip(), layers.RandomBrightness(factor=0.2), layers.RandomContrast(factor=0.2) ], name="img_augmentation", ) inputs = layers.Input(shape=(384, 384, 3)) x = img_augmentation(inputs) model = tf.keras.applications.EfficientNetV2S(include_top=False, input_tensor=x, weights='imagenet') model.trainable = False x = tf.keras.layers.GlobalAveragePooling2D(name="avg_pool")(model.output) x = tf.keras.layers.BatchNormalization()(x) top_dropout_rate = 0.2 x = tf.keras.layers.Dropout(top_dropout_rate, name="top_dropout")(x) outputs = tf.keras.layers.Dense(179, activation="softmax", name="pred")(x) newModel = Model(inputs=model.input, outputs=outputs, name="EfficientNet_DF20M_species") reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_accuracy', factor=0.9, patience=2, verbose=1, min_lr=0.000001) optimizer = tf.keras.optimizers.legacy.SGD(learning_rate=0.01, momentum=0.9) newModel.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy']) history = newModel.fit(x=train_ds, validation_data=val_ds, epochs=30, verbose=2, callbacks=[reduce_lr]) During training with image augmentation, Jupyter prints the following warnings while training the first epoch: WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformFullIntV2 cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting StatelessRandomGetKeyCounter cause there is no registered converter for this op. ... During training with image augmentation, Terminal keeps spamming the following warning: 2023-02-21 23:13:38.958633: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation. 2023-02-21 23:13:38.958920: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation. 2023-02-21 23:13:38.959071: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation. 2023-02-21 23:13:38.959115: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation. 2023-02-21 23:13:38.959359: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation. ... Any suggestions?
Posted
by Cardu6lis.
Last updated
.
Post not yet marked as solved
0 Replies
199 Views
We are developing an app that requires background location tracking at intervals... we are, somewhat obviously in designing the app, finding the correct balance between benefits to the user of more accurate tracking, given our user case, against the costs of battery usage. We have I believe, as have a few other Apps we have seen which are already live, found a satisfactory compromise... IN STEPS APPLE'S MACHINE LEARNING!!! The app will work for a couple of weeks, and then machine learning will step in for a few days and throttle the frequency of background checks... now get this... There seems to be A STRONG CORRELATION between these periods and battery draining attributed to the app via the phone analytics... increasing A LOT.... ie 10x Apple's machine learning is I presume designed to protect the users from too much battery drainage... from background tasks... So what we have when this occurs is this machine learning function apparently draining the phone of much more battery than the original function it is seeking to improve... and in the process the performance of the original function is greatly reduced. Without Machine Learning interfering... battery draining appears to be sustainable and insignificant. Has anyone else encountered this. Apple do you have any comment?
Posted Last updated
.
Post not yet marked as solved
0 Replies
198 Views
Hi everyone! I’m trying to train an activity classification model with 3 classes. The problem is that only one class has precision and recall > 0 after training. Even with 2 classes result is the same First I’d thought that there is a problem with my data but when I switched “left” label to “right” and vice versa the results were the same: only “left”-labeled data get non-zero precision and recall.
Posted
by corle.
Last updated
.
Post not yet marked as solved
1 Replies
196 Views
In the ml-ane-transformers repo, there is a custom LayerNorm implementation for the Neural Engine-optimized shape of (B,C,1,S). The coremltools documentation makes it sound like the layer_norm MIL op would support this natively. In fact, the following code works on CPU: B,C,S = 1,768,512 g,b = 1, 0 @mb.program(input_specs=[mb.TensorSpec(shape=(B,C,1,S)),]) def ln_prog(x): gamma = (torch.ones((C,), dtype=torch.float32) * g).tolist() beta = (torch.ones((C), dtype=torch.float32) * b).tolist() return mb.layer_norm(x=x, axes=[1], gamma=gamma, beta=beta, name="y") However it fails when run on the Neural Engine, giving results that are scaled by an incorrect value. Should this work on the Neural Engine?
Posted
by smpanaro.
Last updated
.
Post not yet marked as solved
1 Replies
406 Views
Hello Apple Developer Community, I'm experiencing an issue when using PyTorch in combination with Metal Performance Shaders (MPS) on an A14 device. During the execution of the backward() function, I encounter the following error message: /AppleInternal/Library/BuildRoots/9941690d-bcf7-11ed-a645-863efbbaf80d/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSNDArray/Kernels/MPSNDArrayConvolutionA14.mm:4332: failed assertion `destination datatype must be fp32' I have already verified that both the input tensors and gradient tensors are of float32 datatype before the backward() function is called. However, the error seems to be originating from the MPS code, specifically within the MPSNDArrayConvolutionA14.mm file. Could you provide any guidance or recommendations on how to resolve this issue? Is there any specific constraint or requirement that I should be aware of when using MPS with PyTorch on A14 devices? I would greatly appreciate any help or suggestions. Thank you in advance for your support. Best regards, kiyotaka86
Posted Last updated
.
Post not yet marked as solved
0 Replies
223 Views
I'm trying to use the randomTensor function from MPS graph to initialize the weights of a fully connected layer. I can create the graph and run inference using the randomly initialized values, but when I try to train and update these randomly initialized weights, I'm hitting a crash: Assertion failed: (isa<To>(Val) && "cast<Ty>() argument of incompatible type!"), function cast, file Casting.h, line 578. I can train the graph if I instead initialize the weights myself on the CPU, but I thought using the randomTensor functions would be faster/allow initialization to occur on the GPU. Here's my code for building the graph including both methods of weight initialization: func buildGraph(variables: inout [MPSGraphTensor]) -> (MPSGraphTensor, MPSGraphTensor, MPSGraphTensor, MPSGraphTensor) { let inputPlaceholder = graph.placeholder(shape: [2], dataType: .float32, name: nil) let labelPlaceholder = graph.placeholder(shape: [1], name: nil) // This works for inference but not training let descriptor = MPSGraphRandomOpDescriptor(distribution: .uniform, dataType: .float32)! let weightTensor = graph.randomTensor(withShape: [2, 1], descriptor: descriptor, seed: 2, name: nil) // This works for inference and training // let weights = [Float](repeating: 1, count: 2) // let weightTensor = graph.variable(with: Data(bytes: weights, count: 2 * MemoryLayout<Float32>.size), shape: [2, 1], dataType: .float32, name: nil) variables += [weightTensor] let output = graph.matrixMultiplication(primary: inputPlaceholder, secondary: weightTensor, name: nil) let loss = graph.softMaxCrossEntropy(output, labels: labelPlaceholder, axis: -1, reuctionType: .sum, name: nil) return (inputPlaceholder, labelPlaceholder, output, loss) } And to run the graph I have the following in my sample view controller: override func viewDidLoad() { super.viewDidLoad() var variables: [MPSGraphTensor] = [] let (inputPlaceholder, labelPlaceholder, output, loss) = buildGraph(variables: &variables) let gradients = graph.gradients(of: loss, with: variables, name: nil) let learningRate = graph.constant(0.001, dataType: .float32) var updateOps: [MPSGraphOperation] = [] for (key, value) in gradients { let updates = graph.stochasticGradientDescent(learningRate: learningRate, values: key, gradient: value, name: nil) let assign = graph.assign(key, tensor: updates, name: nil) updateOps += [assign] } let commandBuffer = MPSCommandBuffer(commandBuffer: Self.commandQueue.makeCommandBuffer()!) let executionDesc = MPSGraphExecutionDescriptor() executionDesc.completionHandler = { (resultsDictionary, nil) in for (key, value) in resultsDictionary { var output: [Float] = [0] value.mpsndarray().readBytes(&output, strideBytes: nil) print(output) } } let inputDesc = MPSNDArrayDescriptor(dataType: .float32, shape: [2]) let input = MPSNDArray(device: Self.device, descriptor: inputDesc) var inputArray: [Float] = [1, 2] input.writeBytes(&inputArray, strideBytes: nil) let source = MPSGraphTensorData(input) let labelMPSArray = MPSNDArray(device: Self.device, descriptor: MPSNDArrayDescriptor(dataType: .float32, shape: [1])) var labelArray: [Float] = [1] labelMPSArray.writeBytes(&labelArray, strideBytes: nil) let label = MPSGraphTensorData(labelMPSArray) // This runs inference and works // graph.encode(to: commandBuffer, feeds: [inputPlaceholder: source], targetTensors: [output], targetOperations: [], executionDescriptor: executionDesc) // // commandBuffer.commit() // commandBuffer.waitUntilCompleted() // This trains but does not work graph.encode( to: commandBuffer, feeds: [inputPlaceholder: source, labelPlaceholder: label], targetTensors: [], targetOperations: updateOps, executionDescriptor: executionDesc) commandBuffer.commit() commandBuffer.waitUntilCompleted() } And a few other relevant variables are created at the class scope: let graph = MPSGraph() static let device = MTLCreateSystemDefaultDevice()! static let commandQueue = device.makeCommandQueue()! How can I use these randomTensor functions on MPSGraph to randomly initialize weights for training?
Posted Last updated
.
Post not yet marked as solved
0 Replies
179 Views
Hello, Is there an API available for "Visual Look Up"? https://support.apple.com/en-gb/guide/iphone/iph21c29a1cf/ios
Posted Last updated
.
Post not yet marked as solved
0 Replies
205 Views
Hi, I am training an adversarial auto encoder using PyTorch 2.0.0 on Apple M2 (Ventura 13.1), with conda 23.1.0 as manager. I encountered this error: /AppleInternal/Library/BuildRoots/5b8a32f9-5db2-11ed-8aeb-7ef33c48bc85/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSNDArray/Kernels/MPSNDArrayConvolutionA14.mm:3967: failed assertion `destination kernel width and filter kernel width mismatch' /Users/vk/miniconda3/envs/betavae/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown To my knowledge, the code broke down when running self.manual_backward(loss["g_loss"]) this block: g_opt.zero_grad() self.manual_backward(loss["g_loss"]) g_opt.step() The same code run without problems on linux distribution. Any thoughts on how to fix it are highly appreciated!
Posted
by RayXC.
Last updated
.
Post not yet marked as solved
0 Replies
260 Views
I'm interested in using CatBoost and XGBoost for some machine learning projects on my Mac, and I was wondering if it's possible to run these algorithms on my GPU(s) to speed up training times. I have a Mac with an AMD Radeon Pro 5600M and an Intel UHD Graphics 630 GPUs, and I'm running macOS Ventura 13.2.1. I've read that both CatBoost and XGBoost support GPU acceleration, but I'm not sure if this is possible on my system. Can anyone point me in the right direction for getting started with GPU-accelerated CatBoost/XGBoost on macOS? Are there any specific drivers or tools I need to install, or any other considerations I should be aware of? Thank you.
Posted Last updated
.