Machine Learning

How to monitor Neural Engine usage on M1 macs?

I'm now running Tensorflow models on my Macbook Air 2020 M1, but I can't find a way to monitor the Neural Engine 16 cores usage to fine tune my ML tasks. The Activity Monitor only reports CPU% and GPU% and I can't find any APIs available on Mach include files in the MacOSX 11.1 sdk or documentation available so I can slap something together from scratch in C. Could anyone point me in some direction as to get a hold of the API for Neural Engine usage. Any indicator I could grab would be a start. It looks like this has been omitted from all sdk documentation and general userland, I've only found a ledger_tag_neural_footprint attribute, which looks memory related, and that's it.

Posted

by

rgolive

Supported classes in the built-in Sound Classifier model

Where can I find a comprehensive list of all the classes that the built in Sound Classifier model supports?

Posted

by

jacobbmoncur

Memoji, AI

I wish there was a tool to create a Memoji from a photo using AI 📸➡️👨 It is a pity there are no tools for artists

Machine Learning

Posted

by

Lebizhor

Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.

Device: MacBook Pro 16 M1 Max, 64GB running MacOS 12.0.1. I tried setting up GPU Accelerated TensorFlow on my Mac using the following steps: Setup: XCode CLI / Homebrew/ Miniforge Conda Env: Python 3.9.5 conda install -c apple tensorflow-deps python -m pip install tensorflow-macos python -m pip install tensorflow-metal brew install libjpeg conda install -y matplotlib jupyterlab In Jupyter Lab, I try to execute this code: from tensorflow.keras import layers from tensorflow.keras import models model = models.Sequential() model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1))) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu')) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu')) model.add(layers.Flatten()) model.add(layers.Dense(64, activation='relu')) model.add(layers.Dense(10, activation='softmax')) model.summary() The code executes, but I get this warning, indicating no GPU Acceleration can be used as it defaults to a 0MB GPU. Error: Metal device set to: Apple M1 Max 2021-10-27 08:23:32.872480: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2021-10-27 08:23:32.872707: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>) Anyone has any idea how to fix this? I came across a bunch of posts around here related to the same issue but with no solid fix. I created a new question as I found the other questions less descriptive of the issue, and wanted to comprehensively depict it. Any fix would be of much help.

Posted

by

suprateembanerjee

Sklearn is unstable on Apple Silicon

Hi, I installed skearn successfully and ran the MINIST toy example successfully. then I started to run my project. The finning thing everything seems good at the start point (at least no ImportError occurs). but when I made some changes of my code and try to run all cells (I use jupyter lab) again, ImportError occurs..... ImportError: dlopen(/Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/qhull.cpython-39-darwin.so, 0x0002): Library not loaded: @rpath/liblapack.3.dylib Referenced from: /Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/qhull.cpython-39-darwin.so Reason: tried: '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/../../../../liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/../../../../liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/bin/../lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/bin/../lib/liblapack.3.dylib' (no such file), '/usr/local/lib/liblapack.3.dylib' (no such file), '/usr/lib/liblapack.3.dylib' (no such file) then I have to uninstall scipy, sklearn, etc and reinstall all of them. and my code can be ran again..... Magically I hate to say, anyone knows how to permanently solve this problem? make skearn more stable?

Posted

by

dkjdjdfdskln

M1 GPU is extremely slow, how can I enable CPU to train my NNs?

Hi everyone, I found that the performance of GPU is not good as I expected (as slow as a turtle), I wanna switch from GPU to CPU. but mlcompute module cannot be found, so wired. The same code ran on colab and my computer (jupyter lab) take 156s vs 40 minutes per epoch, respectively. I only used a small dataset (a few thousands of data points), and each epoch only have 20 baches. I am so disappointing and it seems like the "powerful" GPU is a joke. I am using 12.0.1 macOS and the version of tensorflow-macos is 2.6.0 Can anyone tell me why this happens?

Posted

by

dkjdjdfdskln

Why i enabled Metal API in `encode` function but my Coreml custom layer still run on CPU

I implement a custom pytorch layer on both CPU and GPU following [Hollemans amazing blog] (https://machinethink.net/blog/coreml-custom-layers ). The cpu version works good, but when i implemented this op on GPU it cannot activate "encode" function. Always run on CPU. I have checked the coremltools.convert() options with compute_units=coremltools.ComputeUnit.CPU_AND_GPU, but it still not work. This problem also mentioned in https://stackoverflow.com/questions/51019600/why-i-enabled-metal-api-but-my-coreml-custom-layer-still-run-on-cpu and https://developer.apple.com/forums/thread/695640. Any idea on help this would be grateful. System Information mac OS: 11.6.1 Big Sur xcode: 12.5.1 coremltools: 5.1.0 test device: iphone 11

Posted

by

stx-000

[Apple M1]: I got No registered 'AddN' OpKernel for 'GPU' devices compatible with node while training my model

Hi! GPU acceleration lacks of M1 GPU support (only with this specific model), getting this message when trying to run a trained model on GPU: NotFoundError: Graph execution error: No registered 'AddN' OpKernel for 'GPU' devices compatible with node {{node model_3/keras_layer_3/StatefulPartitionedCall/StatefulPartitionedCall/StatefulPartitionedCall/roberta_pack_inputs/StatefulPartitionedCall/RaggedConcat/ArithmeticOptimizer/AddOpsRewrite_Leaf_0_add_2}} (OpKernel was found, but attributes didn't match) Requested Attributes: N=2, T=DT_INT64, _XlaHasReferenceVars=false, _grappler_ArithmeticOptimizer_AddOpsRewriteStage=true, _device="/job:localhost/replica:0/task:0/device:GPU:0" . Registered: device='XLA_CPU_JIT'; T in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, 16534343205130372495, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64, DT_VARIANT] device='GPU'; T in [DT_FLOAT] device='DEFAULT'; T in [DT_INT32] device='CPU'; T in [DT_UINT64] device='CPU'; T in [DT_INT64] device='CPU'; T in [DT_UINT32] device='CPU'; T in [DT_UINT16] device='CPU'; T in [DT_INT16] device='CPU'; T in [DT_UINT8] device='CPU'; T in [DT_INT8] device='CPU'; T in [DT_INT32] device='CPU'; T in [DT_HALF] device='CPU'; T in [DT_BFLOAT16] device='CPU'; T in [DT_FLOAT] device='CPU'; T in [DT_DOUBLE] device='CPU'; T in [DT_COMPLEX64] device='CPU'; T in [DT_COMPLEX128] device='CPU'; T in [DT_VARIANT] [[model_3/keras_layer_3/StatefulPartitionedCall/StatefulPartitionedCall/StatefulPartitionedCall/roberta_pack_inputs/StatefulPartitionedCall/RaggedConcat/ArithmeticOptimizer/AddOpsRewrite_Leaf_0_add_2]] [Op:__inference_train_function_300451]

Posted

by

sm_96

Error loading a model of CoreML

Hello, I'm new using CoreML and I'm trying to do a test app with the models that already exist. I'm having next error at the moment to classifier the image: [coreml] Failed to get the home directory when checking model path. I would like to receive your help to solve this error. Thanks.

Posted

by

Alejoortga

Export of model preview broken? CreateML and Xcode

Im on the recent version of MacOs and I recently trained a Style Transfer model using CreateML. I used the preview tab of CreateML to preview my model with a video (as well as an image), however when I press the button to export or share the result from the neural network none are exported. The modal window appears but doesnt save after the progress bar shows up for the conversion I tried converting the CoreML model file into a CoreML package, however when I tried exporting the preview it crashed and switched tabs to the package information section. I've been having this issue with all three export buttons on the model preview section of both the CreateML application and Xcode. Is this happening to anyone else? Ive also tried using the coremltools package for Python to extract a preview, however documentation for Style Transfer networks doesnt exist for loading videos with that package. The style transfer network only takes an input of images, so its unclear where a video file can be loaded.

Posted

by

trzroy

Keras with tensorflow-metal freezes during training with image augmentation

I am trying to train an image classification network in Keras with tensorflow-metal. The training freezes after the first 2-3 epochs if image augmentation layers are used (RandomFlip, RandomContrast, RandomBrightness) The system appears to use both GPU as well as CPU (as indicated by Activity Monitor). Also, warnings appear both in Jupyter and Terminal (see below). When the image augmentation layers are removed (i.e. we only rebuild the head and feed images from disk), CPU appears to be idle, no warnings appear, and training completes successfully. Versions: python 3.8, tensorflow-macos 2.11.0, tensorflow-metal 0.7.1 Sample code: img_augmentation = Sequential( [ layers.RandomFlip(), layers.RandomBrightness(factor=0.2), layers.RandomContrast(factor=0.2) ], name="img_augmentation", ) inputs = layers.Input(shape=(384, 384, 3)) x = img_augmentation(inputs) model = tf.keras.applications.EfficientNetV2S(include_top=False, input_tensor=x, weights='imagenet') model.trainable = False x = tf.keras.layers.GlobalAveragePooling2D(name="avg_pool")(model.output) x = tf.keras.layers.BatchNormalization()(x) top_dropout_rate = 0.2 x = tf.keras.layers.Dropout(top_dropout_rate, name="top_dropout")(x) outputs = tf.keras.layers.Dense(179, activation="softmax", name="pred")(x) newModel = Model(inputs=model.input, outputs=outputs, name="EfficientNet_DF20M_species") reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_accuracy', factor=0.9, patience=2, verbose=1, min_lr=0.000001) optimizer = tf.keras.optimizers.legacy.SGD(learning_rate=0.01, momentum=0.9) newModel.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy']) history = newModel.fit(x=train_ds, validation_data=val_ds, epochs=30, verbose=2, callbacks=[reduce_lr]) During training with image augmentation, Jupyter prints the following warnings while training the first epoch: WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformFullIntV2 cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting StatelessRandomGetKeyCounter cause there is no registered converter for this op. ... During training with image augmentation, Terminal keeps spamming the following warning: 2023-02-21 23:13:38.958633: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation. 2023-02-21 23:13:38.958920: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation. 2023-02-21 23:13:38.959071: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation. 2023-02-21 23:13:38.959115: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation. 2023-02-21 23:13:38.959359: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation. ... Any suggestions?

Posted

by

Cardu6lis

Failed assertion `destination datatype must be fp32' when using PyTorch with Metal Performance Shaders on A14 devices

Hello Apple Developer Community, I'm experiencing an issue when using PyTorch in combination with Metal Performance Shaders (MPS) on an A14 device. During the execution of the backward() function, I encounter the following error message: /AppleInternal/Library/BuildRoots/9941690d-bcf7-11ed-a645-863efbbaf80d/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSNDArray/Kernels/MPSNDArrayConvolutionA14.mm:4332: failed assertion `destination datatype must be fp32' I have already verified that both the input tensors and gradient tensors are of float32 datatype before the backward() function is called. However, the error seems to be originating from the MPS code, specifically within the MPSNDArrayConvolutionA14.mm file. Could you provide any guidance or recommendations on how to resolve this issue? Is there any specific constraint or requirement that I should be aware of when using MPS with PyTorch on A14 devices? I would greatly appreciate any help or suggestions. Thank you in advance for your support. Best regards, kiyotaka86

Posted

by

kiyotaka86

Can I run CatBoost/XGBoost on my GPU(s) on my Mac?

I'm interested in using CatBoost and XGBoost for some machine learning projects on my Mac, and I was wondering if it's possible to run these algorithms on my GPU(s) to speed up training times. I have a Mac with an AMD Radeon Pro 5600M and an Intel UHD Graphics 630 GPUs, and I'm running macOS Ventura 13.2.1. I've read that both CatBoost and XGBoost support GPU acceleration, but I'm not sure if this is possible on my system. Can anyone point me in the right direction for getting started with GPU-accelerated CatBoost/XGBoost on macOS? Are there any specific drivers or tools I need to install, or any other considerations I should be aware of? Thank you.

Posted

by

Yesterdays

MPSNDArrayConvolutionA14.mm:3967: failed assertion `destination kernel width and filter kernel width mismatch'

Hi, I am training an adversarial auto encoder using PyTorch 2.0.0 on Apple M2 (Ventura 13.1), with conda 23.1.0 as manager. I encountered this error: /AppleInternal/Library/BuildRoots/5b8a32f9-5db2-11ed-8aeb-7ef33c48bc85/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSNDArray/Kernels/MPSNDArrayConvolutionA14.mm:3967: failed assertion `destination kernel width and filter kernel width mismatch' /Users/vk/miniconda3/envs/betavae/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown To my knowledge, the code broke down when running self.manual_backward(loss["g_loss"]) this block: g_opt.zero_grad() self.manual_backward(loss["g_loss"]) g_opt.step() The same code run without problems on linux distribution. Any thoughts on how to fix it are highly appreciated!

Posted

by

RayXC

Visual Look Up API

Hello, Is there an API available for "Visual Look Up"? https://support.apple.com/en-gb/guide/iphone/iph21c29a1cf/ios

Posted

by

lkjfkdfdsfs

MPSGraph randomTensor works for inference but crashes when training

I'm trying to use the randomTensor function from MPS graph to initialize the weights of a fully connected layer. I can create the graph and run inference using the randomly initialized values, but when I try to train and update these randomly initialized weights, I'm hitting a crash: Assertion failed: (isa<To>(Val) && "cast<Ty>() argument of incompatible type!"), function cast, file Casting.h, line 578. I can train the graph if I instead initialize the weights myself on the CPU, but I thought using the randomTensor functions would be faster/allow initialization to occur on the GPU. Here's my code for building the graph including both methods of weight initialization: func buildGraph(variables: inout [MPSGraphTensor]) -> (MPSGraphTensor, MPSGraphTensor, MPSGraphTensor, MPSGraphTensor) { let inputPlaceholder = graph.placeholder(shape: [2], dataType: .float32, name: nil) let labelPlaceholder = graph.placeholder(shape: [1], name: nil) // This works for inference but not training let descriptor = MPSGraphRandomOpDescriptor(distribution: .uniform, dataType: .float32)! let weightTensor = graph.randomTensor(withShape: [2, 1], descriptor: descriptor, seed: 2, name: nil) // This works for inference and training // let weights = [Float](repeating: 1, count: 2) // let weightTensor = graph.variable(with: Data(bytes: weights, count: 2 * MemoryLayout<Float32>.size), shape: [2, 1], dataType: .float32, name: nil) variables += [weightTensor] let output = graph.matrixMultiplication(primary: inputPlaceholder, secondary: weightTensor, name: nil) let loss = graph.softMaxCrossEntropy(output, labels: labelPlaceholder, axis: -1, reuctionType: .sum, name: nil) return (inputPlaceholder, labelPlaceholder, output, loss) } And to run the graph I have the following in my sample view controller: override func viewDidLoad() { super.viewDidLoad() var variables: [MPSGraphTensor] = [] let (inputPlaceholder, labelPlaceholder, output, loss) = buildGraph(variables: &variables) let gradients = graph.gradients(of: loss, with: variables, name: nil) let learningRate = graph.constant(0.001, dataType: .float32) var updateOps: [MPSGraphOperation] = [] for (key, value) in gradients { let updates = graph.stochasticGradientDescent(learningRate: learningRate, values: key, gradient: value, name: nil) let assign = graph.assign(key, tensor: updates, name: nil) updateOps += [assign] } let commandBuffer = MPSCommandBuffer(commandBuffer: Self.commandQueue.makeCommandBuffer()!) let executionDesc = MPSGraphExecutionDescriptor() executionDesc.completionHandler = { (resultsDictionary, nil) in for (key, value) in resultsDictionary { var output: [Float] = [0] value.mpsndarray().readBytes(&output, strideBytes: nil) print(output) } } let inputDesc = MPSNDArrayDescriptor(dataType: .float32, shape: [2]) let input = MPSNDArray(device: Self.device, descriptor: inputDesc) var inputArray: [Float] = [1, 2] input.writeBytes(&inputArray, strideBytes: nil) let source = MPSGraphTensorData(input) let labelMPSArray = MPSNDArray(device: Self.device, descriptor: MPSNDArrayDescriptor(dataType: .float32, shape: [1])) var labelArray: [Float] = [1] labelMPSArray.writeBytes(&labelArray, strideBytes: nil) let label = MPSGraphTensorData(labelMPSArray) // This runs inference and works // graph.encode(to: commandBuffer, feeds: [inputPlaceholder: source], targetTensors: [output], targetOperations: [], executionDescriptor: executionDesc) // // commandBuffer.commit() // commandBuffer.waitUntilCompleted() // This trains but does not work graph.encode( to: commandBuffer, feeds: [inputPlaceholder: source, labelPlaceholder: label], targetTensors: [], targetOperations: updateOps, executionDescriptor: executionDesc) commandBuffer.commit() commandBuffer.waitUntilCompleted() } And a few other relevant variables are created at the class scope: let graph = MPSGraph() static let device = MTLCreateSystemDefaultDevice()! static let commandQueue = device.makeCommandQueue()! How can I use these randomTensor functions on MPSGraph to randomly initialize weights for training?

Posted

by

noahmartin

ANE-Optimized Layer Norm Fails on ANE

In the ml-ane-transformers repo, there is a custom LayerNorm implementation for the Neural Engine-optimized shape of (B,C,1,S). The coremltools documentation makes it sound like the layer_norm MIL op would support this natively. In fact, the following code works on CPU: B,C,S = 1,768,512 g,b = 1, 0 @mb.program(input_specs=[mb.TensorSpec(shape=(B,C,1,S)),]) def ln_prog(x): gamma = (torch.ones((C,), dtype=torch.float32) * g).tolist() beta = (torch.ones((C), dtype=torch.float32) * b).tolist() return mb.layer_norm(x=x, axes=[1], gamma=gamma, beta=beta, name="y") However it fails when run on the Neural Engine, giving results that are scaled by an incorrect value. Should this work on the Neural Engine?

Posted

by

smpanaro

Create ML activity training problem

Hi everyone! I’m trying to train an activity classification model with 3 classes. The problem is that only one class has precision and recall > 0 after training. Even with 2 classes result is the same First I’d thought that there is a problem with my data but when I switched “left” label to “right” and vice versa the results were the same: only “left”-labeled data get non-zero precision and recall.

Posted

by

corle

Apple - you machine learning appears to do the opposite of what it is intended to...

We are developing an app that requires background location tracking at intervals... we are, somewhat obviously in designing the app, finding the correct balance between benefits to the user of more accurate tracking, given our user case, against the costs of battery usage. We have I believe, as have a few other Apps we have seen which are already live, found a satisfactory compromise... IN STEPS APPLE'S MACHINE LEARNING!!! The app will work for a couple of weeks, and then machine learning will step in for a few days and throttle the frequency of background checks... now get this... There seems to be A STRONG CORRELATION between these periods and battery draining attributed to the app via the phone analytics... increasing A LOT.... ie 10x Apple's machine learning is I presume designed to protect the users from too much battery drainage... from background tasks... So what we have when this occurs is this machine learning function apparently draining the phone of much more battery than the original function it is seeking to improve... and in the process the performance of the original function is greatly reduced. Without Machine Learning interfering... battery draining appears to be sustainable and insignificant. Has anyone else encountered this. Apple do you have any comment?

Posted

by

AngusHulme

-[_MTLCommandBuffer addCompletedHandler:]:867:

failed assertion `Completed handler provided after commit call'. how to clear this error any. when i run with cpu i am getting storage error so i tried with GPU. partial code #PositionalEncoding class PositionalEncoding(nn.Module): def init(self, d_model, max_len, dropout_prob=0.1): super(PositionalEncoding, self).init() self.dropout = nn.Dropout(p=dropout_prob) # Create positional encoding matrix pe = torch.zeros(max_len, d_model) position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1) div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model)) # Pad div_term with zeros if necessary div_term_padded = torch.zeros(d_model) div_term_padded[:div_term.size(0)] = div_term pe[:, 0::2] = torch.sin(position * div_term_padded[0::2]) pe[:, 1::2] = torch.cos(position * div_term_padded[1::2]) pe = pe.unsqueeze(0).transpose(0, 1) self.register_buffer('pe', pe) def forward(self, x): x = x + self.pe[:x.size(0), :] return self.dropout(x) #transformermodel class class TransformerModel(nn.Module): def init(self, input_size, hidden_size, num_layers, d_model, num_heads, dropout_prob, output_size, device, max_len): super(TransformerModel, self).init() self.device = device self.hidden_size = hidden_size self.d_model = d_model self.num_heads = num_heads #self.embedding = nn.Embedding(input_size, d_model).to(device) self.embedding = nn.Linear(input_size, d_model).to(device) self.pos_encoder = PositionalEncoding(d_model, max_len, dropout_prob).to(device) self.transformer_encoder_layer = nn.TransformerEncoderLayer(d_model, num_heads, hidden_size, dropout_prob).to(device) self.transformer_encoder = nn.TransformerEncoder(self.transformer_encoder_layer, num_layers).to(device) self.decoder = nn.Linear(d_model, output_size).to(device) self.to(device) # Ensure the model is on the correct device def forward(self, x): #x = x.long() x = x.transpose(0, 1) # Transpose the input tensor to match the expected shape for the transformer x = x.squeeze() # Remove the extra dimension from the input tensor x = self.embedding(x) # Apply the input embedding x = self.pos_encoder(x) # Add positional encoding x = self.transformer_encoder(x) # Apply the transformer encoder x = self.decoder(x[:, -1, :]) # Decode the last time step's output to get the final prediction return x #train transformer model class def train_transformer_model(train_X_scaled, train_y, input_size, d_model, hidden_size, num_layers, output_size, learning_rate, num_epochs, num_heads, dropout_prob, device, n_accumulation_steps=32): train_X_tensor = torch.from_numpy(train_X_scaled).float().to(device) train_y_tensor = torch.from_numpy(train_y).float().unsqueeze(1).to(device) # Create the dataset and DataLoader train_data = TensorDataset(train_X_tensor, train_y_tensor) train_loader = DataLoader(train_data, batch_size=8, shuffle=True) # Compute the maximum length of the input sequences max_len = train_X_tensor.size(1) # Create the model model = TransformerModel(input_size, hidden_size, num_layers, d_model, num_heads, dropout_prob, output_size, device, max_len).to(device) q = 0.5 criterion = lambda y_pred, y_true: quantile_loss(q, y_true, y_pred) optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate) for epoch in range(1, num_epochs + 1): model.train() print(f"Transformer inputs shape: {train_X_tensor.shape}, targets shape: {train_y_tensor.shape}") for epoch in range(1, num_epochs + 1): model.train() print(f"transformer Epoch {epoch}/{num_epochs}") for i, (batch_X, batch_y) in enumerate(train_loader): batch_X = batch_X.to(device) print("transformer batch_X shape:", batch_X.shape) batch_y = batch_y.to(device) print("transformer batch_Y shape:", batch_y.shape) optimizer.zero_grad() batch_X = batch_X.transpose(0, 1) train_pred = model(batch_X.squeeze(0)).to(device) print("train_pred=",train_pred) loss = criterion(train_pred, batch_y).to(device) loss.backward() # Gradient accumulation if (i + 1) % n_accumulation_steps == 0: optimizer.step() optimizer.zero_grad() print(f"transformer Epoch {epoch}/{num_epochs}, Step {i+1}/{len(train_loader)}, Loss: {loss.item():.6f}") return model

Posted

by

sugumar0107

Posts under Machine Learning tag