Posts

Post marked as solved
6 Replies
0 Views
Hi, tensorflow-macos 2.9.2 tensorflow-metal 0.5.0 macOS Montery 12.4 (patched and upto date) Machine : iMac Retina 5K, 27 Inch, 2020, 3.8GHz 8-Core Intel Core i7, 128Gb 2667 Mhz DDR4, Graphics AMD Radeon Pro 5500 XT 8GB Command to run (as per documentation) python3 train.py -c config/stm32f415_tinyaes.json When running on GPU the slow down occurs exactly the same epoch (19), as a test I disabled the GPU in a duplicate script and whilst taking considerably longer, passed epoch 19, as you can see on GPU enable epoch 19 the time has gone upto 122:06:17 Commend to run (for CPU only, slight modification to script included) python3 train_cpu.py -c config/stm32f415_tinyaes.json Script modification to disable GPU (I have left in the last line and first line of the original script so the placement can be identified, else its identical. from scaaml.utils import tf_cap_memory try: # Disable all GPUS tf.config.set_visible_devices([], 'GPU') visible_devices = tf.config.get_visible_devices() for device in visible_devices: assert device.device_type != 'GPU' except: # Invalid device or cannot modify virtual devices once initialized. pass def train_model(config): CPU ONLY 2048/2048 [==============================] - 5014s 2s/step - loss: 1.3966 - acc: 0.4811 - val_loss: 1.5574 - val_acc: 0.4297 Epoch 25/30 1502/2048 [=====================>........] - ETA: 22:02 - loss: 1.3701 - acc: 0.4919 GPU ENABLED 2022-07-05 14:43:20.822168: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled. WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 5 of 46). These functions will not be directly callable after loading. 2048/2048 [==============================] - 516s 252ms/step - loss: 1.9292 - acc: 0.3521 - val_loss: 1.9108 - val_acc: 0.3503 Epoch 18/30 2048/2048 [==============================] - ETA: 0s - loss: 1.8986 - acc: 0.35982022-07-05 14:52:39.447402: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled. 2022-07-05 14:52:39.450685: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled. 2048/2048 [==============================] - 546s 267ms/step - loss: 1.8986 - acc: 0.3598 - val_loss: 2.0514 - val_acc: 0.3303 Epoch 19/30 741/2048 [=========>....................] - ETA: 122:06:17 - loss: 1.8543 - acc: 0.3750/Users/alan/.pyenv/versions/3.9.5/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker I have run the code on an external system with GPUs based on linux and it runs without problem. This is blocking my research project (MSc) and whilst I can still use the CPU mode, the idea is to compare/baseline against various platforms and functionalities (whilst also using my own traces), so relevant to be able to use all the features available of the host system (GPUs in this case). Hope this helps and you can offer a solution. Regards, alz0r
Post marked as solved
6 Replies
0 Views
After running the same code on the same samples, it happend again. Epoch 19/30 504/2048 [======>.......................] - ETA: 21:47:06 - loss: 1.8561 - acc: 0.371 I dont think it is concidence.