tensorflow-metal

RSS for tag

TensorFlow accelerates machine learning model training with Metal on Mac GPUs.

tensorflow-metal Documentation

Posts under tensorflow-metal tag

202 Posts
Sort by:
Post not yet marked as solved
1 Replies
49 Views
Intel GPU here. I just installed Jax-metal, and it runs fine. However, when I try the following code, it returns the RuntimeError you see below: jax.device_put(jnp.ones(1), device=jax.devices('gpu')[0]) RuntimeError: Unknown backend: 'gpu' requested, but no platforms that are instances of gpu are present. Platforms present are: interpreter,cpu
Posted
by dkga.
Last updated
.
Post not yet marked as solved
0 Replies
66 Views
Testing out https://developer.apple.com/metal/jax/ mainly for trying to run whisper-jax (https://github.com/sanchit-gandhi/whisper-jax/tree/main) on my M2. The jax-metal plugin seems to install without issues, and the basic test code runs fine. However, the jax-whisper code fails when trying to encode a file with the following error: error: failed to legalize operation 'mhlo.convolution' /Users/pere/jax-metal/lib/python3.10/site-packages/whisper_jax/layers.py:1236:0: note: see current operation: %111 = "mhlo.convolution"(%110, <<UNKNOWN SSA VALUE>>) {batch_group_count = 1 : i64, dimension_numbers = #mhlo.conv<[b, 0, f]x[0, i, o]->[b, 0, f]>, feature_group_count = 1 : i64, lhs_dilation = dense<1> : tensor<1xi64>, padding = dense<1> : tensor<1x2xi64>, precision_config = [#mhlo<precision DEFAULT>, #mhlo<precision DEFAULT>], rhs_dilation = dense<1> : tensor<1xi64>, window_reversal = dense<false> : tensor<1xi1>, window_strides = dense<1> : tensor<1xi64>} : (tensor<1x3000x80xf32>, tensor<3x80x384xf32>) -> tensor<1x3000x384xf32>
Posted
by PerEK.
Last updated
.
Post not yet marked as solved
0 Replies
37 Views
I initially raised this issue in the tensorflow forum, and they directed me back here since this is a tf-macos specific problem [see https://github.com/tensorflow/tensorflow/issues/60673]. When calling Model.compile() with the AdamW optimizer, a warning is thrown saying that v2.11+ optimizers have a known slowdown on M1/M2 devices, and so the backend attempts to fallback to a legacy version. However, no legacy version of the AdamW optimizer exists. In a previous tf-macos version 2.12, this lead to an error during Model.compile() [see issue https://github.com//issues/60652 and https://developer.apple.com/forums/thread/729732]. In the current nightly, this error is not thrown - however, after calling model.compile(), the attribute model.optimizer is set to string 'adamw' instead of an optimizer object. Later, when we call model.fit(), this leads to an AttributeError, because model.optimizer.minimize() does not exist when model.optimizer is a string. Expected behaviour: correctly compile the model with either a v2.11+ optimiser without slowdown, or a legacy-compatible implementation of the AdamW optimizer. Then the model will train correctly with a valid AdamW optimizer when calling model.fit(). Note: a warning message suggests using the optimizer located at tf.keras.optimizers.legacy.AdamW, but this does not exist It would be nice to be able to either use modern optimizers, or have a legacy-compatible version of AdamW, since weight-decay is an important tool in modern ML research, and currently cannot be used on mac. Standalone code to reproduce the issue ##===========## ## Imports ## ##===========## import sys import tensorflow as tf import numpy as np from tensorflow.keras.models import Model from tensorflow.keras.layers import Input, Dense from tensorflow.keras.optimizers import AdamW ##===================## ## Report versions ## ##===================## # # Expected outputs: # Python version is: 3.10.11 | packaged by conda-forge | (main, May 10 2023, 19:01:19) [Clang 14.0.6 ] # TF version is: 2.14.0-dev20230523 # Numpy version is: 1.23.2 # print(f"Python version is: {sys.version}") print(f"TF version is: {tf.__version__}") print(f"Numpy version is: {np.__version__}") ##==============================## ## Create a very simple model ## ##==============================## # # Expected outputs: # Model: "model_1" # _________________________________________________________________ # Layer (type) Output Shape Param # # ================================================================= # Layer_in (InputLayer) [(None, 2)] 0 # # Layer_hidden (Dense) (None, 10) 30 # # Layer_out (Dense) (None, 2) 22 # # ================================================================= # Total params: 52 (208.00 Byte) # Trainable params: 52 (208.00 Byte) # Non-trainable params: 0 (0.00 Byte) # _________________________________________________________________ # x_in = Input(2 , dtype=tf.float32, name="Layer_in" ) x = x_in x = Dense(10, dtype=tf.float32, name="Layer_hidden", activation="relu" )(x) x = Dense(2 , dtype=tf.float32, name="Layer_out" , activation="linear")(x) model = Model(x_in, x) model.summary() ##===================================================## ## Compile model with MSE loss and AdamW optimizer ## ##===================================================## # # Expected outputs: # WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.AdamW` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.AdamW`. # WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.AdamW`. # model.compile( loss = "mse", optimizer = AdamW(learning_rate=1e-3, weight_decay=1e-2) ) ##===========================## ## Generate some fake data ## ##===========================## # # Expected outputs: # X shape is (100, 2), Y shape is (100, 2) # dataset_size = 100 X = np.random.normal(size=(dataset_size, 2)) X = tf.constant(X, dtype=tf.float32) Y = np.random.normal(size=(dataset_size, 2)) Y = tf.constant(Y, dtype=tf.float32) print(f"X shape is {X.shape}, Y shape is {Y.shape}") ##===================================## ## Fit model to data for one epoch ## ##===================================## # # Expected outputs: # --------------------------------------------------------------------------- # AttributeError Traceback (most recent call last) # Cell In[9], line 51 # 1 ##===================================## # 2 ## Fit model to data for one epoch ## # 3 ##===================================## # (...) # 48 # • mask=None # 49 # # ---> 51 model.fit(X, Y, epochs=1) # File ~/miniforge3/envs/tf_macos_nightly_230523/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py:70, in filter_traceback.<locals>.error_handler(*args, **kwargs) # 67 filtered_tb = _process_traceback_frames(e.__traceback__) # 68 # To get the full stack trace, call: # 69 # `tf.debugging.disable_traceback_filtering()` # ---> 70 raise e.with_traceback(filtered_tb) from None # 71 finally: # 72 del filtered_tb # File /var/folders/6_/gprzxt797d5098h8dtk22nch0000gn/T/__autograph_generated_filezzqv9k36.py:15, in outer_factory.<locals>.inner_factory.<locals>.tf__train_function(iterator) # 13 try: # 14 do_return = True # ---> 15 retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope) # 16 except: # 17 do_return = False # AttributeError: in user code: # File "/Users/Ste/miniforge3/envs/tf_macos_nightly_230523/lib/python3.10/site-packages/keras/src/engine/training.py", line 1338, in train_function * # return step_function(self, iterator) # File "/Users/Ste/miniforge3/envs/tf_macos_nightly_230523/lib/python3.10/site-packages/keras/src/engine/training.py", line 1322, in step_function ** # outputs = model.distribute_strategy.run(run_step, args=(data,)) # File "/Users/Ste/miniforge3/envs/tf_macos_nightly_230523/lib/python3.10/site-packages/keras/src/engine/training.py", line 1303, in run_step ** # outputs = model.train_step(data) # File "/Users/Ste/miniforge3/envs/tf_macos_nightly_230523/lib/python3.10/site-packages/keras/src/engine/training.py", line 1084, in train_step # self.optimizer.minimize(loss, self.trainable_variables, tape=tape) # AttributeError: 'str' object has no attribute 'minimize' model.fit(X, Y, epochs=1)
Posted
by smenary.
Last updated
.
Post not yet marked as solved
1 Replies
125 Views
I have the same problem in my LSTM model on APPLE M2. I follow the progress of https://developer.apple.com/metal/tensorflow-plugin/. to set up my environment, and the model run extremely slow. How to fix it? Also, i got this respond while i running the model: Failed to get CPU frequency: 0 Hz
Posted Last updated
.
Post not yet marked as solved
3 Replies
1k Views
Hello Everyone! I recently tried installing TensorFlow following this guide: https://developer.apple.com/metal/tensorflow-plugin/ on my M1 Pro MacBook running 12.4 Monterey. However, I'm faced with the following error message when importing: File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/tensorflow/python/framework/dtypes.py:29, in <module> 26 from tensorflow.python.lib.core import _pywrap_bfloat16 27 from tensorflow.python.util.tf_export import tf_export ---> 29 _np_bfloat16 = _pywrap_bfloat16.TF_bfloat16_type() 32 @tf_export("dtypes.DType", "DType") 33 class DType(_dtypes.DType): 34 """Represents the type of the elements in a `Tensor`. 35 36 `DType`'s are used to specify the output data type for operations which (...) 46 See `tf.dtypes` for a complete list of `DType`'s defined. 47 """ TypeError: Unable to convert function return value to a Python type! The signature was () -> handle I've checked that my tensoflow-dep has a version of 2.9.0, tensorflow-macos 2.9.2, and tensor flow-metal 0.5.0, with numpy having its latest version of 1.22.4, all in my env. Anyone knows what's up?
Posted
by bckhm.
Last updated
.
Post not yet marked as solved
0 Replies
92 Views
I am performing a grid search over a parameter grid and train the model with different combinations of hyperparameters. I am receiving the following Warning: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz Why is that and what can I do to fix it? Thank you very much. Here is the code: def grid_search(model_name): ... elif model_name == 'LSTM': def build_model(units, activation, dropout, layers): model = Sequential() model.add(LSTM(units=units, kernel_initializer="normal", activation=activation, return_sequences=True, input_shape=(2, 1152), recurrent_dropout=0)) model.add(Dropout(dropout)) for i in range(layers): if i != layers-1: model.add(LSTM(units=units, kernel_initializer="normal", activation=activation, return_sequences=True,recurrent_dropout=0)) model.add(Dropout(dropout)) elif i == (layers-1): model.add(LSTM(units=units, kernel_initializer="normal", activation=activation, recurrent_dropout=0)) model.add(Dropout(dropout)) model.add(Dense(units=6, kernel_initializer="normal", activation=activation)) model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]) return model param_grid = {'units': [200, 300, 400], 'activation': ['tanh'], 'dropout': [0, 0.2, 0.4, 0.6], 'layers': [0, 5]} group_kfold = GroupKFold(n_splits=len(np.unique(groups_train))) model = KerasClassifier(model=build_model, units=param_grid['units'], activation=param_grid['activation'], dropout=param_grid['dropout'], layers=param_grid['layers']) grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=group_kfold) X_test, X_train, y_test, y_train = raw_dataassigner(model_name) (X_train, y_train) = shuffle(X_train, y_train) with tf.device('/cpu:0'): grid_result = grid_search.fit(X_train, y_train, groups=groups_train) print(f'Best score ({grid_search.best_score_}) for {model_name} model achieved with parameters: ', grid_search.best_params_) means = grid_result.cv_results_['mean_test_score'] stds = grid_result.cv_results_['std_test_score'] params = grid_result.cv_results_['params'] for mean, stdev, param in zip(means, stds, params): print("%f (%f) with: %r" % (mean, stdev, param)) grid_search('LSTM')
Posted
by hefl99.
Last updated
.
Post not yet marked as solved
1 Replies
208 Views
Hi, Apologies if this is not the right place for bug reports in tf-macos, I am happy to move this issue elsewhere if so! The problem occurs in tensorflow-macos v2.12.0 when attempting to call model.compile() with the AdamW optimiser. A warning is thrown, telling us that there is a known slowdown when using v2.11+ optimizers, and the backend attempts to fall back to a legacy version. However, AdamW does not exist in legacy versions, which eventually leads to an "unknown optimizer" error. See below for a MWE. I am running on MacBook M1 Pro. Cheers, Ste ## ## Imports ## import sys import tensorflow as tf from tensorflow.keras.models import Model from tensorflow.keras.layers import Input, Dense from tensorflow.keras.optimizers import AdamW ## ## Report versions ## print(f"Python version is: {sys.version}") ## --> Python version is: 3.10.11 | packaged by conda-forge | (main, May 10 2023, 19:01:19) [Clang 14.0.6 ] print(f"TF version is: {tf.__version__}") ## --> TF version is: 2.12.0 print(f"Keras version is: {tf.keras.__version__}") ## --> Keras version is: 2.12.0 ## ## Create a very simple model ## x_in = Input(1) x = Dense(10)(x_in) model = Model(x_in, x) ## ## Compile model with AdamW optimizer ## model.compile(optimizer=AdamW(learning_rate=1e-3, weight_decay=1e-2)) which outputs: It is worth noting that tf.keras.optimizers.legacy.AdamW does not exist, and so we cannot simply use this.
Posted
by smenary.
Last updated
.
Post not yet marked as solved
7 Replies
1.9k Views
I am comparing my M1 MBA with my 2019 16" Intel MBP. The M1 MBA has tensorflow-metal, while the Intel MBP has TF directly from Google. Generally, the same programs runs 2-5 times FASTER on the Intel MBP, which presumably has no GPU acceleration. Is there anything I could have done wrong on the M1? Here is the start of the metal run: Metal device set to: Apple M1 systemMemory: 16.00 GB maxCacheSize: 5.33 GB 2022-01-19 04:43:50.975025: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2022-01-19 04:43:50.975291: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: ) 2022-01-19 04:43:51.216306: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz Epoch 1/10 2022-01-19 04:43:51.298428: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
Posted Last updated
.
Post not yet marked as solved
6 Replies
1.8k Views
Hi, I have started experimenting with using my MBP with M1 Pro (10CPU cores / 16 GPU cores) for Tensorflow. Two things were odd/noteworthy: I've compared training models in a tensorflow environment with tensorflow-metal, running the code with either with tf.device('gpu:0'): or with tf.device('cpu:0'): as well as in an environment without the tensorflow-metal plugin. Specifiying the device as CPU in tf-metal almost always leads to a lot longer training times compared to specifying using the GPU, but also compared to running the standard (non-metal environment). Also, the GPU was running at quite high power despite of telling TF to use the CPU. Is this an intended or expected behaviour? As it will be preferable to use the non-metal environment when not benefitting from a GPU. Secondly, at small batch sizes, the GPU power in system stats increases with the batch size, as expected. However, when chaning the batch size from 9 to 10 (this appears like a hard step specifically at this number), GPU power drops by about half, and training time doubles. Increasing batch size from about 10 leads again to a gradual increase in GPU power, on my model the same GPU power as batchsize=9 is reached only at about batchsize=50. Making GPU acceleration using batch-sizes from 10 to about 50 rather useless. I've noticed this behavior on several models, which makes me wonder that this is a general tf-metal behaviour. As a result, I've only been able to benefit from GPU acceleration at a batchsize of 9 and above 100. Once again, is this intended or to be expected?
Posted
by iDron.
Last updated
.
Post not yet marked as solved
4 Replies
1.3k Views
I tried installing tensorflow by following the guideline at https://developer.apple.com/metal/tensorflow-plugin/ inside of a conda environment. The installation works fine, but when trying to run the verification script at the end of the guide, it fails at import tensorflow as tf and I get the following error: RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf ImportError: numpy.core._multiarray_umath failed to import ImportError: numpy.core.umath failed to import Traceback (most recent call last):   File "/Users/username/Desktop/test1.py", line 5, in <module>     import tensorflow as tf ... The version of numpy installed by running conda install -c apple tensorflow-deps is 1.22.3, but the error seems to suggest that this is not compatible with tensorflow. Any advice on how to fix that would be much appreciated!
Posted
by Mac4Ever.
Last updated
.
Post marked as solved
19 Replies
28k Views
Following the instructions at https://developer.apple.com/metal/tensorflow-plugin/ I got as far as python -m pip install tensorflow-macos and it responded "ERROR: Could not find a version that satisfies the requirement tensorflow-macos (from versions: none) ERROR: No matching distribution found for tensorflow-macos" I'd be grateful for any suggestions
Posted Last updated
.
Post not yet marked as solved
8 Replies
4.2k Views
I know you guys already support tensorflow by two packages: tensorflow-macos and tensorflow-metal. However to work with NLP, users also need tensorflow-text package. Could you devlopers build a dedicated version tensorflow-text-macos to support this demands. Best Regards,
Posted
by SuperBo.
Last updated
.
Post not yet marked as solved
18 Replies
3.5k Views
Comparison between MAC Studio M1 Ultra (20c, 64c, 128GB RAM) vs 2017 Intel i5 MBP (16GB RAM) for the subject matter i.e. memory leakage while using tf.keras.models.predict() for saved model on both machines: MBP-2017: First prediction takes around 10MB and subsequent calls ~0-1MB MACSTUDIO-2022: First prediction takes around 150MB and subsequent calls ~70-80MB. After say 10000 such calls o predict(), while my MBP memory usage stays under 10GB, MACSTUDIO climbs to ~80GB (and counting up for higher number of calls). Even using keras.backend.clear_session() after each call on MACSTUDIO did not help. Can anyone having insight on TensorFlow-metal and/or MAC M1 machines help? Thanks, Bapi
Posted
by karbapi.
Last updated
.
Post marked as solved
2 Replies
315 Views
Tensorflow 3.12 supports python 3.11. However, the tensorflow-macos package does not provide a build for python 3.11. I assume this is an oversight in the build pipeline. From https://pypi.org/project/tensorflow-macos/#files the current builds are: tensorflow_macos-2.12.0-cp310-cp310-macosx_12_0_x86_64.whl tensorflow_macos-2.12.0-cp310-cp310-macosx_12_0_arm64.whl tensorflow_macos-2.12.0-cp39-cp39-macosx_12_0_x86_64.whl tensorflow_macos-2.12.0-cp39-cp39-macosx_12_0_arm64.whl tensorflow_macos-2.12.0-cp38-cp38-macosx_12_0_x86_64.whl tensorflow_macos-2.12.0-cp38-cp38-macosx_12_0_arm64.whl I would like to request that: tensorflow_macos-2.12.0-cp311-cp311-macosx_12_0_x86_64.whl tensorflow_macos-2.12.0-cp311-cp311-macosx_12_0_arm64.whl are added.
Posted Last updated
.
Post not yet marked as solved
10 Replies
4.2k Views
Hello, I cannot predict with my model on Apple M1. I get a error: Traceback (most recent call last):   File "/Users/martin/Documents/Projects/rl-toolkit/rl_toolkit/__main__.py", line 154, in <module>     agent.run()   File "/Users/martin/Documents/Projects/rl-toolkit/rl_toolkit/training.py", line 213, in run     losses = self._train(sample)   File "/Users/martin/miniforge3/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 889, in __call__     result = self._call(*args, **kwds)   File "/Users/martin/miniforge3/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 950, in _call     return self._stateless_fn(*args, **kwds)   File "/Users/martin/miniforge3/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 3023, in __call__     return graph_function._call_flat(   File "/Users/martin/miniforge3/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 1960, in _call_flat     return self._build_call_outputs(self._inference_function.call(   File "/Users/martin/miniforge3/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 591, in call     outputs = execute.execute(   File "/Users/martin/miniforge3/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation ReadVariableOp: Could not satisfy explicit device specification '' because the node {{colocation_node ReadVariableOp}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0].  Colocation Debug Info: Colocation group had the following types and supported devices:  Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[] ResourceApplyAdamWithAmsgrad: CPU  ReadVariableOp: GPU CPU  _Arg: GPU CPU  Colocation members, user-requested devices, and framework assigned devices, if any:   readvariableop_resource (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0   adam_2_adam_update_6_resourceapplyadamwithamsgrad_m (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0   adam_2_adam_update_6_resourceapplyadamwithamsgrad_v (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0   adam_2_adam_update_6_resourceapplyadamwithamsgrad_vhat (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0   ReadVariableOp (ReadVariableOp)    Exp/ReadVariableOp (ReadVariableOp)    ReadVariableOp_1 (ReadVariableOp)    actor/ReadVariableOp (ReadVariableOp)    actor/Exp/ReadVariableOp (ReadVariableOp)    actor/ReadVariableOp_1 (ReadVariableOp)    actor_critic/actor/ReadVariableOp (ReadVariableOp)    actor_critic/actor/Exp/ReadVariableOp (ReadVariableOp)    actor_critic/actor/ReadVariableOp_1 (ReadVariableOp)    Adam_2/Adam/update_6/ResourceApplyAdamWithAmsgrad (ResourceApplyAdamWithAmsgrad) /job:localhost/replica:0/task:0/device:GPU:0 [[{{node ReadVariableOp}}]] [Op:__inference__train_4206]
Posted Last updated
.
Post not yet marked as solved
1 Replies
673 Views
Hi everyone, I'm a Machine Learning Engineer, and I'm planning to buy the MacBook Pro M2 Max with a 38-core GPU variant. I'm uncertain about whether to choose the 32GB RAM or 64GB RAM option. Based on my research and use case, it seems that 32GB should be sufficient for most tasks, including the 4K video rendering I occasionally do. However, I'm concerned about the longevity of the device, as I'd like to keep the MacBook up-to-date for at least five years. Additionally, considering the 38-core GPU, I wonder if 32GB of unified memory might be insufficient, particularly when I need to train Machine Learning models or run docker or even kubernetes cluster. I don't have any budget constraints, as the additional $400 cost isn't an issue, but I want to make a wise decision. I would appreciate any advice on this matter. Thanks in advance!
Posted
by Aditya-ai.
Last updated
.
Post marked as solved
2 Replies
212 Views
System information OS Platform and Distribution: macOS 13.3.1 (MacBook Pro M2) Python Version(Default): Python 3.11.3 Official Installation Guide: https://developer.apple.com/metal/tensorflow-plugin/ I just followed the installation guide. (base) haesunglee@MacBookPro14-haesunglee ~ % python3 -c 'import tensorflow as tf; print(tf.__version__) RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf ImportError: numpy.core._multiarray_umath failed to import ImportError: numpy.core.umath failed to import RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf ImportError: numpy.core._multiarray_umath failed to import ImportError: numpy.core.umath failed to import RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf ImportError: numpy.core._multiarray_umath failed to import ImportError: numpy.core.umath failed to import Traceback (most recent call last): File "/Users/haesunglee/tensorflow_metal.py", line 1, in <module> import tensorflow as tf File "/Users/haesunglee/miniconda/lib/python3.10/site-packages/tensorflow/__init__.py", line 37, in <module> from tensorflow.python.tools import module_util as _module_util File "/Users/haesunglee/miniconda/lib/python3.10/site-packages/tensorflow/python/__init__.py", line 42, in <module> from tensorflow.python import data File "/Users/haesunglee/miniconda/lib/python3.10/site- from tensorflow.python.data.experimental.ops.data_service_ops import distribute File "/Users/haesunglee/miniconda/lib/python3.10/site-packages/tensorflow/python/data/experimental/ops/data_service_ops.py", line 22, in ... ... ... ... packages/tensorflow/python/framework/sparse_tensor.py", line 25, in <module> from tensorflow.python.framework import constant_op File "/Users/haesunglee/miniconda/lib/python3.10/site-packages/tensorflow/python/framework/constant_op.py", line 25, in <module> from tensorflow.python.eager import execute File "/Users/haesunglee/miniconda/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 21, in <module> from tensorflow.python.framework import dtypes File "/Users/haesunglee/miniconda/lib/python3.10/site-packages/tensorflow/python/framework/dtypes.py", line 37, in <module> _np_bfloat16 = _pywrap_bfloat16.TF_bfloat16_type() TypeError: Unable to convert function return value to a Python type! The signature was () -> handle (base) haesunglee@MacBookPro14-haesunglee ~ % (base) haesunglee@MacBookPro14-haesunglee ~ %
Posted Last updated
.
Post not yet marked as solved
5 Replies
1k Views
We run into an issue that a more complex model fails to converge on M1 Max GPU while it converges on its CPU and on Non-M1 based models. the performance is the same for CPU and GPU for models with single RNN but once we use two RNNs GPU fails to converge. That said, the below example is based on non-sensical data for the model architecture used. but we can observe here the same behavior as the one we observe in our production models (which for obvious reasons we cannot share here). Mainly: the loss goes down to the bottom of the e-06 precision in all cases but when we use two RNNs on GPU. during training we often test e-07 precision level for double RNN with GPU condition, the results do not go that low sometimes reaching also e-05 value level. for our production data we see that double RNN with GPU results in loss of 1.0 and basically stays the same from the first epoch; but for the other conditions it often reaches 0.2 level with clear learning curve. in production model increasing the LSTM_Cell number made the divergence more visible (in this syntactic date it does not happen) the more complex the model is (after the RNN layers) the more visible the issue. Suspected issues: different precision used in CPU and GPU training - we had to decrease the data values a lot to make the effect visible ( if you work with raw data all approaches seem to produce the comparable results) somehow the vanishing gradient problem is more pronounced on GPU as indicated by worse performance as the complexity of the model increases. please let me know if you need any further details Software Stack: Mac OS 12.1 tf 2.7 metal 0.3 also tested on tf. 2.8 Sample Syntax: TEST CONDITIONS: #conditions with issue: 1,2 gpu = 1 # 0 CPU, 1 GPU model_size = 2 # 1 single RNN, 2 double RNN #PARAMETERS LSTM_Cells = 64 epochs = 300 batch = 128 import numpy as np import pandas as pd import sys from sklearn import preprocessing #""" if 'tensorflow' in sys.modules: print("tensorflow uploaded") del sys.modules["tensorflow"] #del tf import tensorflow as tf else: print("tensorflow not uploaded") import tensorflow as tf if gpu == 1: pass else: tf.config.set_visible_devices([], 'GPU') #print("GPUs:", tf.config.list_physical_devices('GPU')) print("GPUs:", tf.config.list_logical_devices('GPU')) #print("CPUs:", tf.config.list_physical_devices('CPU')) print("CPUs:", tf.config.list_logical_devices('CPU')) #""" from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data' column_names = ['MPG', 'Displacement', 'Horsepower', 'Weight'] dataset = pd.read_csv(url, names=column_names, na_values='?', comment='\t', sep=' ', skipinitialspace=True).dropna() scaler = preprocessing.StandardScaler().fit(dataset) X_scaled = scaler.transform(dataset) X_scaled = X_scaled * 0.001 Large Values #x_train = np.array(dataset[['Horsepower', 'Weight']]).reshape(-1,2,2) #y_train = np.array(dataset[['MPG','Displacement']]).reshape(-1,2,2) Small Values x_train = np.array(X_scaled[:,2:]).reshape(-1,2,2) y_train = np.array(X_scaled[:,:2]).reshape(-1,2,2) #print(dataset) print(x_train.shape) print(y_train.shape) print(weight.shape) train_data = tf.data.Dataset.from_tensor_slices((x_train[:,:,:8], y_train)).cache().shuffle(x_train.shape[0]).batch(batch).repeat().prefetch(tf.data.experimental.AUTOTUNE) if model_size == 2: #""" # MINIMAL NOT WORKING encoder_inputs = tf.keras.Input(shape=(x_train.shape[1],x_train.shape[2])) encoder_l1 = tf.keras.layers.LSTM(LSTM_Cells,return_sequences = True, return_state=True) encoder_l1_outputs = encoder_l1(encoder_inputs) encoder_l2 = tf.keras.layers.LSTM(LSTM_Cells, return_state=True) encoder_l2_outputs = encoder_l2(encoder_l1_outputs[0]) dense_1 = tf.keras.layers.Dense(128, activation='relu')(encoder_l2_outputs[0]) dense_2 = tf.keras.layers.Dense(64, activation='relu')(dense_1) dense_3 = tf.keras.layers.Dense(32, activation='relu')(dense_2) dense_4 = tf.keras.layers.Dense(16, activation='relu')(dense_3) flat = tf.keras.layers.Flatten()(dense_2) dense_5 = tf.keras.layers.Dense(22)(flat) reshape_output = tf.keras.layers.Reshape([2,2])(dense_5) model = tf.keras.models.Model(encoder_inputs, reshape_output) #""" else: #""" # WORKING encoder_inputs = tf.keras.Input(shape=(x_train.shape[1],x_train.shape[2])) encoder_l1 = tf.keras.layers.LSTM(LSTM_Cells,return_sequences = True, return_state=True) encoder_l1_outputs = encoder_l1(encoder_inputs) dense_1 = tf.keras.layers.Dense(128, activation='relu')(encoder_l1_outputs[0]) dense_2 = tf.keras.layers.Dense(64, activation='relu')(dense_1) dense_3 = tf.keras.layers.Dense(32, activation='relu')(dense_2) dense_4 = tf.keras.layers.Dense(16, activation='relu')(dense_3) flat = tf.keras.layers.Flatten()(dense_2) dense_5 = tf.keras.layers.Dense(22)(flat) reshape_output = tf.keras.layers.Reshape([2,2])(dense_5) model = tf.keras.models.Model(encoder_inputs, reshape_output) #""" print(model.summary()) loss_tf = tf.keras.losses.MeanSquaredError() model.compile(optimizer='adam', loss=loss_tf, run_eagerly=True) model.fit(train_data, epochs = epochs, steps_per_epoch = 3)
Posted
by sebtac.
Last updated
.
Post not yet marked as solved
1 Replies
500 Views
I am trying to train an image classification network in Keras with tensorflow-metal. The training freezes after the first 2-3 epochs if image augmentation layers are used (RandomFlip, RandomContrast, RandomBrightness) The system appears to use both GPU as well as CPU (as indicated by Activity Monitor). Also, warnings appear both in Jupyter and Terminal (see below). When the image augmentation layers are removed (i.e. we only rebuild the head and feed images from disk), CPU appears to be idle, no warnings appear, and training completes successfully. Versions: python 3.8, tensorflow-macos 2.11.0, tensorflow-metal 0.7.1 Sample code: img_augmentation = Sequential( [ layers.RandomFlip(), layers.RandomBrightness(factor=0.2), layers.RandomContrast(factor=0.2) ], name="img_augmentation", ) inputs = layers.Input(shape=(384, 384, 3)) x = img_augmentation(inputs) model = tf.keras.applications.EfficientNetV2S(include_top=False, input_tensor=x, weights='imagenet') model.trainable = False x = tf.keras.layers.GlobalAveragePooling2D(name="avg_pool")(model.output) x = tf.keras.layers.BatchNormalization()(x) top_dropout_rate = 0.2 x = tf.keras.layers.Dropout(top_dropout_rate, name="top_dropout")(x) outputs = tf.keras.layers.Dense(179, activation="softmax", name="pred")(x) newModel = Model(inputs=model.input, outputs=outputs, name="EfficientNet_DF20M_species") reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_accuracy', factor=0.9, patience=2, verbose=1, min_lr=0.000001) optimizer = tf.keras.optimizers.legacy.SGD(learning_rate=0.01, momentum=0.9) newModel.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy']) history = newModel.fit(x=train_ds, validation_data=val_ds, epochs=30, verbose=2, callbacks=[reduce_lr]) During training with image augmentation, Jupyter prints the following warnings while training the first epoch: WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformFullIntV2 cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting StatelessRandomGetKeyCounter cause there is no registered converter for this op. ... During training with image augmentation, Terminal keeps spamming the following warning: 2023-02-21 23:13:38.958633: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation. 2023-02-21 23:13:38.958920: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation. 2023-02-21 23:13:38.959071: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation. 2023-02-21 23:13:38.959115: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation. 2023-02-21 23:13:38.959359: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation. ... Any suggestions?
Posted
by Cardu6lis.
Last updated
.