tensorflow-metal

RSS for tag

TensorFlow accelerates machine learning model training with Metal on Mac GPUs.

tensorflow-metal Documentation

Posts under tensorflow-metal tag

249 Posts
Sort by:
Post marked as solved
2 Replies
233 Views
Hello everyone I found some problem in tf built-in function (tf.signal.stft) when I type the code below, it will cause problem. Device is MacBookPro with M1 Pro chip in jupyterlab However, the problem won't cause on linux with CUDA. Does anyone know how to fix the problem ? Thanks. code: import numpy as np import tensorflow as tf random_waveform = np.random.normal(size=(16000)) tf_waveform = tf.constant(random_waveform) tf_stft_waveform = tf.signal.stft(tf_waveform, frame_length=255, frame_step=128) error message: InvalidArgumentError Traceback (most recent call last) Input In [1], in <cell line: 6>() 4 random_waveform = np.random.normal(size=(16000)) 5 tf_waveform = tf.constant(random_waveform) ----> 6 tf_stft_waveform = tf.signal.stft(tf_waveform, frame_length=255, frame_step=128) File ~/miniconda3/envs/AI/lib/python3.9/site-packages/tensorflow/python/util/traceback_utils.py:153, in filter_traceback.<locals>.error_handler(*args, **kwargs) 151 except Exception as e: 152 filtered_tb = _process_traceback_frames(e.__traceback__) --> 153 raise e.with_traceback(filtered_tb) from None 154 finally: 155 del filtered_tb File ~/miniconda3/envs/AI/lib/python3.9/site-packages/tensorflow/python/framework/ops.py:7164, in raise_from_not_ok_status(e, name) 7162 def raise_from_not_ok_status(e, name): 7163 e.message += (" name: " + name if name is not None else "") -> 7164 raise core._status_to_exception(e) from None InvalidArgumentError: Multiple Default OpKernel registrations match NodeDef '{{node ZerosLike}}': 'op: "ZerosLike" device_type: "DEFAULT" constraint { name: "T" allowed_values { list { type: DT_INT32 } } } host_memory_arg: "y"' and 'op: "ZerosLike" device_type: "DEFAULT" constraint { name: "T" allowed_values { list { type: DT_INT32 } } } host_memory_arg: "y"' [Op:ZerosLike] 1
Posted
by
Post marked as solved
3 Replies
286 Views
Hi, I am reliably able to get the following results after running pip install tensorflow-metal. Note I did not cull anything (including some device registration messages that only appear the first time you use tensorflow - hopefully not too distracting, but thought it would provide helpful context about my environment in case something is fishy). Python 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:14) [Clang 12.0.1 ] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow as tf >>> tf.config.list_physical_devices('GPU') [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')] >>> tf.zeros_like([1]) Metal device set to: Apple M1 systemMemory: 8.00 GB maxCacheSize: 2.67 GB 2022-06-05 18:54:29.515755: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2022-06-05 18:54:29.516007: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/opt/homebrew/Caskroom/miniforge/base/envs/ml/lib/python3.8/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler raise e.with_traceback(filtered_tb) from None File "/opt/homebrew/Caskroom/miniforge/base/envs/ml/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 7164, in raise_from_not_ok_status raise core._status_to_exception(e) from None # pylint: disable=protected-access tensorflow.python.framework.errors_impl.InvalidArgumentError: Multiple Default OpKernel registrations match NodeDef '{{node ZerosLike}}': 'op: "ZerosLike" device_type: "DEFAULT" constraint { name: "T" allowed_values { list { type: DT_INT32 } } } host_memory_arg: "y"' and 'op: "ZerosLike" device_type: "DEFAULT" constraint { name: "T" allowed_values { list { type: DT_INT32 } } } host_memory_arg: "y"' [Op:ZerosLike] Whereas after uninstalling tensorflow-metal (pip uninstall tensorflow-metal) the same commands produce: Python 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:14) [Clang 12.0.1 ] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow as tf >>> tf.config.list_physical_devices('GPU') [] >>> tf.zeros_like([1]) <tf.Tensor: shape=(1,), dtype=int32, numpy=array([0], dtype=int32)> It looks like a simple double registration issue, but I've only just found out about the 'PluggableDevice' API, so I don't know if it has recommendations for resolving multiple registrations. If I had to guess it is unexpected in the extreme for a pluggable device extension to contain default device op registrations, but without being able to see the code I cannot guess further about what might be wrong.
Posted
by
Post not yet marked as solved
4 Replies
465 Views
First of all, as I understand that this is a problem related with tensorflow addons, I've been in contact with tfa developers (https://github.com/tensorflow/addons/issues/2578), and this issue only happens in M1, so they think it has to do with Apple tensorflow-metal. I've been getting spurious errors while doing model.fit with the Lookahead optimizer (I'm doing fine-tuning with big datasets, and my code just breaks while fitting to different files, and in a not-reproducible way, i.e. each time I run it it breaks on a different file, and on different operations). I can see that these errors are undoubtedly related to the Lookahead optimizer. Let me try to explain this new info in a clear manner. I've tried with 2 different versions of tf+tfaddons (conda environments), but I got the same type of errors, probably more frequent with the pylast conda environment: pylast:tensorflow-macos 2.9.0, tensorflow-metal 0.5.0, tensorflow-addons 0.17.0 py39deps26-source: tensorflow-macos 2.6.0, tensorflow-metal 0.2.0, tensorflow-addons 0.15.0.dev0 The base code is always the same, I use tf.config.set_soft_device_placement(True) and also with tf.device('/cpu:0'): in every call to tensorflow, otherwise I get errors. As explained before, in my code, I just load a model, and fine-tune it to each file of a dataset. Here are a pair of example error outputs (obtained with the pylast conda environment): File "/Users/machine/Projects/finetune-asp/src/finetune_IMR2020.py", line 138, in finetune_dataset_db history = model.fit(ft, steps_per_epoch=len(ft), epochs=ft_cfg["num_epochs"], shuffle=True, File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler raise e.with_traceback(filtered_tb) from None File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 54, in quick_execute tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error: Detected at node 'Lookahead/Lookahead/update_64/mul_11' defined at (most recent call last): File "/Users/machine/Projects/finetune-asp/src/finetune_IMR2020.py", line 138, in finetune_dataset_db history = model.fit(ft, steps_per_epoch=len(ft), epochs=ft_cfg["num_epochs"], shuffle=True, File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler return fn(*args, **kwargs) File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/keras/engine/training.py", line 1409, in fit tmp_logs = self.train_function(iterator) File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/keras/engine/training.py", line 1051, in train_function return step_function(self, iterator) File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/keras/engine/training.py", line 1040, in step_function outputs = model.distribute_strategy.run(run_step, args=(data,)) File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/keras/engine/training.py", line 1030, in run_step outputs = model.train_step(data) File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/keras/engine/training.py", line 893, in train_step self.optimizer.minimize(loss, self.trainable_variables, tape=tape) File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/keras/optimizers/optimizer_v2/optimizer_v2.py", line 539, in minimize return self.apply_gradients(grads_and_vars, name=name) File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/tensorflow_addons/optimizers/lookahead.py", line 104, in apply_gradients return super().apply_gradients(grads_and_vars, name, **kwargs) File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/keras/optimizers/optimizer_v2/optimizer_v2.py", line 678, in apply_gradients return tf.__internal__.distribute.interim.maybe_merge_call( File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/keras/optimizers/optimizer_v2/optimizer_v2.py", line 723, in _distributed_apply update_op = distribution.extended.update( File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/keras/optimizers/optimizer_v2/optimizer_v2.py", line 706, in apply_grad_to_update_var update_op = self._resource_apply_dense(grad, var, **apply_kwargs) File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/tensorflow_addons/optimizers/lookahead.py", line 130, in _resource_apply_dense train_op = self._optimizer._resource_apply_dense( File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/tensorflow_addons/optimizers/rectified_adam.py", line 249, in _resource_apply_dense coef["r_t"] * m_corr_t / (v_corr_t + coef["epsilon_t"]), Node: 'Lookahead/Lookahead/update_64/mul_11' Incompatible shapes: [0] vs. [5,40,20] [[{{node Lookahead/Lookahead/update_64/mul_11}}]] [Op:__inference_train_function_30821] and Another error output
Posted
by
Post not yet marked as solved
2 Replies
366 Views
error message I ran the following notebook with tensorflow-metal--0.5.0 import tensorflow as tf from tensorflow import keras import numpy as np (X_train_full, y_train_full), (X_test, y_test) = keras.datasets.fashion_mnist.load_data() X_train, X_valid = X_train_full[:-5000], X_train_full[-5000:] y_train, y_valid = y_train_full[:-5000], y_train_full[-5000:] X_mean = X_train.mean(axis=0, keepdims=True) X_std = X_train.std(axis=0, keepdims=True) + 1e-7 X_train = (X_train - X_mean) / X_std X_valid = (X_valid - X_mean) / X_std X_test = (X_test - X_mean) / X_std X_train = X_train[..., np.newaxis] X_valid = X_valid[..., np.newaxis] X_test = X_test[..., np.newaxis] from functools import partial DefaultConv2D = partial(keras.layers.Conv2D, kernel_size=3, activation='relu', padding="SAME") input_ = keras.layers.Input(shape=[28, 28, 1]) conv0 = DefaultConv2D(filters=64, kernel_size=7)(input_) pool1 = keras.layers.MaxPooling2D(pool_size=2)(conv0) conv1 = DefaultConv2D(filters=128)(pool1) conv2 = DefaultConv2D(filters=128)(conv1) pool2 = keras.layers.MaxPooling2D(pool_size=2)(conv2) conv3 = DefaultConv2D(filters=256)(pool2) conv4 = DefaultConv2D(filters=256)(conv3) pool3 = keras.layers.MaxPooling2D(pool_size=2)(conv4) flatten = keras.layers.Flatten()(conv4) hidden1 = keras.layers.Dense(units=128, activation='relu')(flatten) dropout1 = keras.layers.Dropout(0.5)(hidden1) hidden2 = keras.layers.Dense(units=64, activation='relu')(dropout1) dropout2 = keras.layers.Dropout(0.5)(hidden2) output = keras.layers.Dense(units=10, activation='softmax')(dropout2) model = keras.Model(inputs=[input_], outputs=[output]) model.compile(loss="sparse_categorical_crossentropy", optimizer="nadam", metrics=["accuracy"]) model.fit(X_train, y_train, epochs=20, validation_data=(X_valid, y_valid)) However I got the error message the kernel appears to have died. it will restart automatically. in the 3rd cell. I also ran this python script from terminal one line at a time, and I got the error message I attached above when I tried to run the code conv0 = DefaultConv2D(filters=64, kernel_size=7)(input_) With tensorflow-metal uninstalled, this code runs without any error messages.
Posted
by
Post not yet marked as solved
5 Replies
338 Views
I am noticing huge memory usage with TensorFlow. The memory usage will keep on increasing up to 36GB of memory usage only after one epoch. The following is the dataset preprocessing process: with tf.device('CPU: 0'): data_augmentation = keras.Sequential([ keras.layers.experimental.preprocessing.RandomFlip("horizontal"), keras.layers.experimental.preprocessing.RandomRotation(0.2), keras.layers.experimental.preprocessing.RandomHeight(0.2), keras.layers.experimental.preprocessing.RandomWidth(0.2), keras.layers.experimental.preprocessing.RandomZoom(0.2), ], name="data_augmentation") train_data = train_data.map(map_func=lambda x, y: (data_augmentation(x), y), num_parallel_calls=tf.data.AUTOTUNE).prefetch(buffer_size=tf.data.AUTOTUNE) test_data = test_data.prefetch(buffer_size=tf.data.AUTOTUNE) And the following is the model I used base_model = keras.applications.EfficientNetB0(include_top=False) base_model.trainable = False inputs = keras.layers.Input(shape=(224, 224, 3), name='input_layer') x = base_model(inputs, training=False) x = keras.layers.GlobalAveragePooling2D(name='global_average_pooling')(x) outputs = keras.layers.Dense(101, activation='softmax', name='output_layer')(x) model = keras.Model(inputs, outputs) # Compile model.compile(loss="categorical_crossentropy", optimizer=tf.keras.optimizers.Adam(), # use Adam with default settings metrics=["accuracy"]) from tqdm.keras import TqdmCallback tqdm_callback = TqdmCallback() # Fit history_all_classes_10_percent = model.fit(train_data, verbose=0, epochs=5, validation_data=test_data, validation_steps=int(0.15 * len(test_data)), callbacks=[checkpoint_callback, tqdm_callback]) # save best model weights to file
Posted
by
Post not yet marked as solved
5 Replies
478 Views
Not only Upgrading tensorflow-macos and tensorflow-metal breaks Conv2d with groups arg , it also makes training unable to finish. Today, after upgrading the tensorflow-macos to 2.9.0 and tensorflow-metal to 0.5.0, my notebook can no longer make progress after training around 16 minutes. I tested 4 times. It could happily run around 17 to 18 epochs, each epoch around 55 seconds. After that, it just stopped making progress. I checked the activity monitor, both cpu and gpu usage were 0 at that point. I accidentally found that there are a lot of kernel faults in the Console app. The last one before I force-killed the process: IOReturn IOGPUDevice::new_resource(IOGPUNewResourceArgs *, struct IOGPUNewResourceReturnData *, IOByteCount, uint32_t *): PID 68905 likely leaking IOGPUResource (count=200000) The PID 68905 is in fact the training process. I have always observed this kind of issue for several months. But it's not as frequent and I can restart my notebook train successfully. No luck today. Hope Apple engineers can found the cause and fix it.
Posted
by
Post not yet marked as solved
1 Replies
252 Views
Today I upgraded tensorflow-macos to 2.9.0 and tensorflow-metal to 0.5.0, and found my old notebook failed to run. It ran well with tensorflow-macos 2.8.0 and tensorflow-metal 0.4.0. Specifically, I found that the groups arg of Conv2d layer was the cause. Here is a demo: import tensorflow as tf from tensorflow import keras as tfk # tf.config.set_visible_devices([], 'GPU') Xs = tf.random.normal((32, 64, 48, 4)) ys = tf.random.normal((32,)) tf.random.set_seed(0) model = tfk.Sequential([ tfk.layers.Conv2D( filters=16, kernel_size=(4, 3), groups=4, # groups arg activation='relu', ), tfk.layers.Flatten(), tfk.layers.Dense(1, activation='sigmoid'), ]) model.compile( loss=tfk.losses.BinaryCrossentropy(), metrics=[ tfk.metrics.BinaryAccuracy(), ], ) model.fit(Xs, ys, epochs=2, verbose=1) The error is: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at xla_ops.cc:296 : UNIMPLEMENTED: Could not find compiler for platform METAL: NOT_FOUND: could not find registered compiler for platform METAL -- check target linkage Removing groups arg would make the code run again. Training on CPU, by uncommenting line 4, gives different error: 'apple-m1' is not a recognized processor for this target (ignoring processor) LLVM ERROR: 64-bit code requested on a subtarget that doesn't support it! And removing groups arg also would make training on CPU work. However I didn't test training on CPU before the upgrade. My device is a MacBook Pro 14' running macOS 12.4.
Posted
by
Post not yet marked as solved
2 Replies
392 Views
I'm trying to get TensorFlow with Metal support running on my iMac (2017, Radeon 580 Pro) following these instructions. However, simply importing tensorflow ( import tensorflow ) results in the following error with the Python console crashing: 2022-05-27 11:46:12.419950: F tensorflow/c/experimental/stream_executor/stream_executor.cc:808] Non-OK-status: stream_executor::MultiPlatformManager::RegisterPlatform( std::move(cplatform)) status: INTERNAL: platform is already registered with name: "METAL" Abort trap: 6 Versions: macOS 12.3, Python 3.8.13, tensorflow-macos 2.9.0, tensorflow-metal 0.5.0
Posted
by
Post not yet marked as solved
1 Replies
186 Views
I completed till the step of installing tensor flow dependencies which itself posed so many errors, but when I try to run python -m pip install tensorflow-metal or python -m pip install tensorflow-macos I get the following error, ERROR: Could not find a version that satisfies the requirement tensorflow-metal (from versions: none) ERROR: No matching distribution found for tensorflow-metal ERROR: Could not find a version that satisfies the requirement tensorflow-macos (from versions: none) ERROR: No matching distribution found for tensorflow-macos What am I supposed to do now to install tensor flow directly?
Posted
by
Post not yet marked as solved
1 Replies
337 Views
Hi, OS: macOS 12.4 CPU: Apple M1 I cannot import the new TensorFlow 2.9.0 on Apple M1. I got an error: --------------------------------------------------------------------------- TypeError Traceback (most recent call last) /Users/martin/Documents/Projects/Solar-Transformer/Testing.ipynb Cell 3' in <cell line: 1>() ----> 1 from tensorflow.keras.layers import Add, Dense, Dropout, Layer, LayerNormalization, MultiHeadAttention, Normalization 2 from tensorflow.keras.models import Model 3 from tensorflow.keras.initializers import TruncatedNormal File ~/miniforge3/lib/python3.9/site-packages/tensorflow/__init__.py:37, in <module> 34 import sys as _sys 35 import typing as _typing ---> 37 from tensorflow.python.tools import module_util as _module_util 38 from tensorflow.python.util.lazy_loader import LazyLoader as _LazyLoader 40 # Make sure code inside the TensorFlow codebase can use tf2.enabled() at import. File ~/miniforge3/lib/python3.9/site-packages/tensorflow/python/__init__.py:42, in <module> 37 from tensorflow.python.eager import context 39 # pylint: enable=wildcard-import 40 41 # Bring in subpackages. ---> 42 from tensorflow.python import data 43 from tensorflow.python import distribute 44 # from tensorflow.python import keras File ~/miniforge3/lib/python3.9/site-packages/tensorflow/python/data/__init__.py:21, in <module> 15 """`tf.data.Dataset` API for input pipelines. 16 17 See [Importing Data](https://tensorflow.org/guide/data) for an overview. 18 """ 20 # pylint: disable=unused-import ---> 21 from tensorflow.python.data import experimental 22 from tensorflow.python.data.ops.dataset_ops import AUTOTUNE 23 from tensorflow.python.data.ops.dataset_ops import Dataset File ~/miniforge3/lib/python3.9/site-packages/tensorflow/python/data/util/structure.py:22, in <module> 19 import six 20 import wrapt ---> 22 from tensorflow.python.data.util import nest 23 from tensorflow.python.framework import composite_tensor 24 from tensorflow.python.framework import ops File ~/miniforge3/lib/python3.9/site-packages/tensorflow/python/data/util/nest.py:36, in <module> 16 """## Functions for working with arbitrarily nested sequences of elements. 17 18 NOTE(mrry): This fork of the `tensorflow.python.util.nest` module (...) 31 arrays. 32 """ 34 import six as _six ---> 36 from tensorflow.python.framework import sparse_tensor as _sparse_tensor 37 from tensorflow.python.util import _pywrap_utils 38 from tensorflow.python.util import nest File ~/miniforge3/lib/python3.9/site-packages/tensorflow/python/framework/sparse_tensor.py:24, in <module> 22 from tensorflow.python import tf2 23 from tensorflow.python.framework import composite_tensor ---> 24 from tensorflow.python.framework import constant_op 25 from tensorflow.python.framework import dtypes 26 from tensorflow.python.framework import ops File ~/miniforge3/lib/python3.9/site-packages/tensorflow/python/framework/constant_op.py:25, in <module> 23 from tensorflow.core.framework import types_pb2 24 from tensorflow.python.eager import context ---> 25 from tensorflow.python.eager import execute 26 from tensorflow.python.framework import dtypes 27 from tensorflow.python.framework import op_callbacks File ~/miniforge3/lib/python3.9/site-packages/tensorflow/python/eager/execute.py:23, in <module> 21 from tensorflow.python import pywrap_tfe 22 from tensorflow.python.eager import core ---> 23 from tensorflow.python.framework import dtypes 24 from tensorflow.python.framework import ops 25 from tensorflow.python.framework import tensor_shape File ~/miniforge3/lib/python3.9/site-packages/tensorflow/python/framework/dtypes.py:29, in <module> 26 from tensorflow.python.lib.core import _pywrap_bfloat16 27 from tensorflow.python.util.tf_export import tf_export ---> 29 _np_bfloat16 = _pywrap_bfloat16.TF_bfloat16_type() 32 @tf_export("dtypes.DType", "DType") 33 class DType(_dtypes.DType): 34 """Represents the type of the elements in a `Tensor`. 35 36 `DType`'s are used to specify the output data type for operations which (...) 46 See `tf.dtypes` for a complete list of `DType`'s defined. 47 """ Example code: from tensorflow.keras.layers import Add, Dense, Dropout, Layer, LayerNormalization, MultiHeadAttention, Normalization from tensorflow.keras.models import Model from tensorflow.keras.initializers import TruncatedNormal from tensorflow.keras.utils import timeseries_dataset_from_array import tensorflow as tf import tensorflow_probability as tfp import numpy as np
Posted
by
Post not yet marked as solved
2 Replies
332 Views
So I'm trying to install Tensorflow on my M1 MacBook Air using the guidelines here https://developer.apple.com/metal/tensorflow-plugin/ I'm able to download and install the Conda env (at least I believe), but when I try to install the TensorFlow dependencies, I get the error PackagesNotFoundError: The following packages are not available from current channels:   - tensorflow-deps Would really appreciate any help.
Posted
by
Post marked as solved
1 Replies
339 Views
I followed steps:https://developer.apple.com/metal/tensorflow-plugin/ and installed the tensor flow. But when I tried to import tensorflow on python in my shell I got errors, And not installing TF to the base environment, uninstall numpy, reinstall numpy, all these methods can not work. Appreciate your patience. ImportError Traceback (most recent call last) File ~/miniforge3/lib/python3.9/site-packages/numpy/core/init.py:22, in 21 try: ---> 22 from . import multiarray 23 except ImportError as exc: File ~/miniforge3/lib/python3.9/site-packages/numpy/core/multiarray.py:12, in 10 import warnings ---> 12 from . import overrides 13 from . import _multiarray_umath File ~/miniforge3/lib/python3.9/site-packages/numpy/core/overrides.py:7, in 5 import textwrap ----> 7 from numpy.core._multiarray_umath import ( 8 add_docstring, implement_array_function, _get_implementing_args) 9 from numpy.compat._inspect import getargspec ImportError: dlopen(/Users/myname/miniforge3/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-darwin.so, 0x0002): Library not loaded: @rpath/libcblas.3.dylib Referenced from: /Users/myname/miniforge3/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-darwin.so Reason: tried: '/Users/myname/miniforge3/lib/python3.9/site-packages/numpy/core/../../../../libcblas.3.dylib' (no such file), '/Users/myname/miniforge3/lib/python3.9/site-packages/numpy/core/../../../../libcblas.3.dylib' (no such file), '/Users/myname/miniforge3/bin/../lib/libcblas.3.dylib' (no such file), '/Users/myname/miniforge3/bin/../lib/libcblas.3.dylib' (no such file), '/usr/local/lib/libcblas.3.dylib' (no such file), '/usr/lib/libcblas.3.dylib' (no such file) During handling of the above exception, another exception occurred: ImportError Traceback (most recent call last) Input In [1], in <cell line: 1>() ----> 1 import tensorflow as tf File ~/miniforge3/lib/python3.9/site-packages/tensorflow/init.py:37, in 34 import sys as _sys 35 import typing as _typing ---> 37 from tensorflow.python.tools import module_util as _module_util 38 from tensorflow.python.util.lazy_loader import LazyLoader as _LazyLoader 40 # Make sure code inside the TensorFlow codebase can use tf2.enabled() at import. File ~/miniforge3/lib/python3.9/site-packages/tensorflow/python/init.py:37, in 29 # We aim to keep this file minimal and ideally remove completely. 30 # If you are adding a new file with @tf_export decorators, 31 # import it in modules_with_exports.py instead. 32 33 # go/tf-wildcard-import 34 # pylint: disable=wildcard-import,g-bad-import-order,g-import-not-at-top 36 from tensorflow.python import pywrap_tensorflow as _pywrap_tensorflow ---> 37 from tensorflow.python.eager import context 39 # pylint: enable=wildcard-import 40 41 # Bring in subpackages. 42 from tensorflow.python import data File ~/miniforge3/lib/python3.9/site-packages/tensorflow/python/eager/context.py:26, in 23 import threading 25 from absl import logging ---> 26 import numpy as np 27 import six 29 from tensorflow.core.framework import function_pb2 File ~/miniforge3/lib/python3.9/site-packages/numpy/init.py:150, in 147 # Allow distributors to run custom init code 148 from . import _distributor_init --> 150 from . import core 151 from .core import * 152 from . import compat File ~/miniforge3/lib/python3.9/site-packages/numpy/core/init.py:48, in 24 import sys 25 msg = """ 26 27 IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE! (...) 46 """ % (sys.version_info[0], sys.version_info[1], sys.executable, 47 version, exc) ---> 48 raise ImportError(msg) 49 finally: 50 for envkey in env_added: ImportError: IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE! Importing the numpy C-extensions failed. This error can happen for many reasons, often due to issues with your setup or how NumPy was installed. We have compiled some common reasons and troubleshooting tips at: https://numpy.org/devdocs/user/troubleshooting-importerror.html Please note and check the following: The Python version is: Python3.9 from "/Users/myname/miniforge3/bin/python" The NumPy version is: "1.21.6" and make sure that they are the versions you expect. Please carefully study the documentation linked above for further help. Original error was: dlopen(/Users/myname/miniforge3/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-darwin.so, 0x0002): Library not loaded: @rpath/libcblas.3.dylib Referenced from: /Users/myname/miniforge3/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-darwin.so Reason: tried: '/Users/myname/miniforge3/lib/python3.9/site-packages/numpy/core/../../../../libcblas.3.dylib' (no such file), '/Users/myname/miniforge3/lib/python3.9/site-packages/numpy/core/../../../../libcblas.3.dylib' (no such file), '/Users/myname/miniforge3/bin/../lib/libcblas.3.dylib' (no such file), '/Users/myname/miniforge3/bin/../lib/libcblas.3.dylib' (no such file), '/usr/local/lib/libcblas.3.dylib' (no such file), '/usr/lib/libcblas.3.dylib' (no such file)
Posted
by
Post not yet marked as solved
2 Replies
648 Views
I have followed all the instructions to install tensorflow for my M1 mac from "https://developer.apple.com/metal/tensorflow-plugin/". Despite of showing a successful installation, there is an error when I am trying to import the tensorflow library. --------------------------------------------------------------------------- OSError Traceback (most recent call last) Input In [1], in <cell line: 1>() ----> 1 import tensorflow as tf 2 tf.__version__ File ~/.local/lib/python3.9/site-packages/tensorflow/__init__.py:37, in <module> 34 import sys as _sys 35 import typing as _typing ---> 37 from tensorflow.python.tools import module_util as _module_util 38 from tensorflow.python.util.lazy_loader import LazyLoader as _LazyLoader 40 # Make sure code inside the TensorFlow codebase can use tf2.enabled() at import. File ~/.local/lib/python3.9/site-packages/tensorflow/python/__init__.py:36, in <module> 27 import traceback 29 # We aim to keep this file minimal and ideally remove completely. 30 # If you are adding a new file with @tf_export decorators, 31 # import it in modules_with_exports.py instead. 32 33 # go/tf-wildcard-import 34 # pylint: disable=wildcard-import,g-bad-import-order,g-import-not-at-top ---> 36 from tensorflow.python import pywrap_tensorflow as _pywrap_tensorflow 37 from tensorflow.python.eager import context 39 # pylint: enable=wildcard-import 40 41 # Bring in subpackages. File ~/.local/lib/python3.9/site-packages/tensorflow/python/pywrap_tensorflow.py:24, in <module> 21 from tensorflow.python.platform import self_check 23 # Perform pre-load sanity checks in order to produce a more actionable error. ---> 24 self_check.preload_check() 26 # pylint: disable=wildcard-import,g-import-not-at-top,unused-import,line-too-long 28 try: 29 # This import is expected to fail if there is an explicit shared object 30 # dependency (with_framework_lib=true), since we do not need RTLD_GLOBAL. File ~/.local/lib/python3.9/site-packages/tensorflow/python/platform/self_check.py:65, in preload_check() 58 else: 59 # Load a library that performs CPU feature guard checking as a part of its 60 # static initialization. Doing this here as a preload check makes it more 61 # likely that we detect any CPU feature incompatibilities before we trigger 62 # them (which would typically result in SIGILL). 63 cpu_feature_guard_library = os.path.join( 64 os.path.dirname(__file__), "../../core/platform/_cpu_feature_guard.so") ---> 65 ctypes.CDLL(cpu_feature_guard_library) File ~/miniforge3/envs/tfm1/lib/python3.9/ctypes/__init__.py:374, in CDLL.__init__(self, name, mode, handle, use_errno, use_last_error, winmode) 371 self._FuncPtr = _FuncPtr 373 if handle is None: --> 374 self._handle = _dlopen(self._name, mode) 375 else: 376 self._handle = handle OSError: dlopen(/Users/k_krishna/.local/lib/python3.9/site-packages/tensorflow/python/platform/../../core/platform/_cpu_feature_guard.so, 0x0006): tried: '/Users/k_krishna/.local/lib/python3.9/site-packages/tensorflow/python/platform/../../core/platform/_cpu_feature_guard.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e')), '/Users/k_krishna/.local/lib/python3.9/site-packages/tensorflow/core/platform/_cpu_feature_guard.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e')) I have tried stack-overflow but non of the solutions worked. Please help me resolve the issue.
Posted
by
Post not yet marked as solved
0 Replies
213 Views
We are developing a simple GAN an when training the solution, the behavior of the convergence of the discriminator is different if we use GPU than using only CPU or even executing in Collab. We've read a lot, but this is the only one post that seems to talk about similar behavior. Unfortunately, after updating to 0.4 version problem persists. My Hardware/Software: MacBook Pro. model: MacBookPro18,2. Chip: Apple M1 Max. Cores: 10 (8 de rendimiento y 2 de eficiencia). Memory: 64 GB. firmware: 7459.101.3. OS: Monterey 12.3.1. OS Version: 7459.101.3. Python version 3.8 and libraries (the most related) using !pip freeze keras==2.8.0 Keras-Preprocessing==1.1.2 .... tensorboard==2.8.0 tensorboard-data-server==0.6.1 tensorboard-plugin-wit==1.8.1 tensorflow-datasets==4.5.2 tensorflow-docs @ git+https://github.com/tensorflow/docs@7d5ea2e986a4eae7573be3face00b3cccd4b8b8b tensorflow-macos==2.8.0 tensorflow-metadata==1.7.0 tensorflow-metal==0.4.0 #####. CODE TO REPRODUCE. ####### Code does not fit in the max space in this message... I've shared a Google Collab Notebook at: https://colab.research.google.com/drive/1oDS8EV0eP6kToUYJuxHf5WCZlRL0Ypgn?usp=sharing You can easily see that loss goes to 0 after 1 or 2 epochs when GPU is enabled, buy if GPU is disabled everything is OK
Posted
by
Post not yet marked as solved
1 Replies
220 Views
class RankingModel(tf.keras.Model): def init(self): super().init() embedding_dimension = 32 # Compute embeddings for users. self.user_embeddings = tf.keras.Sequential([ tf.keras.layers.StringLookup( vocabulary=unique_user_ids, mask_token=None), tf.keras.layers.Embedding(len(unique_user_ids) + 1, embedding_dimension) ]) # Compute embeddings for movies. self.movie_embeddings = tf.keras.Sequential([ tf.keras.layers.StringLookup( vocabulary=unique_movie_titles, mask_token=None), tf.keras.layers.Embedding(len(unique_movie_titles) + 1, embedding_dimension) ]) # Compute predictions. self.ratings = tf.keras.Sequential([ # Learn multiple dense layers. tf.keras.layers.Dense(256, activation="relu"), tf.keras.layers.Dense(64, activation="relu"), # Make rating predictions in the final layer. tf.keras.layers.Dense(1) ]) def call(self, inputs): user_id, movie_title = inputs user_embedding = self.user_embeddings(user_id) movie_embedding = self.movie_embeddings(movie_title) return self.ratings(tf.concat([user_embedding, movie_embedding], axis=1)) task = tfrs.tasks.Ranking( loss = tf.keras.losses.MeanSquaredError(), metrics=[tf.keras.metrics.RootMeanSquaredError()] ) class MovielensModel(tfrs.models.Model): def init(self): super().init() self.ranking_model: tf.keras.Model = RankingModel() self.task: tf.keras.layers.Layer = tfrs.tasks.Ranking( loss = tf.keras.losses.MeanSquaredError(), metrics=[tf.keras.metrics.RootMeanSquaredError()] ) def call(self, features: Dict[str, tf.Tensor]) -> tf.Tensor: return self.ranking_model( (features["user_id"], features["movie_title"])) def compute_loss(self, features: Dict[Text, tf.Tensor], training=False) -> tf.Tensor: labels = features.pop("user_rating") rating_predictions = self(features) # The task computes the loss and the metrics. return self.task(labels=labels, predictions=rating_predictions) model = MovielensModel() model.compile(optimizer=tf.keras.optimizers.Adagrad(learning_rate=0.1)) cached_train = train.shuffle(100_000).batch(8192).cache() cached_test = test.batch(4096).cache() model.fit(cached_train, epochs=3) InvalidArgumentError Traceback (most recent call last) Input In [40], in <cell line: 5>() 1 #physical_devices = tf.config.list_physical_devices('GPU') 2 #tf.config.set_visible_devices(physical_devices[0], 'GPU') 3 #with tf.device("GPU"): 4 #model.compile(optimizer=tf.keras.optimizers.Adagrad(learning_rate=0.1)) ----> 5 model.fit(cached_train, epochs=3) File /opt/homebrew/Caskroom/miniforge/base/envs/mlp/lib/python3.8/site-packages/keras/utils/traceback_utils.py:67, in filter_traceback..error_handler(*args, **kwargs) 65 except Exception as e: # pylint: disable=broad-except 66 filtered_tb = _process_traceback_frames(e.traceback) ---> 67 raise e.with_traceback(filtered_tb) from None 68 finally: 69 del filtered_tb File /opt/homebrew/Caskroom/miniforge/base/envs/mlp/lib/python3.8/site-packages/tensorflow/python/eager/execute.py:54, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name) 52 try: 53 ctx.ensure_initialized() ---> 54 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, 55 inputs, attrs, num_outputs) 56 except core._NotOkStatusException as e: 57 if name is not None: InvalidArgumentError: Cannot assign a device for operation movielens_model_1/ranking_model_3/sequential_9/embedding_6/embedding_lookup: Could not satisfy explicit device specification '' because the node {{colocation_node movielens_model_1/ranking_model_3/sequential_9/embedding_6/embedding_lookup}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0]. Colocation Debug Info: Colocation group had the following types and supported devices: Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[] ResourceSparseApplyAdagradV2: CPU UnsortedSegmentSum: GPU CPU StridedSlice: GPU CPU Const: GPU CPU Shape: GPU CPU _Arg: GPU CPU Unique: GPU CPU Identity: GPU CPU ResourceGather: GPU CPU Colocation members, user-requested devices, and framework assigned devices, if any: movielens_model_1_ranking_model_3_sequential_9_embedding_6_embedding_lookup_4370 (_Arg) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0 adagrad_adagrad_update_resourcesparseapplyadagradv2_accum (_Arg) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0 movielens_model_1/ranking_model_3/sequential_9/embedding_6/embedding_lookup (ResourceGather) movielens_model_1/ranking_model_3/sequential_9/embedding_6/embedding_lookup/Identity (Identity) Adagrad/Adagrad/update/Unique (Unique) /job:localhost/replica:0/task:0/device:GPU:0 Adagrad/Adagrad/update/Shape (Shape) /job:localhost/replica:0/task:0/device:GPU:0 Adagrad/Adagrad/update/strided_slice/stack (Const) /job:localhost/replica:0/task:0/device:GPU:0 Adagrad/Adagrad/update/strided_slice/stack_1 (Const) /job:localhost/replica:0/task:0/device:GPU:0 Adagrad/Adagrad/update/strided_slice/stack_2 (Const) /job:localhost/replica:0/task:0/device:GPU:0 Adagrad/Adagrad/update/strided_slice (StridedSlice) /job:localhost/replica:0/task:0/device:GPU:0 Adagrad/Adagrad/update/UnsortedSegmentSum (UnsortedSegmentSum) /job:localhost/replica:0/task:0/device:GPU:0 Adagrad/Adagrad/update/ResourceSparseApplyAdagradV2 (ResourceSparseApplyAdagradV2) /job:localhost/replica:0/task:0/device:GPU:0 [[{{node movielens_model_1/ranking_model_3/sequential_9/embedding_6/embedding_lookup}}]] [Op:__inference_train_function_4593]
Posted
by
Post not yet marked as solved
0 Replies
148 Views
I installed the tensorflow-deps package on my MacBook Pro, and noted that it install python 3.10. Unfortunately there is no whl for python 3.10 for tensorflow-macos on pypi. I had to downgrade to python 3.9 before I could install tensorflow and tensorflow-metal.
Posted
by