tensorflow-metal

RSS for tag

TensorFlow accelerates machine learning model training with Metal on Mac GPUs.

tensorflow-metal Documentation

Posts under tensorflow-metal tag

220 results found
Sort by:
Post not yet marked as solved
6.6k Views

Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.

Device: MacBook Pro 16 M1 Max, 64GB running MacOS 12.0.1. I tried setting up GPU Accelerated TensorFlow on my Mac using the following steps: Setup: XCode CLI / Homebrew/ Miniforge Conda Env: Python 3.9.5 conda install -c apple tensorflow-deps python -m pip install tensorflow-macos python -m pip install tensorflow-metal brew install libjpeg conda install -y matplotlib jupyterlab In Jupyter Lab, I try to execute this code: from tensorflow.keras import layers from tensorflow.keras import models model = models.Sequential() model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1))) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu')) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu')) model.add(layers.Flatten()) model.add(layers.Dense(64, activation='relu')) model.add(layers.Dense(10, activation='softmax')) model.summary() The code executes, but I get this warning, indicating no GPU Acceleration can be used as it defaults to a 0MB GPU. Error: Metal device set to: Apple M1 Max 2021-10-27 08:23:32.872480: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2021-10-27 08:23:32.872707: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>) Anyone has any idea how to fix this? I came across a bunch of posts around here related to the same issue but with no solid fix. I created a new question as I found the other questions less descriptive of the issue, and wanted to comprehensively depict it. Any fix would be of much help.
Asked Last updated
.
Post marked as solved
100 Views

Tensorflow on MacBook Pro M1 doesn't work

I followed steps:https://developer.apple.com/metal/tensorflow-plugin/ and installed the tensor flow. But when I tried to import tensorflow on python in my shell I got errors, And not installing TF to the base environment, uninstall numpy, reinstall numpy, all these methods can not work. Appreciate your patience. ImportError Traceback (most recent call last) File ~/miniforge3/lib/python3.9/site-packages/numpy/core/init.py:22, in 21 try: ---> 22 from . import multiarray 23 except ImportError as exc: File ~/miniforge3/lib/python3.9/site-packages/numpy/core/multiarray.py:12, in 10 import warnings ---> 12 from . import overrides 13 from . import _multiarray_umath File ~/miniforge3/lib/python3.9/site-packages/numpy/core/overrides.py:7, in 5 import textwrap ----> 7 from numpy.core._multiarray_umath import ( 8 add_docstring, implement_array_function, _get_implementing_args) 9 from numpy.compat._inspect import getargspec ImportError: dlopen(/Users/myname/miniforge3/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-darwin.so, 0x0002): Library not loaded: @rpath/libcblas.3.dylib Referenced from: /Users/myname/miniforge3/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-darwin.so Reason: tried: '/Users/myname/miniforge3/lib/python3.9/site-packages/numpy/core/../../../../libcblas.3.dylib' (no such file), '/Users/myname/miniforge3/lib/python3.9/site-packages/numpy/core/../../../../libcblas.3.dylib' (no such file), '/Users/myname/miniforge3/bin/../lib/libcblas.3.dylib' (no such file), '/Users/myname/miniforge3/bin/../lib/libcblas.3.dylib' (no such file), '/usr/local/lib/libcblas.3.dylib' (no such file), '/usr/lib/libcblas.3.dylib' (no such file) During handling of the above exception, another exception occurred: ImportError Traceback (most recent call last) Input In [1], in <cell line: 1>() ----> 1 import tensorflow as tf File ~/miniforge3/lib/python3.9/site-packages/tensorflow/init.py:37, in 34 import sys as _sys 35 import typing as _typing ---> 37 from tensorflow.python.tools import module_util as _module_util 38 from tensorflow.python.util.lazy_loader import LazyLoader as _LazyLoader 40 # Make sure code inside the TensorFlow codebase can use tf2.enabled() at import. File ~/miniforge3/lib/python3.9/site-packages/tensorflow/python/init.py:37, in 29 # We aim to keep this file minimal and ideally remove completely. 30 # If you are adding a new file with @tf_export decorators, 31 # import it in modules_with_exports.py instead. 32 33 # go/tf-wildcard-import 34 # pylint: disable=wildcard-import,g-bad-import-order,g-import-not-at-top 36 from tensorflow.python import pywrap_tensorflow as _pywrap_tensorflow ---> 37 from tensorflow.python.eager import context 39 # pylint: enable=wildcard-import 40 41 # Bring in subpackages. 42 from tensorflow.python import data File ~/miniforge3/lib/python3.9/site-packages/tensorflow/python/eager/context.py:26, in 23 import threading 25 from absl import logging ---> 26 import numpy as np 27 import six 29 from tensorflow.core.framework import function_pb2 File ~/miniforge3/lib/python3.9/site-packages/numpy/init.py:150, in 147 # Allow distributors to run custom init code 148 from . import _distributor_init --> 150 from . import core 151 from .core import * 152 from . import compat File ~/miniforge3/lib/python3.9/site-packages/numpy/core/init.py:48, in 24 import sys 25 msg = """ 26 27 IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE! (...) 46 """ % (sys.version_info[0], sys.version_info[1], sys.executable, 47 version, exc) ---> 48 raise ImportError(msg) 49 finally: 50 for envkey in env_added: ImportError: IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE! Importing the numpy C-extensions failed. This error can happen for many reasons, often due to issues with your setup or how NumPy was installed. We have compiled some common reasons and troubleshooting tips at: https://numpy.org/devdocs/user/troubleshooting-importerror.html Please note and check the following: The Python version is: Python3.9 from "/Users/myname/miniforge3/bin/python" The NumPy version is: "1.21.6" and make sure that they are the versions you expect. Please carefully study the documentation linked above for further help. Original error was: dlopen(/Users/myname/miniforge3/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-darwin.so, 0x0002): Library not loaded: @rpath/libcblas.3.dylib Referenced from: /Users/myname/miniforge3/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-darwin.so Reason: tried: '/Users/myname/miniforge3/lib/python3.9/site-packages/numpy/core/../../../../libcblas.3.dylib' (no such file), '/Users/myname/miniforge3/lib/python3.9/site-packages/numpy/core/../../../../libcblas.3.dylib' (no such file), '/Users/myname/miniforge3/bin/../lib/libcblas.3.dylib' (no such file), '/Users/myname/miniforge3/bin/../lib/libcblas.3.dylib' (no such file), '/usr/local/lib/libcblas.3.dylib' (no such file), '/usr/lib/libcblas.3.dylib' (no such file)
Asked
by jbingl.
Last updated
.
Post not yet marked as solved
142 Views

Unable to install Tensorflow on my M1 Macbook pro.

I have followed all the instructions to install tensorflow for my M1 mac from "https://developer.apple.com/metal/tensorflow-plugin/". Despite of showing a successful installation, there is an error when I am trying to import the tensorflow library. --------------------------------------------------------------------------- OSError Traceback (most recent call last) Input In [1], in <cell line: 1>() ----> 1 import tensorflow as tf 2 tf.__version__ File ~/.local/lib/python3.9/site-packages/tensorflow/__init__.py:37, in <module> 34 import sys as _sys 35 import typing as _typing ---> 37 from tensorflow.python.tools import module_util as _module_util 38 from tensorflow.python.util.lazy_loader import LazyLoader as _LazyLoader 40 # Make sure code inside the TensorFlow codebase can use tf2.enabled() at import. File ~/.local/lib/python3.9/site-packages/tensorflow/python/__init__.py:36, in <module> 27 import traceback 29 # We aim to keep this file minimal and ideally remove completely. 30 # If you are adding a new file with @tf_export decorators, 31 # import it in modules_with_exports.py instead. 32 33 # go/tf-wildcard-import 34 # pylint: disable=wildcard-import,g-bad-import-order,g-import-not-at-top ---> 36 from tensorflow.python import pywrap_tensorflow as _pywrap_tensorflow 37 from tensorflow.python.eager import context 39 # pylint: enable=wildcard-import 40 41 # Bring in subpackages. File ~/.local/lib/python3.9/site-packages/tensorflow/python/pywrap_tensorflow.py:24, in <module> 21 from tensorflow.python.platform import self_check 23 # Perform pre-load sanity checks in order to produce a more actionable error. ---> 24 self_check.preload_check() 26 # pylint: disable=wildcard-import,g-import-not-at-top,unused-import,line-too-long 28 try: 29 # This import is expected to fail if there is an explicit shared object 30 # dependency (with_framework_lib=true), since we do not need RTLD_GLOBAL. File ~/.local/lib/python3.9/site-packages/tensorflow/python/platform/self_check.py:65, in preload_check() 58 else: 59 # Load a library that performs CPU feature guard checking as a part of its 60 # static initialization. Doing this here as a preload check makes it more 61 # likely that we detect any CPU feature incompatibilities before we trigger 62 # them (which would typically result in SIGILL). 63 cpu_feature_guard_library = os.path.join( 64 os.path.dirname(__file__), "../../core/platform/_cpu_feature_guard.so") ---> 65 ctypes.CDLL(cpu_feature_guard_library) File ~/miniforge3/envs/tfm1/lib/python3.9/ctypes/__init__.py:374, in CDLL.__init__(self, name, mode, handle, use_errno, use_last_error, winmode) 371 self._FuncPtr = _FuncPtr 373 if handle is None: --> 374 self._handle = _dlopen(self._name, mode) 375 else: 376 self._handle = handle OSError: dlopen(/Users/k_krishna/.local/lib/python3.9/site-packages/tensorflow/python/platform/../../core/platform/_cpu_feature_guard.so, 0x0006): tried: '/Users/k_krishna/.local/lib/python3.9/site-packages/tensorflow/python/platform/../../core/platform/_cpu_feature_guard.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e')), '/Users/k_krishna/.local/lib/python3.9/site-packages/tensorflow/core/platform/_cpu_feature_guard.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e')) I have tried stack-overflow but non of the solutions worked. Please help me resolve the issue.
Asked
by K_Krishna.
Last updated
.
Post not yet marked as solved
149 Views

I can't install tensorflow on my mac m1

Hey I can't install Tensorflow on my M1 Mac here is attached some screenshot.
Asked Last updated
.
Post not yet marked as solved
604 Views

Kernel Died Importing Keras on Mac M1

Hi, I am trying to import keras as followed on my Mac M1 running on MacOS 11.6 but the kernel died during the process. from keras.models import Sequential from keras.layers import Dense The error message in the terminal: [I 15:44:39.003 NotebookApp] Kernel started: e0bec9a2-cde8-42ed-be9d-747ea2841818, name: python3 2022-01-27 15:44:43.203092: F tensorflow/c/c_api_experimental.cc:739] Non-OK-status: tensorflow::RegisterPluggableDevicePlugin(lib_handle->lib_handle) status: FAILED_PRECONDITION: 'host_callback' field in SP_StreamExecutor must be set. Not sure if it's related, I have read the documentation on https://developer.apple.com/metal/tensorflow-plugin/ and the OS requirement is MacOS 12.0+. Is there anyway to solve this issue on my current OS?
Asked
by Zen26.
Last updated
.
Post not yet marked as solved
50 Views

Tensorflow error

Hii, would u help me to install tensorflow o my m1
Asked Last updated
.
Post not yet marked as solved
66 Views

Tensorflow M1 Max Metal 0.4 convergence problems

We are developing a simple GAN an when training the solution, the behavior of the convergence of the discriminator is different if we use GPU than using only CPU or even executing in Collab. We've read a lot, but this is the only one post that seems to talk about similar behavior. Unfortunately, after updating to 0.4 version problem persists. My Hardware/Software: MacBook Pro. model: MacBookPro18,2. Chip: Apple M1 Max. Cores: 10 (8 de rendimiento y 2 de eficiencia). Memory: 64 GB. firmware: 7459.101.3. OS: Monterey 12.3.1. OS Version: 7459.101.3. Python version 3.8 and libraries (the most related) using !pip freeze keras==2.8.0 Keras-Preprocessing==1.1.2 .... tensorboard==2.8.0 tensorboard-data-server==0.6.1 tensorboard-plugin-wit==1.8.1 tensorflow-datasets==4.5.2 tensorflow-docs @ git+https://github.com/tensorflow/docs@7d5ea2e986a4eae7573be3face00b3cccd4b8b8b tensorflow-macos==2.8.0 tensorflow-metadata==1.7.0 tensorflow-metal==0.4.0 #####. CODE TO REPRODUCE. ####### Code does not fit in the max space in this message... I've shared a Google Collab Notebook at: https://colab.research.google.com/drive/1oDS8EV0eP6kToUYJuxHf5WCZlRL0Ypgn?usp=sharing You can easily see that loss goes to 0 after 1 or 2 epochs when GPU is enabled, buy if GPU is disabled everything is OK
Asked Last updated
.
Post not yet marked as solved
776 Views

TensorFlow model predictions are incorrect on M1 GPU

I have a TensorFlow 2.x object detection model (ssd resnet50 v1) that was trained on an Ubuntu 20.04 box with a GPU. The predictions from the model preform as expected on Linux CPU&GPU, Windows 10 CPU&GPU, and Intel MacBook Air CPU, and the M1 MacBook Air CPU. However, when I install the tensorflow-metal plugin on the M1, I can see the GPU is being used but the predictions are garbage. I followed these install instruction: https://developer.apple.com/metal/tensorflow-plugin/ Which gives me: tensorflow-macos 2.6.0 tensorflow-metal 0.2.0 and Python 3.9.5 Anyone have insight as to what may be the problem? The M1 Air is running the public release of Monterey.
Asked
by AdkPete.
Last updated
.
Post not yet marked as solved
469 Views

Error: command buffer exited with error status.

Experimenting with the Tensorflow text_classification example from (https://www.tensorflow.org/tutorials/keras/text_classification) I am constantly getting the following error when increasing the batch size to 512: Epoch 2/10 5/40 [==>...........................] - ETA: 5s - loss: 0.6887 - binary_accuracy: 0.7086 Error: command buffer exited with error status. The Metal Performance Shaders operations encoded on it may not have completed. Error: (null) Internal Error (0000000e:Internal Error) <AGXG13XFamilyCommandBuffer: 0x2e1897c10> label = <none> device = <AGXG13XDevice: 0x119460c00> name = Apple M1 Max commandQueue = <AGXG13XFamilyCommandQueue: 0x11946e400> label = <none> device = <AGXG13XDevice: 0x119460c00> name = Apple M1 Max retainedReferences = 1 With other experiments (which are working on other GPUs/Systems) I am getting the same error. How is it to be interpreted? Are there workarounds? Setup: Tensorflow 2.6.0 (installed as described here) Apple M1 Max, 64 GB Monterey 12.0.1
Asked
by joergw.
Last updated
.
Post not yet marked as solved
81 Views

InvalidArgumentError: Cannot assign a device for operation

class RankingModel(tf.keras.Model): def init(self): super().init() embedding_dimension = 32 # Compute embeddings for users. self.user_embeddings = tf.keras.Sequential([ tf.keras.layers.StringLookup( vocabulary=unique_user_ids, mask_token=None), tf.keras.layers.Embedding(len(unique_user_ids) + 1, embedding_dimension) ]) # Compute embeddings for movies. self.movie_embeddings = tf.keras.Sequential([ tf.keras.layers.StringLookup( vocabulary=unique_movie_titles, mask_token=None), tf.keras.layers.Embedding(len(unique_movie_titles) + 1, embedding_dimension) ]) # Compute predictions. self.ratings = tf.keras.Sequential([ # Learn multiple dense layers. tf.keras.layers.Dense(256, activation="relu"), tf.keras.layers.Dense(64, activation="relu"), # Make rating predictions in the final layer. tf.keras.layers.Dense(1) ]) def call(self, inputs): user_id, movie_title = inputs user_embedding = self.user_embeddings(user_id) movie_embedding = self.movie_embeddings(movie_title) return self.ratings(tf.concat([user_embedding, movie_embedding], axis=1)) task = tfrs.tasks.Ranking( loss = tf.keras.losses.MeanSquaredError(), metrics=[tf.keras.metrics.RootMeanSquaredError()] ) class MovielensModel(tfrs.models.Model): def init(self): super().init() self.ranking_model: tf.keras.Model = RankingModel() self.task: tf.keras.layers.Layer = tfrs.tasks.Ranking( loss = tf.keras.losses.MeanSquaredError(), metrics=[tf.keras.metrics.RootMeanSquaredError()] ) def call(self, features: Dict[str, tf.Tensor]) -> tf.Tensor: return self.ranking_model( (features["user_id"], features["movie_title"])) def compute_loss(self, features: Dict[Text, tf.Tensor], training=False) -> tf.Tensor: labels = features.pop("user_rating") rating_predictions = self(features) # The task computes the loss and the metrics. return self.task(labels=labels, predictions=rating_predictions) model = MovielensModel() model.compile(optimizer=tf.keras.optimizers.Adagrad(learning_rate=0.1)) cached_train = train.shuffle(100_000).batch(8192).cache() cached_test = test.batch(4096).cache() model.fit(cached_train, epochs=3) InvalidArgumentError Traceback (most recent call last) Input In [40], in <cell line: 5>() 1 #physical_devices = tf.config.list_physical_devices('GPU') 2 #tf.config.set_visible_devices(physical_devices[0], 'GPU') 3 #with tf.device("GPU"): 4 #model.compile(optimizer=tf.keras.optimizers.Adagrad(learning_rate=0.1)) ----> 5 model.fit(cached_train, epochs=3) File /opt/homebrew/Caskroom/miniforge/base/envs/mlp/lib/python3.8/site-packages/keras/utils/traceback_utils.py:67, in filter_traceback..error_handler(*args, **kwargs) 65 except Exception as e: # pylint: disable=broad-except 66 filtered_tb = _process_traceback_frames(e.traceback) ---> 67 raise e.with_traceback(filtered_tb) from None 68 finally: 69 del filtered_tb File /opt/homebrew/Caskroom/miniforge/base/envs/mlp/lib/python3.8/site-packages/tensorflow/python/eager/execute.py:54, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name) 52 try: 53 ctx.ensure_initialized() ---> 54 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, 55 inputs, attrs, num_outputs) 56 except core._NotOkStatusException as e: 57 if name is not None: InvalidArgumentError: Cannot assign a device for operation movielens_model_1/ranking_model_3/sequential_9/embedding_6/embedding_lookup: Could not satisfy explicit device specification '' because the node {{colocation_node movielens_model_1/ranking_model_3/sequential_9/embedding_6/embedding_lookup}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0]. Colocation Debug Info: Colocation group had the following types and supported devices: Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[] ResourceSparseApplyAdagradV2: CPU UnsortedSegmentSum: GPU CPU StridedSlice: GPU CPU Const: GPU CPU Shape: GPU CPU _Arg: GPU CPU Unique: GPU CPU Identity: GPU CPU ResourceGather: GPU CPU Colocation members, user-requested devices, and framework assigned devices, if any: movielens_model_1_ranking_model_3_sequential_9_embedding_6_embedding_lookup_4370 (_Arg) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0 adagrad_adagrad_update_resourcesparseapplyadagradv2_accum (_Arg) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0 movielens_model_1/ranking_model_3/sequential_9/embedding_6/embedding_lookup (ResourceGather) movielens_model_1/ranking_model_3/sequential_9/embedding_6/embedding_lookup/Identity (Identity) Adagrad/Adagrad/update/Unique (Unique) /job:localhost/replica:0/task:0/device:GPU:0 Adagrad/Adagrad/update/Shape (Shape) /job:localhost/replica:0/task:0/device:GPU:0 Adagrad/Adagrad/update/strided_slice/stack (Const) /job:localhost/replica:0/task:0/device:GPU:0 Adagrad/Adagrad/update/strided_slice/stack_1 (Const) /job:localhost/replica:0/task:0/device:GPU:0 Adagrad/Adagrad/update/strided_slice/stack_2 (Const) /job:localhost/replica:0/task:0/device:GPU:0 Adagrad/Adagrad/update/strided_slice (StridedSlice) /job:localhost/replica:0/task:0/device:GPU:0 Adagrad/Adagrad/update/UnsortedSegmentSum (UnsortedSegmentSum) /job:localhost/replica:0/task:0/device:GPU:0 Adagrad/Adagrad/update/ResourceSparseApplyAdagradV2 (ResourceSparseApplyAdagradV2) /job:localhost/replica:0/task:0/device:GPU:0 [[{{node movielens_model_1/ranking_model_3/sequential_9/embedding_6/embedding_lookup}}]] [Op:__inference_train_function_4593]
Asked
by Vishal89.
Last updated
.
Post not yet marked as solved
715 Views

Tensorflow on M1 Mac - Symbol not found: _OBJC_CLASS_$_MPSGraphCompilationDescriptor

Hello here. I try to install Tensorflow on my M1 Mac. Here is how I install : chmod +x ~/Downloads/Miniforge3-MacOSX-arm64.sh sh ~/Downloads/Miniforge3-MacOSX-arm64.sh source ~/miniforge3/bin/activate conda install -c apple tensorflow-deps python -m pip install tensorflow-macos python -m pip install tensorflow-metal When I run : python import tensorflow as tf I got this error : Traceback (most recent call last):   File "<stdin>", line 1, in <module>   File "/Users/floriane/miniforge3/lib/python3.9/site-packages/tensorflow/__init__.py", line 449, in <module>     _ll.load_library(_plugin_dir)   File "/Users/floriane/miniforge3/lib/python3.9/site-packages/tensorflow/python/framework/load_library.py", line 155, in load_library     py_tf.TF_LoadLibrary(lib) tensorflow.python.framework.errors_impl.NotFoundError: dlopen(/Users/floriane/miniforge3/lib/python3.9/site-packages/tensorflow-plugins/libmetal_plugin.dylib, 6): Symbol not found: _OBJC_CLASS_$_MPSGraphCompilationDescriptor   Referenced from: /Users/floriane/miniforge3/lib/python3.9/site-packages/tensorflow-plugins/libmetal_plugin.dylib (which was built for Mac OS X 12.0)   Expected in: /System/Library/Frameworks/MetalPerformanceShadersGraph.framework/Versions/A/MetalPerformanceShadersGraph I try to find some issues on forum but I didn't find. Somebody can help me ? Thank you
Asked
by FloEss.
Last updated
.
Post not yet marked as solved
449 Views

M1 Max GPU fails to converge in more complex models

We run into an issue that a more complex model fails to converge on M1 Max GPU while it converges on its CPU and on Non-M1 based models. the performance is the same for CPU and GPU for models with single RNN but once we use two RNNs GPU fails to converge. That said, the below example is based on non-sensical data for the model architecture used. but we can observe here the same behavior as the one we observe in our production models (which for obvious reasons we cannot share here). Mainly: the loss goes down to the bottom of the e-06 precision in all cases but when we use two RNNs on GPU. during training we often test e-07 precision level for double RNN with GPU condition, the results do not go that low sometimes reaching also e-05 value level. for our production data we see that double RNN with GPU results in loss of 1.0 and basically stays the same from the first epoch; but for the other conditions it often reaches 0.2 level with clear learning curve. in production model increasing the LSTM_Cell number made the divergence more visible (in this syntactic date it does not happen) the more complex the model is (after the RNN layers) the more visible the issue. Suspected issues: different precision used in CPU and GPU training - we had to decrease the data values a lot to make the effect visible ( if you work with raw data all approaches seem to produce the comparable results) somehow the vanishing gradient problem is more pronounced on GPU as indicated by worse performance as the complexity of the model increases. please let me know if you need any further details Software Stack: Mac OS 12.1 tf 2.7 metal 0.3 also tested on tf. 2.8 Sample Syntax: TEST CONDITIONS: #conditions with issue: 1,2 gpu = 1 # 0 CPU, 1 GPU model_size = 2 # 1 single RNN, 2 double RNN #PARAMETERS LSTM_Cells = 64 epochs = 300 batch = 128 import numpy as np import pandas as pd import sys from sklearn import preprocessing #""" if 'tensorflow' in sys.modules: print("tensorflow uploaded") del sys.modules["tensorflow"] #del tf import tensorflow as tf else: print("tensorflow not uploaded") import tensorflow as tf if gpu == 1: pass else: tf.config.set_visible_devices([], 'GPU') #print("GPUs:", tf.config.list_physical_devices('GPU')) print("GPUs:", tf.config.list_logical_devices('GPU')) #print("CPUs:", tf.config.list_physical_devices('CPU')) print("CPUs:", tf.config.list_logical_devices('CPU')) #""" from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data' column_names = ['MPG', 'Displacement', 'Horsepower', 'Weight'] dataset = pd.read_csv(url, names=column_names, na_values='?', comment='\t', sep=' ', skipinitialspace=True).dropna() scaler = preprocessing.StandardScaler().fit(dataset) X_scaled = scaler.transform(dataset) X_scaled = X_scaled * 0.001 Large Values #x_train = np.array(dataset[['Horsepower', 'Weight']]).reshape(-1,2,2) #y_train = np.array(dataset[['MPG','Displacement']]).reshape(-1,2,2) Small Values x_train = np.array(X_scaled[:,2:]).reshape(-1,2,2) y_train = np.array(X_scaled[:,:2]).reshape(-1,2,2) #print(dataset) print(x_train.shape) print(y_train.shape) print(weight.shape) train_data = tf.data.Dataset.from_tensor_slices((x_train[:,:,:8], y_train)).cache().shuffle(x_train.shape[0]).batch(batch).repeat().prefetch(tf.data.experimental.AUTOTUNE) if model_size == 2: #""" # MINIMAL NOT WORKING encoder_inputs = tf.keras.Input(shape=(x_train.shape[1],x_train.shape[2])) encoder_l1 = tf.keras.layers.LSTM(LSTM_Cells,return_sequences = True, return_state=True) encoder_l1_outputs = encoder_l1(encoder_inputs) encoder_l2 = tf.keras.layers.LSTM(LSTM_Cells, return_state=True) encoder_l2_outputs = encoder_l2(encoder_l1_outputs[0]) dense_1 = tf.keras.layers.Dense(128, activation='relu')(encoder_l2_outputs[0]) dense_2 = tf.keras.layers.Dense(64, activation='relu')(dense_1) dense_3 = tf.keras.layers.Dense(32, activation='relu')(dense_2) dense_4 = tf.keras.layers.Dense(16, activation='relu')(dense_3) flat = tf.keras.layers.Flatten()(dense_2) dense_5 = tf.keras.layers.Dense(22)(flat) reshape_output = tf.keras.layers.Reshape([2,2])(dense_5) model = tf.keras.models.Model(encoder_inputs, reshape_output) #""" else: #""" # WORKING encoder_inputs = tf.keras.Input(shape=(x_train.shape[1],x_train.shape[2])) encoder_l1 = tf.keras.layers.LSTM(LSTM_Cells,return_sequences = True, return_state=True) encoder_l1_outputs = encoder_l1(encoder_inputs) dense_1 = tf.keras.layers.Dense(128, activation='relu')(encoder_l1_outputs[0]) dense_2 = tf.keras.layers.Dense(64, activation='relu')(dense_1) dense_3 = tf.keras.layers.Dense(32, activation='relu')(dense_2) dense_4 = tf.keras.layers.Dense(16, activation='relu')(dense_3) flat = tf.keras.layers.Flatten()(dense_2) dense_5 = tf.keras.layers.Dense(22)(flat) reshape_output = tf.keras.layers.Reshape([2,2])(dense_5) model = tf.keras.models.Model(encoder_inputs, reshape_output) #""" print(model.summary()) loss_tf = tf.keras.losses.MeanSquaredError() model.compile(optimizer='adam', loss=loss_tf, run_eagerly=True) model.fit(train_data, epochs = epochs, steps_per_epoch = 3)
Asked
by sebtac.
Last updated
.
Post marked as solved
16k Views

I can't install TensorFlow-macos and TensorFlow-metal

Dear All, I am a Data Scientist and waiting for GPU accelerating for years, and I am thrilled while Apple announced it will come at MacOS 12. linkText And so, I updated my OS to Monterey Beta and tried to install TensorFlow-Metal a few days ago. However, all installing instruction commands not work at all. After that, I looked into pypi.org and found out there are whl files for TensorFlow-macos and TensorFlow-metal. So, I tried to pip install both whl files. Yet, noting work again. Here is the screenshot for installing. I would very much appreciate if you can help me to solve this issue. Sincerely,
Asked
by hawkiyc.
Last updated
.
Post not yet marked as solved
1.7k Views

Sklearn is unstable on Apple Silicon

Hi, I installed skearn successfully and ran the MINIST toy example successfully. then I started to run my project. The finning thing everything seems good at the start point (at least no ImportError occurs). but when I made some changes of my code and try to run all cells (I use jupyter lab) again, ImportError occurs..... ImportError: dlopen(/Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/qhull.cpython-39-darwin.so, 0x0002): Library not loaded: @rpath/liblapack.3.dylib   Referenced from: /Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/qhull.cpython-39-darwin.so   Reason: tried: '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/../../../../liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/../../../../liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/bin/../lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/bin/../lib/liblapack.3.dylib' (no such file), '/usr/local/lib/liblapack.3.dylib' (no such file), '/usr/lib/liblapack.3.dylib' (no such file) then I have to uninstall scipy, sklearn, etc and reinstall all of them. and my code can be ran again..... Magically I hate to say, anyone knows how to permanently solve this problem? make skearn more stable?
Asked Last updated
.