Machine Learning

bus error

When I use tensorflow to write maskrcnn, bus error will be reported. It is not a matter of memory, but a problem of the system itself. I hope it can be solved

Posted

by

xulinqing.

Last updated

.

Training custom data set model using mask_rcnn_inception from tensorflow model zoo

Running into GPU related error while working with latest tensorflow ( 2.13 ) . Please note the test model training provided on tensorflow-metal page to verify my setup works fine. PLEASE ADVISE - tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __wrapped__IteratorGetNext_output_types_18_device_/job:localhost/replica:0/task:0/device:GPU:0}} indices[0] = 0 is not in [0, 0) [[{{node GatherV2_7}}]] [[MultiDeviceIteratorGetNextFromShard]] [[RemoteCall]] [Op:IteratorGetNext] name: The above are the last lines of the error message. below is the full log from the model training script https://stackoverflow.com/questions/77076602/training-custom-data-set-model-using-mask-rcnn-inception-from-tensorflow-model-z I went to SO since I cant share the full log here due to length restrictions. Please help.

Posted

by

spookydumb.

Last updated

.

How to monitor Neural Engine usage on M1 macs?

I'm now running Tensorflow models on my Macbook Air 2020 M1, but I can't find a way to monitor the Neural Engine 16 cores usage to fine tune my ML tasks. The Activity Monitor only reports CPU% and GPU% and I can't find any APIs available on Mach include files in the MacOSX 11.1 sdk or documentation available so I can slap something together from scratch in C. Could anyone point me in some direction as to get a hold of the API for Neural Engine usage. Any indicator I could grab would be a start. It looks like this has been omitted from all sdk documentation and general userland, I've only found a ledger_tag_neural_footprint attribute, which looks memory related, and that's it.

Posted

by

rgolive.

Last updated

.

ANE-Optimized Layer Norm Fails on ANE

In the ml-ane-transformers repo, there is a custom LayerNorm implementation for the Neural Engine-optimized shape of (B,C,1,S). The coremltools documentation makes it sound like the layer_norm MIL op would support this natively. In fact, the following code works on CPU: B,C,S = 1,768,512 g,b = 1, 0 @mb.program(input_specs=[mb.TensorSpec(shape=(B,C,1,S)),]) def ln_prog(x): gamma = (torch.ones((C,), dtype=torch.float32) * g).tolist() beta = (torch.ones((C), dtype=torch.float32) * b).tolist() return mb.layer_norm(x=x, axes=[1], gamma=gamma, beta=beta, name="y") However it fails when run on the Neural Engine, giving results that are scaled by an incorrect value. Should this work on the Neural Engine?

Posted

by

smpanaro.

Last updated

.

AdamW crashes on tf-macos (tf-nightly-macos 2.14.0-dev20230523)

I initially raised this issue in the tensorflow forum, and they directed me back here since this is a tf-macos specific problem [see https://github.com/tensorflow/tensorflow/issues/60673]. When calling Model.compile() with the AdamW optimizer, a warning is thrown saying that v2.11+ optimizers have a known slowdown on M1/M2 devices, and so the backend attempts to fallback to a legacy version. However, no legacy version of the AdamW optimizer exists. In a previous tf-macos version 2.12, this lead to an error during Model.compile() [see issue https://github.com//issues/60652 and https://developer.apple.com/forums/thread/729732]. In the current nightly, this error is not thrown - however, after calling model.compile(), the attribute model.optimizer is set to string 'adamw' instead of an optimizer object. Later, when we call model.fit(), this leads to an AttributeError, because model.optimizer.minimize() does not exist when model.optimizer is a string. Expected behaviour: correctly compile the model with either a v2.11+ optimiser without slowdown, or a legacy-compatible implementation of the AdamW optimizer. Then the model will train correctly with a valid AdamW optimizer when calling model.fit(). Note: a warning message suggests using the optimizer located at tf.keras.optimizers.legacy.AdamW, but this does not exist It would be nice to be able to either use modern optimizers, or have a legacy-compatible version of AdamW, since weight-decay is an important tool in modern ML research, and currently cannot be used on mac. Standalone code to reproduce the issue ##===========## ## Imports ## ##===========## import sys import tensorflow as tf import numpy as np from tensorflow.keras.models import Model from tensorflow.keras.layers import Input, Dense from tensorflow.keras.optimizers import AdamW ##===================## ## Report versions ## ##===================## # # Expected outputs: # Python version is: 3.10.11 | packaged by conda-forge | (main, May 10 2023, 19:01:19) [Clang 14.0.6 ] # TF version is: 2.14.0-dev20230523 # Numpy version is: 1.23.2 # print(f"Python version is: {sys.version}") print(f"TF version is: {tf.__version__}") print(f"Numpy version is: {np.__version__}") ##==============================## ## Create a very simple model ## ##==============================## # # Expected outputs: # Model: "model_1" # _________________________________________________________________ # Layer (type) Output Shape Param # # ================================================================= # Layer_in (InputLayer) [(None, 2)] 0 # # Layer_hidden (Dense) (None, 10) 30 # # Layer_out (Dense) (None, 2) 22 # # ================================================================= # Total params: 52 (208.00 Byte) # Trainable params: 52 (208.00 Byte) # Non-trainable params: 0 (0.00 Byte) # _________________________________________________________________ # x_in = Input(2 , dtype=tf.float32, name="Layer_in" ) x = x_in x = Dense(10, dtype=tf.float32, name="Layer_hidden", activation="relu" )(x) x = Dense(2 , dtype=tf.float32, name="Layer_out" , activation="linear")(x) model = Model(x_in, x) model.summary() ##===================================================## ## Compile model with MSE loss and AdamW optimizer ## ##===================================================## # # Expected outputs: # WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.AdamW` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.AdamW`. # WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.AdamW`. # model.compile( loss = "mse", optimizer = AdamW(learning_rate=1e-3, weight_decay=1e-2) ) ##===========================## ## Generate some fake data ## ##===========================## # # Expected outputs: # X shape is (100, 2), Y shape is (100, 2) # dataset_size = 100 X = np.random.normal(size=(dataset_size, 2)) X = tf.constant(X, dtype=tf.float32) Y = np.random.normal(size=(dataset_size, 2)) Y = tf.constant(Y, dtype=tf.float32) print(f"X shape is {X.shape}, Y shape is {Y.shape}") ##===================================## ## Fit model to data for one epoch ## ##===================================## # # Expected outputs: # --------------------------------------------------------------------------- # AttributeError Traceback (most recent call last) # Cell In[9], line 51 # 1 ##===================================## # 2 ## Fit model to data for one epoch ## # 3 ##===================================## # (...) # 48 # • mask=None # 49 # # ---> 51 model.fit(X, Y, epochs=1) # File ~/miniforge3/envs/tf_macos_nightly_230523/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py:70, in filter_traceback.<locals>.error_handler(*args, **kwargs) # 67 filtered_tb = _process_traceback_frames(e.__traceback__) # 68 # To get the full stack trace, call: # 69 # `tf.debugging.disable_traceback_filtering()` # ---> 70 raise e.with_traceback(filtered_tb) from None # 71 finally: # 72 del filtered_tb # File /var/folders/6_/gprzxt797d5098h8dtk22nch0000gn/T/__autograph_generated_filezzqv9k36.py:15, in outer_factory.<locals>.inner_factory.<locals>.tf__train_function(iterator) # 13 try: # 14 do_return = True # ---> 15 retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope) # 16 except: # 17 do_return = False # AttributeError: in user code: # File "/Users/Ste/miniforge3/envs/tf_macos_nightly_230523/lib/python3.10/site-packages/keras/src/engine/training.py", line 1338, in train_function * # return step_function(self, iterator) # File "/Users/Ste/miniforge3/envs/tf_macos_nightly_230523/lib/python3.10/site-packages/keras/src/engine/training.py", line 1322, in step_function ** # outputs = model.distribute_strategy.run(run_step, args=(data,)) # File "/Users/Ste/miniforge3/envs/tf_macos_nightly_230523/lib/python3.10/site-packages/keras/src/engine/training.py", line 1303, in run_step ** # outputs = model.train_step(data) # File "/Users/Ste/miniforge3/envs/tf_macos_nightly_230523/lib/python3.10/site-packages/keras/src/engine/training.py", line 1084, in train_step # self.optimizer.minimize(loss, self.trainable_variables, tape=tape) # AttributeError: 'str' object has no attribute 'minimize' model.fit(X, Y, epochs=1)

Posted

by

smenary.

Last updated

.

Error with GPU JIT function with GPU tensor UNIMPLEMENTED: DefaultDeviceAssignment not supported for Metal Client.

Hi everyone, I'm trying to test some functionality of jax-metal and got this error. Any help please? import jax import jax.numpy as jnp import numpy as np def f(x): y1=x+x*x+3 y2=x*x+x*x.T return y1*y2 x = np.random.randn(3000,3000).astype('float32') jax_x_gpu = jax.device_put(jnp.array(x), jax.devices('METAL')[0]) jax_x_cpu = jax.device_put(jnp.array(x), jax.devices('cpu')[0]) jax_f_gpu = jax.jit(f, backend='METAL') jax_f_gpu(jax_x_gpu) --------------------------------------------------------------------------- XlaRuntimeError Traceback (most recent call last) Cell In[1], line 17 13 jax_x_cpu = jax.device_put(jnp.array(x), jax.devices('cpu')[0]) 15 jax_f_gpu = jax.jit(f, backend='METAL') ---> 17 jax_f_gpu(jax_x_gpu) [... skipping hidden 5 frame] File ~/.virtualenvs/jax-metal/lib/python3.11/site-packages/jax/_src/pjit.py:817, in _create_sharding_with_device_backend(device, backend) 814 elif backend is not None: 815 assert device is None 816 out = SingleDeviceSharding( --> 817 xb.get_backend(backend).get_default_device_assignment(1)[0]) 818 return out XlaRuntimeError: UNIMPLEMENTED: DefaultDeviceAssignment not supported for Metal Client.

Posted

by

lytkarinskiy.

Last updated

.

mpsgraphtool doesn't work

Hi there, I'm trying to convert my CoreML model (it's actually .mlpackage) to .mpsgraphpackage so I can test the performance of my model with MPSGraph API. I run the code you provide in terminal but it just does nothing (command execute forever). In Activity Monitor terminal uses 0.0% of CPU. I My XCode version 15.0 beta 6 (15A5219j) and running in OS Sonoma 14.0 Beta (23A5312d)

Posted

by

ohnatiuk.

Last updated

.

AudioFeaturePrint Create ML Components Property Limits

Hey, Are there any limits to the windowDuration property of the AudioFeaturePrint transformer such as the minimum value or maximum value? If we create a model with the Create ML UI App, upon selecting the AudioFeaturePrint as the feature extractor, we cannot go below 0.5 seconds for the window duration. Is the limit same if we programmatically create a model using the AudioFeaturePrint?

Posted

by

mspattan.

Last updated

.

Updatable model using built-in Create ML classifiers

Is it possible to create an updatable sound classifier model which uses Apple's built in MLSoundClassifier available via Create ML that can be trained/personalized on device using Core ML? I tried to look up in quite a few places for a long while, however, I know that when on-device training was initially announced in 2019, updatable models were only restricted to non built-in classifiers, but any additional information that may have come out after 2019 in this regard has been hard to find.

Posted

by

mspattan.

Last updated

.

Create ML activity training problem

Hi everyone! I’m trying to train an activity classification model with 3 classes. The problem is that only one class has precision and recall > 0 after training. Even with 2 classes result is the same First I’d thought that there is a problem with my data but when I switched “left” label to “right” and vice versa the results were the same: only “left”-labeled data get non-zero precision and recall.

Posted

by

corle.

Last updated

.

[Apple M1]: I got No registered 'AddN' OpKernel for 'GPU' devices compatible with node while training my model

Hi! GPU acceleration lacks of M1 GPU support (only with this specific model), getting this message when trying to run a trained model on GPU: NotFoundError: Graph execution error: No registered 'AddN' OpKernel for 'GPU' devices compatible with node {{node model_3/keras_layer_3/StatefulPartitionedCall/StatefulPartitionedCall/StatefulPartitionedCall/roberta_pack_inputs/StatefulPartitionedCall/RaggedConcat/ArithmeticOptimizer/AddOpsRewrite_Leaf_0_add_2}} (OpKernel was found, but attributes didn't match) Requested Attributes: N=2, T=DT_INT64, _XlaHasReferenceVars=false, _grappler_ArithmeticOptimizer_AddOpsRewriteStage=true, _device="/job:localhost/replica:0/task:0/device:GPU:0" . Registered: device='XLA_CPU_JIT'; T in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, 16534343205130372495, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64, DT_VARIANT] device='GPU'; T in [DT_FLOAT] device='DEFAULT'; T in [DT_INT32] device='CPU'; T in [DT_UINT64] device='CPU'; T in [DT_INT64] device='CPU'; T in [DT_UINT32] device='CPU'; T in [DT_UINT16] device='CPU'; T in [DT_INT16] device='CPU'; T in [DT_UINT8] device='CPU'; T in [DT_INT8] device='CPU'; T in [DT_INT32] device='CPU'; T in [DT_HALF] device='CPU'; T in [DT_BFLOAT16] device='CPU'; T in [DT_FLOAT] device='CPU'; T in [DT_DOUBLE] device='CPU'; T in [DT_COMPLEX64] device='CPU'; T in [DT_COMPLEX128] device='CPU'; T in [DT_VARIANT] [[model_3/keras_layer_3/StatefulPartitionedCall/StatefulPartitionedCall/StatefulPartitionedCall/roberta_pack_inputs/StatefulPartitionedCall/RaggedConcat/ArithmeticOptimizer/AddOpsRewrite_Leaf_0_add_2]] [Op:__inference_train_function_300451]

Posted

by

sm_96.

Last updated

.

CoreML Converter Missing Tensorflow Package

I am trying to convert my Tensorflow 2.0 model to a CoreML model so I can deploy it to a mobile app. However, I continually get the error: ValueError: Converter was called with source="tensorflow", but missing tensorflow package I am working in a virtual environment with Python 3.7, Tensorflow 2.11, and Coremltools 5.3.1. I had saved the Tensorflow model by using tensorflow.saved_model.save and was attempting to convert the model with the following: import coremltools as ct image_input = ct.ImageType(shape=(1, 250, 250, 3,), bias=[-1,-1,-1], scale=1/255) classifier_config = ct.ClassifierConfig(['Billy','Not_Billy']) core_model = ct.convert( <path_to_saved_model>, convert_to='mlprogram', inputs=[image_input], classifier_config=classifier_config, source='tensorflow' ) I keep receiving this error: --------------------------------------------------------------------------- ValueError Traceback (most recent call last) /var/folders/7n/vj_bf6q122bg43h_xm957hp80000gn/T/ipykernel_11024/1565729572.py in 6 inputs=[image_input], 7 classifier_config=classifier_config, ----> 8 source='tensorflow' 9 ) ~/Documents/Python/.venv/lib/python3.7/site-packages/coremltools/converters/_converters_entry.py in convert(model, source, inputs, outputs, classifier_config, minimum_deployment_target, convert_to, compute_precision, skip_model_load, compute_units, package_dir, debug, pass_pipeline) 466 _validate_conversion_arguments(model, exact_source, inputs, outputs_as_tensor_or_image_types, 467 classifier_config, compute_precision, --> 468 exact_target, minimum_deployment_target) 469 470 if pass_pipeline is None: ~/Documents/Python/.venv/lib/python3.7/site-packages/coremltools/converters/_converters_entry.py in _validate_conversion_arguments(model, exact_source, inputs, outputs, classifier_config, compute_precision, convert_to, minimum_deployment_target) 722 if exact_source == "tensorflow" and not _HAS_TF_1: 723 raise ValueError( --> 724 'Converter was called with source="tensorflow", but missing ' "tensorflow package" 725 ) 726 ValueError: Converter was called with source="tensorflow", but missing tensorflow package

Posted

by

cofcmsds.

Last updated

.

Comparing Performance: Inference CoreML Model with CoreMLTools in Python VS CoreML with Swift in a MacOS Application

Hello fellow developers, I am currently developing an application involving machine learning models, specifically CoreML models, and I have encountered an intriguing issue that I am hoping to get some insights on. In my current scenario, I'm planning to create a simple application with minimal UI, possibly using PyQT or similar tools. Therefore, I'm seeking a way to utilize NeuralEngine and GPU for CoreML model inference in Python. I discovered the 'predict' API in CoreMLTools which allows for model inference, but I'm unsure if its performance is on par with that of a properly built MacOS application using Swift and Neural Engine. Can anyone provide insights into whether there's a considerable difference in inference performance between these two methods? Is the performance of CoreMLTools 'predict' API comparable to that of a full-fledged Swift MacOS application leveraging the Neural Engine? Any clarification or guidance on this matter would be greatly appreciated. Thanks!

Posted

by

beeble123.

Last updated

.

wwdc21-10040: Detect people, faces, and poses using Vision

Can you share the source code for the demo of the Vision Face Detector with the metrics (roll, yaw and pitch) displayed? You provide some code online but not for this portion of the presentation.

Posted

by

rovingshoe.

Last updated

.

Keras with tensorflow-metal freezes during training with image augmentation

I am trying to train an image classification network in Keras with tensorflow-metal. The training freezes after the first 2-3 epochs if image augmentation layers are used (RandomFlip, RandomContrast, RandomBrightness) The system appears to use both GPU as well as CPU (as indicated by Activity Monitor). Also, warnings appear both in Jupyter and Terminal (see below). When the image augmentation layers are removed (i.e. we only rebuild the head and feed images from disk), CPU appears to be idle, no warnings appear, and training completes successfully. Versions: python 3.8, tensorflow-macos 2.11.0, tensorflow-metal 0.7.1 Sample code: img_augmentation = Sequential( [ layers.RandomFlip(), layers.RandomBrightness(factor=0.2), layers.RandomContrast(factor=0.2) ], name="img_augmentation", ) inputs = layers.Input(shape=(384, 384, 3)) x = img_augmentation(inputs) model = tf.keras.applications.EfficientNetV2S(include_top=False, input_tensor=x, weights='imagenet') model.trainable = False x = tf.keras.layers.GlobalAveragePooling2D(name="avg_pool")(model.output) x = tf.keras.layers.BatchNormalization()(x) top_dropout_rate = 0.2 x = tf.keras.layers.Dropout(top_dropout_rate, name="top_dropout")(x) outputs = tf.keras.layers.Dense(179, activation="softmax", name="pred")(x) newModel = Model(inputs=model.input, outputs=outputs, name="EfficientNet_DF20M_species") reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_accuracy', factor=0.9, patience=2, verbose=1, min_lr=0.000001) optimizer = tf.keras.optimizers.legacy.SGD(learning_rate=0.01, momentum=0.9) newModel.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy']) history = newModel.fit(x=train_ds, validation_data=val_ds, epochs=30, verbose=2, callbacks=[reduce_lr]) During training with image augmentation, Jupyter prints the following warnings while training the first epoch: WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformFullIntV2 cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting StatelessRandomGetKeyCounter cause there is no registered converter for this op. ... During training with image augmentation, Terminal keeps spamming the following warning: 2023-02-21 23:13:38.958633: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation. 2023-02-21 23:13:38.958920: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation. 2023-02-21 23:13:38.959071: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation. 2023-02-21 23:13:38.959115: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation. 2023-02-21 23:13:38.959359: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation. ... Any suggestions?

Posted

by

Cardu6lis.

Last updated

.

Implementation of some core functions of jax-metal

It appears that some of the jax core functions (in pjit, mlir) are not supported. Is this something to be supported in the future? For example, when I tested a diffrax example, from diffrax import diffeqsolve, ODETerm, Dopri5 import jax.numpy as jnp def f(t, y, args): return -y term = ODETerm(f) solver = Dopri5() y0 = jnp.array([2., 3.]) solution = diffeqsolve(term, solver, t0=0, t1=1, dt0=0.1, y0=y0) It generates an error saying EmitPythonCallback is not supported in metal. File ~/anaconda3/envs/jax-metal-0410/lib/python3.10/site-packages/jax/_src/interpreters/mlir.py:1787 in emit_python_callback raise ValueError( ValueError: `EmitPythonCallback` not supported on METAL backend. I uderstand that, currently, no M1 or M2 chips have multiple devices or can be arranged like that. Therefore, it may not be necessary to fully implement p*** functions (pmap, pjit, etc). But some powerful libraries use them. So, it would be great if at least some workaround for core functions are implemented. Or is there any easy fix for this?

Posted

by

sungsoo.

Last updated

.

MPSNDArrayConvolutionA14.mm:3967: failed assertion `destination kernel width and filter kernel width mismatch'

Hi, I am training an adversarial auto encoder using PyTorch 2.0.0 on Apple M2 (Ventura 13.1), with conda 23.1.0 as manager. I encountered this error: /AppleInternal/Library/BuildRoots/5b8a32f9-5db2-11ed-8aeb-7ef33c48bc85/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSNDArray/Kernels/MPSNDArrayConvolutionA14.mm:3967: failed assertion `destination kernel width and filter kernel width mismatch' /Users/vk/miniconda3/envs/betavae/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown To my knowledge, the code broke down when running self.manual_backward(loss["g_loss"]) this block: g_opt.zero_grad() self.manual_backward(loss["g_loss"]) g_opt.step() The same code run without problems on linux distribution. Any thoughts on how to fix it are highly appreciated!

Posted

by

RayXC.

Last updated

.

Code for Extract document data using Vision

I am looking for the examples demo'd by Frank in session wwdc21-10041. I don't seem to find it anywhere. Any lead is appreciated.

Posted

by

Behy.

Last updated

.

how to convert coreML model into tenser flow model with `.tflite` extension

I am trying to convert a core ML model created by turicreate into a formate that's compatible to use in android app. is there a direct way to do that? or should I create a new model and copy the current model weights over.

Posted

by

SohaAhmed.

Last updated

.

Parallelizing/MultiProcessing Vision API

First of all this vision api is amazing. the OCR is very accurate. I've been looking to multiprocess using the vision API. I have about 2 million PDFs I want to OCR, and I want to run multiple threads/run parallel processing to OCR each. I tried pyobjc but it does not work so well. Any suggestions on tackling this problem?

Posted

by

jsunghop.

Last updated

.

Posts under Machine Learning tag