It doesn't matter if I install miniforge or mamba, directly or through brew, when I try to fit the sample model from https://developer.apple.com/metal/tensorflow-plugin/, even with a simple sequential model, I always get this error.
Is there any workaround on this? I'll appreciate any help, thanks!
2022-12-10 11:18:19.941623: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2022-12-10 11:18:20.427283: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-10 11:18:21.222950: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90
2022-12-10 11:18:21.223003: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90
2022-12-10 11:18:21.363366: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90
2022-12-10 11:18:21.364757: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90
2022-12-10 11:18:21.388739: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90
2022-12-10 11:18:21.388757: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90
NotFoundError Traceback (most recent call last)
Cell In[25], line 2
1 model = create_model()
----> 2 history = model.fit(Xf_train, yf_train, epochs=3, batch_size=64);
File /opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/utils/traceback_utils.py:70, in filter_traceback..error_handler(*args, **kwargs)
67 filtered_tb = _process_traceback_frames(e.traceback)
68 # To get the full stack trace, call:
69 # tf.debugging.disable_traceback_filtering()
---> 70 raise e.with_traceback(filtered_tb) from None
71 finally:
72 del filtered_tb
File /opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/tensorflow/python/eager/execute.py:52, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
50 try:
51 ctx.ensure_initialized()
---> 52 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
53 inputs, attrs, num_outputs)
54 except core._NotOkStatusException as e:
55 if name is not None:
NotFoundError: Graph execution error:
Detected at node 'StatefulPartitionedCall_4' defined at (most recent call last):
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/ipykernel_launcher.py", line 17, in
app.launch_new_instance()
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/traitlets/config/application.py", line 992, in launch_instance
app.start()
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/ipykernel/kernelapp.py", line 711, in start
self.io_loop.start()
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/tornado/platform/asyncio.py", line 215, in start
self.asyncio_loop.run_forever()
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
self._run_once()
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/asyncio/base_events.py", line 1899, in _run_once
handle._run()
...
File "/var/folders/f9/bp40pn0d401d974fy48dxm8h0000gn/T/ipykernel_63636/3393788193.py", line 2, in <module>
history = model.fit(Xf_train, yf_train, epochs=3, batch_size=64);
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/engine/training.py", line 1650, in fit
tmp_logs = self.train_function(iterator)
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/engine/training.py", line 1249, in train_function
return step_function(self, iterator)
......
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/engine/training.py", line 1222, in run_step
outputs = model.train_step(data)
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/engine/training.py", line 1027, in train_step
self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize
self.apply_gradients(grads_and_vars)
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients
return super().apply_gradients(grads_and_vars, name=name)
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 634, in apply_gradients
iteration = self._internal_apply_gradients(grads_and_vars)
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1166, in _internal_apply_gradients
return tf.__internal__.distribute.interim.maybe_merge_call(
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1216, in _distributed_apply_gradients_fn
distribution.extended.update(
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1211, in apply_grad_to_update_var
return self._update_step_xla(grad, var, id(self._var_key(var)))
Node: 'StatefulPartitionedCall_4'
could not find registered platform with id: 0x28edf1f90
[[{{node StatefulPartitionedCall_4}}]] [Op:__inference_train_function_1241]
Explore the power of machine learning and Apple Intelligence within apps. Discuss integrating features, share best practices, and explore the possibilities for your app here.
Post
Replies
Boosts
Views
Activity
I have tried many times. When I change the file or re-create it, it shows 404 error
{
"code": 400,
"message": "InvalidArgumentError: Unable to unzip MLArchive",
"reason": "There was a problem with your request.",
"detailedMessage": "InvalidArgumentError: Unable to unzip MLArchive",
"requestUuid": "699afb97-8328-4a83-b186-851f797942aa"
}
I am trying to train an image classification network in Keras with tensorflow-metal.
The training freezes after the first 2-3 epochs if image augmentation layers are used (RandomFlip, RandomContrast, RandomBrightness)
The system appears to use both GPU as well as CPU (as indicated by Activity Monitor). Also, warnings appear both in Jupyter and Terminal (see below).
When the image augmentation layers are removed (i.e. we only rebuild the head and feed images from disk), CPU appears to be idle, no warnings appear, and training completes successfully.
Versions: python 3.8, tensorflow-macos 2.11.0, tensorflow-metal 0.7.1
Sample code:
img_augmentation = Sequential(
[
layers.RandomFlip(),
layers.RandomBrightness(factor=0.2),
layers.RandomContrast(factor=0.2)
],
name="img_augmentation",
)
inputs = layers.Input(shape=(384, 384, 3))
x = img_augmentation(inputs)
model = tf.keras.applications.EfficientNetV2S(include_top=False, input_tensor=x, weights='imagenet')
model.trainable = False
x = tf.keras.layers.GlobalAveragePooling2D(name="avg_pool")(model.output)
x = tf.keras.layers.BatchNormalization()(x)
top_dropout_rate = 0.2
x = tf.keras.layers.Dropout(top_dropout_rate, name="top_dropout")(x)
outputs = tf.keras.layers.Dense(179, activation="softmax", name="pred")(x)
newModel = Model(inputs=model.input, outputs=outputs, name="EfficientNet_DF20M_species")
reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_accuracy', factor=0.9, patience=2, verbose=1, min_lr=0.000001)
optimizer = tf.keras.optimizers.legacy.SGD(learning_rate=0.01, momentum=0.9)
newModel.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
history = newModel.fit(x=train_ds, validation_data=val_ds, epochs=30, verbose=2, callbacks=[reduce_lr])
During training with image augmentation, Jupyter prints the following warnings while training the first epoch:
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformFullIntV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomGetKeyCounter cause there is no registered converter for this op.
...
During training with image augmentation, Terminal keeps spamming the following warning:
2023-02-21 23:13:38.958633: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation.
2023-02-21 23:13:38.958920: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation.
2023-02-21 23:13:38.959071: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation.
2023-02-21 23:13:38.959115: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation.
2023-02-21 23:13:38.959359: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation.
...
Any suggestions?
In the ml-ane-transformers repo, there is a custom LayerNorm implementation for the Neural Engine-optimized shape of (B,C,1,S).
The coremltools documentation makes it sound like the layer_norm MIL op would support this natively. In fact, the following code works on CPU:
B,C,S = 1,768,512
g,b = 1, 0
@mb.program(input_specs=[mb.TensorSpec(shape=(B,C,1,S)),])
def ln_prog(x):
gamma = (torch.ones((C,), dtype=torch.float32) * g).tolist()
beta = (torch.ones((C), dtype=torch.float32) * b).tolist()
return mb.layer_norm(x=x, axes=[1], gamma=gamma, beta=beta, name="y")
However it fails when run on the Neural Engine, giving results that are scaled by an incorrect value.
Should this work on the Neural Engine?
Hi everyone!
I’m trying to train an activity classification model with 3 classes. The problem is that only one class has precision and recall > 0 after training.
Even with 2 classes result is the same
First I’d thought that there is a problem with my data but when I switched “left” label to “right” and vice versa the results were the same: only “left”-labeled data get non-zero precision and recall.
does anyone know if the CreateML app has a way to build Support Vector Machine models for tabular regression? I see only the attached options. xcode14.2
Is it possible to create an updatable sound classifier model which uses Apple's built in MLSoundClassifier available via Create ML that can be trained/personalized on device using Core ML?
I tried to look up in quite a few places for a long while, however, I know that when on-device training was initially announced in 2019, updatable models were only restricted to non built-in classifiers, but any additional information that may have come out after 2019 in this regard has been hard to find.
Testing out https://developer.apple.com/metal/jax/ mainly for trying to run whisper-jax (https://github.com/sanchit-gandhi/whisper-jax/tree/main) on my M2.
The jax-metal plugin seems to install without issues, and the basic test code runs fine. However, the jax-whisper code fails when trying to encode a file with the following error:
error: failed to legalize operation 'mhlo.convolution'
/Users/pere/jax-metal/lib/python3.10/site-packages/whisper_jax/layers.py:1236:0: note: see current operation: %111 = "mhlo.convolution"(%110, <<UNKNOWN SSA VALUE>>) {batch_group_count = 1 : i64, dimension_numbers = #mhlo.conv<[b, 0, f]x[0, i, o]->[b, 0, f]>, feature_group_count = 1 : i64, lhs_dilation = dense<1> : tensor<1xi64>, padding = dense<1> : tensor<1x2xi64>, precision_config = [#mhlo<precision DEFAULT>, #mhlo<precision DEFAULT>], rhs_dilation = dense<1> : tensor<1xi64>, window_reversal = dense<false> : tensor<1xi1>, window_strides = dense<1> : tensor<1xi64>} : (tensor<1x3000x80xf32>, tensor<3x80x384xf32>) -> tensor<1x3000x384xf32>
Hey, Are there any limits to the windowDuration property of the AudioFeaturePrint transformer such as the minimum value or maximum value?
If we create a model with the Create ML UI App, upon selecting the AudioFeaturePrint as the feature extractor, we cannot go below 0.5 seconds for the window duration. Is the limit same if we programmatically create a model using the AudioFeaturePrint?
In the video of Explore Natural Language multilingual models https://developer.apple.com/videos/play/wwdc2023/10042/, it's said at 6:24 that there are three models.
I wonder if it is possible to find semantic similairity between models? For example English and Japanese belong to different models(Latin and CJK), can we compare the vector produced from the different models to find out if two sentences have similar meanings?
I am using SFSpeechRecognizer to perform speech recognition, but I am getting the following error.
[SpeechFramework] -[SFSpeechRecognitionTask localSpeechRecognitionClient:speechRecordingDidFail:]_block_invoke Ignoring subsequent local speech recording error: Error Domain=kAFAssistantErrorDomain Code=1101 "(null)"
Setting requiresOnDeviceRecognition to False works correctly, but previously it worked with True with no error.
The value of supportsOnDeviceRecognition was True, so the device is recognizing that it supports speech recognition.
iPad Pro 11inch iOS 16.5.
Is this expected behavior?
I need a simple text-to-speech avatar in my iOS app. iOS already has Memojis ready to go - but I cannot find anywhere in the dev docs on how to access Memojis to use in as a tool in app development. Am I missing something? Also - can anyone point me to any resources besides the Apple docs for using AVSpeechSynthesis?
I want to add shortcut and Siri support using the new AppIntents framework. Running my intent using shortcuts or from spotlight works fine, as the touch based UI for the disambiguation is shown. However, when I ask Siri to perform this action, she gets into a loop of asking me the question to set the parameter.
My AppIntent is implemented as following:
struct StartSessionIntent: AppIntent {
static var title: LocalizedStringResource = "start_recording"
@Parameter(title: "activity", requestValueDialog: IntentDialog("which_activity"))
var activity: ActivityEntity
@MainActor
func perform() async throws -> some IntentResult & ProvidesDialog {
let activityToSelect: ActivityEntity = self.activity
guard let selectedActivity = Activity[activityToSelect.name] else {
return .result(dialog: "activity_not_found")
}
...
return .result(dialog: "recording_started \(selectedActivity.name.localized())")
}
}
The ActivityEntity is implemented like this:
struct ActivityEntity: AppEntity {
static var typeDisplayRepresentation = TypeDisplayRepresentation(name: "activity")
typealias DefaultQuery = MobilityActivityQuery
static var defaultQuery: MobilityActivityQuery = MobilityActivityQuery()
var id: String
var name: String
var icon: String
var displayRepresentation: DisplayRepresentation {
DisplayRepresentation(title: "\(self.name.localized())", image: .init(systemName: self.icon))
}
}
struct MobilityActivityQuery: EntityQuery {
func entities(for identifiers: [String]) async throws -> [ActivityEntity] {
Activity.all()?.compactMap({ activity in
identifiers.contains(where: { $0 == activity.name }) ? ActivityEntity(id: activity.name, name: activity.name, icon: activity.icon) : nil
}) ?? []
}
func suggestedEntities() async throws -> [ActivityEntity] {
Activity.all()?.compactMap({ activity in
ActivityEntity(id: activity.name, name: activity.name, icon: activity.icon)
}) ?? []
}
}
Has anyone an idea what might be causing this and how I can fix this behavior? Thanks in advance
Hi,
I've found a memory leak issue when using the tensorFlow-metal plugin for running a deep learning model on a Mac with the M1 chip. Here are the details of my system:
System Information
MacOS version: 13.4
TensorFlow (macos) version: 2.12.0, 2.13.0-rc1, tf-nightly==2.14.0.dev20230616
TensorFlow-Metal Plugin Version: 0.8, 1.0.0, 1.0.1
Model Details
I've implemented a custom model architecture using TensorFlow's Keras API. The model has a dynamic Input, which I resize the images in a Resizing layer. Moreover, the data is passed to the model through a data generator class, using model.fit().
Problem Description
When I train this model using the GPU on M1 Mac, I observe a continuous increase in memory usage, leading to a memory leak. This memory increase is more prominent with larger image inputs. For smaller images or average sizes (1024x128), the increase is smaller, but continuous, leading to a memory leak after several epochs.
On the other hand, when I switch to using the CPU for training (tf.config.set_visible_devices([], 'GPU')), the memory leak issue is resolved, and I observe normal memory usage. In addition, I've tested the model with different sizes of images and various layer configurations. The memory leak appears to be present only when using the GPU, indeed.
I hope this information is helpful in identifying and resolving the issue. If you need any further details, please let me know. The project code is private, but I can try to provide it with pseudocode if necessary.
System Information
MacOS version: 13.4
TensorFlow (macos) version: tf-nightly==2.14.0.dev20230616
TensorFlow-Metal Plugin Version: 1.0.1
Problem Description
I'm trying to compute the CTC Loss using TensorFlow's tf.nn.ctc_loss on M1 Mac, but an error is thrown indicating that no OpKernel was registered to support the CTCLossV2 operation. However, when using the CPU or even tf.keras.backend.ctc_batch_cost, it works fine. The error stack trace is as follows:
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:
Detected at node CTCLossV2 defined at (most recent call last):
<stack traces unavailable>
No OpKernel was registered to support Op 'CTCLossV2' used by {{node CTCLossV2}} with these attrs: [ctc_merge_repeated=true, preprocess_collapse_repeated=false, ignore_longer_outputs_than_inputs=false]
Registered devices: [CPU, GPU]
Registered kernels:
<no registered kernels>
[[CTCLossV2]]
[[ctc_loss_func/PartitionedCall]] [Op:__inference_train_function_13095]
I see a lot of crashes on iOS 17 beta regarding some problem of "Text To Speech". Does anybody has a clue why TTS crashes? Anybody else seeing the same problem?
Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Subtype: KERN_INVALID_ADDRESS at 0x000000037f729380
Exception Codes: 0x0000000000000001, 0x000000037f729380
VM Region Info: 0x37f729380 is not in any region. Bytes after previous region: 3748828033 Bytes before following region: 52622617728
REGION TYPE START - END [ VSIZE] PRT/MAX SHRMOD REGION DETAIL
MALLOC_NANO 280000000-2a0000000 [512.0M] rw-/rwx SM=PRV
---> GAP OF 0xd20000000 BYTES
commpage (reserved) fc0000000-1000000000 [ 1.0G] ---/--- SM=NUL ...(unallocated)
Termination Reason: SIGNAL 11 Segmentation fault: 11
Terminating Process: exc handler [36389]
Triggered by Thread: 9
.....
Thread 9 name:
Thread 9 Crashed:
0 libobjc.A.dylib 0x000000019eeff248 objc_retain_x8 + 16
1 AudioToolboxCore 0x00000001b2da9d80 auoop::RenderPipeUser::~RenderPipeUser() + 112 (AUOOPRenderPipePool.mm:400)
2 AudioToolboxCore 0x00000001b2e110b4 -[AUAudioUnit_XPC internalDeallocateRenderResources] + 92 (AUAudioUnit_XPC.mm:904)
3 AVFAudio 0x00000001bfa4cc04 AUInterfaceBaseV3::Uninitialize() + 60 (AUInterface.mm:524)
4 AVFAudio 0x00000001bfa894bc AVAudioEngineGraph::PerformCommand(AUGraphNodeBaseV3&, AVAudioEngineGraph::ENodeCommand, void*, unsigned int) const + 772 (AVAudioEngineGraph.mm:3317)
5 AVFAudio 0x00000001bfa93550 AVAudioEngineGraph::_Uninitialize(NSError**) + 132 (AVAudioEngineGraph.mm:1469)
6 AVFAudio 0x00000001bfa4b50c AVAudioEngineImpl::Stop(NSError**) + 396 (AVAudioEngine.mm:1081)
7 AVFAudio 0x00000001bfa4b094 -[AVAudioEngine stop] + 48 (AVAudioEngine.mm:193)
8 TextToSpeech 0x00000001c70b3c5c __55-[TTSSynthesisProviderAudioEngine renderSpeechRequest:]_block_invoke + 1756 (TTSSynthesisProviderAudioEngine.m:613)
9 libdispatch.dylib 0x00000001ae4b0740 _dispatch_call_block_and_release + 32 (init.c:1519)
10 libdispatch.dylib 0x00000001ae4b2378 _dispatch_client_callout + 20 (object.m:560)
11 libdispatch.dylib 0x00000001ae4b990c _dispatch_lane_serial_drain + 748 (queue.c:3885)
12 libdispatch.dylib 0x00000001ae4ba470 _dispatch_lane_invoke + 432 (queue.c:3976)
13 libdispatch.dylib 0x00000001ae4c5074 _dispatch_root_queue_drain_deferred_wlh + 288 (queue.c:6913)
14 libdispatch.dylib 0x00000001ae4c48e8 _dispatch_workloop_worker_thread + 404 (queue.c:6507)
...
Thread 9 crashed with ARM Thread State (64-bit):
x0: 0x0000000283309360 x1: 0x0000000000000000 x2: 0x0000000000000000 x3: 0x00000002833093c0
x4: 0x00000002833093c0 x5: 0x0000000101737740 x6: 0x0000000000000013 x7: 0x00000000ffffffff
x8: 0x0000000283309360 x9: 0x3c788942d067009a x10: 0x0000000101547000 x11: 0x0000000000000000
x12: 0x00000000000007fb x13: 0x00000000000007fd x14: 0x000000001ee24020 x15: 0x0000000000000020
x16: 0x0000b1037f729360 x17: 0x000000037f729360 x18: 0x0000000000000000 x19: 0x0000000000000000
x20: 0x00000001016a8de8 x21: 0x0000000283e21d00 x22: 0x0000000283b3f1f8 x23: 0x0000000283098000
x24: 0x00000001bfb4fc35 x25: 0x00000001bfb4fc43 x26: 0x000000028033a688 x27: 0x0000000280c93090
x28: 0x0000000000000000 fp: 0x000000016fc86490 lr: 0x00000001b2da9d80
sp: 0x000000016fc863e0 pc: 0x000000019eeff248 cpsr: 0x1000
esr: 0x92000006 (Data Abort) byte read Translation fault
I have a custom model that runs just fine on the CPU under tensorflow-2.13.0rc1. I'd go to the stable 1.12.0, however there's no pip version that can be installed on an ARM-based computer.
If I install tensorflow-metal 1.0.1 the same model crashes in what appears to be a protobuf initialization:
0 libtensorflow_framework.2.dylib 0x31e103724 google::protobuf::Message::InitializationErrorString() const + 88
1 libarrow.600.dylib 0x168dc95c0 google::protobuf::MessageLite::ParseFromArray(void const*, int) + 276
2 libmetal_plugin.dylib 0x371c0b400 metal_plugin::P_Optimize(void*, TF_Buffer const*, TF_GrapplerItem const*, TF_Buffer*, TSL_Status*) + 88
3 libtensorflow_cc.2.dylib 0x359f5196c tensorflow::grappler::CGraphOptimizer::Optimize(tensorflow::grappler::Cluster*, tensorflow::grappler::GrapplerItem const&, tensorflow::GraphDef*) + 116
This is on an M2 Ultra with macOS 13.4.1.
Can anyone from the tensorflow-metal team look at what's going on?
Thanks!
XlaRuntimeError Traceback (most recent call last)
Cell In[49], line 4
1 arr = jnp.array( [7, 8, 9])
3 # Find indices where the condition is True
----> 4 indices = jnp.where(arr > 1)
6 print(indices)
XlaRuntimeError: UNKNOWN
my personal python version is 3.8.6
pip install --upgrade tensorflow
ERROR: Could not find a version that satisfies the requirement t ensorflow (from versions: none)
ERROR: No matching distribution found for tensorflow
how to fix it?
Hello everyone, I am new to using Create ML, but am running up against a problem where the error is not descriptive, and I can not figure out what might be causing it. I am fairly sure my data is formatted properly, as in the CreateML software, it detects the images and shows me a bar graph of how many images belong to each label. But when it comes to actually training, the moment I press the "Train" button, it shows an error with the message: "Unexpected Error".
I have also attempted to create and train the model programmatically, and that actually works! The framework requires that the JSON be named "annotations.json" instead of "annotation.json" and that the key representing the name of image be changed to "filename" from "image", but other than that, the data is the same. I tried to use the software with the changes I made to the JSON for use in the framework, but if I try that, it won't even parse the data, so I am fairly sure that my data is formatted correctly.
I would prefer to use the app rather than do everything programmatically, because it presents the data in a much more digestible way.
Has anyone else come up against this issue or a similar issue. I should note that I am running the latest Beta of MacOS Sonoma and Xcode.