Cannot assign a device for operation ReadVariableOp

Hello, I cannot predict with my model on Apple M1. I get a error:

Traceback (most recent call last):

  File "/Users/martin/Documents/Projects/rl-toolkit/rl_toolkit/__main__.py", line 154, in <module>

    agent.run()

  File "/Users/martin/Documents/Projects/rl-toolkit/rl_toolkit/training.py", line 213, in run

    losses = self._train(sample)

  File "/Users/martin/miniforge3/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 889, in __call__

    result = self._call(*args, **kwds)

  File "/Users/martin/miniforge3/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 950, in _call

    return self._stateless_fn(*args, **kwds)

  File "/Users/martin/miniforge3/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 3023, in __call__

    return graph_function._call_flat(

  File "/Users/martin/miniforge3/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 1960, in _call_flat

    return self._build_call_outputs(self._inference_function.call(

  File "/Users/martin/miniforge3/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 591, in call

    outputs = execute.execute(

  File "/Users/martin/miniforge3/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute

    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,

tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation ReadVariableOp: Could not satisfy explicit device specification '' because the node {{colocation_node ReadVariableOp}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0]. 

Colocation Debug Info:

Colocation group had the following types and supported devices: 

Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]

ResourceApplyAdamWithAmsgrad: CPU 

ReadVariableOp: GPU CPU 

_Arg: GPU CPU 



Colocation members, user-requested devices, and framework assigned devices, if any:

  readvariableop_resource (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0

  adam_2_adam_update_6_resourceapplyadamwithamsgrad_m (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0

  adam_2_adam_update_6_resourceapplyadamwithamsgrad_v (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0

  adam_2_adam_update_6_resourceapplyadamwithamsgrad_vhat (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0

  ReadVariableOp (ReadVariableOp) 

  Exp/ReadVariableOp (ReadVariableOp) 

  ReadVariableOp_1 (ReadVariableOp) 

  actor/ReadVariableOp (ReadVariableOp) 

  actor/Exp/ReadVariableOp (ReadVariableOp) 

  actor/ReadVariableOp_1 (ReadVariableOp) 

  actor_critic/actor/ReadVariableOp (ReadVariableOp) 

  actor_critic/actor/Exp/ReadVariableOp (ReadVariableOp) 

  actor_critic/actor/ReadVariableOp_1 (ReadVariableOp) 

  Adam_2/Adam/update_6/ResourceApplyAdamWithAmsgrad (ResourceApplyAdamWithAmsgrad) /job:localhost/replica:0/task:0/device:GPU:0



	 [[{{node ReadVariableOp}}]] [Op:__inference__train_4206]

Post not yet marked as solved Up vote post of markub3327 Down vote post of markub3327
2.1k views
  • I have the same problem. tensorflow.config.set_soft_device_placement(True) should solve such a problem, but it did not.

Add a Comment

Replies

Hi @markub3327, The issue is related to colocation error in Tensorflow due to missing operation ResourceApplyAdamWithAmsgrad in Metal plugin. Thanks for providing a reproducible case, we will take a look and provide update here.


ResourceApplyAdamWithAmsgrad: CPU       <== this Op is currently not supported in Metal plugin

ReadVariableOp: GPU CPU 

_Arg: GPU CPU 
  • Is there any update on when tensorflow-metal will implement these missing operations? I am using: tensorflow-macos 2.8.0 tensorflow-metal 0.4.0

    I am getting the following error which I assume is the same issue as above.

    Colocation Debug Info: Colocation group had the following types and supported devices: Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[] ReadVariableOp: GPU CPU ResourceApplyAdam: CPU _Arg: GPU CPU Colocation members, user-requested devices, and framework assigned devices, if any: sequential_dense_matmul_readvariableop_resource (_Arg) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0 adam_adam_update_resourceapplyadam_m (_Arg) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0 adam_adam_update_resourceapplyadam_v (_Arg) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0 sequential/dense/MatMul/ReadVariableOp (ReadVariableOp) Adam/Adam/update/ResourceApplyAdam (ResourceApplyAdam) /job:localhost/replica:0/task:0/device:GPU:0 [[{{node sequential/dense/MatMul/ReadVariableOp}}]] [Op:__inference_train_function_809]
Add a Comment

Hi - I have the same issue when applying certain Data Augmentation layers - RandomFlip works but RandomZoom does not. I'm using tf.keras.layers.experimental.preprocessing.*** for my layers.

Is there any update on a fix?

Iain

  • Did you ever find a solution for the data augmentation layers?

Add a Comment

Hi! I get same error when trying to fine-tune EfficientNetB7. Any updates at this issue?

M1 Max 32Gb, Monterey 12.1

InvalidArgumentError: Cannot assign a device for operation sequential/efficientnetb7/stem_conv/Conv2D/ReadVariableOp: Could not satisfy explicit device specification '' because the node {{colocation_node sequential/efficientnetb7/stem_conv/Conv2D/ReadVariableOp}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0]. 
Colocation Debug Info:
Colocation group had the following types and supported devices: 
Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
ResourceApplyAdaMax: CPU 
ReadVariableOp: GPU CPU 
_Arg: GPU CPU 

Colocation members, user-requested devices, and framework assigned devices, if any:
  sequential_efficientnetb7_stem_conv_conv2d_readvariableop_resource (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  adamax_adamax_update_resourceapplyadamax_m (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  adamax_adamax_update_resourceapplyadamax_v (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  sequential/efficientnetb7/stem_conv/Conv2D/ReadVariableOp (ReadVariableOp) 
  Adamax/Adamax/update/ResourceApplyAdaMax (ResourceApplyAdaMax) /job:localhost/replica:0/task:0/device:GPU:0

	 [[{{node sequential/efficientnetb7/stem_conv/Conv2D/ReadVariableOp}}]] [Op:__inference_train_function_56534]

M1 Max 64GB, Monterey 12.0.1

This is a strange error. To resolve I added the expected input shape to the first layer

# Failing:
# Possibly unrelated:  the `with` block was added as the following code didn't run on GPU:
with tf.device('/cpu:0'):
  data_augmentation = tf.keras.Sequential([
    tf.keras.layers.RandomFlip('horizontal'),
    tf.keras.layers.RandomRotation(0.2),
    ])

The above caused the following errors

# The following error was received - but after adding the input shape it worked fine
# Colocation group had the following types and supported devices: 
# Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
# RngReadAndSkip: CPU 
# _Arg: GPU CPU 

# Colocation members, user-requested devices, and framework assigned devices, if any:
#   model_sequential_2_random_flip_2_stateful_uniform_full_int_rngreadandskip_resource (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
#   model/sequential_2/random_flip_2/stateful_uniform_full_int/RngReadAndSkip (RngReadAndSkip) 

#          [[{{node model/sequential_2/random_flip_2/stateful_uniform_full_int/RngReadAndSkip}}]] [Op:__inference_train_function_12915]

Adding the input_shape size seems to have resolved the issue and the model is training.

# Working
with tf.device('/cpu:0'):
  data_augmentation = tf.keras.Sequential([
    tf.keras.layers.RandomFlip('horizontal',input_shape=(IMG_SIZE[0],IMG_SIZE[1],3)),
    tf.keras.layers.RandomRotation(0.2),
    ])

I also had this configuration:

tf.config.set_soft_device_placement(True) 

@Frameworks Engineer

Is there any update on when tensorflow-metal will implement these missing operations?

I am using:

tensorflow-macos         2.8.0
tensorflow-metal         0.4.0

I am getting the following error which I assume is the same issue as above.

Colocation Debug Info:
Colocation group had the following types and supported devices: 
Root Member(assigned_device_name_index_=2 
requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' 
assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' 
resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' 
supported_device_types_=[CPU] possible_devices_=[]
ReadVariableOp: GPU CPU 
ResourceApplyAdam: CPU 
_Arg: GPU CPU 

Colocation members, user-requested devices, and framework assigned 
devices, if any:
  sequential_dense_matmul_readvariableop_resource (_Arg)  framework 
assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  adam_adam_update_resourceapplyadam_m (_Arg)  framework assigned 
device=/job:localhost/replica:0/task:0/device:GPU:0
  adam_adam_update_resourceapplyadam_v (_Arg)  framework assigned 
device=/job:localhost/replica:0/task:0/device:GPU:0
  sequential/dense/MatMul/ReadVariableOp (ReadVariableOp) 
  Adam/Adam/update/ResourceApplyAdam (ResourceApplyAdam) 
/job:localhost/replica:0/task:0/device:GPU:0

	 [[{{node sequential/dense/MatMul/ReadVariableOp}}]] [Op:__inference_train_function_809]

Hi I'm using Apple M1 Pro and OS is Monterey 12.3.1, and with

tensorflow-macos         2.8.0
tensorflow-metal         0.4.0

Getting the same problem. While I added an augmentation layer to my model.

Costs me like 3 hours to figure it out.

My solution is, I manually assign gpu using with tf.device("/gpu:0"): like this:

with tf.device("/gpu:0"):
    model.compile(loss="categorical_crossentropy", optimizer='adma', metrics=['accuracy'])
    history = model.fit(train_ds,epochs=epochs,validation_data=val_ds, use_multiprocessing=True)

Don't need to change/add other code. Just a simple line of code.

It worked like a charm for me, you guys can try it!!!

I hope it is helpful to anyone who come across this strange problem.

A workaround is to uninstall tensorflow-meta: pip uninstall tensorflow-meta

history=model.fit( train_ds, epochs=EPOCHS, batch_size=BATCH_SIZE, verbose=1, validation_data=val_ds, use_multiprocessing=True )

I have use this code and getting this error in my MacBook Air m1 2020

InvalidArgumentError Traceback (most recent call last) Input In [57], in <cell line: 1>() ----> 1 history=model.fit( 2 train_ds, 3 epochs=EPOCHS, 4 batch_size=BATCH_SIZE, 5 verbose=1, 6 validation_data=val_ds, 7 use_multiprocessing=True 8 )

File ~/opt/miniconda3/envs/tensorflow/lib/python3.9/site-packages/keras/utils/traceback_utils.py:67, in filter_traceback..error_handler(*args, **kwargs) 65 except Exception as e: # pylint: disable=broad-except 66 filtered_tb = _process_traceback_frames(e.traceback) ---> 67 raise e.with_traceback(filtered_tb) from None 68 finally: 69 del filtered_tb

File ~/opt/miniconda3/envs/tensorflow/lib/python3.9/site-packages/tensorflow/python/eager/execute.py:54, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name) 52 try: 53 ctx.ensure_initialized() ---> 54 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, 55 inputs, attrs, num_outputs) 56 except core._NotOkStatusException as e: 57 if name is not None:

InvalidArgumentError: Cannot assign a device for operation sequential_5/sequential_4/random_flip_1/stateful_uniform_full_int/RngReadAndSkip: Could not satisfy explicit device specification '' because the node {{colocation_node sequential_5/sequential_4/random_flip_1/stateful_uniform_full_int/RngReadAndSkip}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0]. Colocation Debug Info: Colocation group had the following types and supported devices: Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[] RngReadAndSkip: CPU _Arg: GPU CPU

Colocation members, user-requested devices, and framework assigned devices, if any: sequential_5_sequential_4_random_flip_1_stateful_uniform_full_int_rngreadandskip_resource (_Arg) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0 sequential_5/sequential_4/random_flip_1/stateful_uniform_full_int/RngReadAndSkip (RngReadAndSkip) sequential_5/sequential_4/random_flip_1/stateful_uniform_full_int_1/RngReadAndSkip (RngReadAndSkip)

 [[{{node sequential_5/sequential_4/random_flip_1/stateful_uniform_full_int/RngReadAndSkip}}]] [Op:__inference_train_function_11336]