Cannot assign a device for operation ReadVariableOp

Question

Created Aug ’21

Replies 12

Boosts 2

Views 7.4k

Participants 16

Hello, I cannot predict with my model on Apple M1. I get a error:

Traceback (most recent call last):

  File "/Users/martin/Documents/Projects/rl-toolkit/rl_toolkit/__main__.py", line 154, in <module>

    agent.run()

  File "/Users/martin/Documents/Projects/rl-toolkit/rl_toolkit/training.py", line 213, in run

    losses = self._train(sample)

  File "/Users/martin/miniforge3/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 889, in __call__

    result = self._call(*args, **kwds)

  File "/Users/martin/miniforge3/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 950, in _call

    return self._stateless_fn(*args, **kwds)

  File "/Users/martin/miniforge3/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 3023, in __call__

    return graph_function._call_flat(

  File "/Users/martin/miniforge3/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 1960, in _call_flat

    return self._build_call_outputs(self._inference_function.call(

  File "/Users/martin/miniforge3/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 591, in call

    outputs = execute.execute(

  File "/Users/martin/miniforge3/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute

    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,

tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation ReadVariableOp: Could not satisfy explicit device specification '' because the node {{colocation_node ReadVariableOp}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0]. 

Colocation Debug Info:

Colocation group had the following types and supported devices: 

Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]

ResourceApplyAdamWithAmsgrad: CPU 

ReadVariableOp: GPU CPU 

_Arg: GPU CPU 



Colocation members, user-requested devices, and framework assigned devices, if any:

  readvariableop_resource (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0

  adam_2_adam_update_6_resourceapplyadamwithamsgrad_m (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0

  adam_2_adam_update_6_resourceapplyadamwithamsgrad_v (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0

  adam_2_adam_update_6_resourceapplyadamwithamsgrad_vhat (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0

  ReadVariableOp (ReadVariableOp) 

  Exp/ReadVariableOp (ReadVariableOp) 

  ReadVariableOp_1 (ReadVariableOp) 

  actor/ReadVariableOp (ReadVariableOp) 

  actor/Exp/ReadVariableOp (ReadVariableOp) 

  actor/ReadVariableOp_1 (ReadVariableOp) 

  actor_critic/actor/ReadVariableOp (ReadVariableOp) 

  actor_critic/actor/Exp/ReadVariableOp (ReadVariableOp) 

  actor_critic/actor/ReadVariableOp_1 (ReadVariableOp) 

  Adam_2/Adam/update_6/ResourceApplyAdamWithAmsgrad (ResourceApplyAdamWithAmsgrad) /job:localhost/replica:0/task:0/device:GPU:0



	 [[{{node ReadVariableOp}}]] [Op:__inference__train_4206]

Boost

Answer 1

roefer OP

Aug ’21

I have the same problem. tensorflow.config.set_soft_device_placement(True) should solve such a problem, but it did not.

0

Answer 2

Systems Engineer OP

Apple

Aug ’21

Hi @markub3327, The issue is related to colocation error in Tensorflow due to missing operation ResourceApplyAdamWithAmsgrad in Metal plugin. Thanks for providing a reproducible case, we will take a look and provide update here.


ResourceApplyAdamWithAmsgrad: CPU       <== this Op is currently not supported in Metal plugin

ReadVariableOp: GPU CPU 

_Arg: GPU CPU

1

Answer 3

IainRatherThanIan OP

Sep ’21

Hi - I have the same issue when applying certain Data Augmentation layers - RandomFlip works but RandomZoom does not. I'm using tf.keras.layers.experimental.preprocessing.xxx for my layers.

Is there any update on a fix?

Iain

1

Answer 4

bellerofonte OP

Jan ’22

Hi! I get same error when trying to fine-tune EfficientNetB7. Any updates at this issue?

M1 Max 32Gb, Monterey 12.1

InvalidArgumentError: Cannot assign a device for operation sequential/efficientnetb7/stem_conv/Conv2D/ReadVariableOp: Could not satisfy explicit device specification '' because the node {{colocation_node sequential/efficientnetb7/stem_conv/Conv2D/ReadVariableOp}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0]. 
Colocation Debug Info:
Colocation group had the following types and supported devices: 
Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
ResourceApplyAdaMax: CPU 
ReadVariableOp: GPU CPU 
_Arg: GPU CPU 

Colocation members, user-requested devices, and framework assigned devices, if any:
  sequential_efficientnetb7_stem_conv_conv2d_readvariableop_resource (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  adamax_adamax_update_resourceapplyadamax_m (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  adamax_adamax_update_resourceapplyadamax_v (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  sequential/efficientnetb7/stem_conv/Conv2D/ReadVariableOp (ReadVariableOp) 
  Adamax/Adamax/update/ResourceApplyAdaMax (ResourceApplyAdaMax) /job:localhost/replica:0/task:0/device:GPU:0

	 [[{{node sequential/efficientnetb7/stem_conv/Conv2D/ReadVariableOp}}]] [Op:__inference_train_function_56534]

0

Answer 5

tsalama OP

Jan ’22

M1 Max 64GB, Monterey 12.0.1

This is a strange error. To resolve I added the expected input shape to the first layer

# Failing:
# Possibly unrelated:  the `with` block was added as the following code didn't run on GPU:
with tf.device('/cpu:0'):
  data_augmentation = tf.keras.Sequential([
    tf.keras.layers.RandomFlip('horizontal'),
    tf.keras.layers.RandomRotation(0.2),
    ])

The above caused the following errors

# The following error was received - but after adding the input shape it worked fine
# Colocation group had the following types and supported devices: 
# Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
# RngReadAndSkip: CPU 
# _Arg: GPU CPU 

# Colocation members, user-requested devices, and framework assigned devices, if any:
#   model_sequential_2_random_flip_2_stateful_uniform_full_int_rngreadandskip_resource (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
#   model/sequential_2/random_flip_2/stateful_uniform_full_int/RngReadAndSkip (RngReadAndSkip) 

#          [[{{node model/sequential_2/random_flip_2/stateful_uniform_full_int/RngReadAndSkip}}]] [Op:__inference_train_function_12915]

Adding the input_shape size seems to have resolved the issue and the model is training.

# Working
with tf.device('/cpu:0'):
  data_augmentation = tf.keras.Sequential([
    tf.keras.layers.RandomFlip('horizontal',input_shape=(IMG_SIZE[0],IMG_SIZE[1],3)),
    tf.keras.layers.RandomRotation(0.2),
    ])

I also had this configuration:

tf.config.set_soft_device_placement(True)

1

Answer 6

thomashan OP

Apr ’22

@Frameworks Engineer

Is there any update on when tensorflow-metal will implement these missing operations?

I am using:

tensorflow-macos         2.8.0
tensorflow-metal         0.4.0

I am getting the following error which I assume is the same issue as above.

Colocation Debug Info:
Colocation group had the following types and supported devices: 
Root Member(assigned_device_name_index_=2 
requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' 
assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' 
resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' 
supported_device_types_=[CPU] possible_devices_=[]
ReadVariableOp: GPU CPU 
ResourceApplyAdam: CPU 
_Arg: GPU CPU 

Colocation members, user-requested devices, and framework assigned 
devices, if any:
  sequential_dense_matmul_readvariableop_resource (_Arg)  framework 
assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  adam_adam_update_resourceapplyadam_m (_Arg)  framework assigned 
device=/job:localhost/replica:0/task:0/device:GPU:0
  adam_adam_update_resourceapplyadam_v (_Arg)  framework assigned 
device=/job:localhost/replica:0/task:0/device:GPU:0
  sequential/dense/MatMul/ReadVariableOp (ReadVariableOp) 
  Adam/Adam/update/ResourceApplyAdam (ResourceApplyAdam) 
/job:localhost/replica:0/task:0/device:GPU:0

	 [[{{node sequential/dense/MatMul/ReadVariableOp}}]] [Op:__inference_train_function_809]

0

Answer 7

VictorC_ OP

Apr ’22

Hi I'm using Apple M1 Pro and OS is Monterey 12.3.1, and with

tensorflow-macos         2.8.0
tensorflow-metal         0.4.0

Getting the same problem. While I added an augmentation layer to my model.

Costs me like 3 hours to figure it out.

My solution is, I manually assign gpu using with tf.device("/gpu:0"): like this:

with tf.device("/gpu:0"):
    model.compile(loss="categorical_crossentropy", optimizer='adma', metrics=['accuracy'])
    history = model.fit(train_ds,epochs=epochs,validation_data=val_ds, use_multiprocessing=True)

Don't need to change/add other code. Just a simple line of code.

It worked like a charm for me, you guys can try it!!!

I hope it is helpful to anyone who come across this strange problem.

0

Answer 8

wenzhang OP

Jul ’22

A workaround is to uninstall tensorflow-meta: pip uninstall tensorflow-meta

0

Answer 9

SACHIN_24063 OP

Jul ’22

history=model.fit( train_ds, epochs=EPOCHS, batch_size=BATCH_SIZE, verbose=1, validation_data=val_ds, use_multiprocessing=True )

I have use this code and getting this error in my MacBook Air m1 2020

InvalidArgumentError Traceback (most recent call last) Input In [57], in <cell line: 1>() ----> 1 history=model.fit( 2 train_ds, 3 epochs=EPOCHS, 4 batch_size=BATCH_SIZE, 5 verbose=1, 6 validation_data=val_ds, 7 use_multiprocessing=True 8 )

File ~/opt/miniconda3/envs/tensorflow/lib/python3.9/site-packages/keras/utils/traceback_utils.py:67, in filter_traceback..error_handler(*args, **kwargs) 65 except Exception as e: # pylint: disable=broad-except 66 filtered_tb = _process_traceback_frames(e.traceback) ---> 67 raise e.with_traceback(filtered_tb) from None 68 finally: 69 del filtered_tb

File ~/opt/miniconda3/envs/tensorflow/lib/python3.9/site-packages/tensorflow/python/eager/execute.py:54, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name) 52 try: 53 ctx.ensure_initialized() ---> 54 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, 55 inputs, attrs, num_outputs) 56 except core._NotOkStatusException as e: 57 if name is not None:

InvalidArgumentError: Cannot assign a device for operation sequential_5/sequential_4/random_flip_1/stateful_uniform_full_int/RngReadAndSkip: Could not satisfy explicit device specification '' because the node {{colocation_node sequential_5/sequential_4/random_flip_1/stateful_uniform_full_int/RngReadAndSkip}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0]. Colocation Debug Info: Colocation group had the following types and supported devices: Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[] RngReadAndSkip: CPU _Arg: GPU CPU

Colocation members, user-requested devices, and framework assigned devices, if any: sequential_5_sequential_4_random_flip_1_stateful_uniform_full_int_rngreadandskip_resource (_Arg) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0 sequential_5/sequential_4/random_flip_1/stateful_uniform_full_int/RngReadAndSkip (RngReadAndSkip) sequential_5/sequential_4/random_flip_1/stateful_uniform_full_int_1/RngReadAndSkip (RngReadAndSkip)

 [[{{node sequential_5/sequential_4/random_flip_1/stateful_uniform_full_int/RngReadAndSkip}}]] [Op:__inference_train_function_11336]

0

Answer 10

PlatiniumOwl OP

Dec ’22

Solved: A simple workaround is to uninstall tensorflow-metal. It is not perfect for M1 Macs and still under development. Also it will boost up your processing speed as tensorflow-metal is not crafted to utilise full capacity of GPUs. Errors you are getting is due to inefficiency of tensorflow-metal plugin.

pip uninstall tensorflow-metal

And don't forget to update tensorflow-macos to latest version. Latest version is 2.11.0

pip install --upgrade tensoflow-macos

Prerequisites :

Python 3.8-3.10
macOS version 12.0 or later

Hope it helps!!

1

Answer 11

AmbABC OP

May ’23

I had the same error M2 Macs but none of the above suggestions worked for me. The error has got to do with data_augmentation layer of Keras Sequential model.

Sample code producing errors:

data_augmentation = keras.Sequential(
    [
        layers.experimental.preprocessing.RandomFlip("horizontal"),
        layers.experimental.preprocessing.RandomRotation(0.1),
        layers.experimental.preprocessing.RandomZoom(0.1),
    ]
)
model_1 = tf.keras.models.Sequential([
  data_augmentation,
  tf.keras.layers.Conv2D(filters=32, kernel_size=(3,3).......
.......
.......
.....

Rather, i moved Data Augmentation to the Image Generator:

train_datagen = ImageDataGenerator(rescale=1.&#x2F;255,
                                    rotation_range=0.2, # rotate the image slightly to up to 20%
                                    zoom_range=0.2, # zoom into the image up to 20%
                                    width_shift_range=0.2, # shift the image width ways up to 20%
                                    height_shift_range=0.2, # shift the image height ways up to 20%
                                    shear_range=0.2, # shear the image up to 20%
                                    horizontal_flip=True # flip the image on the horizontal axis
                                        )

building the model

model_1 = tf.keras.models.Sequential([
  tf.keras.layers.Conv2D(filters=32, 
                         kernel_size=(3,3)....
.......
......
.....

0

Answer 12

drclearly OP

Jun ’23

Another Slant

I encountered this problem when I was trying to run example code from a notebook on Tensorflow-Probability: https://www.tensorflow.org/probability/examples/Probabilistic_Layers_Regression

Specifically, Case 5: Functional Uncertainty includes an item not previously seen in the notebook:

tf.keras.backend.set_floatx('float64')

If I change that to

tf.keras.backend.set_floatx('float32')

Then the notebook runs. Not well: the numerical stability that prompted the authors to use float64 occur. But it runs.

Don't know what this means, but it seems like the missing "ResourceApplyAdamWithAmsgrad" is not the issue. Dunno

0