The Metal Performance Shaders operations encoded on it may not have completed.

Tensorflow metal was working on my Power Mac Mac M3 until yesterday. Then my code started freezing. I ran the test script from https://developer.apple.com/metal/tensorflow-plugin/ and it now crashes - this used to work fine, but all of a sudden it does not. The results are shown below. Has anyone seen anything like this? Could this be a hardware problem?

MacBook-Pro-3: carl$ python mac_tensorflow_test.py 
Epoch 1/5
  1/782 [..............................] - ETA: 51:53 - loss: 6.0044 - accuracy: 0.0312Error: command buffer exited with error status.
	The Metal Performance Shaders operations encoded on it may not have completed.
	Error: 
	(null)
	Ignored (for causing prior/excessive GPU errors) (00000004:kIOGPUCommandBufferCallbackErrorSubmissionsIgnored)
	<AGXG15XFamilyCommandBuffer: 0x1172515e0>
    label = <none> 
    device = <AGXG15SDevice: 0x1588e6000>
        name = Apple M3 Pro 
    commandQueue = <AGXG15XFamilyCommandQueue: 0x17427e400>
        label = <none> 
        device = <AGXG15SDevice: 0x1588e6000>
            name = Apple M3 Pro 
    retainedReferences = 1
Error: command buffer exited with error status.
	The Metal Performance Shaders operations encoded on it may not have completed.
	Error: 
	(null)
	Ignored (for causing prior/excessive GPU errors) (00000004:kIOGPUCommandBufferCallbackErrorSubmissionsIgnored)
	<AGXG15XFamilyCommandBuffer: 0x117257b40>
    label = <none> 
    device = <AGXG15SDevice: 0x1588e6000>
        name = Apple M3 Pro 
    commandQueue = <AGXG15XFamilyCommandQueue: 0x17427e400>
        label = <none> 
        device = <AGXG15SDevice: 0x1588e6000>
            name = Apple M3 Pro 
    retainedReferences = 1

Many more rows of similar printouts follow.

Replies

I have fixed this with two changes:

  • python 3.8, rather than 3.9 (specificaly 3.8.18 which is latest at this time)
  • pandas 1.5.3 rather than 2.x

As a result of this I'm on the following tensorflow package versions: tensorboard==2.13.0 tensorboard-data-server==0.7.2 tensorflow==2.13.0 tensorflow-datasets==4.9.2 tensorflow-estimator==2.13.0 tensorflow-macos==2.13.0 tensorflow-metadata==1.14.0 tensorflow-metal==1.0.1

With these everything works. I still have no idea why python 3.9 stopped working after working fine for months, but I wasn't particularly attached to it.