Dear All Developers,
It is so great that we finally have TF-macOS and TF-Metal for GPU/NPU accelerating. After some tests, it looks like everything works well.
So, I am wondering that if it is possible to solve NLP tasks with HuggingFace via TF-Metal for GPU accelerating. To figure it out, I installed all packages we need and ran the testing code.
What I got is showing here. So far so good, right?
However, it pops out an error while I attempt to fine-tune a BERT model.
Colocation Debug Info: Colocation group had the following types and supported devices: Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[] RealDiv: GPU CPU Sqrt: GPU CPU UnsortedSegmentSum: CPU AssignVariableOp: GPU CPU AssignSubVariableOp: GPU CPU ReadVariableOp: GPU CPU StridedSlice: GPU CPU NoOp: GPU CPU Mul: GPU CPU Shape: GPU CPU _Arg: GPU CPU ResourceScatterAdd: GPU CPU Unique: CPU AddV2: GPU CPU ResourceGather: GPU CPU Const: GPU CPU
It looks like that GPU is not assigned correctly, therefore, I checked if GPU is detected by TensorFlow. And here is the GPU info. from TensorFlow.
WARNING:tensorflow:From <ipython-input-2-17bb7203622b>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.config.list_physical_devices('GPU')` instead. WARNING:tensorflow:From <ipython-input-2-17bb7203622b>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.config.list_physical_devices('GPU')` instead. 2021-06-29 01:56:25.862829: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2021-06-29 01:56:25.862893: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>) Out[2]: True
Obviously, the problem resulted from HuggingFace. I do know that it is not Apple's responsibility to packages other than TF-macOS and TF-Metal, I am just curious that if anyone has a solution about it here.
Sincerely,
hawkiyc