Tensorflow-macos on M1 leads a memory leak

I prepared a python3.9 environment with tensorflow-macos( 2.9.0 , 2.9.1), tensorflow-metal( 0.4.0, 0.5.1) to enable m1 core acceleration. The program architecture is a tf.keras model training on a Celery(5.2.7) worker which is created by gevent(21.8.0 ). After training, the model is converted to coreml model(coremltools 6.0 ) and tflite model. Based on experiments, there are some kinds of memory leak on two different M1 Macs.

1.     Memory grew smoothly from 7GB to 80GB within a 300-epoch training.

2.     Memory grew slowly from 7GB to 11GB within a 100-epoch training process, then grow to 13GB after model conversion. We added tf.keras.clear_session() that expect the memory to be collected after celery worker task finished. But memory still remained when the next task published to the worker.

I saw some post about memory leak on M1 core. Is there any suggestion to memory leak? I know a work around may be to create celery work by prefork method and set memory growth limit. But my code cannot transfer directly because some GPU resource context issue. Is it easy to create tf.keras training in threads on the M1?

  • I also meet the similar issue. Use python3.9, TensorFlow-macos 2.9.0, metal0.4. When train a YOLOV3. the memory will raise to 80G

Add a Comment