Huge memory leakage issue with tf.keras.models.predict()

Comparison between MAC Studio M1 Ultra (20c, 64c, 128GB RAM) vs 2017 Intel i5 MBP (16GB RAM) for the subject matter i.e. memory leakage while using tf.keras.models.predict() for saved model on both machines:

MBP-2017: First prediction takes around 10MB and subsequent calls ~0-1MB

MACSTUDIO-2022: First prediction takes around 150MB and subsequent calls ~70-80MB.

After say 10000 such calls o predict(), while my MBP memory usage stays under 10GB, MACSTUDIO climbs to ~80GB (and counting up for higher number of calls).

Even using keras.backend.clear_session() after each call on MACSTUDIO did not help.

Can anyone having insight on TensorFlow-metal and/or MAC M1 machines help?

Thanks, Bapi

Post not yet marked as solved Up vote post of karbapi Down vote post of karbapi
5.6k views
  • Moreover, in predict function, when I turned on multi_processing as below, it merely turns on 4cores (seen on activity monitor- CPU HISTORY). Rest of the cores are dead!

    predict(data, max_queue_size=10, workers=8, use_multiprocessing=True )

Add a Comment

Replies

I am facing the same issue. A huge memory leak (20GB+) quickly builds up when fitting a model under default settings. Only when the device is set to CPU does the leak not occur. This, however, is not a permanent solution - the GPUs are an important part of the value proposition of Apple silicon. Please address ASAP!

  • when I started this thread almost 3months ago, I thought they would address the issue (was apparent based on their enthusiastic comments by dev-engineer). Now it looks like, either they do not have engineering resources to address the issue or they quickly realised managing TENSORFLOW is not their CUP of TEA (getting to the level of Google TF Engineers is a mammoth task). Grossly disappointed for spending ~$8K on a M1-Ultra Machine (probably hype does not work all the time) for TF HW.

  • Update from me!

    I am fed up with TF-MACOS/METAL and have migrated to PyTorch 1.13 (also tried 1.14dev version) in Python 3.9/3.10 env. At least I could see my training is going on with MUCH MUCH MUCH MUCH LESS memory usage while using GPU (60-75% usage depending on the data) in my M1 ULTRA machine with 64c GPU. I will soon try on Python 3.11 (PyTorch is yet to support it) and update you all.

    Thanks, Bapi

Add a Comment

I noticed the memory increase is mainly from swap memory. I compared keras.fit with different input size of image, the training memory of small size rarely increased but the larger one increased very much. Maybe it is related to macOS swap memory issue?

For me, the leakage problem on M2 was magically solved by:

import tensorflow as tf

tf.config.set_visible_devices([],'GPU')
  • Hi, could you be more specific about the device are you using? Laptop (MBP/MBA) or Mac-Mini? And also detailed spec if possible? Thanks

  • I think this only "solves" the leakage problem because it doesn't work correctly and disables GPU acceleration. Try this, then check your GPU usage to make sure it's actually doing what you expect.

Add a Comment