Memory „Leak“ when using cpu+gpu

My app allows the user to select different stable diffusion models, and I noticed a very strange issue concerning memory management. When using the StableDiffusionPipeline (https://github.com/apple/ml-stable-diffusion) with cpu+gpu, around 1.5 GB of memory is not properly released after generateImages is called and the pipeline is released. When generating more images with a new StableDiffusionPipeline object, memory is reused and stays stable at around 1.5 GB after inference is complete. Everything, especially MLModels, are released properly. Guessing, MLModel seems to create a persistent cache.

Here is the problem: When using a different MLModel afterwards, another 1.5 GB is not released and stays resident. Using a third model, this totales to 4.5 GB of unreleased, persistent memory.

At first I thought that would be a bug in the StableDiffusionPipeline – but I was able to reproduce this behaviour in a very minimal objective-c sample without ARC:

MLArrayBatchProvider *batchProvider = [[MLArrayBatchProvider alloc] initWithFeatureProviderArray:@[<VALID FEATURE PROVIDER>]];
    
MLModelConfiguration *config = [[MLModelConfiguration alloc] init];
config.computeUnits = MLComputeUnitsCPUAndGPU;
    
MLModel *model = [[MLModel modelWithContentsOfURL:[NSURL fileURLWithPath:<VALID PATH TO .mlmodelc SD 1.5 FILE>] configuration:config error:&error] retain];
    
id<MLBatchProvider> returnProvider = [model predictionsFromBatch:batchProvider error:&error];
   
[model release];
[config release];
[batchProvider release];

After running this minimal code, 1.5 GB of persistent memory is present that is not released during the lifetime of the app. This only happens on macOS 14(.1) Sonoma and on iOS 17(.1), but not on macOS 13 Ventura. On Ventura, everything works as expected and the memory is released when predictionsFromBatch: is done and the model is released.

Some observations:

  • This only happens using cpu+gpu, not cpu+ane (since the memory is allocated out of process) and not using cpu-only
  • It does not matter which stable diffusion model is used, I tried custom sd-derived models as well as the apple-provided sd 1.5 models
  • I reproduced the issue on MBP 16" M1 Max with macOS 14.1, iPhone 12 mini with iOS 17.0.3 and iPad Pro M2 with iPadOS 17.1
  • The memory that "leaks" are mostly huge malloc block of 100-500 MB of size OR IOSurfaces
  • This memory is allocated during predictionsFromBatch, not while loading the model
  • Loading and unloading a model does not leak memory – only when predictionsFromBatch is called, the huge memory chunk is allocated and never freed during the lifetime of the app

Does anybody have any clue what is going on? I highly suspect that I am missing something crucial, but my colleagues and me looked everywhere trying to find a method of releasing this leaked/cached memory.

To further illustrate the issue, here is an annotated screenshot of the memory graph:

I am just bumping because this issue is still around on macOS 14.3, and I really need a solution, 1.5-6 gb memory wasted each time a model is changed makes diffusion models impractical, but as OP said, this worked fine on macOS 13

Memory „Leak“ when using cpu+gpu
 
 
Q