Memory Leak Using TensorFlow-Metal

Hi,

I've found a memory leak issue when using the tensorFlow-metal plugin for running a deep learning model on a Mac with the M1 chip. Here are the details of my system:

System Information

  • MacOS version: 13.4

  • TensorFlow (macos) version: 2.12.0, 2.13.0-rc1, tf-nightly==2.14.0.dev20230616

  • TensorFlow-Metal Plugin Version: 0.8, 1.0.0, 1.0.1

Model Details

I've implemented a custom model architecture using TensorFlow's Keras API. The model has a dynamic Input, which I resize the images in a Resizing layer. Moreover, the data is passed to the model through a data generator class, using model.fit().

Problem Description

When I train this model using the GPU on M1 Mac, I observe a continuous increase in memory usage, leading to a memory leak. This memory increase is more prominent with larger image inputs. For smaller images or average sizes (1024x128), the increase is smaller, but continuous, leading to a memory leak after several epochs.

On the other hand, when I switch to using the CPU for training (tf.config.set_visible_devices([], 'GPU')), the memory leak issue is resolved, and I observe normal memory usage. In addition, I've tested the model with different sizes of images and various layer configurations. The memory leak appears to be present only when using the GPU, indeed.

I hope this information is helpful in identifying and resolving the issue. If you need any further details, please let me know. The project code is private, but I can try to provide it with pseudocode if necessary.

Hi @arthurflor23!

Based on the description my hunch would be that this is related to the dynamic input size interacting with the caching behaviour for the GPU kernels used in implementing the ops. More concretely there might be some op that ends up getting excessively cached for each input size, leading to the memory footprint expanding during the runtime of the program. Other possibility is that there's something unexpected going on in the data generator class implementation but I would need more details on that to dig further.

If I understand correctly the shapes in the layers of your model after the reshape layer stay constant through the runtime of the program? That would indicate the issue is in the behaviour for the reshape op if the caching is to blame. We'll start looking from there but it would help us conclude that we have solved your issue if you are able to reproduce this behaviour with a small dummy script with random generated data that demonstrates the creeping memory usage you mentioned. It would be great if the example also includes the data generator in case the generator performs some preprocessing that might explain the unexpected memory usage.

Hey,

Thank you for your detailed response.

I've prepared two standalone scripts that try to replicate the issue using randomly generated data. In both scripts, I generate synthetic data of varying sizes, to mirror the dynamic input size scenario in my actual project. This synthetic data is then passed through a Keras model, that includes a tf.keras.layers.Resizing layer.

The first script uses tf.data.Dataset to feed the model, while the second one utilizes a generator function to yield the data in batches.

Interestingly, the memory issue it seems to occur only in the script that uses tf.data.Dataset (filling the memory cache) and does not seem to occur when using the generator (~1.5gb of memory). However, in my actual code, where I use the generator approach, I do observe the memory issue (reaching more than memory cache). Furthermore, the issue is absent when using CPU or Nvidia GPU (via Google Colab), which both reach less than 1.5gb of memory. Anyway, you can find the two scripts below.

Script using tf.data.Dataset:

import numpy as np
import tensorflow as tf

# # Use cpu test memory
# tf.config.set_visible_devices([], 'GPU')


def generate_data(num_samples, max_size):
    """Generate synthetic data of varying sizes"""
    data = []
    labels = []
    for _ in range(num_samples):
        size = np.random.randint(1, max_size+1)
        data.append(np.ones((size, size)) * 255)  # Example of image
        labels.append(np.random.randint(0, 2))  # binary classification for simplicity
    return data, labels

class DynamicResizeModel(tf.keras.Model):
    """A model that includes a resizing layer"""
    def __init__(self, target_size):
        super().__init__()
        self.target_size = target_size
        self.expand_dims = tf.keras.layers.Lambda(lambda x: tf.expand_dims(x, -1))
        self.resize = tf.keras.layers.Resizing(*target_size)
        self.flatten = tf.keras.layers.Flatten()
        self.dense = tf.keras.layers.Dense(1, activation='sigmoid')

    def call(self, inputs):
        x = self.expand_dims(inputs)
        x = self.resize(x)
        x = self.flatten(x)
        return self.dense(x)


# Generate training data
train_data, train_labels = generate_data(100, 1024) # you can adjust these parameters as needed

# Convert the variable-sized data to ragged tensors
train_data = tf.ragged.constant(train_data)
train_labels = tf.constant(train_labels)

# Prepare a dataset
train_dataset = tf.data.Dataset.from_tensor_slices((train_data, train_labels))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(8) # adjust batch size as needed

# Create and train the model
model = DynamicResizeModel(target_size=(128, 32)) # resize all inputs to 128x32
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(train_dataset, epochs=1000)

Script using a generator:

import numpy as np
import tensorflow as tf

# # Use cpu test memory
# tf.config.set_visible_devices([], 'GPU')


def generate_data(num_samples, max_size):
    """Generate synthetic data of varying sizes"""
    data = []
    labels = []
    for _ in range(num_samples):
        size = np.random.randint(1, max_size+1)
        data.append(np.ones((size, size)) * 255)  # Example of image
        labels.append(np.random.randint(0, 2))  # binary classification for simplicity
    return data, labels

def data_generator(data, labels, batch_size):
    """Create a generator that returns batches of data"""
    num_samples = len(data)
    indices = np.arange(num_samples)
    while True:
        for i in range(0, num_samples, batch_size):
            batch_indices = indices[i:i+batch_size]
            batch_data = tf.ragged.constant([data[idx] for idx in batch_indices], dtype=tf.float32)
            batch_labels = np.array([labels[idx] for idx in batch_indices], dtype=np.float32)
            yield batch_data, batch_labels
        np.random.shuffle(indices)

class DynamicResizeModel(tf.keras.Model):
    """A model that includes a resizing layer"""
    def __init__(self, target_size):
        super().__init__()
        self.target_size = target_size
        self.expand_dims = tf.keras.layers.Lambda(lambda x: tf.expand_dims(x, -1))
        self.resize = tf.keras.layers.Resizing(*target_size)
        self.flatten = tf.keras.layers.Flatten()
        self.dense = tf.keras.layers.Dense(1, activation='sigmoid')

    def call(self, inputs):
        x = self.expand_dims(inputs)
        x = self.resize(x)
        x = self.flatten(x)
        return self.dense(x)

# Generate training data
num_samples = 100  # Total number of samples in your dataset
max_size = 1024  # Maximum size of matrix
train_data, train_labels = generate_data(num_samples, max_size)

# Create and train the model
model = DynamicResizeModel(target_size=(1024, 128))  # resize all inputs to 1024, 128
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Set the parameters
batch_size = 8  # Number of samples per batch

# Create a generator
train_generator = data_generator(train_data, train_labels, batch_size)

# Use fit to train the model
model.fit(train_generator, steps_per_epoch=num_samples // batch_size, epochs=1000)

Given these findings, I would like to understand if this behavior is expected with the tensorflow-metal plugin or if it is indeed an anomaly. If it's the former, could you provide guidance on optimizing my code to prevent the memory issue while using tensorflow-metal?

Looking forward to your insights.

Just to reinforce, I'm currently starting a training model with 7gb of memory usage and at the end of the training, the memory usage hits 100gb+ (getting slower and slower because of the swap). On the other hand, on google colab (nvidia version), it worked perfectly without this excessive memory usage.

I have the same problem with my M2 chip. I also use the cleaning function of tensor flow and explicit garbage collection from python, but my memory usage grows completely uncontrolled and after 2000 iterations it crashes. This is super frustrating, my training interrupts and there is no way to make ti work, even if i try to optimize the code it does not work. I think apple should fix these compatibility issue with tensor flow :(

I have the same problem. Apple M2 macOS 13.4.1(c) tensorflow 2.14.0 tensorflow-metal 1.1.0

Memory seems to grow by 10GB per iteration even if I release everything for each iteration and call gc.collect() afterwards, using this model: https://github.com/google-research/frame-interpolation

The memory is released when the process is terminated, but this makes it impossible to run large models using tensor flow-metal. Everything works fine when using the CPU.

Ola.

Memory Leak Using TensorFlow-Metal
 
 
Q