M1 MBA tensorflow-metal LSTM Model Training Extremely Slow, Fails to Learn

I'm training a basic model using an M1 MBA with tensorflow-metal 0.7.0 and tensorflow-macos 2.11 installed, using Python 3.10 on macOS 13.2.1.

CPU-based training runs as expected with about 10 s/epoch on this model.

However, GPU-based training is orders of magnitude slower and doesn't learn.

Here's a model to generate Irish poetry, based upon the example https://github.com/susanli2016/Natural-Language-Processing-in-TensorFlow/blob/master/Irish%20Lyrics%20generated%20poetry.ipynb.

CPU training on this dataset takes 10 s/epoch. The ETA with GPU training with a batch size of 32 is over 2.5 hours, and many minutes for a batch size of 2048, and 20 s for a batch size of the length of the training data. Furthermore, GPU training does not work—there is no increase in accuracy.

import numpy as np
import os
import platform
import subprocess
import tensorflow as tf
from textwrap import wrap

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout, Bidirectional
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import regularizers
import tensorflow.keras.utils as ku 

from tensorflow.python.framework.ops import disable_eager_execution, enable_eager_execution

# use the GPU
disable_eager_execution()

irish_lyrics_file = '/tmp/irish-lyrics-eof.txt'
irish_lyrics_url = 'https://raw.githubusercontent.com/AliAkbarBadri/nlp-tf/master/irish-lyrics-eof.txt'

if not os.path.isfile(irish_lyrics_file):
    subprocess.run(["curl", "-L", irish_lyrics_url, "-o", irish_lyrics_file])

with open(irish_lyrics_file, 'r') as fd:
    data = fd.read()
corpus = data.lower().split('\n')

tokenizer = Tokenizer()
tokenizer.fit_on_texts(corpus)
total_words = len(tokenizer.word_index) + 1

# create input sequences using list of tokens
input_sequences = []
for line in corpus:
    token_list = tokenizer.texts_to_sequences([line])[0]
    for i in range(1, len(token_list)):
        n_gram_sequence = token_list[: i+1]
        input_sequences.append(n_gram_sequence)

# pad sequences 
max_sequence_len = max([len(x) for x in input_sequences])
input_sequences = np.array(pad_sequences(input_sequences, maxlen = max_sequence_len, padding='pre'))

# Create predictors and label
xs, labels = input_sequences[:, :-1], input_sequences[:,-1]
ys = ku.to_categorical(labels, num_classes=total_words)

xs = tf.convert_to_tensor(xs)
ys = tf.convert_to_tensor(ys)

model = Sequential()
model.add(Embedding(total_words, 100, input_length=max_sequence_len-1))
model.add(Bidirectional(LSTM(150)))
model.add(Dense(total_words, activation='softmax'))
adam = Adam(learning_rate=0.01)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics = ['accuracy'])

batch_size = 32
steps_per_epoch = int(np.ceil(xs.shape[0]/batch_size))

history = model.fit(xs, ys, epochs=100, batch_size=batch_size, steps_per_epoch=steps_per_epoch, verbose=1)

ku.plot_model(model, show_shapes=True)
model.summary()

import matplotlib.pyplot as plt

def plot_graphs(history, string):
    plt.plot(history.history[string])
    plt.xlabel('Epochs')
    plt.ylabel(string)
    plt.show()

plot_graphs(history, 'accuracy');

index_word_dict = {index: word for word, index in tokenizer.word_index.items()}

seed_text = 'A poor emigrants daughter'
next_words = 100

for _ in range(next_words):
    token_list = tokenizer.texts_to_sequences([seed_text])[0]
    token_list = pad_sequences([token_list], maxlen=max_sequence_len-1, padding='pre')
    predicted = np.argmax(model.predict(token_list, verbose=0), axis=-1).item()
    
    if predicted in index_word_dict:
        seed_text += ' ' + index_word_dict[predicted]

print('\n'.join(wrap(seed_text)))

Did you find any solution? @essandess

M1 MBA tensorflow-metal LSTM Model Training Extremely Slow, Fails to Learn
 
 
Q