Some resource has been exhausted. For example, this error might be raised if a per-user quota is exhausted, or perhaps the entire file system is out of space. @@init 2 root error(s) found. (0) RESOURCE_EXHAUSTED: OOM when allocating

Question

dbl001 OP

Created Dec ’21

Replies 2

Boosts 0

Views 1.3k

Participants 2

In a tensorflow-metal virtual environment on OS X 12.1:

 tensorboard                  2.6.0
tensorboard-data-server      0.6.1
tensorboard-plugin-profile   2.5.0
tensorboard-plugin-wit       1.8.0
tensorflow                   2.6.0
tensorflow-addons            0.14.0
tensorflow-consciousness     0.1
tensorflow-datasets          4.4.0
tensorflow-estimator         2.7.0
tensorflow-gan               2.1.0
tensorflow-hub               0.12.0
tensorflow-io-gcs-filesystem 0.22.0
tensorflow-macos             2.7.0
tensorflow-metadata          1.2.0
tensorflow-metal             0.3.0
tensorflow-probability       0.14.1
tensorflow-similarity        0.13.45
tensorflow-text              2.7.3

Running the Top2vec model: https://github.com/ddangelov/Top2Vec

 import numpy as np 
import pandas as pd 
import json
import os
import ipywidgets as widgets
from IPython.display import clear_output, display
from top2vec import Top2Vec
 
papers_prepared_df = pd.read_feather("/Users/davidlaxer/Downloads/archive/covid19_papers_processed.feather")
top2vec_trained = Top2Vec(documents=papers_prepared_df.text.tolist(), embedding_model="universal-sentence-encoder", use_embedding_model_tokenizer=True, embedding_model_path="/Users/davidlaxer/Downloads/universal-sentence-encoder_4/", workers=4)
 
2021-12-20 06:30:52,188 - top2vec - INFO - Pre-processing documents for training
/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function get_feature_names is deprecated; get_feature_names is deprecated in 1.0 and will be removed in 1.2. Please use get_feature_names_out instead.
  warnings.warn(msg, category=FutureWarning)
2021-12-20 06:31:57,351 - top2vec - INFO - Loading universal-sentence-encoder model at /Users/davidlaxer/Downloads/universal-sentence-encoder_4
2021-12-20 06:31:57.488459: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-12-20 06:31:57.489288: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2021-12-20 06:31:57.489490: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
Metal device set to: AMD Radeon Pro 5700 XT
2021-12-20 06:31:59.447260: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-12-20 06:32:00,841 - top2vec - INFO - Creating joint document/word embedding
2021-12-20 06:32:00.923838: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
 
Some resource has been exhausted.
 
  For example, this error might be raised if a per-user quota is
  exhausted, or perhaps the entire file system is out of space.
 
  @@__init__
  
2 root error(s) found.
  (0) RESOURCE_EXHAUSTED:  OOM when allocating tensor with shape[114389,320] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator Simple allocator
	 [[{{node EncoderDNN/EmbeddingLookup/EmbeddingLookupUnique/GatherV2}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
 
	 [[StatefulPartitionedCall/StatefulPartitionedCall/EncoderDNN/EmbeddingLookup/EmbeddingLookupUnique/Reshape_1/_188]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
 
  (1) RESOURCE_EXHAUSTED:  OOM when allocating tensor with shape[114389,320] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator Simple allocator
	 [[{{node EncoderDNN/EmbeddingLookup/EmbeddingLookupUnique/GatherV2}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
 
...

I tried adjusting the batchsize (e.g - 500, 100, 50, 10, 5).

Boost

Answer 1

dbl001 OP

Dec ’21

The exception is generated building a list of document vectors from input documents not in model training: E.g. - document_vectors.append(self.embed(train_corpus[current:current + batch_size]))

The python 3.8 process grows in memory to 100GB and then generates the OOM exception.

 def _embed_documents(self, train_corpus):
 
    self._check_import_status()
    self._check_model_status()
 
    # embed documents
    batch_size = 5
    document_vectors = []
 
    current = 0
    batches = int(len(train_corpus) / batch_size)
    extra = len(train_corpus) % batch_size
 
    for ind in range(0, batches):
        try:
            __**document_vectors.append(self.embed(train_corpus[current:current + batch_size]))**__
        except Exception as e:
            print (e.__doc__)
            print (e.message)
        current += batch_size
 
    if extra > 0:
        document_vectors.append(self.embed(train_corpus[current:current + extra]))
 
    document_vectors = self._l2_normalize(np.array(np.vstack(document_vectors)))
 
    return document_vectors

0

Answer 2

ker2x OP

Dec ’21

shape[114389,320] ? are you sure you're not doing something wrong here ?

0

	tensorboard 2.6.0
	tensorboard-data-server 0.6.1
	tensorboard-plugin-profile 2.5.0
	tensorboard-plugin-wit 1.8.0
	tensorflow 2.6.0
	tensorflow-addons 0.14.0
	tensorflow-consciousness 0.1
	tensorflow-datasets 4.4.0
	tensorflow-estimator 2.7.0
	tensorflow-gan 2.1.0
	tensorflow-hub 0.12.0
	tensorflow-io-gcs-filesystem 0.22.0
	tensorflow-macos 2.7.0
	tensorflow-metadata 1.2.0
	tensorflow-metal 0.3.0
	tensorflow-probability 0.14.1
	tensorflow-similarity 0.13.45
	tensorflow-text 2.7.3

	import numpy as np
	import pandas as pd
	import json
	import os
	import ipywidgets as widgets
	from IPython.display import clear_output, display
	from top2vec import Top2Vec

	papers_prepared_df = pd.read_feather("/Users/davidlaxer/Downloads/archive/covid19_papers_processed.feather")
	top2vec_trained = Top2Vec(documents=papers_prepared_df.text.tolist(), embedding_model="universal-sentence-encoder", use_embedding_model_tokenizer=True, embedding_model_path="/Users/davidlaxer/Downloads/universal-sentence-encoder_4/", workers=4)

	2021-12-20 06:30:52,188 - top2vec - INFO - Pre-processing documents for training
	/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function get_feature_names is deprecated; get_feature_names is deprecated in 1.0 and will be removed in 1.2. Please use get_feature_names_out instead.
	warnings.warn(msg, category=FutureWarning)
	2021-12-20 06:31:57,351 - top2vec - INFO - Loading universal-sentence-encoder model at /Users/davidlaxer/Downloads/universal-sentence-encoder_4
	2021-12-20 06:31:57.488459: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
	To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
	2021-12-20 06:31:57.489288: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
	2021-12-20 06:31:57.489490: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
	Metal device set to: AMD Radeon Pro 5700 XT
	2021-12-20 06:31:59.447260: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
	2021-12-20 06:32:00,841 - top2vec - INFO - Creating joint document/word embedding
	2021-12-20 06:32:00.923838: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.

	Some resource has been exhausted.

	For example, this error might be raised if a per-user quota is
	exhausted, or perhaps the entire file system is out of space.

	@@__init__

	2 root error(s) found.
	(0) RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[114389,320] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator Simple allocator
	[[{{node EncoderDNN/EmbeddingLookup/EmbeddingLookupUnique/GatherV2}}]]
	Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

	[[StatefulPartitionedCall/StatefulPartitionedCall/EncoderDNN/EmbeddingLookup/EmbeddingLookupUnique/Reshape_1/_188]]
	Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

	(1) RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[114389,320] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator Simple allocator
	[[{{node EncoderDNN/EmbeddingLookup/EmbeddingLookupUnique/GatherV2}}]]
	Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

	...

	def _embed_documents(self, train_corpus):

	self._check_import_status()
	self._check_model_status()

	# embed documents
	batch_size = 5
	document_vectors = []

	current = 0
	batches = int(len(train_corpus) / batch_size)
	extra = len(train_corpus) % batch_size

	for ind in range(0, batches):
	try:
	__document_vectors.append(self.embed(train_corpus[current:current + batch_size]))__
	except Exception as e:
	print (e.__doc__)
	print (e.message)
	current += batch_size

	if extra > 0:
	document_vectors.append(self.embed(train_corpus[current:current + extra]))

	document_vectors = self._l2_normalize(np.array(np.vstack(document_vectors)))

	return document_vectors

Some resource has been exhausted. For example, this error might be raised if a per-user quota is exhausted, or perhaps the entire file system is out of space. @@__init__ 2 root error(s) found. (0) RESOURCE_EXHAUSTED: OOM when allocating

Some resource has been exhausted. For example, this error might be raised if a per-user quota is exhausted, or perhaps the entire file system is out of space. @@init 2 root error(s) found. (0) RESOURCE_EXHAUSTED: OOM when allocating