I am getting a segmentation violation in Tensorflow running the Top2vec model.
https://github.com/ddangelov/Top2Vec
Here's my code snippet:
import sys import faulthandler import numpy as np import pandas as pd import json import os import ipywidgets as widgets from IPython.display import clear_output, display from top2vec import Top2Vec faulthandler.enable(file=sys.stderr, all_threads=True) def filter_short(papers_df): papers_df["token_counts"] = papers_df["text"].str.split().map(len) papers_df = papers_df[papers_df.token_counts > 50].reset_index(drop=True) papers_df.drop('token_counts', axis=1, inplace=True) return papers_df papers_prepared_df = pd.read_feather("/Users/davidlaxer/Downloads/archive/covid19_papers_processed.feather") papers_feathered_filtered_df = filter_short(papers_prepared_df) top2vec_trained = Top2Vec(documents=papers_feathered_filtered_df.text.tolist(), embedding_model="universal-sentence-encoder", use_embedding_model_tokenizer=True, embedding_model_path="/Users/davidlaxer/Downloads/universal-sentence-encoder_4/", workers=4)
Here's the stack trace:
/Users/davidlaxer/tensorflow-metal/bin/python /Users/davidlaxer/Top2Vec/notebooks/test1.py 2021-12-24 08:50:04,782 - top2vec - INFO - Pre-processing documents for training /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function get_feature_names is deprecated; get_feature_names is deprecated in 1.0 and will be removed in 1.2. Please use get_feature_names_out instead. warnings.warn(msg, category=FutureWarning) 2021-12-24 08:51:37,701 - top2vec - INFO - Loading universal-sentence-encoder model at /Users/davidlaxer/Downloads/universal-sentence-encoder_4/ 2021-12-24 08:51:37.837087: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2021-12-24 08:51:37.838170: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2021-12-24 08:51:37.838380: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>) Metal device set to: AMD Radeon Pro 5700 XT systemMemory: 128.00 GB maxCacheSize: 7.99 GB 2021-12-24 08:51:39.764518: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled. 2021-12-24 08:51:41,157 - top2vec - INFO - Creating joint document/word embedding 2021-12-24 08:51:41.241496: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled. Fatal Python error: Segmentation fault Thread 0x000000010c32a600 (most recent call first): File "/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 58 in quick_execute File "/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 598 in call File "/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1959 in _call_flat File "/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3130 in __call__ File "/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 949 in _call File "/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 910 in __call__ File "/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/util/traceback_utils.py", line 150 in error_handler File "/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py", line 701 in _call_attribute File "/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/top2vec/Top2Vec.py", line 538 in _embed_documents File "/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/top2vec/Top2Vec.py", line 344 in __init__ File "/Users/davidlaxer/Top2Vec/notebooks/test1.py", line 24 in <module> Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
This is likely related to: https://github.com/ddangelov/Top2Vec/issues/232
However, SIGSEGV indicates some other problem(s).
% python --version Python 3.8.5 % pip show tensorflow_macos Name: tensorflow-macos Version: 2.7.0 Summary: TensorFlow is an open source machine learning framework for everyone. Home-page: https://www.tensorflow.org/ Author: Google Inc. Author-email: packages@tensorflow.org License: Apache 2.0 Location: /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages Requires: absl-py, astunparse, flatbuffers, gast, google-pasta, grpcio, h5py, keras, keras-preprocessing, libclang, numpy, opt-einsum, protobuf, six, tensorboard, tensorflow-estimator, tensorflow-io-gcs-filesystem, termcolor, typing-extensions, wheel, wrapt Required-by: % pip show tensorflow_metal Name: tensorflow-metal Version: 0.3.0 Summary: TensorFlow acceleration for Mac GPUs. Home-page: https://developer.apple.com/metal/tensorflow-plugin/ Author: Author-email: License: MIT License. Copyright © 2020-2021 Apple Inc. All rights reserved. Location: /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages Requires: six, wheel Required-by: