Models - Machine Learning - Apple Developer

image

FastViT

Image Classification

A Fast Hybrid Vision Transformer architecture trained to classify the dominant object in a camera frame or image.

View details
Hide details

Model Info

Summary

FastViT is a general-purpose, hybrid vision transformer model, trained on the ImageNet dataset, that provides a state-of-the-art accuracy/latency trade-off.

The model's high performance, low latency, and robustness against out-of-distribution samples result from three novel architectural strategies:

Structural reparameterization
Linear training-time overparameterization
Use of large kernel convolutions

FastViT consistently outperforms competing robust architectures on mobile and desktop GPU platforms across a wide range of computer vision tasks such as image classification, object detection, semantic segmentation, and 3D mesh regression.

Use Cases

Image classification, object detection, semantic segmentation, 3D mesh regression

Links

Variants

Model Name	Size	Action
FastViTMA36F16.mlpackage	88.3MB	Download
FastViTT8F16.mlpackage	8.2MB	Download
FastViTMA36F16Headless.mlpackage	85.8MB	Download
FastViTT8F16Headless.mlpackage	6.5MB	Download

Variant	Parameters	Size	Weight Precision	Activation Precision
T8	3.6M	7.8	Float16	Float16
MA36	42.7M	84	Float16	Float16

Inference Time

Variant	Device	OS	Inference Time (ms)	Compute Unit
T8 F16	iPhone 15 Pro Max	17.6	0.67	All
T8 F16	iPhone 15 Plus	17.6	0.73	All
T8 F16	iPhone 14 Plus	17.6	0.82	All
T8 F16	iPhone 13 Pro Max	17.6	0.83	All
T8 F16	MacBook Pro M3 Max	14.4	0.62	All
MA36 F16	iPhone 15 Pro Max	17.6	3.33	All
MA36 F16	iPhone 15 Plus	17.6	3.47	All
MA36 F16	iPhone 14 Plus	17.6	4.56	All
MA36 F16	iPhone 13 Pro Max	17.6	4.47	All
MA36 F16	MacBook Pro M2 Max	15.0	2.94	All
MA36 F16	MacBook Pro M1 Max	15.0	4	All
MA36 F16	iPad Pro 5th Gen	17.5	3.35	All

Example Projects

Classifying Images with Vision and Core ML
Preprocess photos using the Vision framework and classify them with a Core ML model.

image

Depth Anything V2

Depth Estimation

The Depth Anything model performs monocular depth estimation.

View details
Hide details

Model Info

Summary

Depth Anything v2 is a foundation model for monocular depth estimation. It maintains the strengths and rectifies the weaknesses of the original Depth Anything by refining the powerful data curation engine and teacher-student pipeline.

To train a teacher model, Depth Anything v2 uses purely synthetic, computer-generated images. This avoids problems created by using real images, which can limit monocular depth-estimation model performance due to noisy annotations and low resolution. The teacher model predicts depth information on unlabeled real images, and then uses only that new, pseudo-labeled data to train a student model. This helps avoid distribution shift between synthetic and real images.

On the depth estimation task, the Depth Anything v2 model optimizes and outperforms v1 especially in terms of robustness, inference speed, and image depth properties like fine-grained details, transparent objects, reflections, and complex scenes. Its refined data curation approach results in competitive performance on standard datasets (including KITTI, NYU-D, Sintel, ETH3D, and DIODE) and a more than 9% accuracy improvement over v1 and other community models on the new DA-2k evaluation set built for depth estimation.

Depth Anything v2 provides varied model scales and inference efficiency to support extensive applications and is generalizable for fine tuning to downstream tasks. It can be used in any application requiring depth estimation, such as 3D reconstruction, navigation, autonomous driving, and image or video generation.

Use Cases

Depth estimation, semantic segmentation

Links

Variants

Model Name	Size	Action
DepthAnythingV2SmallF16.mlpackage	49.8MB	Download
DepthAnythingV2SmallF16P6.mlpackage	19MB	Download

Variant	Parameters	Size	Weight Precision	Activation Precision
F32	24.8M	99.2	Float32	Float32
F16	24.8M	49.8	Float16	Float16

Inference Time

Variant	Device	OS	Inference Time (ms)	Compute Unit
Small F16	iPhone 15 Pro Max	17.4	33.90	All
Small F16	MacBook Pro M1 Max	15.0	33.48	All
Small F16	MacBook Pro M1 Max	15.0	32.78	GPU

image

DETR Resnet50 Semantic Segmentation

Semantic Segmentation

The DEtection TRansformer (DETR) model, trained for object detection and panoptic segmentation, configured to return semantic segmentation masks.

View details
Hide details

Model Info

Summary

The DETR model is an encoder/decoder transformer with a convolutional backbone trained on the COCO 2017 dataset. It blends a set of proven ML strategies to detect and classify objects in images more elegantly than standard object detectors can, while matching their performance.

The model is trained with a loss function that performs bipartite matching between predicted and ground-truth objects. At inference time, DETR applies self-attention to an image globally to predict all objects at once. Thanks to global attention, the model outperforms standard object detectors on large objects but underperforms on small objects. Despite this limitation, DETR demonstrates accuracy and run-time performance on par with other highly optimized architectures when evaluated on the challenging COCO dataset.

DETR can be easily reproduced in any framework that contains standard CNN and transformer classes. It can also be easily generalized to accommodate more complex tasks, such as panoptic segmentation and other tasks requiring a simple segmentation head trained on top of a pre-trained DETR.

DETR avoids clunky surrogate tasks and hand-designed components that traditional architectures require to achieve acceptable performance and instead provides a conceptually simple, easily reproducible approach that streamlines the object detection pipeline.

Use Cases

Object detection, panoptic segmentation

Links

Variants

Model Name	Size	Action
DETRResnet50SemanticSegmentationF16.mlpackage	85.5MB	Download
DETRResnet50SemanticSegmentationF16P8.mlpackage	43.1MB	Download

Variant	Parameters	Size	Weight Precision	Activation Precision
F32	43M	171	Float32	Float32
F16	43M	86	Float16	Float16

Inference Time

Variant	Device	OS	Inference Time (ms)	Compute Unit
F16	iPhone 15 Pro Max	17.6	39	All
F16	iPhone 15 Plus	17.6	43	All
F16	iPhone 14 Plus	17.6	50	All
F16	iPhone 14	17.5	51	All
F16	iPhone 13 Pro Max	17.6	51	All
F16	MacBook Pro M1 Max	15.0	117	All
F16	MacBook Pro M1 Max	15.0	43	GPU
F16P8	iPhone 15 Plus	18.0	40.73	All
F16P8	iPhone 13 Pro Max	17.6	51.53	All
F16P8	MacBook Pro M1 Max	15.0	36.52	All
F16P8	MacBook Pro M1 Max	15.0	33.14	GPU
F16P8	iPad Pro 5th Generation	18.0	62.49	All
F16P8	iPad Pro 4th Generation	18.0	1224	All

Example Projects

Using Core ML for Semantic Image Segmentation
Identify multiple objects in an image by using the DEtection TRansformer image-segmentation model.

text

BERT-SQuAD

Question Answering

Find answers to questions about paragraphs of text.

View details
Hide details

Model Info

Links

Variants

Model Name	Size	Action
BERTSQUADFP16.mlmodel	217.8MB	Download

Example Projects

Finding Answers to Questions in a Text Document
Locate relevant passages in a document by asking the Bidirectional Encoder Representations from Transformers (BERT) model a question.

image

DeeplabV3

Image Segmentation

Segment the pixels of a camera frame or image into a predefined set of classes.

View details
Hide details

Model Info

Links

Variants

Model Name	Size	Action
DeepLabV3.mlmodel	8.6MB	Download
DeepLabV3FP16.mlmodel	4.3MB	Download
DeepLabV3Int8LUT.mlmodel	2.3MB	Download

image

MNIST

Drawing Classification

Classify a single handwritten digit (supports digits 0-9).

View details
Hide details

Model Info

Links

Source dataset

Variants

Model Name	Size	Action
MNISTClassifier.mlmodel	395KB	Download

image

MobileNetV2

Image Classification

The MobileNetv2 architecture trained to classify the dominant object in a camera frame or image.

View details
Hide details

Model Info

Links

Variants

Model Name	Size	Action
MobileNetV2.mlmodel	24.7MB	Download
MobileNetV2FP16.mlmodel	12.4MB	Download
MobileNetV2Int8LUT.mlmodel	6.3MB	Download

Example Projects

Classifying Images with Vision and Core ML
Preprocess photos using the Vision framework and classify them with a Core ML model.

image

Resnet50

Image Classification

A Residual Neural Network that will classify the dominant object in a camera frame or image.

View details
Hide details

Model Info

Links

Variants

Model Name	Size	Action
Resnet50.mlmodel	102.6MB	Download
Resnet50FP16.mlmodel	51.3MB	Download
Resnet50Int8LUT.mlmodel	25.8MB	Download
Resnet50Headless.mlmodel	94.4MB	Download

Example Projects

Classifying Images with Vision and Core ML
Preprocess photos using the Vision framework and classify them with a Core ML model.

image

UpdatableDrawingClassifier

Drawing Classification

Drawing classifier that learns to recognize new drawings based on a K-Nearest Neighbors model (KNN).

View details
Hide details

Model Info

Links

Source code and documentation

Variants

Model Name	Size	Action
UpdatableDrawingClassifier.mlmodel	382KB	Download

Example Projects

Personalizing a Model with On-Device Updates
Learn to map drawings from a user to custom stickers by updating a drawing classification model on device.

image

YOLOv3

Object Detection

Locate and classify 80 different types of objects present in a camera frame or image.

View details
Hide details

Model Info

Links

Variants

Model Name	Size	Action
YOLOv3.mlmodel	248.4MB	Download
YOLOv3FP16.mlmodel	124.2MB	Download
YOLOv3Int8LUT.mlmodel	62.2MB	Download
YOLOv3Tiny.mlmodel	35.4MB	Download
YOLOv3TinyFP16.mlmodel	17.7MB	Download
YOLOv3TinyInt8LUT.mlmodel	8.9MB	Download

Example Projects

Recognizing Objects in Live Capture
Apply Vision algorithms to identify objects in real-time video.

Core ML Models

FastViT

Model Info

Summary

Use Cases

Links

Variants

Inference Time

Example Projects

Depth Anything V2

Model Info

Summary

Use Cases

Links

Variants

Inference Time

DETR Resnet50 Semantic Segmentation

Model Info

Summary

Use Cases

Links

Variants

Inference Time

Example Projects

BERT-SQuAD

Model Info

Links

Variants

Example Projects

DeeplabV3

Model Info

Links

Variants

MNIST

Model Info

Links

Variants

MobileNetV2

Model Info

Links

Variants

Example Projects

Resnet50

Model Info

Links

Variants

Example Projects

UpdatableDrawingClassifier

Model Info

Links

Variants

Example Projects

YOLOv3

Model Info

Links

Variants

Example Projects

No Results.