Get started with Foundation Models adapter training

Teach the on-device language model new skills specific to your app by training a custom adapter. This toolkit contains a Python training workflow and utilities to package adapters for use with the Foundation Models framework.

    Overview

    While the on-device system language model is powerful, it may not be capable of all specialized tasks. Adapters are an advanced technique that adapt a large language model (LLM) with new skills or domains. With the adapter training toolkit, you can train adapters to specialize the on-device system LLM's abilities, and then use your adapter in apps with the Foundation Models framework. On this page you can download the toolkit and learn about the adapter training process. To use custom adapters in your app, see the framework guide Loading and using a custom adapter with Foundation Models.

    The adapter training toolkit contains:

    • Python sample code for each adapter training step
    • Model assets that match a specific system model version
    • Utilities to export an .fmadapter package
    • Utilities to bundle adapters as asset packs for Background Assets

    Foundation Models Framework Adapter Entitlement

    When you’re ready to deploy adapters in your app, the Account Holder of a membership in the Apple Developer Program will need to request the Foundation Models Framework Adapter Entitlement. You don't need this entitlement to train or locally test adapters.

    Get entitlement

    Download toolkit

    To download any adapter training toolkit version, you’ll need to be a member of the Apple Developer Program and will first need to agree to the terms and conditions of the toolkit.

    Get toolkit

    Remember you may need to download multiple toolkit versions. Each version contains the unique model assets compatible with a specific OS version range. To support people on different OS versions using your app, you must train an adapter for each version of the toolkit.

    Version Changes OS Compatibility
    Beta 0.1.0 (removed) Initial release. macOS 26, iOS 26, iPadOS 26, visionOS 26
    Beta 0.2.0 (removed) Updates for new base model version. Updated data schema with support for tool-calling. New data schema utility for guided generation. macOS 26, iOS 26, iPadOS 26, visionOS 26
    26.0.0 First full toolkit version. New support for custom data transforms in training pipeline. Updated guided generation transform utility. macOS 26, iOS 26, iPadOS 26, visionOS 26

    When do new versions come out? A new toolkit will be released for every system model update. The system model is shared across iOS, macOS, and visionOS, and system model updates will occur as part of those platforms’ OS updates (though not every OS update will have a model update). Be sure to install and use the latest beta software releases so that you have time to train a new adapter before people start using your app with the new system model version. Additionally, with the Foundation Models Framework Adapter Entitlement, the Account Holder of your membership in the Apple Developer Program will get an email update when a new toolkit version is available. Otherwise, when a new beta comes out, check here for any new toolkit versions.

    How to train adapters

    This guide provides a conceptual walkthrough of the steps to train an adapter. Each toolkit version also includes a sample code end-to-end Jupyter notebook in ./examples.

    Requirements

    • Mac with Apple silicon and at least 32GB memory, or Linux GPU machines
    • Python 3.11 or later

    1. When to consider an adapter

    Adapters are an effective way to teach the model specialized tasks, but they have steep requirements to train (and re-train for OS updates), so adapters aren’t suitable for all situations. Before considering adapters, try to get the most out of the system model using prompt engineering or tool calling. With the Foundation Models framework, tool calling is an effective way to give the system model access to outside knowledge sources or services.

    Adapter training is worth considering if you have a dataset suitable for use with an LLM, or if your app is already using a fine-tuned server-based LLM and you want to try replicating that functionality with the on-device LLM for reduced costs. Other reasons to use an adapter include:

    • You need the model to become a subject-matter expert.
    • You need the model to adhere to a specific style, format, or policy.
    • Prompt engineering isn’t achieving the required accuracy or consistency for your task.
    • You want lower latency at inference. If your prompt-engineered solutions require lengthy prompts with examples for every call, an adapter specialized for that task offers minimal prompting.

    Take into consideration that you will need:

    • A dataset of prompt and response pairs that demonstrate your target skill
    • A process for evaluating the quality of your adapters
    • A process to load your adapters into your app from a server

    Each adapter will take approximately 160 MB of storage space in your app. Like other big assets, adapters shouldn’t be part of your app’s main bundle because with multiple adapter versions your app will become too big for people to install. Instead, host your adapters on a server so that each person using your app can download just one adapter compatible with their device using the Background Assets framework. For more on how, see the documentation guide Loading and using a custom adapter with Foundation Models.

    2. Set up virtual environment

    Once you’ve downloaded the toolkit, it’s recommended to set up a Python virtual environment, using a Python environment manager like conda or venv:

    conda create -n adapter-training python=3.11
    conda activate adapter-training
    cd /path/to/toolkit

    3. Install dependencies

    Next, use pip to install all the packages required by the toolkit:

    pip install -r requirements.txt

    Finally, start running the toolkit’s walkthrough Jupyter notebook to finish setup:

    jupyter notebook ./examples/end_to_end_example.ipynb

    4. Test generation

    Verify your setup is ready by loading and running inference with system base model assets in the assets folder. The Jupyter notebook in examples demonstrates how to run inference, or you can run examples/generate.py from the command line:

    python -m examples.generate --prompt "Prompt here"

    5. Prepare a dataset

    To train an adapter, you’ll need to prepare a dataset in the jsonl format expected by the model. As a rough estimate of how much data you’ll need, consider:

    • 100 to 1,000 samples to teach the model basic tasks
    • 5,000+ samples to teach the model complex tasks

    The full expected data schema, including special fields you need to support guided generation and improve AI safety, can be found in the toolkit in Schema.md. The most basic schema is a list of prompt and response pairs:

    jsonl
    [{"role": "user", "content": "PROMPT"}, {"role": "assistant", "content": "RESPONSE"}]

    Here "role" identifies who is providing the content. The role "user" can refer to any entity providing the input prompt, such as you the developer, people using your app, or a mix of sources. The role "assistant" always refers to the model. Replace the "content" values above with your prompt and response, which can be text written in any language supported by Apple Intelligence.

    Utilities to help you prepare your data, including options for specifying language and locale, can be found in examples/data.py.

    After formatting, split your data into train and eval sets. The train set is used to optimize the adapter parameters during training. The eval set is used to monitor performance during training, such as identifying overfitting, and providing feedback to help you tune hyper-parameters.

    6. Start adapter training

    Adapter training is faster and less memory-intensive than fine-tuning an entire large language model. This is because the system model uses a parameter-efficient fine-tuning (PEFT) approach known as LoRA (Low-Rank Adaptation). In LoRA, the original model weights are frozen, and small trainable weight matrices called “adapters” are embedded through the model’s network. During training, only adapter weights are updated, significantly reducing the number of parameters to train. This approach also allows the base system model to be shared across many different use cases and apps that can each have a specialized adapter.

    Start training by running the walkthrough Jupyter notebook in examples, or the sample code in examples/train_adapter.py. You can modify and customize the training sample code to meet your use cases’s needs. For convenience, examples/train_adapter.py can be run from the command line:

    python -m examples.train_adapter \
    --train-data /path/to/train.jsonl \
    --eval-data /path/to/valid.jsonl \
    --epochs 5 \
    --learning-rate 1e-3 \
    --batch-size 4 \
    --checkpoint-dir /path/to/my_checkpoints/

    Use the data you prepared for train-data and eval-data. The additional training arguments are:

    • epochs is number of training iterations. More epochs will take longer, but may improve your adapter’s quality.
    • learning-rate is a floating-point number indicating how much to adjust the model’s parameters at each step. Adjustments should be tailored to the specific use case.
    • batch-size is the number of examples in a single training step. Choose batch size based on the machine you’re running the training process on.
    • checkpoint-dir is a folder you create so that the training process can save checkpoints of your adapter as it trains.

    During and after training, you can compare your adapter’s checkpoints to pick the one that best meets your quality goals. Checkpoints are also handy for resuming training in case the process fails midway, or you decide to train again for a few more epochs.

    7. Optionally train the draft model

    After training an adapter, you can train a matching draft model. Each toolkit includes assets for the system draft model, which is a small version of the system base model that can speed up inference via a technique called speculative decoding. Training the draft model is very similar to training an adapter, with some additional metrics so that you can measure how much your draft model speeds up inference. This step is optional. If you choose not to train the draft model, speculative decoding will not be available for your adapter’s use case. For more details on how draft models work, please refer to the papers Leviathan et al., 2022 (arXiv:2211.17192) and Chen et al., 2023 (arXiv:2302.01318).

    Just like adapter training, you can train using the examples Jupyter notebook, or by running the sample code in train_draft_model.py from the command line:

    python -m examples.train_draft_model \
    --checkpoint /path/to/my_checkpoints/adapter-final.pt \
    --train-data /path/to/train.jsonl \
    --eval-data /path/to/valid.jsonl \
    --epochs 5 \
    --learning-rate 1e-3 \
    --batch-size 4 \
    --checkpoint-dir /path/to/my_checkpoints/

    Training arguments are the same as training an adapter, except for:

    • checkpoint is the base model checkpoint after adapter training as the target for draft model training. Choose the checkpoint you intend to export for your adapter.
    • checkpoint-dir is where you’d like your draft model checkpoints saved

    After you train the draft model, if you’re not seeing much inference speedup, try experimenting with retraining the draft model using different hyper-parameters, more epochs, or alternative data to improve performance.

    8. Evaluate adapter quality

    Congratulations, you’ve trained an adapter! After training, you will need to evaluate how well your adapter has improved the system model’s behavior for your specific use case. Since each adapter is specialized, evaluation needs to be a custom process that makes sense for your specific use case. Typically, adapters are evaluated by both quantitative metrics, such as match to a target dataset, and qualitative metrics, such as human grading or auto-grading by a larger server-based LLM. You will want to come up with a standardized eval process, so that you can evaluate each of your adapters for each model version, and ensure they all meet your performance goals. Be sure to also evaluate your adapter for AI safety.

    To start running inference with your new adapter, see the walkthrough Jupyter notebook, or call the sample code examples/generate.py from the command line:

    python -m examples.generate \
    --prompt "Your prompt here" \
    --checkpoint /path/to/my_checkpoints/adapter-final.pt \
    --draft-checkpoint /path/to/my_checkpoints/draft-model-final.pt

    Include the arguments draft_checkpoint only if you trained a draft model.

    9. Export adapter

    When you’re ready to export, the toolkit includes utility functions to export your adapter in the .fmadapter package format Xcode and Foundation Models framework expect. Unlike all the customizable sample code for training, code in the export folder should not be modified, since the export logic must match exactly to make your adapter compatible with the system model and Xcode.

    Export is covered in the walkthrough Jupyter notebook in examples, and the export utility can be run from the command line:

    python -m export.export_fmadapter \
    --adapter-name my_adapter \
    --checkpoint /path/to/my_checkpoints/adapter-final.pt \
    --draft-checkpoint /path/to/my_checkpoints/draft-model-final.pt \
    --output-dir /path/to/my_exports/

    If you trained the draft model, the --draft_checkpoint argument will bundle your draft model checkpoint as part of the .fmadapter package. Exclude this argument otherwise.

    Now that you have my_adapter.fmadapter, you’re ready to start using your custom adapter with the Foundation Models framework. For next steps, check out the framework documentation guide Loading and using a custom adapter with Foundation Models.