Context window 90% of adapter model full after single user prompt

I have been able to train an adapter on Google's Colaboratory.

I am able to start a LanguageModelSession and load it with my adapter.

The problem is that after one simple prompt, the context window is 90% full.

If I start the session without the adapter, the same simple prompt consumes only 1% of the context window.

Has anyone encountered this? I asked Claude AI and it seems to think that my training script needs adjusting. Grok on the other hand is (wrongly, I tried) convinced that I just need to tweak some parameters of LanguageModelSession or SystemLanguageModel.

Thanks for any tips.

Hey Michael! To better understand your issue, what parameters did you use when training your adapter (max seq length, pack sequence, batch size, etc)? Do you mind copying your AdapterTrainingConfiguration?

Are you using tools in your training data system message? Do you mind sharing a training sample entry?

Hi! Just want to check if this is still an issue after tool calling succeeded from your other thread post?

Thanks for checking in on me. I've been building a comprehensive comparison test. I'll post it here when I have something for you.

Hi, I have put together a pair of unit tests that run the same scenario against two separate language models:

  1. the Apple Foundation base model,
  2. my fine-tuned adapter model.

While both are able to successfully complete a first prompt/reply turn, the LanguageModelSession that is running against the adapter model runs out of context window in turn 2.

A very important nuance is that while both models operate with the same system prompt:

a) for the unit test running against the base model, the system prompt is passed as "instructions" when instantiating the LanguageModelSession

b) in the unit test running against the adapter model, the system prompt is baked into the training data.

Here's the link to the analysis of the behaviour of the two tests and how they differ (compiled by Claude, as you'll no doubt detect from the superb over-confidence on display that is typical of AI agents) :

https://github.com/MAOShea/Hello-World-Tools-Adapter-SwiftUI/blob/main/SUPPORT_REQUEST_TranscriptStorageDifference.md

The two log files that it is analysing are :

  1. the base model : https://github.com/MAOShea/Hello-World-Tools-Adapter-SwiftUI/blob/7a7016f7a90c4606fd834a37dd58da11d0f9419e/TestRuns/baseModel_DiagnosticInspectTranscriptEntries.log

  2. the adapter model: https://github.com/MAOShea/Hello-World-Tools-Adapter-SwiftUI/blob/7a7016f7a90c4606fd834a37dd58da11d0f9419e/TestRuns/adapterModel_DiagnosticInspectTranscriptEntries.log

I'll be happy to share any code or training data that you'd request as we investigate this problem together.

Kind regards,

Michael O'Shea

Hi Carina,

here's the requested information:

1) Sample of training data:

See attached sample.json

**2) AdapterTrainingConfiguration **

config3 = AdapterTrainingConfiguration(
    epochs=6,
    learning_rate=1e-4,
    batch_size=2,  # Try increasing if memory allows
    gradient_accumulation_steps=4,  # Reduce accordingly
    enable_activation_checkpointing=True,
    precision='bf16-mixed',
    max_sequence_length=4095,
    compile_model=False
)

train_adapter(
    train_data=TRAIN_FILE,
    eval_data=VALID_FILE,
    config=config3,
    checkpoint_dir='/content/drive/MyDrive/checkpoints'
)

3) at inference time (from my unit tests)

https://github.com/MAOShea/Hello-World-Tools-Adapter-SwiftUI/blob/main/Hello%20World%20ToolsTests/LanguageModelComparisonTests.swift

struct SessionFactory {
    static func createSession(
        modelType: ModelType,
        systemPrompt: SystemPromptVersion
    ) throws -> LanguageModelSession {
        let tools = [WriteUbersichtWidgetToFileSystem()]
        
        switch modelType {
        case .base:
            let instructions = systemPrompt.prompt
            return LanguageModelSession(
                tools: tools,
                instructions: instructions
            )
            
        case .adapter(let adapterURL):
            let adapter = try SystemLanguageModel.Adapter(fileURL: adapterURL)
            let customAdapterModel = SystemLanguageModel(adapter: adapter)

            return LanguageModelSession(
                model: customAdapterModel,
                tools: tools
            )
        }
    }
}

Note the switch and the two cases.

Hi Carina. Any news for me?

Thanks.

Michael

bump.

Can I please have an update?

Thank you!

Michael O.

an update would be appreciated.

Hey Michael! Your system prompt (in the sample.json and in your post https://developer.apple.com/forums/thread/805970) seems to be extremely verbose. Instead of embedding all instructions/examples in the system message, we want to try to just let the model learn the patterns from training examples; in other words we don't want the model to learn how to tool call from the verbose system prompt.

Is that something you can change? We want to follow a format like so:

    {
      "role": "system",
      "content": "A conversation between a user and a helpful assistant. You are an Übersicht widget
  designer.",
      "tools": [/* tool definition */]
    },
    {
      "role": "user",
      "content": "Create a music player widget"
    },
    {
      "role": "assistant",
      "content": "",
      "tool_calls": [/* tool call */]
    }
  ]

thanks for the suggestion! I'll deal with it when I get back from Christmas. Merry Christmas!!!

a short update to say that I'm now dealing with Python package dependency issues on GoogleColab. I trudge on.

Finally, an update. Sorry for taking so long.

I ran my test suite with the system prompt that you suggested.

Sadly, even with the stripped down prompt, the context window gets filled up immediately.

I track this by analysing session._transcript. I see the tool call in the transcript and the values of the arguments are included.

I am starting to think that this is in fact working as designed and not even a bug.

I am starting to doubt that it's actually possible to have a multi-turn conversation that generates then iteratively modifies a block of code.

Can you suggest alternative strategies towards getting this to work?

Thanks!

Michael

Hi @MichaelOShea,

I've reviewed this thread and your logs. There are two aspects I want to mention:

  1. We suggest using the on-device model for simple tasks with a small context window. You can find a list of capabilities (and what to avoid) under "Understand model capabilities." If you can pair down your instructions and prompt to a very basic single task that can be described in a sentence or two, then that may be a good candidate for Foundation Models. For large tasks, especially those with multiple steps to process complex information or more than a few paragraphs, it may be better to utilize a server-side frontier model. The system prompt you linked to likely falls into this category.

  2. Depending on how your adapter was trained, it may not be tuned for the on-device environment, which is further contributing to why it exceeds the context window. If this is the case then we suggest using the standard SystemLanguageModel and implementing your custom functionality via instructions and prompting instead.

We also released a Technote recently that specifically addresses working with the context window, and I highly suggest reading it: Managing the on-device foundation model’s context window

Best,

-J

Context window 90% of adapter model full after single user prompt
 
 
Q