Context window 90% of adapter model full after single user prompt

Question

MichaelOShea OP

Created 3w

Replies 6

Boosts 0

Views 1.5k

Participants 3

I have been able to train an adapter on Google's Colaboratory.

I am able to start a LanguageModelSession and load it with my adapter.

The problem is that after one simple prompt, the context window is 90% full.

If I start the session without the adapter, the same simple prompt consumes only 1% of the context window.

Has anyone encountered this? I asked Claude AI and it seems to think that my training script needs adjusting. Grok on the other hand is (wrongly, I tried) convinced that I just need to tweak some parameters of LanguageModelSession or SystemLanguageModel.

Thanks for any tips.

Boost

Answer 1

carinapeng OP

3w

Hey Michael! To better understand your issue, what parameters did you use when training your adapter (max seq length, pack sequence, batch size, etc)? Do you mind copying your AdapterTrainingConfiguration?

Are you using tools in your training data system message? Do you mind sharing a training sample entry?

0

Answer 2

carinapeng OP

3w

Hi! Just want to check if this is still an issue after tool calling succeeded from your other thread post?

0

Answer 3

MichaelOShea OP

2w

Thanks for checking in on me. I've been building a comprehensive comparison test. I'll post it here when I have something for you.

0

Answer 4

MichaelOShea OP

2w

Hi, I have put together a pair of unit tests that run the same scenario against two separate language models:

the Apple Foundation base model,
my fine-tuned adapter model.

While both are able to successfully complete a first prompt/reply turn, the LanguageModelSession that is running against the adapter model runs out of context window in turn 2.

A very important nuance is that while both models operate with the same system prompt:

a) for the unit test running against the base model, the system prompt is passed as "instructions" when instantiating the LanguageModelSession

b) in the unit test running against the adapter model, the system prompt is baked into the training data.

Here's the link to the analysis of the behaviour of the two tests and how they differ (compiled by Claude, as you'll no doubt detect from the superb over-confidence on display that is typical of AI agents) :

https://github.com/MAOShea/Hello-World-Tools-Adapter-SwiftUI/blob/main/SUPPORT_REQUEST_TranscriptStorageDifference.md

The two log files that it is analysing are :

I'll be happy to share any code or training data that you'd request as we investigate this problem together.

Kind regards,

Michael O'Shea

0

Answer 5

MichaelOShea OP

1w

Hi Carina,

here's the requested information:

1) Sample of training data:

See attached sample.json

sample.json

**2) AdapterTrainingConfiguration **

config3 = AdapterTrainingConfiguration(
    epochs=6,
    learning_rate=1e-4,
    batch_size=2,  # Try increasing if memory allows
    gradient_accumulation_steps=4,  # Reduce accordingly
    enable_activation_checkpointing=True,
    precision='bf16-mixed',
    max_sequence_length=4095,
    compile_model=False
)

train_adapter(
    train_data=TRAIN_FILE,
    eval_data=VALID_FILE,
    config=config3,
    checkpoint_dir='/content/drive/MyDrive/checkpoints'
)

3) at inference time (from my unit tests)

https://github.com/MAOShea/Hello-World-Tools-Adapter-SwiftUI/blob/main/Hello%20World%20ToolsTests/LanguageModelComparisonTests.swift

struct SessionFactory {
    static func createSession(
        modelType: ModelType,
        systemPrompt: SystemPromptVersion
    ) throws -> LanguageModelSession {
        let tools = [WriteUbersichtWidgetToFileSystem()]
        
        switch modelType {
        case .base:
            let instructions = systemPrompt.prompt
            return LanguageModelSession(
                tools: tools,
                instructions: instructions
            )
            
        case .adapter(let adapterURL):
            let adapter = try SystemLanguageModel.Adapter(fileURL: adapterURL)
            let customAdapterModel = SystemLanguageModel(adapter: adapter)

            return LanguageModelSession(
                model: customAdapterModel,
                tools: tools
            )
        }
    }
}

Note the switch and the two cases.

0

Answer 6

MichaelOShea OP

3d

Hi Carina. Any news for me?

Thanks.

Michael

0