I am working on an app using FoundationModels to process web pages.
I am looking to find ways to filter the input to fit within the token limits.
I have unit tests, UI tests and the app running on an iPad in the simulator. It appears that the different configurations of the test environment seems to affect the token limits.
That is, the same input in a unit test and UI test will hit different token limits.
Is this correct? Or is this an artifact of my test tooling?
The token limit on the SystemLanguageModel is currently 4096 tokens. This is always the fixed token limit, there's no possibility of it changing.
See this tech note for more discussion of the context window: TN3193: Managing the on-device foundation model’s context window | Apple Developer Documentation
The tokenizer will always produce the same amount of tokens given the same input. So you shouldn't see any variation.
... One source of confusion might be, currently the error message for GenerationError.exceededContextWindowSize will print out a token size as soon as the token length of your content trips over the 4096 limit. So sometimes it might print the token size of your content is 4090 or maybe 4100... that number is when the error was triggered, the actual limit is still always 4096.