You can probably start with profiling your app with Instruments.app, as discussed in the WWDC25 code along session (starting at 24:32). How to set up the Foundation Models instrument is detailed here. The Foundation Models instrument provides the token numbers the models generate. From there, you can calculate how many tokens per second. The number can vary a lot, but if it is consistently much worse than 20~30/s, I'd suggest that you file a feedback report and share your report ID here. The WWDC25 session also discusses how to use prewarm and includesSchemaInInstructions to improve performance in cases that are appropriate. You can check if that can be applied to your app. Best, —— Ziqiao Chen Worldwide Developer Relations.
Topic:
Machine Learning & AI
SubTopic:
Foundation Models