Hi all, I am interested in unlocking unique applications with the new foundational models. I have a few questions regarding the availability of the following features:
- Image Input: The update in June 2025 mentions "image" 44 times (https://machinelearning.apple.com/research/apple-foundation-models-2025-updates) - however I can't seem to find any information about having images as the input/prompt for the foundational models. When will this be available? I understand that there are existing Vision ML APIs, but I want image input into a multimodal on-device LLM (VLM) instead for features like "Which player is holding the ball in the image", etc (image understanding)
- Cloud Foundational Model - when will this be available?
Thanks!
Clement :)
