Foundational Model - Image as Input? Timeline

Question

Created Sep ’25

Replies 1

Boosts 0

Participants 2

Hi all, I am interested in unlocking unique applications with the new foundational models. I have a few questions regarding the availability of the following features:

Image Input: The update in June 2025 mentions "image" 44 times (https://machinelearning.apple.com/research/apple-foundation-models-2025-updates) - however I can't seem to find any information about having images as the input/prompt for the foundational models. When will this be available? I understand that there are existing Vision ML APIs, but I want image input into a multimodal on-device LLM (VLM) instead for features like "Which player is holding the ball in the image", etc (image understanding)
Cloud Foundational Model - when will this be available?

Thanks!

Clement :)

Boost

Answer 1

Apple Designer OP

Apple

Sep ’25

Sorry, we can't discuss any roadmaps, timelines, or future plans from Apple.

0