Visual Intelligence and screen/camera understanding for third-party apps

Visual Intelligence lets users ask Siri about what the camera or screen shows, and the screenshot tool can extract structured data into system apps.

  • Can a third-party app contribute results or actions when the user invokes Visual Intelligence over the app's own content or a screenshot of it (analogous to how a schedule becomes calendar events), and what API surfaces that?
  • For the Image Playground API, what are the content, rate, and style constraints, and can generated assets be used in commercial app contexts?
  • Is there a supported way for an app to provide its own visual understanding to the system rather than relying solely on Apple's model — for domain-specific imagery the on-device model may not recognize?

Thanks for your question. This QA session focuses on the Foundation Models framework. Your question is related to visual intelligence and image playground, and so we suggest that you ask in the main forums for folks with expertise in that area to comment.

Visual Intelligence and screen/camera understanding for third-party apps
 
 
Q