-
Best practices for integrating visual intelligence in your app
Gain insight on how visual intelligence can transform content discovery in your app. Explore how to define entities, process images, and handle multiple result types effectively. Learn best practices for optimizing speed and relevance, and discover how intents enable direct actions like opening or playing content with a single tap.
Chapters
- 0:07 - Introduction
- 2:02 - Defining your content
- 5:03 - Implementing a query
- 8:18 - Opening results
- 10:03 - Mac and iPad adoption
- 12:27 - Returning multiple result types
- 12:56 - Continuing search in your app
- 14:27 - System store integrations
- 17:16 - Next steps
Resources
-
Search this video…
Hi, I'm David, an ML engineer on System Experience.
Let's build something with Visual Intelligence.
In this session, I'll take you step by step through integrating your app with Visual Intelligence and share some best practices along the way. Since Visual Intelligence was introduced, people have been using it to quickly learn more about what's around them whether in their physical surroundings or on their iPhone screen.
This year, we're adding new capabilities like adding to contacts, saving multiple calendar events, and medical device logging, as well as bringing Visual Intelligence to iPad and macOS.
So how do you bring your app into this experience? I'll show you by building one. I love listening to and discovering new music. So I want to create an app that helps me discover albums and find upcoming concerts. Here's what we'll build today. This is my music app. I can browse albums, check out upcoming concerts, and start listening to anything with a tap. If I take a picture or screenshot of some album artwork, and highlight to search, my app shows matching albums and concerts right in Visual Intelligence. I can even capture a post about an upcoming concert, use Visual Intelligence to add the event to my calendar and the concert shows up automatically in my app. By the end of this session, you'll know how to build all of this. There are a few steps to making the most of your integration with Visual Intelligence, which I'll go over today. First, we'll define the content we want to return from our app using App entities.
Next, we'll implement a query so Visual Intelligence can find and return our app's content.
Then, we'll take our integration beyond iOS — bringing it to Mac and iPad, covering some considerations for each platform along the way.
And to wrap things up, we'll explore system store integrations where information extracted by Visual Intelligence can be read by your app automatically, from common data stores you may have already adopted. Let's start with the basics of Image Search. Integrating Image Search leverages both the App Intents and Visual Intelligence frameworks. If you're new to App Intents, I'd recommend checking out these sessions from WWDC25.
The first step to integrating Image Search is defining the content we want to return. We'll use App entities from the App Intents framework for this. App entities are the nouns within your app. In our app, I want our Image Search to first return visually similar albums, so I'll define an album entity.
Let's see how this looks in code.
We'll start by defining an AlbumEntity that Visual Intelligence can display in its search results. First I'll add a default EntityQuery and a typeDisplayRepresentation, which are standard for any App entity.
Then I'll define the content of the entity. Each AlbumEntity has an identifier, name, artistName, and thumbnail data for the album artwork.
And I'll add a displayRepresentation which tells Visual Intelligence how to present each result.
Let's talk about the display representation we just defined. This is the first thing people will see in the Image Search results. And there's not a lot of room — you get about three lines of text for a title and subtitle, as well as a thumbnail image.
It's good practice to put the most important identifying information here. In my case, the album name and the artist.
And if you initialize a display representation with an image URL, I'd recommend serving a thumbnail-sized image when appropriate, rather than pointing to your full-resolution asset.
For example, if you always expect to return multiple results, using smaller images can help your results load faster and still look good in a two column layout.
However, if you only return one result, keep in mind that this image will take up the full width of the results sheet. Now that I have my entity defined, how does Visual Intelligence actually query my app for results? That's where the Intent value query comes in. An Intent value query is a lightweight query protocol that provides entity values to the system.
You might already have one if you've adopted App Intents to make your app work with Siri.
For Visual Intelligence, the key difference is the input, the system passes a SemanticContentDescriptor containing information about the captured image. Let's jump back to the code to build this.
I'll adopt the IntentValueQuery protocol and implement its values for requirement with a SemanticContentDescriptor as input.
In the body, I'll grab the pixelBuffer from the input and pass it to my catalog.search method, which returns matching albums.
But how does that search actually work? Let's look at that next.
For this app, I'll search on device using a local catalog of saved albums. I'll use the Vision framework for this, which provides pre-trained machine learning models for computer vision tasks.
Each entry in our catalog will have a featurePrint, a compact numerical representation of the image, which we can use to compare image similarity.
I'll define a function to compute feature prints using GenerateImageFeaturePrintRequest.
I'll make sure to pre-compute these for albums in my catalog, so we don't need to do this computation at query time.
For our query, I'll first convert the pixelBuffer to a CGImage using VideoToolbox. Then, I'll generate a new feature print for this image.
I'll compare that against the pre-computed feature prints in my catalog, applying a maximum distance threshold to filter out dissimilar results.
Finally, I sort by similarity and return the top results.
A few things to note. I pre-compute feature prints for my album catalog, to keep the query fast. And I sort results by similarity so the best match appears first. Whether you're searching on device or hitting a server, the same principles apply — return results fast and ranked.
I'd also recommend limiting the number of results returned to ensure they're relevant. If you don't find any good matches, you can return an empty array. The system will handle displaying an empty response.
And I encourage you to check out the Vision framework APIs to learn more about image processing techniques you can use in your app. We just scratched the surface with feature prints, but you can do so much more like extract text, scan barcodes, detect faces, and classify images, just to name a few. These can be incredibly useful techniques for extending the capabilities of your app's visual search.
Now, how can we land people on the right screen of the app when they tap on a result? For that, we need an OpenIntent. When someone taps an album in the Image Search results, the system calls this intent with the selected entity.
My perform method navigates to the album detail page.
Your OpenIntent should take people straight to the content they selected.
If you already have an OpenIntent for your entity from adopting App Intents to power other features, you can reuse it here too. You don't need a separate one just for Visual Intelligence.
And it's recommended to keep this lightweight. This method runs as the app comes to the foreground, so do your navigation and save any heavy loading for after the view appears.
And that's everything you need for a basic Image Search integration. Let's take a look at what we've built so far.
My friend sent me this recommendation, let's use Visual Intelligence to start listening to it in our app. I'll take a screenshot, highlight to search, and choose our app from the available providers.
Our query worked, and we were able to find the album and return it as the top result.
It's worth mentioning that your app appears here alongside other adopting apps. The system decides the ordering based on which Image Search providers are available on the device. If I tap on this result, it takes me right to the album page in my app. That's our entity, our query, and our OpenIntent all working together. Now, let's bring this to more platforms. This year, Visual Intelligence is also available on iPadOS and macOS. The same APIs are available on these new platforms as well, with minimal changes needed to your app.
Your IntentValueQuery, your entities, and your OpenIntent all work across iOS, iPadOS, and macOS. That's the same code we just wrote.
That said, there are a few platform differences worth keeping in mind. On iOS, people often use Visual Intelligence through the camera — capturing physical objects like vinyl records or concert posters.
On macOS and iPad, the primary entry point is screenshots — capturing digital media. Make sure your search handles both kinds of content well.
Also keep in mind that on Mac, the input pixel buffer can be much larger than what you'd encounter on iPhone. Consider if resizing is necessary for your use case.
Let's build our app for macOS and see how it looks.
I'll take a screenshot of that same image, and with no changes to our query or entity code, our app's Image Search works on macOS. The result looks great.
Now that we've covered the basics, I want to add even more capabilities to our app.
What if we could search not only for visually similar albums, but also for upcoming concerts by artists of those albums? For that, we can use UnionValue. Since our app can only have one IntentValueQuery that accepts a SemanticContentDescriptor, I'll define a @UnionValue enum with a case for each entity type — album and concert.
And since I have two entity types now, I'll need an OpenIntent for each one.
Then I'll update my query to return this union type.
I'll search for the top matching albums first, then use the artists from those albums to find nearby concerts, and combine them into a single results list. Consider if it makes sense for your app to return multiple types of results. And it's worth thinking about the different types of content your app can return beyond simply matching pixels. I found albums through image similarity, then used those artist names to surface nearby concerts, a completely different kind of result.
Feel free to be creative about the type of content you return based on the context.
As a final touch, if people don't find the result they're looking for immediately, I want to provide an easy way for them to continue the search inside the app. We can use the semanticContentSearch schema to do that. I'll create an intent conforming to the semanticContentSearch schema. The system provides the semanticContent property automatically. That's the same SemanticContentDescriptor we saw before with the pixel buffer. In perform, I'll navigate to an in-app search view with some pre-populated search results.
Now when someone taps More results, they'll land in my app's full search experience. It's good practice to use semantic content search to give people a way to continue into your full search experience.
And you can pre-populate your search view based on the input context, rather than starting from scratch.
Your app can show much more than the Visual Intelligence results view — filters, categories, the full depth of your content. Take advantage of that.
Let's see everything we've built in action. I'll take another screenshot of this album, and my app returns matching albums and concerts right in Visual Intelligence.
And if I want to browse more, tapping the More results button takes me into my app's full search.
We've talked about your app providing results to Visual Intelligence. But there's another side to this story. Your app can also receive data from Visual Intelligence through system store integrations. Providing results to Visual Intelligence is done through the Image Search integration, which is everything we've built so far.
Other Visual Intelligence actions write data to system stores, which provide developers with a bridge to shared system data. Events can be read with EventKit, contact information with Contacts, and medical device readings with HealthKit. If your app already reads from the data stores in these frameworks, Visual Intelligence becomes a new source of input automatically.
For our app, I want to know upcoming concerts people are interested in so we can suggest songs for them to listen to beforehand. So let's add an EventKit integration to access these events.
This is my UpcomingConcertManager, which uses EKEventStore.
I'll request read access to calendar, then query for upcoming events.
For our app, I'll simply filter for events in the near future that match artists in my catalog.
I'll also add a notification observer so new events, including ones created by Visual Intelligence, appear automatically.
Now let's see the final piece.
When I capture this social media post about an upcoming concert, Visual Intelligence detects the event so I can add it to my calendar.
When I open my app, it's already there in Upcoming Concerts, with a suggestion to start listening. The same pattern applies to other system stores. Contacts added through Visual Intelligence, for example from a business card, can be accessed through CNContactStore.
And medical device readings captured by Visual Intelligence from displays on blood pressure monitors, glucose meters or weight scales can be queried using HKHealthStore. If your health or fitness app reads from HealthKit, Visual Intelligence becomes another way for people to log data without manual entry. We've covered a lot today. To recap, Visual Intelligence offers two powerful integration points for your app.
You can provide results to Visual Intelligence through Image Search, and you can receive data from Visual Intelligence through system store integrations. With Visual Intelligence now available on iOS, iPadOS, and macOS, your integration can reach people across their devices. If you want to learn more, check out the documentation available on the developer website.
You can also view these related sessions to explore further capabilities in App Intents and the Vision framework. Thanks for watching. I can't wait to see what you build with Visual Intelligence.
-
-
3:21 - Define the content you want to return as an App Entity
// Define the content you want to return as an App Entity import AppIntents struct AlbumEntity: AppEntity { var id: String @Property var name: String @Property var artistName: String var coverArtData: Data var displayRepresentation: DisplayRepresentation { DisplayRepresentation( title: "\(name)", subtitle: "\(artistName)", image: .init(data: coverArtData) ) } static let defaultQuery = AlbumEntityQuery() static var typeDisplayRepresentation: TypeDisplayRepresentation { "Album" } } struct AlbumEntityQuery: EntityQuery { @Dependency var catalog: AlbumCatalog func entities(for identifiers: [String]) async throws -> [AlbumEntity] { catalog.albums(for: identifiers) } } -
5:39 - Adopt IntentValueQuery to return results
// Adopt IntentValueQuery to return visual search results import AppIntents import VisualIntelligence struct SearchHandler: IntentValueQuery { @Dependency var catalog: AlbumCatalog @Dependency var concertFinder: ConcertFinder func values(for input: SemanticContentDescriptor) async throws -> [VisualSearchResult] { guard let pixelBuffer = input.pixelBuffer else { return [] } let albums = try await catalog.search(matching: pixelBuffer) return albums.map { VisualSearchResult.album($0) } } } -
6:24 - Build a catalog of albums with precomputed feature prints
// Build a catalog of albums with precomputed feature prints import Vision @Observable class AlbumCatalog { static let shared = AlbumCatalog() struct CatalogEntry: Sendable { let album: AlbumEntity let featurePrint: FeaturePrintObservation } private(set) var entries: [CatalogEntry] = [] private func generateFeaturePrint( for image: CGImage ) async throws -> FeaturePrintObservation { let request = GenerateImageFeaturePrintRequest() let result = try await request.perform(on: image) return result } } -
6:45 - Search the catalog for albums matching the captured image
// Search the catalog for albums matching the captured image func search(matching pixelBuffer: CVReadOnlyPixelBuffer, limit: Int = 10, maxDistance: Double = 1.0) async throws -> [AlbumEntity] { var cgImage: CGImage? _ = pixelBuffer.withUnsafeBuffer { VTCreateCGImageFromCVPixelBuffer($0, options: nil, imageOut: &cgImage) } guard let cgImage else { return [] } let queryPrint = try await generateFeaturePrint(for: cgImage) return try entries.compactMap { entry -> (album: AlbumEntity, distance: Double)? in let distance = try queryPrint.distance(to: entry.featurePrint) guard distance <= maxDistance else { return nil } return (entry.album, distance) } .sorted { $0.distance < $1.distance } .prefix(limit) .map { $0.album } } -
8:27 - Create an open intent to land users on the right screen
// Create an open intent to land users on the right screen import AppIntents struct OpenAlbumIntent: OpenIntent { static let title: LocalizedStringResource = "Open Album" @Parameter(title: "Album") var target: AlbumEntity @Dependency var appState: AppState func perform() async throws -> some IntentResult { await appState.openAlbum(id: target.id) return .result() } } -
12:05 - Use UnionValue to return multiple visual search result types
// Use UnionValue to return multiple visual search result types @UnionValue enum VisualSearchResult { case album(AlbumEntity) case concert(ConcertEntity) } struct OpenConcertIntent: OpenIntent { static let title: LocalizedStringResource = "Open Concert" @Parameter(title: "Concert") var target: ConcertEntity @Dependency var appState: AppState func perform() async throws -> some IntentResult { await appState.openConcert(id: target.id) return .result() } } -
12:18 - Expand the IntentValueQuery to return the UnionValue
// Expand the IntentValueQuery to return the UnionValue struct SearchHandler: IntentValueQuery { @Dependency var catalog: AlbumCatalog @Dependency var concertFinder: ConcertFinder func values(for input: SemanticContentDescriptor) async throws -> [VisualSearchResult] { guard let pixelBuffer = input.pixelBuffer else { return [] } let albums = try await catalog.search(matching: pixelBuffer) let artists = albums.map { $0.artistName } let concerts = await concertFinder.findNearby(byArtists: artists) return albums.map { VisualSearchResult.album($0) } + concerts.map { VisualSearchResult.concert($0) } } } -
13:13 - Provide a link to in-app search
// Provide a link to in-app search @AppIntent(schema: .visualIntelligence.semanticContentSearch) struct SemanticContentSearchIntent: AppIntent { static let title: LocalizedStringResource = "Search in app" static let openAppWhenRun: Bool = true var semanticContent: SemanticContentDescriptor @Dependency var catalog: AlbumCatalog @Dependency var concertFinder: ConcertFinder @Dependency var appState: AppState func perform() async throws -> some IntentResult { guard let pixelBuffer = semanticContent.pixelBuffer else { return .result() } let albums = try await catalog.search(matching: pixelBuffer) let artists = albums.map { $0.artistName } let concerts = await concertFinder.findNearby(byArtists: artists) await appState.openSearch(albums: albums, concerts: concerts) return .result() } } -
15:24 - Request calendar access and fetch upcoming concerts
// Request calendar access and fetch upcoming concerts import EventKit @Observable class UpcomingConcertManager { private let eventStore = EKEventStore() var upcomingConcerts: [EKEvent] = [] var authorizationStatus: EKAuthorizationStatus = .notDetermined func requestAccessAndFetch() async throws { let granted = try await eventStore.requestFullAccessToEvents() guard granted else { authorizationStatus = .denied return } authorizationStatus = .fullAccess await fetchUpcomingConcerts() // ... } } -
15:42 - Filter for upcoming events that match known artists in our catalog
// Filter for upcoming events that match known artists in our catalog class UpcomingConcertManager { func fetchUpcomingConcerts() async { let predicate = eventStore.predicateForEvents( withStart: .now, end: .now.addingTimeInterval(90 * 24 * 60 * 60), calendars: nil ) let events = eventStore.events(matching: predicate) upcomingConcerts = events.filter { event in AlbumCatalog.shared.entries.contains { entry in event.title?.localizedCaseInsensitiveContains(entry.album.artistName) == true } } } } -
15:44 - Observe newly created events
// Observe newly created events @Observable class UpcomingConcertManager { // ... func requestAccessAndFetch() async throws { // ... for await _ in NotificationCenter.default .notifications( named: .EKEventStoreChanged ) { await fetchUpcomingConcerts() } } }
-
-
- 0:07 - Introduction
Visual Intelligence integration and what's new in iOS 26, iPadOS, and macOS, using a sample music-discovery app built throughout the session. Outlines the agenda: defining content, implementing a query, cross-platform adoption, and system store integrations.
- 2:02 - Defining your content
Model your app's content as an AppEntity so Visual Intelligence can display it in search results. Covers the entity's DisplayRepresentation (title, subtitle, thumbnail) and best practices around concise identifying text and thumbnail-sized images.
- 5:03 - Implementing a query
IntentValueQuery returns results from a SemanticContentDescriptor's pixel buffer — using the Vision framework's GenerateImageFeaturePrintRequest for on-device image similarity, with pre-computed feature prints and distance thresholds to keep results fast.
- 8:18 - Opening results
Implement an OpenIntent to take people straight to the selected content. Keep it lightweight since it runs as the app foregrounds, and reuse an existing OpenIntent rather than creating one specific to Visual Intelligence.
- 10:03 - Mac and iPad adoption
The same entities, query, and OpenIntent carry over to iPadOS and macOS with minimal changes. Account for platform differences such as camera versus screenshot input and the much larger pixel buffers on Mac that may need resizing.
- 12:27 - Returning multiple result types
The @UnionValue type returns more than one entity type from a single query — here albums plus nearby concerts — encouraging you to derive related content rather than only matching pixels.
- 12:56 - Continuing search in your app
The semanticContentSearch schema lets people continue into your full in-app search — pre-populating results from the captured context so they land on filters, categories, and deeper content.
- 14:27 - System store integrations
Visual Intelligence can also write data your app reads back via system stores: events through EventKit (EKEventStore), contacts via CNContactStore, and medical-device readings via HealthKit (HKHealthStore). Observe store-change notifications so captured data appears automatically.
- 17:16 - Next steps
Recaps the two integration points, Image Search and system stores, across iOS, iPadOS, and macOS. Points to documentation and related App Intents and Vision sessions.