-
画像理解の新機能
最新のVisionフレームワークとFoundation Modelフレームワークのアップデートを活用し、高品質な画像理解を実現しましょう。新しいTap to Segmentリクエストにより画像を新しい方法でセグメント化できるようになったほか、VisionはwatchOSにも対応するようになりしました。AppleのFoundation Modelによる画像の新たなサポートをOCR、バーコードスキャン、デベロッパ独自のツールと組み合わせることで、LLMを活用した高度なビジュアル理解をアプリ上で提供できます。
関連する章
- 0:00 - Introduction
- 1:36 - Segment images with tap-to-segment
- 5:50 - Image inputs for Foundation Models
- 7:57 - Image-based tool calling
- 13:09 - Vision on watchOS
- 14:39 - Next steps
リソース
- Segmenting objects using taps, scribbles or rectangles
- Implementing saliency-based image cropping in iOS and watchOS
関連ビデオ
WWDC26
WWDC25
WWDC24
-
このビデオを検索
-
-
4:15 - Segment images (tap-to-segment)
// Generate a segmentation mask of an object with a seed point let handler = ImageRequestHandler(image) let request = GenerateIterativeSegmentationRequest(seed: point) let observation = try await handler.perform(request) let mask = observation?.pixelBuffer // Refine the mask with a new point request.addIncludedPoint(newPoint) let refinedObservation = try await handler.perform(request) -
6:41 - Generate an image caption with Foundation Models
// Generate an image caption with Foundation Models import FoundationModels let prompt = Prompt { "Generate a caption for this image" Attachment(image) } let response = try await session.respond(to: prompt) let caption = response.content -
9:55 - Create an image-based tool
// Create an image-based tool struct PlantIdentifierTool: Tool { @SessionProperty(\.history) var history @Generable struct Arguments { var image: ImageReference } func call(arguments: Arguments) async throws -> String { let imageReference = arguments.image let transcript = Transcript(history) guard let imageAttachment = imageReference.resolve(in: transcript) else { throw AppError.imageNotFound } let image = try imageAttachment.pixelBuffer() return classifyPlant(image) } } -
12:09 - Use Vision tools
// Use Vision tools import FoundationModels import Vision let session = LanguageModelSession(model: model, tools: [BarcodeReaderTool()]) let response = try await session.respond(generating: EventInfo.self) { "Get the date, location, and website from this flyer" Attachment(image) .label("flyer") } -
13:54 - Create a crop that highlights a prominent subject (watchOS / saliency)
// Create a crop that highlights a prominent subject func generateImageCrop(in image: CGImage) async throws -> NormalizedRect? { let request = GenerateObjectnessBasedSaliencyImageRequest() let observation = try await request.perform(on: image) let prominentObjects = observation.salientObjects return prominentObjects.first }
-
-
- 0:00 - Introduction
An overview of the new image understanding capabilities in Vision and Foundation Models this year: the tap-to-segment API, image inputs for large language models, image-based tool calling, and Vision on watchOS.
- 1:36 - Segment images with tap-to-segment
How to use Vision's new tap-to-segment API to interactively isolate any object in an image using point taps, lasso strokes, or combinations. Covers the ImageRequestHandler setup, normalized coordinate system, lasso stroke width best practices, and the on-device model download requirement.
- 5:50 - Image inputs for Foundation Models
How to pass images directly to large language models using the Foundation Models framework for tasks like caption generation, scene understanding, recipe creation, and interior design suggestions. Includes a comparison of when to use Vision versus Foundation Models for image analysis.
- 7:57 - Image-based tool calling
How to extend LLM capabilities with tool calling that accepts image arguments. Covers defining tools conforming to the Tool protocol with image parameters, accessing image references via session history transcripts, and using built-in Vision tools — including the barcode reader and saliency tool — to give models capabilities they cannot perform on their own.
- 13:09 - Vision on watchOS
How to use Vision on watchOS to enhance watch apps. Demonstrates using saliency analysis to automatically identify and crop the subject of interest from wildlife photos, so the most relevant part of an image is always displayed in the compact watch UI.
- 14:39 - Next steps
A recap of all four new image understanding capabilities and links to downloadable sample apps for tap-to-segment and watchOS Vision from the Apple Developer website.