画像理解の新機能

最新のVisionフレームワークとFoundation Modelフレームワークのアップデートを活用し、高品質な画像理解を実現しましょう。新しいTap to Segmentリクエストにより画像を新しい方法でセグメント化できるようになったほか、VisionはwatchOSにも対応するようになりしました。AppleのFoundation Modelによる画像の新たなサポートをOCR、バーコードスキャン、デベロッパ独自のツールと組み合わせることで、LLMを活用した高度なビジュアル理解をアプリ上で提供できます。

リソース

関連ビデオ

// Generate a segmentation mask of an object with a seed point
let handler = ImageRequestHandler(image)
let request = GenerateIterativeSegmentationRequest(seed: point)
let observation = try await handler.perform(request)
let mask = observation?.pixelBuffer

// Refine the mask with a new point
request.addIncludedPoint(newPoint)
let refinedObservation = try await handler.perform(request)

6:41 - Generate an image caption with Foundation Models

// Generate an image caption with Foundation Models
import FoundationModels

let prompt = Prompt {
    "Generate a caption for this image"
    Attachment(image)
}
let response = try await session.respond(to: prompt)
let caption = response.content

9:55 - Create an image-based tool

// Create an image-based tool
struct PlantIdentifierTool: Tool {
    @SessionProperty(\.history) var history

    @Generable
    struct Arguments {
        var image: ImageReference
    }

    func call(arguments: Arguments) async throws -> String {
        let imageReference = arguments.image
        let transcript = Transcript(history)
        guard let imageAttachment = imageReference.resolve(in: transcript) else {
            throw AppError.imageNotFound
        }
        let image = try imageAttachment.pixelBuffer()
        return classifyPlant(image)
    }
}

12:09 - Use Vision tools

// Use Vision tools
import FoundationModels
import Vision

let session = LanguageModelSession(model: model, tools: [BarcodeReaderTool()])
let response = try await session.respond(generating: EventInfo.self) {
    "Get the date, location, and website from this flyer"
    Attachment(image)
        .label("flyer")
}

13:54 - Create a crop that highlights a prominent subject (watchOS / saliency)

// Create a crop that highlights a prominent subject
func generateImageCrop(in image: CGImage) async throws -> NormalizedRect? {
    let request = GenerateObjectnessBasedSaliencyImageRequest()
    let observation = try await request.perform(on: image)
    let prominentObjects = observation.salientObjects
    return prominentObjects.first
}

0:00 - Introduction
An overview of the new image understanding capabilities in Vision and Foundation Models this year: the tap-to-segment API, image inputs for large language models, image-based tool calling, and Vision on watchOS.
1:36 - Segment images with tap-to-segment
How to use Vision's new tap-to-segment API to interactively isolate any object in an image using point taps, lasso strokes, or combinations. Covers the ImageRequestHandler setup, normalized coordinate system, lasso stroke width best practices, and the on-device model download requirement.
5:50 - Image inputs for Foundation Models
How to pass images directly to large language models using the Foundation Models framework for tasks like caption generation, scene understanding, recipe creation, and interior design suggestions. Includes a comparison of when to use Vision versus Foundation Models for image analysis.
7:57 - Image-based tool calling
How to extend LLM capabilities with tool calling that accepts image arguments. Covers defining tools conforming to the Tool protocol with image parameters, accessing image references via session history transcripts, and using built-in Vision tools — including the barcode reader and saliency tool — to give models capabilities they cannot perform on their own.
13:09 - Vision on watchOS
How to use Vision on watchOS to enhance watch apps. Demonstrates using saliency analysis to automatically identify and crop the subject of interest from wildlife photos, so the most relevant part of an image is always displayed in the compact watch UI.
14:39 - Next steps
A recap of all four new image understanding capabilities and links to downloadable sample apps for tap-to-segment and watchOS Vision from the Apple Developer website.

「今すぐ始める」を詳しく見る

最新情報

プラットフォームを詳しく見る

特集

テクノロジーを詳しく見る

特集

コミュニティを詳しく見る

特集

ドキュメントを詳しく見る

リリースノート

ダウンロードを詳しく見る

特集

サポートを詳しく見る

特集

クイックリンク

画像理解の新機能

関連する章

リソース

関連ビデオ

WWDC26

WWDC25

WWDC24