View in English

  • Apple 开发者
    • 入门汇总

    探索“入门汇总”

    • 概览
    • 学习
    • Apple Developer Program

    及时了解最新动态

    • 最新动态
    • 开发者你好
    • 平台

    探索“平台”

    • Apple 平台
    • iOS
    • iPadOS
    • macOS
    • Apple tvOS
    • visionOS
    • watchOS
    • App Store

    精选

    • 设计
    • 分发
    • 游戏
    • 配件
    • 网页
    • Home
    • CarPlay 车载
    • 技术

    探索“技术”

    • 概览
    • Xcode
    • Swift
    • SwiftUI

    精选

    • 辅助功能
    • App Intents
    • Apple 智能
    • 游戏
    • 机器学习与 AI
    • 安全性
    • Xcode Cloud
    • 社区

    探索“社区”

    • 概览
    • “与 Apple 会面交流”活动
    • 社区主导的活动
    • 开发者论坛
    • 开源

    精选

    • WWDC
    • Swift Student Challenge
    • 开发者故事
    • App Store 大奖
    • Apple 设计大奖
    • Apple Developer Centers
    • 文档

    探索“文档”

    • 文档库
    • 技术概述
    • 示例代码
    • 《人机界面指南》
    • 视频

    发布说明

    • 精选更新
    • iOS
    • iPadOS
    • macOS
    • watchOS
    • visionOS
    • Apple tvOS
    • Xcode
    • 下载

    探索“下载”

    • 所有下载
    • 操作系统
    • 应用程序
    • 设计资源

    精选

    • Xcode
    • TestFlight
    • 字体
    • SF Symbols
    • Icon Composer
    • 支持

    探索“支持”

    • 概览
    • 帮助指南
    • 开发者论坛
    • “反馈助理”
    • 联系我们

    精选

    • 《开发者账户帮助》
    • 《App 审核指南》
    • 《App Store Connect 帮助》
    • 即将实行的要求
    • 协议和准则
    • 系统状态
  • 快速链接

    • 活动
    • 新闻
    • 论坛
    • 示例代码
    • 视频
 

视频

打开菜单 关闭菜单
  • 专题
  • 所有视频
  • 关于

更多视频

  • 简介
  • 概要
  • 代码
  • 图像理解方面的新动向

    借助 Vision 框架和 Foundation Models 框架的最新更新,解锁强大的图像理解能力。新的轻点分割请求为你带来图像分割的新方式,而且 Vision 框架现在还支持 watchOS。将 Apple Foundation Models 中全新的图像支持与 OCR、条形码扫描以及你自己的工具相结合,为你的 App 构建由 LLM 支持的视觉理解能力。

    章节

    • 0:00 - Introduction
    • 1:36 - Segment images with tap-to-segment
    • 5:50 - Image inputs for Foundation Models
    • 7:57 - Image-based tool calling
    • 13:09 - Vision on watchOS
    • 14:39 - Next steps

    资源

    • Segmenting objects using taps, scribbles or rectangles
    • Implementing saliency-based image cropping in iOS and watchOS
      • 高清视频
      • 标清视频

    相关视频

    WWDC26

    • Foundation Models 框架的新功能

    WWDC25

    • 深入了解 Foundation Models 框架

    WWDC24

    • 探索 Vision 框架中的 Swift 增强功能
  • 搜索此视频…
    • 4:15 - Segment images (tap-to-segment)

      // Generate a segmentation mask of an object with a seed point
      let handler = ImageRequestHandler(image)
      let request = GenerateIterativeSegmentationRequest(seed: point)
      let observation = try await handler.perform(request)
      let mask = observation?.pixelBuffer
      
      // Refine the mask with a new point
      request.addIncludedPoint(newPoint)
      let refinedObservation = try await handler.perform(request)
    • 6:41 - Generate an image caption with Foundation Models

      // Generate an image caption with Foundation Models
      import FoundationModels
      
      let prompt = Prompt {
          "Generate a caption for this image"
          Attachment(image)
      }
      let response = try await session.respond(to: prompt)
      let caption = response.content
    • 9:55 - Create an image-based tool

      // Create an image-based tool
      struct PlantIdentifierTool: Tool {
          @SessionProperty(\.history) var history
      
          @Generable
          struct Arguments {
              var image: ImageReference
          }
      
          func call(arguments: Arguments) async throws -> String {
              let imageReference = arguments.image
              let transcript = Transcript(history)
              guard let imageAttachment = imageReference.resolve(in: transcript) else {
                  throw AppError.imageNotFound
              }
              let image = try imageAttachment.pixelBuffer()
              return classifyPlant(image)
          }
      }
    • 12:09 - Use Vision tools

      // Use Vision tools
      import FoundationModels
      import Vision
      
      let session = LanguageModelSession(model: model, tools: [BarcodeReaderTool()])
      let response = try await session.respond(generating: EventInfo.self) {
          "Get the date, location, and website from this flyer"
          Attachment(image)
              .label("flyer")
      }
    • 13:54 - Create a crop that highlights a prominent subject (watchOS / saliency)

      // Create a crop that highlights a prominent subject
      func generateImageCrop(in image: CGImage) async throws -> NormalizedRect? {
          let request = GenerateObjectnessBasedSaliencyImageRequest()
          let observation = try await request.perform(on: image)
          let prominentObjects = observation.salientObjects
          return prominentObjects.first
      }
    • 0:00 - Introduction
    • An overview of the new image understanding capabilities in Vision and Foundation Models this year: the tap-to-segment API, image inputs for large language models, image-based tool calling, and Vision on watchOS.

    • 1:36 - Segment images with tap-to-segment
    • How to use Vision's new tap-to-segment API to interactively isolate any object in an image using point taps, lasso strokes, or combinations. Covers the ImageRequestHandler setup, normalized coordinate system, lasso stroke width best practices, and the on-device model download requirement.

    • 5:50 - Image inputs for Foundation Models
    • How to pass images directly to large language models using the Foundation Models framework for tasks like caption generation, scene understanding, recipe creation, and interior design suggestions. Includes a comparison of when to use Vision versus Foundation Models for image analysis.

    • 7:57 - Image-based tool calling
    • How to extend LLM capabilities with tool calling that accepts image arguments. Covers defining tools conforming to the Tool protocol with image parameters, accessing image references via session history transcripts, and using built-in Vision tools — including the barcode reader and saliency tool — to give models capabilities they cannot perform on their own.

    • 13:09 - Vision on watchOS
    • How to use Vision on watchOS to enhance watch apps. Demonstrates using saliency analysis to automatically identify and crop the subject of interest from wildlife photos, so the most relevant part of an image is always displayed in the compact watch UI.

    • 14:39 - Next steps
    • A recap of all four new image understanding capabilities and links to downloadable sample apps for tap-to-segment and watchOS Vision from the Apple Developer website.

Developer Footer

  • 视频
  • WWDC26
  • 图像理解方面的新动向
  • 打开菜单 关闭菜单
    • iOS
    • iPadOS
    • macOS
    • Apple tvOS
    • visionOS
    • watchOS
    打开菜单 关闭菜单
    • Swift
    • SwiftUI
    • Swift Playground
    • TestFlight
    • Xcode
    • Xcode Cloud
    • SF Symbols
    打开菜单 关闭菜单
    • 辅助功能
    • 配件
    • Apple 智能
    • App 扩展
    • App Store
    • 音频与视频 (英文)
    • 增强现实
    • 设计
    • 分发
    • 教育
    • 字体 (英文)
    • 游戏
    • 健康与健身
    • App 内购买项目
    • 本地化
    • 地图与位置
    • 机器学习与 AI
    • 开源资源 (英文)
    • 安全性
    • Safari 浏览器与网页 (英文)
    打开菜单 关闭菜单
    • 完整文档 (英文)
    • 部分主题文档 (简体中文)
    • 教程
    • 下载
    • 论坛 (英文)
    • 视频
    打开菜单 关闭菜单
    • 支持文档
    • 联系我们
    • 错误报告
    • 系统状态 (英文)
    打开菜单 关闭菜单
    • Apple 开发者
    • App Store Connect
    • 证书、标识符和描述文件 (英文)
    • 反馈助理
    打开菜单 关闭菜单
    • Apple Developer Program
    • Apple Developer Enterprise Program
    • App Store Small Business Program
    • MFi Program (英文)
    • Mini Apps Partner Program
    • News Partner Program (英文)
    • Video Partner Program (英文)
    • 安全赏金计划 (英文)
    • Security Research Device Program (英文)
    打开菜单 关闭菜单
    • 与 Apple 会面交流
    • Apple Developer Center
    • App Store 大奖 (英文)
    • Apple 设计大奖
    • Apple Developer Academies (英文)
    • WWDC
    阅读最近新闻。
    获取 Apple Developer App。
    版权所有 © 2026 Apple Inc. 保留所有权利。
    使用条款 隐私政策 协议和准则