Novedades en la comprensión de imágenes

Aprovecha las potentes capacidades de comprensión de imágenes con las últimas actualizaciones del framework Vision y del framework Foundation Models. La nueva solicitud de segmentación mediante toque te permite segmentar imágenes de nuevas maneras, y Vision ahora es compatible con watchOS. Combina la nueva compatibilidad con imágenes de Apple Foundation Model con OCR, el escaneo de códigos de barras y tus propias herramientas para ofrecer una comprensión visual basada en LLM en tu app.

Capítulos

0:00 - Introducción
1:36 - Segmenta imágenes con la segmentación mediante toque
5:50 - Entradas de imagen para Foundation Models
7:57 - Llamada de herramientas basada en imágenes
13:09 - Vision en watchOS
14:39 - Próximos pasos

Recursos

Videos relacionados

// Generate a segmentation mask of an object with a seed point
let handler = ImageRequestHandler(image)
let request = GenerateIterativeSegmentationRequest(seed: point)
let observation = try await handler.perform(request)
let mask = observation?.pixelBuffer

// Refine the mask with a new point
request.addIncludedPoint(newPoint)
let refinedObservation = try await handler.perform(request)

6:41 - Generate an image caption with Foundation Models

// Generate an image caption with Foundation Models
import FoundationModels

let prompt = Prompt {
    "Generate a caption for this image"
    Attachment(image)
}
let response = try await session.respond(to: prompt)
let caption = response.content

9:55 - Create an image-based tool

// Create an image-based tool
struct PlantIdentifierTool: Tool {
    @SessionProperty(\.history) var history

    @Generable
    struct Arguments {
        var image: ImageReference
    }

    func call(arguments: Arguments) async throws -> String {
        let imageReference = arguments.image
        let transcript = Transcript(history)
        guard let imageAttachment = imageReference.resolve(in: transcript) else {
            throw AppError.imageNotFound
        }
        let image = try imageAttachment.pixelBuffer()
        return classifyPlant(image)
    }
}

12:09 - Use Vision tools

// Use Vision tools
import FoundationModels
import Vision

let session = LanguageModelSession(model: model, tools: [BarcodeReaderTool()])
let response = try await session.respond(generating: EventInfo.self) {
    "Get the date, location, and website from this flyer"
    Attachment(image)
        .label("flyer")
}

13:54 - Create a crop that highlights a prominent subject (watchOS / saliency)

// Create a crop that highlights a prominent subject
func generateImageCrop(in image: CGImage) async throws -> NormalizedRect? {
    let request = GenerateObjectnessBasedSaliencyImageRequest()
    let observation = try await request.perform(on: image)
    let prominentObjects = observation.salientObjects
    return prominentObjects.first
}

Explore Get Started

Stay Updated

Explore Platforms

Featured

Explore Technologies

Featured

Explore Community

Featured

Explore Documentation

Release Notes

Explore Downloads

Featured

Explore Support

Featured

Quick Links

Novedades en la comprensión de imágenes

Capítulos

Recursos

Videos relacionados

WWDC26

WWDC25

WWDC24