View in English

  • Apple Developer
    • Get Started

    Explore Get Started

    • Overview
    • Learn
    • Apple Developer Program

    Stay Updated

    • Latest News
    • Hello Developer
    • Platforms

    Explore Platforms

    • Apple Platforms
    • iOS
    • iPadOS
    • macOS
    • tvOS
    • visionOS
    • watchOS
    • App Store

    Featured

    • Design
    • Distribution
    • Games
    • Accessories
    • Web
    • Home
    • CarPlay
    • Technologies

    Explore Technologies

    • Overview
    • Xcode
    • Swift
    • SwiftUI

    Featured

    • Accessibility
    • App Intents
    • Apple Intelligence
    • Games
    • Machine Learning & AI
    • Security
    • Xcode Cloud
    • Community

    Explore Community

    • Overview
    • Meet with Apple events
    • Community-driven events
    • Developer Forums
    • Open Source

    Featured

    • WWDC
    • Swift Student Challenge
    • Developer Stories
    • App Store Awards
    • Apple Design Awards
    • Apple Developer Centers
    • Documentation

    Explore Documentation

    • Documentation Library
    • Technology Overviews
    • Sample Code
    • Human Interface Guidelines
    • Videos

    Release Notes

    • Featured Updates
    • iOS
    • iPadOS
    • macOS
    • watchOS
    • visionOS
    • tvOS
    • Xcode
    • Downloads

    Explore Downloads

    • All Downloads
    • Operating Systems
    • Applications
    • Design Resources

    Featured

    • Xcode
    • TestFlight
    • Fonts
    • SF Symbols
    • Icon Composer
    • Support

    Explore Support

    • Overview
    • Help Guides
    • Developer Forums
    • Feedback Assistant
    • Contact Us

    Featured

    • Account Help
    • App Review Guidelines
    • App Store Connect Help
    • Upcoming Requirements
    • Agreements and Guidelines
    • System Status
  • Quick Links

    • Events
    • News
    • Forums
    • Sample Code
    • Videos
 

Videos

Abrir menú Cerrar menú
  • Colecciones
  • Todos los videos
  • Información

Más videos

  • Información
  • Código
  • Crear evaluaciones sólidas para apps agénticas

    Obtén información sobre cómo aprovechar las funcionalidades avanzadas del framework Evaluations para crear evaluaciones sólidas para tu app. Explora la evaluación de flujos mediante llamadas de herramientas y condiciones dinámicas, y descubre cómo definir qué se entiende por “comportamiento correcto” en tu caso de uso. Descubre cómo generar datos sintéticos, utilizar los jueces de forma eficaz y validar tus conjuntos de datos para obtener resultados fiables.

    Capítulos

    • 0:00 - Introducción
    • 2:21 - El problema del conjunto de datos en BookTracker
    • 3:46 - Generación de datos sintéticos con makeSamples
    • 6:27 - Personalización de la generación con SampleGenerator
    • 8:38 - Estrategias de muestreo
    • 10:11 - Validación de muestras sintéticas
    • 13:04 - Comparación de los resultados de la evaluación
    • 15:09 - Llamadas a herramientas y evaluaciones de herramientas
    • 18:54 - Expectativas de trayectoria
    • 21:26 - Creación de una evaluación de llamadas a herramientas
    • 22:02 - Datos sintéticos para evaluaciones de herramientas
    • 23:49 - Próximos pasos

    Recursos

    • Book Tracker: Using Evaluations to evaluate an intelligent feature
    • Generating synthetic datasets
    • Evaluating tool-calling behavior
    • Scoring with model-as-judge evaluators
      • Video HD
      • Video SD
  • Buscar este video…
    • 5:16 - Generate synthetic data with makeSamples

      // Synthetic data
        let prompt = Prompt("""
            Generate diverse range of book reviews and corresponding tags.
            Cover a wide range of genres, time periods, cultures, and
            reader personas. Do not repeat books already in the dataset.
            """)
        
        let dataset = Book.sampleBooks.map { book in
            ModelSample(prompt: book.review, expected: BookTags(tags: book.tags))
        }
        
        let targetCount = 100
        var expandedDataset = dataset
      
        for try await sample in dataset.makeSamples(prompt, targetCount: targetCount) {
            expandedDataset.append(sample)
            print("Generated \(expandedDataset.count) samples so far.")
        }
      
        2. Configure a custom SampleGenerator — slides 30–43
        
        // Define your own configuration
        let generator = SampleGenerator<ModelSample<BookTags>>(
            prompt,
            samples: dataset,
            targetCount: targetCount,
            sessionProvider: {
                LanguageModelSession( 
                    model: PrivateCloudComputeLanguageModel(),
                    instructions: """
                        You are a synthetic data generator for a book-tracking app's evaluation suite.
                        Your job is to produce realistic, diverse book entries that will stress-test
                        a tagging system.
      
                        Rules:
                        - Review must be at least 100 characters long.
                        - Review should cover a mix of genre, mood/tone, and themes.
                        - Reviews should vary in length.
                        - Create between 3 and 8 tags.
                        - Tags must be lowercase.
                        """ 
                )
            }
        )
    • 5:53 - Configure a custom SampleGenerator

      // Define your own configuration
        let generator = SampleGenerator<ModelSample<BookTags>>(
            prompt,
            samples: dataset,
            targetCount: targetCount,
            sessionProvider: {
                LanguageModelSession( 
                    model: PrivateCloudComputeLanguageModel(),
                    instructions: """
                        You are a synthetic data generator for a book-tracking app's evaluation suite.
                        Your job is to produce realistic, diverse book entries that will stress-test
                        a tagging system.
      
                        Rules:
                        - Review must be at least 100 characters long.
                        - Review should cover a mix of genre, mood/tone, and themes.
                        - Reviews should vary in length.
                        - Create between 3 and 8 tags.
                        - Tags must be lowercase.
                        """ 
                )
            }
        )
    • 10:37 - Validate generated samples

      // Define validation metrics
        validator: { sample in
            guard let book = sample.expected else { return false }
      
            // Review must be at least 100 characters
            guard sample.promptDescription.count >= 100 else { return false }
      
            // Must have between 3 and 8 tags
            guard (3...8).contains(book.tags.count) else { return false }
      
            // All tags must be lowercase
            guard book.tags.allSatisfy({ $0 == $0.lowercased() }) else { return false }
      
            return true
        }
    • 10:58 - Access valid and invalid results

      // Accessing results
        for try await sample in generator.run() {
            // During iteration
            expandedDataset.append(sample)
        }
      
        // After iteration
        let allSamples = await generator.samples
        let invalidSamples = await generator.invalidSamples
        
        print("Generated \(allSamples.count) new samples. Total: \(expandedDataset.count)")
    • 15:30 - Define a tool's Generable argument

      @Generable
        struct SearchBooksArguments {
            @Guide(description: "A freeform search term to match against titles, reviews, or tags")
            var query: String?
        
            @Guide(description: "Filter results to books with this specific tag")
            var tag: String?
      
            @Guide(description: "Filter results by mood")
            var mood: String?
      
            @Guide(description: "Filter results by genre")
            var genre: String?
      
            @Guide(description: "Maximum number of results to return. Defaults to 5.")
            var limit: Int? 
        }
    • 16:37 - A basic trajectory expectation

      // "Find books tagged gothic"
        TrajectoryExpectation(
            unordered: [
                ToolExpectation(
                    "searchBooks",
                    arguments: [
                        .exact(argumentName: "tag", value: .string("gothic"))
                    ]
                )
            ]
        )
    • 17:07 - Match arguments by intent (naturalLanguage)

      // "Find something cheerful"
        TrajectoryExpectation(
            "searchBooks",
            arguments: [
                .naturalLanguage(
                    argumentName: "mood",
                    criteria: "Should relate to uplifting, hopeful, or positive feelings"
                )
            ]
        )
        Other matchers available: .contains, .oneOf, .pattern, .range, and more.
    • 17:34 - Expect tool calls in order

      // "Find gothic books and show details on the first"
        TrajectoryExpectation(
            ordered: [
                ToolExpectation(
                    "searchBooks",
                    arguments: [
                        .exact(argumentName: "tag", value: .string("gothic"))
                    ]
                ),
                ToolExpectation(
                    "getBookDetails",
                    arguments: [
                        .keyOnly(argumentName: "bookId")
                    ]
                )
            ]
        )
    • 17:55 - Disallow specific tool calls

      // "Show only sci-fi books. Don't look for similar ones."
        TrajectoryExpectation(
            unordered: [
                ToolExpectation(
                    "searchBooks",
                    arguments: [
                        .naturalLanguage(
                            argumentName: "genre",
                            criteria: "Should refer to science fiction")
                    ]
                )
            ],
            disallowed: [
                ToolExpectation("findSimilarBooks")
            ]
        )
    • 18:14 - Build a tool call evaluation

      // Tool call evaluations
        let samples = SampleArrayLoader(samples: [
            ModelSample(
                prompt: "Find all the books tagged with 'gothic'.",
                instructions: "Help the user explore their book collection.",
                expectations: TrajectoryExpectation(  )
            )
        ])
      
        struct BookLibraryToolCallEval: Evaluation {
            var dataset = samples
      
            let pass = Metric("All Passed")
            let percent = Metric("Percentage Passed")
      
            var evaluators: Evaluators { 
                ToolCallEvaluator(allPass: pass, percentagePass: percent)
            }
        }
    • 19:20 - Synthesize tool-evaluation samples

      // Tool call evaluations
        let prompt = Prompt("""
            Generate diverse user queries for a personal book library assistant.
            Each sample needs a prompt (what the user says), and a trajectory
            expectation describing which tools should be called and in what order.
            """)
      
        let instructions = """
            AVAILABLE TOOLS:
            - searchBooks(query?, tag?, mood?, genre?, limit?): search the library
            - getBookDetails(bookId): full details for one book
            - findSimilarBooks(bookId, maxResults?): find books sharing tags
            ORDER REQUIREMENTS:
            - searchBooks must comes before getBookDetails or findSimilarBooks
            - Use TrajectoryExpectation(ordered:) when sequence matters, else (unordered:)
            USE THESE ARGUMENT MATCHERS:
            - .exact for precise values, .naturalLanguage for fuzzy matching
            - .keyOnly when any value is acceptable, .range for numeric constraints
            - .contains/.hasPrefix/.hasSuffix for partial string matching
            """
    • 19:51 - Validate tool-evaluation samples

      // Tool call evaluations
        validator: { sample in
            // Must have expectations defined
            guard sample.output.expectations != nil else { return false }
      
            let expectations = sample.output.expectations!
      
            // Must reference at least one tool
            let totalExpectations = expectations.ordered.count + expectations.unordered.count
            guard totalExpectations > 0 else { return false }
      
            // All tool names must be from the valid set
            let validTools: Set<String> = ["searchBooks", "getBookDetails", "findSimilarBooks"]
            let allExpectations = expectations.ordered + expectations.unordered + expectations.disallowed
            for expectation in allExpectations {
                guard validTools.contains(expectation.name) else { return false }
            }
        
            return true
        }
      
        ---

Developer Footer

  • Videos
  • WWDC26
  • Crear evaluaciones sólidas para apps agénticas
  • Open Menu Close Menu
    • iOS
    • iPadOS
    • macOS
    • tvOS
    • visionOS
    • watchOS
    • App Store
    Open Menu Close Menu
    • Swift
    • SwiftUI
    • Swift Playground
    • TestFlight
    • Xcode
    • Xcode Cloud
    • Icon Composer
    • SF Symbols
    Open Menu Close Menu
    • Accessibility
    • Accessories
    • Apple Intelligence
    • Audio & Video
    • Augmented Reality
    • Business
    • Design
    • Distribution
    • Education
    • Games
    • Health & Fitness
    • In-App Purchase
    • Localization
    • Maps & Location
    • Machine Learning & AI
    • Security
    • Safari & Web
    Open Menu Close Menu
    • Documentation
    • Downloads
    • Sample Code
    • Videos
    Open Menu Close Menu
    • Help Guides & Articles
    • Contact Us
    • Forums
    • Feedback & Bug Reporting
    • System Status
    Open Menu Close Menu
    • Apple Developer
    • App Store Connect
    • Certificates, IDs, & Profiles
    • Feedback Assistant
    Open Menu Close Menu
    • Apple Developer Program
    • Apple Developer Enterprise Program
    • App Store Small Business Program
    • MFi Program
    • Mini Apps Partner Program
    • News Partner Program
    • Video Partner Program
    • Security Bounty Program
    • Security Research Device Program
    Open Menu Close Menu
    • Meet with Apple
    • Apple Developer Centers
    • App Store Awards
    • Apple Design Awards
    • Apple Developer Academies
    • WWDC
    Read the latest news.
    Get the Apple Developer app.
    Copyright © 2026 Apple Inc. All rights reserved.
    Terms of Use Privacy Policy Agreements and Guidelines