View in English

  • Apple Developer
    • Get Started

    Explore Get Started

    • Overview
    • Learn
    • Apple Developer Program

    Stay Updated

    • Latest News
    • Hello Developer
    • Platforms

    Explore Platforms

    • Apple Platforms
    • iOS
    • iPadOS
    • macOS
    • tvOS
    • visionOS
    • watchOS
    • App Store

    Featured

    • Design
    • Distribution
    • Games
    • Accessories
    • Web
    • Home
    • CarPlay
    • Technologies

    Explore Technologies

    • Overview
    • Xcode
    • Swift
    • SwiftUI

    Featured

    • Accessibility
    • App Intents
    • Apple Intelligence
    • Games
    • Machine Learning & AI
    • Security
    • Xcode Cloud
    • Community

    Explore Community

    • Overview
    • Meet with Apple events
    • Community-driven events
    • Developer Forums
    • Open Source

    Featured

    • WWDC
    • Swift Student Challenge
    • Developer Stories
    • App Store Awards
    • Apple Design Awards
    • Apple Developer Centers
    • Documentation

    Explore Documentation

    • Documentation Library
    • Technology Overviews
    • Sample Code
    • Human Interface Guidelines
    • Videos

    Release Notes

    • Featured Updates
    • iOS
    • iPadOS
    • macOS
    • watchOS
    • visionOS
    • tvOS
    • Xcode
    • Downloads

    Explore Downloads

    • All Downloads
    • Operating Systems
    • Applications
    • Design Resources

    Featured

    • Xcode
    • TestFlight
    • Fonts
    • SF Symbols
    • Icon Composer
    • Support

    Explore Support

    • Overview
    • Help Guides
    • Developer Forums
    • Feedback Assistant
    • Contact Us

    Featured

    • Account Help
    • App Review Guidelines
    • App Store Connect Help
    • Upcoming Requirements
    • Agreements and Guidelines
    • System Status
  • Quick Links

    • Events
    • News
    • Forums
    • Sample Code
    • Videos
 

Vidéos

Ouvrir le menu Fermer le menu
  • Collections
  • Toutes les vidéos
  • À propos

Plus de vidéos

  • À propos
  • Code
  • Découvrez le framework Evaluations

    Apprenez à évaluer des expériences fondées sur des modèles à l'aide du framework Evaluations. Dans un monde probabiliste, les tests unitaires ne suffisent pas. Découvrez comment définir des métriques, évaluer automatiquement les résultats et agréger des statistiques pour garantir que vos fonctionnalités d'IA fonctionnent de manière fiable sur les plateformes Apple.

    Chapitres

    • 0:00 - Introduction
    • 3:10 - App de démonstration Book Tracker : une évaluation manuelle
    • 4:31 - Créer votre première évaluation
    • 8:06 - Exécuter l’évaluation et analyser le rapport
    • 10:57 - Créer des jeux de données robustes
    • 14:20 - Affiner les métriques et les évaluateurs
    • 15:41 - Développement piloté par l’évaluation et hill-climbing
    • 16:12 - Juges de modèles : métriques qualitatives
    • 18:42 - Créer un juge de modèle
    • 21:19 - Affiner avec des dimensions de score
    • 23:45 - Analyser les résultats par dimension
    • 24:20 - Bonnes pratiques
    • 25:38 - Étapes suivantes

    Ressources

    • Book Tracker: Using Evaluations to evaluate an intelligent feature
    • Designing datasets to test your feature
    • Designing effective evaluations
    • Evaluating language model responses
      • Vidéo HD
      • Vidéo SD
  • Rechercher dans cette vidéo…
    • 4:54 - Define an Evaluation

      // Evaluations
        import Evaluations
      
        struct BookTaggingEvaluation: Evaluation {
        
        }
    • 8:02 - Run with Swift Testing and an optimization target

      // Optimization Target
        @Test("Book Tag Evaluations", .evaluates(evaluation, info: evaluationInfo))
        func evaluateBookTagging() async throws {
            let result = EvaluationContext.current.result
        
            let rangeMetric = BookTagEvaluationTests.evaluation.tagCount
            #expect(result.aggregateValue(.mean(of: rangeMetric)) >= 0.8)
        }
    • 10:09 - Constrain output with a Generable @Guide

      // BookTags.swift
        @Generable
        struct BookTags: Codable {
            @Guide(description: "Descriptive tags capturing themes, genres, moods, and topics from the summary", .count(3...8))
            var tags: [String]
        } snippet.
    • 11:15 - Define the dataset with ModelSample

      // BookTaggingEvaluation
        var dataset = ArrayLoader(samples: [
            ModelSample(prompt: "okay I am OBSESSED and I need everyone to read this RIGHT NOW...",
                        expected: BookTags(tags: ["classic", "romance", "wit", "regency"])),
      
            ModelSample(prompt: "Read this in one sitting between midnight and 4am and I cannot...",
                        expected: BookTags(tags: ["classic", "gothic", "horror", "vampire", "suspense"])),
        ])
        
        // Or load your whole library:
        var dataset = ArrayLoader(samples:
            Book.sampleBooks.map { book in
                ModelSample(prompt: book.review, expected: BookTags(tags: book.tags))
            }
        )
    • 12:53 - Synthesize more samples with a SampleGenerator

      // Synthesizing more inputs
        let samples: [ModelSample<String>] = [
            ModelSample(prompt: "The largest planet in our solar system...", expected: "Jupiter."),
            ModelSample(prompt: "The capital of Thailand...", expected: "Bangkok."),
            ModelSample(prompt: "Swift is...", expected: "a powerful programming language."),
            ModelSample(prompt: "All those moments will be lost in time...", expected: "Like tears in rain.")
        ]
        
        for try await sample in samples.makeSamples(
            """
            Generate diverse sentence completions about the listed topics:
              - The Solar System
              - World Capitals 
            """,
            targetCount: 1000) {
                samples.append(sample)
        }
    • 14:02 - More evaluators: word count and genre

      let wordCount = Metric("WordCount")
      
        Evaluator { _, subject in
            for tag in subject.value.tags {
                if tag.contains(" ") {
                    return wordCount.failing(rationale: "Tag \(tag) contains multiple words")
                }
            }
            return wordCount.passing()
        }
      
        let hasGenreTag = Metric("HasGenreTag")
        
        Evaluator { _, subject in
            let tags = subject.value.tags.map { $0.lowercased() }
            let knownGenres = await BookTaggingService.knownGenres
            for tag in tags {
                if knownGenres.contains(tag) {
                    return hasGenreTag.passing(rationale: "Matched \(tag)")
                }
            }
            return hasGenreTag.failing() 
        }
    • 14:03 - Define a Metric and Evaluator

      let tagCount = Metric("TagCount")
      
        var evaluators: Evaluators {
      
            // Tag count is within the required 3–8 range
            Evaluator { _, subject in 
                let count = subject.value.tags.count
                if (count >= 3 && count <= 8) {
                    return tagCount.passing(rationale: "\(count) tags")
                } 
                return tagCount.failing(rationale: "Got \(count) tags, expected 3–8")
            }
        }
    • 14:27 - Aggregate metrics across samples

      let tagCount = Metric("TagCount")
        let tagTotal = Metric("TagTotal")
        
        func aggregateMetrics(using aggregator: inout MetricsAggregator) {
            aggregator.computeMean(of: tagCount)
            aggregator.group("Distribution of Tag Totals") { aggregator in
                aggregator.computeStandardDeviation(of: tagTotal)
                aggregator.computeMean(of: tagTotal)
                aggregator.computeVariance(of: tagTotal)
            }
        }
    • 15:33 - Iterate the feature's instructions (hill-climbing)

      // BookTaggingService.swift
        let instructions = Instructions {
            """
            You are a librarian and literary analyst. Given a reader's
            freeform summary of a book they read — describing their
            thoughts, feelings, and what stood out — generate a set of
            descriptive tags reflected in the summary.
      
            Rules:
             - Return between 3 and 8 tags.
             - Tags should be lowercase, concise (single word or hyphenated), and descriptive.
             - Tags should include the book's genre, chosen from the included list of known genres.
        
            Known Genres:
             - \(Self.knownGenres.joined(separator: ", "))
            """
        }
    • 18:53 - Build a model judge

      ModelJudgeEvaluator(
            "TagQuality",
            scale: .numeric([
                4: "Tags are relevant and helpful for browsing",
                3: "Mostly relevant, one tag too vague or generic",
                2: "Several tags are wrong or generic",
                1: "Unhelpful or irrelevant"
            ]),   
            judge: PrivateCloudComputeLanguageModel()
        )
    • 22:17 - Split into score dimensions

      // BookTaggingEvaluation.swift
        ScoreDimension(
            "Relevance",
            description: """
                Whether each tag describes a quality, theme, or tone
                of the book itself rather than incidental details or
                the reader's personal reactions.
                """,
            scale: .numeric([
                4: "Every tag describes the book itself",
                3: "Most tags describe the book",
                2: "Some tags describe personal reactions",
                1: "Tags don't meaningfully describe the book"
            ])    
        )
        // Define `usefulness` the same way as a second ScoreDimension.
    • 22:32 - Add dimensions to the judge

      // BookTaggingEvaluation.swift
        var evaluators: Evaluators {
      
            Evaluator {  }  
      
            Evaluator {  }
      
            Evaluator {  }
        
            ModelJudgeEvaluator(
                judge: PrivateCloudComputeLanguageModel(),
                dimensions: [relevance, usefulness]
            )
        }
    • 23:17 - Add app context with a ModelJudgePrompt

      // BookTaggingEvaluation.swift
        ModelJudgeEvaluator(
            judge: PrivateCloudComputeLanguageModel(),
            dimensions: [relevance, usefulness],
            prompt: ModelJudgePrompt( 
                instructions: """
                    You are evaluating tags generated for a personal book-tracking app where users
                    organize their library by browsing and filtering tags.
                    """,
                evaluationTarget: { value in
                    "\(value.tags.count) Generated tags: " + value.tags.joined(separator: ", ")
                },
                reference: { input, _ in 
                    let expectedTags = input.expected?.tags.joined(separator: ", ")
                    return ["Expected Tags": expectedTags ?? "No expected tags defined"]
                }
            )
        )

Developer Footer

  • Vidéos
  • WWDC26
  • Découvrez le framework Evaluations
  • Open Menu Close Menu
    • iOS
    • iPadOS
    • macOS
    • tvOS
    • visionOS
    • watchOS
    • App Store
    Open Menu Close Menu
    • Swift
    • SwiftUI
    • Swift Playground
    • TestFlight
    • Xcode
    • Xcode Cloud
    • Icon Composer
    • SF Symbols
    Open Menu Close Menu
    • Accessibility
    • Accessories
    • Apple Intelligence
    • Audio & Video
    • Augmented Reality
    • Business
    • Design
    • Distribution
    • Education
    • Games
    • Health & Fitness
    • In-App Purchase
    • Localization
    • Maps & Location
    • Machine Learning & AI
    • Security
    • Safari & Web
    Open Menu Close Menu
    • Documentation
    • Downloads
    • Sample Code
    • Videos
    Open Menu Close Menu
    • Help Guides & Articles
    • Contact Us
    • Forums
    • Feedback & Bug Reporting
    • System Status
    Open Menu Close Menu
    • Apple Developer
    • App Store Connect
    • Certificates, IDs, & Profiles
    • Feedback Assistant
    Open Menu Close Menu
    • Apple Developer Program
    • Apple Developer Enterprise Program
    • App Store Small Business Program
    • MFi Program
    • Mini Apps Partner Program
    • News Partner Program
    • Video Partner Program
    • Security Bounty Program
    • Security Research Device Program
    Open Menu Close Menu
    • Meet with Apple
    • Apple Developer Centers
    • App Store Awards
    • Apple Design Awards
    • Apple Developer Academies
    • WWDC
    Read the latest news.
    Get the Apple Developer app.
    Copyright © 2026 Apple Inc. All rights reserved.
    Terms of Use Privacy Policy Agreements and Guidelines