-
Foundation ModelフレームワークへのLLMプロバイダーの導入
新しいモデル用にLanguageModelExecutorを実装することで、Foundation Modelフレームワークを拡張できます。LanguageModelSessionのトランスクリプトとの接続、セッション状態の効果的な管理、KVキャッシュの利用の最適化を行うための方法を紹介します。カスタムのセグメントタイプをサポートし、生成AIの高度な機能を活用する方法も確認しましょう。
関連する章
- 0:00 - Introduction
- 3:37 - Packaging
- 4:48 - Protocol
- 14:50 - Authentication
- 15:51 - Customization
- 19:47 - Next steps
リソース
関連ビデオ
WWDC26
-
このビデオを検索
-
-
2:00 - Choose a language model
import FoundationModels import MLXFoundationModels // On-device Apple Foundation Model let model = SystemLanguageModel() // Private Cloud Compute model // let model = PrivateCloudComputeLanguageModel() // Custom Core AI model // let model = try await CoreAILanguageModel(resourcesAt: modelURL) // Open-source MLX model from HuggingFace // let model = MLXLanguageModel(modelID: "mlx-community/my-model") let session = LanguageModelSession(model: model) let response = try await session.respond(to: "...") print(response.content) -
3:46 - Configure Package.swift for your model package
// Package.swift let package = Package( name: "MyModel", platforms: [ .macOS(.v27), .iOS(.v27), .visionOS(.v27), .watchOS(.v27) ], products: [ .library(name: "MyModel", targets: ["MyModel"]) ], dependencies: [ .package(url: "...", .upToNextMinor(from: "1.0.0")) ], targets: [ .target(name: "MyModelRuntime"), // public: LanguageModel conformance .target(name: "MyModel", dependencies: ["MyModelRuntime"]), .testTarget(name: "MyModelTests", dependencies: ["MyModel"]) ] ) -
4:56 - LanguageModel and LanguageModelExecutor protocols
// LanguageModel protocol public protocol LanguageModel: Sendable { var capabilities: LanguageModelCapabilities { get } var executorConfiguration: Executor.Configuration { get } } // LanguageModelExecutor protocol public protocol LanguageModelExecutor: Sendable { init(configuration: Configuration) throws func prewarm(model: Model, transcript: Transcript) func respond( to request: LanguageModelExecutorGenerationRequest, model: Model, streamingInto channel: LanguageModelExecutorGenerationChannel ) async throws } -
6:25 - Implement LanguageModel and Executor conformances
// LanguageModel conformance public struct MyLanguageModel: LanguageModel { typealias Executor = MyLanguageModelExecutor public var capabilities: LanguageModelCapabilities { LanguageModelCapabilities(capabilities: [ .toolCalling, .guidedGeneration, .reasoning ]) } public var executorConfiguration: Executor.Configuration { Executor.Configuration(/* ... */) } } // Executor conformance public struct MyLanguageModelExecutor: LanguageModelExecutor { public typealias Model = MyLanguageModel public struct Configuration: Hashable, Sendable { /* ... */ } public init(configuration: Configuration) throws { /* ... */ } public func respond( to request: LanguageModelExecutorGenerationRequest, model: MyLanguageModel, streamingInto channel: LanguageModelExecutorGenerationChannel ) async throws { /* ... */ } } -
7:28 - Manage model resources with prewarm and respond
// One approach to managing resources struct MyLanguageModelExecutor: LanguageModelExecutor { private mutating func loadModelIfNeeded() throws -> LoadedWeights { let weights = try loadedModel ?? loadWeights() loadedModel = weights return weights } func prewarm(transcript: Transcript) { loadedModel = try? loadModelIfNeeded() } func respond( ... ) async throws { let weights = try loadModelIfNeeded() // ...generate with 'weights'... } } -
9:00 - Map Transcript entries to model messages
// Transcript entries let transcript = Transcript(entries: [ .instructions( ... ), // "You are a helpful assistant" .prompt( ... ), // "What's the weather in Pittsburgh?" .toolCalls( ... ), // getWeather(location: "Pittsburgh") .toolOutput( ... ), // 65°F, sunny .response( ... ), // "It's 65°F and sunny in Pittsburgh" .prompt( ... ), // "What's the address of Apple Park?" .response( ... ), // "One Apple Park Way, Cupertino, CA 95014" ]) -
10:42 - Read generation and context options from the request
// Parse generation and context options func respond( to request: LanguageModelExecutorGenerationRequest, model: MyLanguageModel, streamingInto channel: LanguageModelExecutorGenerationChannel ) async throws { let reasoningLevel = request.contextOptions.reasoningLevel let temperature = request.generationOptions.temperature let maxTokens = request.generationOptions.maximumResponseTokens } -
11:47 - Stream tokens and metadata through the channel
// Streaming text tokens func respond( ... ) async throws { // 1. Report metadata await channel.send(.response(action: .updateMetadata([ "modelID": "my-model-2026-06-08", "requestID": request.id.uuidString ]))) // 2. Report prompt token usage before generating await channel.send(.response(action: .updateUsage( input: .init(totalTokenCount: promptTokens, cachedTokenCount: cachedTokens), output: .init(totalTokenCount: 0, reasoningTokenCount: 0) ))) // 3. Stream text deltas as the model generates for try await token in tokens { await channel.send(.response(action: .appendText(token))) } } -
13:33 - Honor the developer's intent or throw
// Honor the developer's intention where possible // The developer set sampling: .greedy, but our service only takes temperature if request.generationOptions.sampling?.kind == .greedy { serviceRequest.temperature = 0 } // Otherwise, throw an error // The token budget is too small to satisfy the schema if let schema = request.schema, let budget = request.generationOptions.maximumResponseTokens, budget < minimumTokens(for: schema) { throw LanguageModelError.unsupportedCapability( .init( capability: .guidedGeneration, debugDescription: "Token budget too small to satisfy this schema." ) ) } -
13:57 - Built-in errors that any model can throw
// Built-in errors that any model can throw public enum LanguageModelError: LocalizedError, CustomDebugStringConvertible { // Transcript grew past the model's context window. Trim entries and retry. case contextSizeExceeded( ) // Too many requests in a short window. Space them out or reduce load. case rateLimited( ) // Model declined to answer. Fall back to a message of your choosing. case refusal( ) // Safety guardrails tripped on the prompt or the response. case guardrailViolation( ) // Model lacks a feature you used, such as guided generation or tools. case unsupportedCapability( ) // Prompt contains content the model can't process (bad files, unknown formats). case unsupportedTranscriptContent( ) // A generation guide (e.g., a regex pattern) isn't supported by this model. case unsupportedGenerationGuide( ) // Prompt asked for output in a language or locale the model doesn't support. case unsupportedLanguageOrLocale( ) // Request timed out before the model produced a response. case timeout( ) } -
14:14 - Handle errors from your model executor
// Custom errors public enum MyModelError: Error, LocalizedError { // User hit monthly token limit. Prompt upgrade or wait for reset. case exceededSubscriptionTierLimit // Model variant isn't enabled on this account. case modelNotProvisioned // Billing or policy review locked this account. case accountSuspended public var errorDescription: String? { switch self { case .exceededSubscriptionTierLimit: String(localized: "Your plan limit has been reached.") // ... } } } -
16:08 - Attach custom metadata to responses
// Attach service-specific performance metadata let elapsed = Date().timeIntervalSince(startTime) let tokensPerSecond = Double(tokenCount) / elapsed let timeToFirstToken = firstTokenTime?.timeIntervalSince(startTime) ?? 0 await channel.send(.metadataUpdate([ "tokensPerSecond": tokensPerSecond, "timeToFirstToken": timeToFirstToken ])) -
17:05 - Define and use custom Transcript segments
// Define a custom segment public struct AudioSegment: Transcript.CustomSegment { public var id: String public var content: URL } // Pass it in a prompt let recording = AudioSegment(id: UUID().uuidString, content: URL(filePath: "/path/to/recording.m4a")) let response = try await session.respond { "Where was Frank Lloyd Wright's original architecture school located?" recording } // Emit a custom segment from the executor for try await event in stream { switch event { case .audioFileGenerated(let file): await channel.send(.response(action: .updateCustomSegment( AudioSegment(id: file.id, content: file.url) ))) } } -
18:09 - Implement server-side tools in your model
// Configure server-side tools public struct MyLanguageModel: LanguageModel { public struct ServerTool: Sendable { public static let webSearch: ServerTool = ... } public init(serverTools: [ServerTool] = []) { } } // Surface tool results through the channel let client = MyServerClient(serverTools: model.serverTools) let response = try await client.send(prompt: .init(request)) for try await chunk in response { switch chunk { case .webSearch(let webSearch): await channel.send(.response(action: .updateCustomSegment( WebSearchSegment(url: webSearch.url, content: webSearch.html) ))) case .textDelta(let textDelta): await channel.send(.response(action: .appendText( textDelta.text, tokenCount: textDelta.tokenCount ))) } }
-
-
- 0:00 - Introduction
Overview of the Foundation Models framework opening to nearly any LLM. Covers improvements to the on-device System Language Model, three new model options (Private Cloud Compute, Core AI, and MLX), upcoming Anthropic and Google partner integrations, and a code preview showing how any model can be swapped into a LanguageModelSession using the same Swift API.
- 3:37 - Packaging
How to package your LLM provider as a Swift package — configuring Package.swift with the right platform targets (iOS, macOS, visionOS, watchOS, and Linux), being deliberate about dependencies to minimize shipped bytes, and publishing a release via a git tag that developers can paste directly into Xcode.
- 4:48 - Protocol
The two core protocol types bridging your model to the framework: LanguageModel (declares capabilities and provides a Configuration) and LanguageModelExecutor (handles prewarm, translates Transcript entries to your inference engine's native format, applies ContextOptions and GenerationOptions, and streams responses with metadata-first ordering). Covers executor caching by configuration and KV cache state reuse across calls, plus how to approximate unsupported options or throw LanguageModelError when needed.
- 14:50 - Authentication
Best practices for credential handling — designing initializers that guide developers toward secure usage rather than plain API key strings, persisting tokens securely via Keychain, and using App Attest for device attestation to verify devices, catch tampered builds, and protect cloud-based language model services.
- 15:51 - Customization
How to differentiate your model package beyond the protocol fundamentals — attaching custom response metadata (e.g., tokensPerSecond, timeToFirstToken), defining custom segment types for new input and output modalities (audio, video, and beyond), and implementing server-side tools (web search, code execution, image generation) at three levels of visibility: privately grounded, metadata-enriched, or fully surfaced through custom segments.
- 19:47 - Next steps
Privacy considerations when choosing or shipping a model package — on-device versus cloud-based models have very different characteristics and users deserve to know which they're getting. Pointers to companion sessions on Core AI model integration, Private Cloud Compute, and building agentic app experiences on top of the new model ecosystem.