What is the context window of the on-device model (AFM 3 Core Advanced and the 3B Core), and how should developers handle prompts that exceed it — automatic truncation, error, or developer-managed chunking?
For guided/structured generation into typed Swift values, what are the limits on schema complexity (nesting depth, enums, arrays, optionals), and what is the failure mode when the model cannot satisfy the schema?
How deterministic and reliable is on-device tool calling under the Tool protocol — are there guarantees on argument validity, and a recommended pattern for validating/repairing tool arguments before execution?
For the new image input: what are the constraints on resolution, image count per prompt, and formats, and does passing images change which device tiers or which model (on-device vs PCC) services the request?
Since the on-device model ships and updates with the OS, how should developers detect the active model version at runtime and guard against behavioral drift between OS releases? Is there a pinning or capability-query API?
What are the realistic latency and concurrency expectations on supported hardware, and is there a supported way to run multiple sessions or background inference without thermal/throttling penalties?