In this WWDC25 session, it is explictely mentioned that apps should support AttributedString for text parameters to their App Intents.
However, I have not gotten this to work. Whenever I pass rich text (either generated by the new "Use Model" intent or generated manually for example using "Make Rich Text from Markdown"), my Intent gets an AttributedString with the correct characters, but with all attributes stripped (so in effect just plain text).
struct TestIntent: AppIntent {
static var title = LocalizedStringResource(stringLiteral: "Test Intent")
static var description = IntentDescription("Tests Attributed Strings in Intent Parameters.")
@Parameter
var text: AttributedString
func perform() async throws -> some IntentResult & ReturnsValue<AttributedString> {
return .result(value: text)
}
}
Is there anything else I am missing?
General
RSS for tagExplore the power of machine learning within apps. Discuss integrating machine learning features, share best practices, and explore the possibilities for your app.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
I have a question. In China, long pressing a picture in the album can segment the target. Is this model a local model? Is there any information? Can developers use it?
Hey Devs,
I'm trying to create my own Real Time Text detection like this Apple project. https://developer.apple.com/documentation/vision/extracting-phone-numbers-from-text-in-images
I want to use the new iOS18 RecognizeTextRequest instead of the old VNRecognizeTextRequest in my SwiftUI project.
This is my delegate code with the camera setup. I removed region of interest for debugging but I'm trying to scan English words in books. The idea is to get one word in the ROI in the future. But I can't even get proper words so testing without ROI incase my math is wrong.
@Observable
class CameraManager: NSObject, AVCapturePhotoCaptureDelegate
...
override init() {
super.init()
setUpVisionRequest()
}
private func setUpVisionRequest() {
textRequest = RecognizeTextRequest(.revision3)
}
...
func setup() -> Bool {
captureSession.beginConfiguration()
guard
let captureDevice = AVCaptureDevice.default(
.builtInWideAngleCamera, for: .video, position: .back)
else {
return false
}
self.captureDevice = captureDevice
guard let deviceInput = try? AVCaptureDeviceInput(device: captureDevice)
else {
return false
}
/// Check whether the session can add input.
guard captureSession.canAddInput(deviceInput) else {
print("Unable to add device input to the capture session.")
return false
}
/// Add the input and output to session
captureSession.addInput(deviceInput)
/// Configure the video data output
videoDataOutput.setSampleBufferDelegate(
self, queue: videoDataOutputQueue)
if captureSession.canAddOutput(videoDataOutput) {
captureSession.addOutput(videoDataOutput)
videoDataOutput.connection(with: .video)?
.preferredVideoStabilizationMode = .off
} else {
return false
}
// Set zoom and autofocus to help focus on very small text
do {
try captureDevice.lockForConfiguration()
captureDevice.videoZoomFactor = 2
captureDevice.autoFocusRangeRestriction = .near
captureDevice.unlockForConfiguration()
} catch {
print("Could not set zoom level due to error: \(error)")
return false
}
captureSession.commitConfiguration()
// potential issue with background vs dispatchqueue ??
Task(priority: .background) {
captureSession.startRunning()
}
return true
}
}
// Issue here ???
extension CameraManager: AVCaptureVideoDataOutputSampleBufferDelegate {
func captureOutput(
_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer,
from connection: AVCaptureConnection
) {
guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }
Task {
textRequest.recognitionLevel = .fast
textRequest.recognitionLanguages = [Locale.Language(identifier: "en-US")]
do {
let observations = try await textRequest.perform(on: pixelBuffer)
for observation in observations {
let recognizedText = observation.topCandidates(1).first
print("recognized text \(recognizedText)")
}
} catch {
print("Recognition error: \(error.localizedDescription)")
}
}
}
}
The results I get look like this ( full page of English from a any book)
recognized text Optional(RecognizedText(string: e bnUI W4, confidence: 0.5))
recognized text Optional(RecognizedText(string: ?'U, confidence: 0.3))
recognized text Optional(RecognizedText(string: traQt4, confidence: 0.3))
recognized text Optional(RecognizedText(string: li, confidence: 0.3))
recognized text Optional(RecognizedText(string: 15,1,#, confidence: 0.3))
recognized text Optional(RecognizedText(string: jllÈ, confidence: 0.3))
recognized text Optional(RecognizedText(string: vtrll, confidence: 0.3))
recognized text Optional(RecognizedText(string: 5,1,: 11, confidence: 0.5))
recognized text Optional(RecognizedText(string: 1141, confidence: 0.3))
recognized text Optional(RecognizedText(string: jllll ljiiilij41, confidence: 0.3))
recognized text Optional(RecognizedText(string: 2f4, confidence: 0.3))
recognized text Optional(RecognizedText(string: ktril, confidence: 0.3))
recognized text Optional(RecognizedText(string: ¥LLI, confidence: 0.3))
recognized text Optional(RecognizedText(string: 11[Itl,, confidence: 0.3))
recognized text Optional(RecognizedText(string: 'rtlÈ131, confidence: 0.3))
Even with ROI set to a specific rectangle Normalized to Vision, I get the same results with single characters returning gibberish.
Any help would be amazing thank you.
Am I using the buffer right ?
Am I using the new perform(on: CVPixelBuffer) right ?
Maybe I didn't set up my camera properly? I can provide code
During testing the “Bringing advanced speech-to-text capabilities to your app” sample app demonstrating the use of iOS 26 SpeechAnalyzer, I noticed that the language model for the English locale was presumably already downloaded. Upon checking the documentation of AssetInventory, I found out that indeed, the language model can be preinstalled on the system.
Can someone from the dev team share more info about what assets are preinstalled by the system? For example, can we safely assume that the English language model will almost certainly be already preinstalled by the OS if the phone has the English locale?
Following WWDC24 video "Discover Swift enhancements in the Vision framework" recommendations (cfr video at 10'41"), I used the following code to perform multiple new iOS 18 `RecognizedTextRequest' in parallel.
Problem: if more than 2 request are run in parallel, the request will hang, leaving the app in a state where no more requests can be started. -> deadlock
I tried other ways to run the requests, but no matter the method employed, or what device I use: no more than 2 requests can ever be run in parallel.
func triggerDeadlock() {}
try await withThrowingTaskGroup(of: Void.self) { group in
// See: WWDC 2024 Discover Siwft enhancements in the Vision framework at 10:41
// ############## THIS IS KEY
let maxOCRTasks = 5 // On a real-device, if more than 2 RecognizeTextRequest are launched in parallel using tasks, the request hangs
// ############## THIS IS KEY
for idx in 0..<maxOCRTasks {
let url = ... // URL to some image
group.addTask {
// Perform OCR
let _ = await performOCRRequest(on: url: url)
}
}
var nextIndex = maxOCRTasks
for try await _ in group { // Wait for the result of the next child task that finished
if nextIndex < pageCount {
group.addTask {
let url = ... // URL to some image
// Perform OCR
let _ = await performOCRRequest(on: url: url)
}
nextIndex += 1
}
}
}
}
// MARK: - ASYNC/AWAIT version with iOS 18
@available(iOS 18, *)
func performOCRRequest(on url: URL) async throws -> [RecognizedText] {
// Create request
var request = RecognizeTextRequest() // Single request: no need for ImageRequestHandler
// Configure request
request.recognitionLevel = .accurate
request.automaticallyDetectsLanguage = true
request.usesLanguageCorrection = true
request.minimumTextHeightFraction = 0.016
// Perform request
let textObservations: [RecognizedTextObservation] = try await request.perform(on: url)
// Convert [RecognizedTextObservation] to [RecognizedText]
return textObservations.compactMap { observation in
observation.topCandidates(1).first
}
}
I also found this Swift forums post mentioning something very similar.
I also opened a feedback: FB17240843
Environment
MacOC 26
Xcode Version 26.0 beta 7 (17A5305k)
simulator: iPhone 16 pro
iOS: iOS 26
Problem
NLContextualEmbedding.load() fails with the following error
In simulator
Failed to load embedding from MIL representation: filesystem error: in create_directories: Permission denied ["/var/db/com.apple.naturallanguaged/com.apple.e5rt.e5bundlecache"]
filesystem error: in create_directories: Permission denied ["/var/db/com.apple.naturallanguaged/com.apple.e5rt.e5bundlecache"]
Failed to load embedding model 'mul_Latn' - '5C45D94E-BAB4-4927-94B6-8B5745C46289'
assetRequestFailed(Optional(Error Domain=NLNaturalLanguageErrorDomain Code=7 "Embedding model requires compilation" UserInfo={NSLocalizedDescription=Embedding model requires compilation}))
in #Playground
I'm new to this embedding model. Not sure if it's caused by my code or environment.
Code snippet
import Foundation
import NaturalLanguage
import Playgrounds
#Playground {
// Prefer initializing by script for broader coverage; returns NLContextualEmbedding?
guard let embeddingModel = NLContextualEmbedding(script: .latin) else {
print("Failed to create NLContextualEmbedding")
return
}
print(embeddingModel.hasAvailableAssets)
do {
try embeddingModel.load()
print("Model loaded")
} catch {
print("Failed to load model: \(error)")
}
}
Hi,
I'm trying to use the new RecognizeDocumentsRequest from the Vision Framework to read a receipt. It looks very promising by being able to read paragraphs, lines and detect data. So far it unfortunately seems to read every line on the receipt as a paragraph and when there is more space on one line it creates two paragraphs.
Is there perhaps an Apple Engineer who knows if this is expected behaviour or if I should file a Feedback for this?
Code setup:
let request = RecognizeDocumentsRequest()
let observations = try await request.perform(on: image)
guard let document = observations.first?.document else {
return
}
for paragraph in document.paragraphs {
print(paragraph.transcript)
for data in paragraph.detectedData {
switch data.match.details {
case .phoneNumber(let data):
print("Phone: \(data)")
case .postalAddress(let data):
print("Postal: \(data)")
case .calendarEvent(let data):
print("Calendar: \(data)")
case .moneyAmount(let data):
print("Money: \(data)")
case .measurement(let data):
print("Measurement: \(data)")
default:
continue
}
}
}
See attached image as an example of a receipt I'd like to parse. The top 3 lines are the name, street, and postal code + city. These are all separate paragraphs. Checking on detectedData does see the street (2nd line) as PostalAddress, but not the complete address. Might that be a location thing since it's a Dutch address.
And lower on the receipt it sees the block with "Pomp 1 95 Ongelood" and the things below also as separate paragraphs. First picking up the left side and after that the right side. So it's something like this:
*
Pomp 1
Volume
Prijs
€
TOTAAL
*
BTW
Netto
21.00 %
95 Ongelood
41,90 l
1.949/ 1
81.66
€
14.17
67.49
Hello, I'm using videotoolbox superresolution API in MACOS 26: https://developer.apple.com/documentation/videotoolbox/vtsuperresolutionscalerconfiguration/downloadconfigurationmodel(completionhandler:)?language=objc, when using swift, it's ok, when using objective-c, I get error when downloading model with downloadConfigurationModelWithCompletionHandler:
[Auto] MA-auto{_failedLockContent} | failure reported by server | error:[com.apple.MobileAssetError.AutoAsset:MissingReference(6111)]
[Auto] MA-auto{_failedLockContent} | failure reported by server | error:[com.apple.MobileAssetError.AutoAsset:UnderlyingError(6107)_1_com.apple.MobileAssetError.Download:47]
Download completion handler called with error: The operation couldnxe2x80x99t be completed. (VTFrameProcessorErrorDomain error -19743.)
I'm playing with the new Vision API for iOS18, specifically with the new CalculateImageAestheticsScoresRequest API.
When I try to perform the image observation request I get this error:
internalError("Error Domain=NSOSStatusErrorDomain Code=-1 \"Failed to create espresso context.\" UserInfo={NSLocalizedDescription=Failed to create espresso context.}")
The code is pretty straightforward:
if let image = image {
let request = CalculateImageAestheticsScoresRequest()
Task {
do {
let cgImg = image.cgImage!
let observations = try await request.perform(on: cgImg)
let description = observations.description
let score = observations.overallScore
print(description)
print(score)
} catch {
print(error)
}
}
}
I'm running it on a M2 using the simulator.
Is it a bug? What's wrong?
Using Tensorflow for Silicon gives inaccurate results when compared to Google Colab GPU (9-15% differences). Here are my install versions for 4 anaconda env's. I understand the Floating point precision can be an issue, batch size, activation functions but how do you rectify this issue for the past 3 years?
1.) Version TF: 2.12.0, Python 3.10.13, tensorflow-deps: 2.9.0, tensorflow-metal: 1.2.0, h5py: 3.6.0, keras: 2.12.0
2.) Version TF: 2.19.0, Python 3.11.0, tensorflow-metal: 1.2.0, h5py: 3.13.0, keras: 3.9.2, jax: 0.6.0, jax-metal: 0.1.1,jaxlib: 0.6.0, ml_dtypes: 0.5.1
3.) python: 3.10.13,tensorflow: 2.19.0,tensorflow-metal: 1.2.0, h5py: 3.13.0, keras: 3.9.2, ml_dtypes: 0.5.1
4.) Version TF: 2.16.2, tensorflow-deps:2.9.0,Python: 3.10.16, tensorflow-macos 2.16.2, tensorflow-metal: 1.2.0, h5py:3.13.0, keras: 3.9.2, ml_dtypes: 0.3.2
Install of Each ENV with common example:
Create ENV: conda create --name TF_Env_V2 --no-default-packages
start env: source TF_Env_Name
ENV_1.) conda install -c apple tensorflow-deps , conda install tensorflow,pip install tensorflow-metal,conda install ipykernel
ENV_2.) conda install pip python==3.11, pip install tensorflow,pip install tensorflow-metal,conda install ipykernel
ENV_3) conda install pip python 3.10.13,pip install tensorflow, pip install tensorflow-metal,conda install ipykernel
ENV_4) conda install -c apple tensorflow-deps, pip install tensorflow-macos, pip install tensor-metal, conda install ipykernel
Example used on all 4 env:
import tensorflow as tf
cifar = tf.keras.datasets.cifar100
(x_train, y_train), (x_test, y_test) = cifar.load_data()
model = tf.keras.applications.ResNet50(
include_top=True,
weights=None,
input_shape=(32, 32, 3),
classes=100,)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False)
model.compile(optimizer="adam", loss=loss_fn, metrics=["accuracy"])
model.fit(x_train, y_train, epochs=5, batch_size=64)