In WWDC25 Metal 4 released quite excited new features for machine learning optimization, but as we all know the pytorch based on metal shader performance (mps) is the one of most important tools for Mac machine learning area.but on mps introduced website we cannot see any support information for metal4.
General
RSS for tagExplore the power of machine learning within apps. Discuss integrating machine learning features, share best practices, and explore the possibilities for your app.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
WWDC25: Combine Metal 4 machine learning and graphics
Demonstrated a way to combine neural network in the graphics pipeline directly through the shaders, using an example of Texture Compression. However there is no mention of using which ML technique texture is compressed.
Can anyone point me to some well known model/s for this particular use case shown in WWDC25.
In this WWDC25 session, it is explictely mentioned that apps should support AttributedString for text parameters to their App Intents.
However, I have not gotten this to work. Whenever I pass rich text (either generated by the new "Use Model" intent or generated manually for example using "Make Rich Text from Markdown"), my Intent gets an AttributedString with the correct characters, but with all attributes stripped (so in effect just plain text).
struct TestIntent: AppIntent {
static var title = LocalizedStringResource(stringLiteral: "Test Intent")
static var description = IntentDescription("Tests Attributed Strings in Intent Parameters.")
@Parameter
var text: AttributedString
func perform() async throws -> some IntentResult & ReturnsValue<AttributedString> {
return .result(value: text)
}
}
Is there anything else I am missing?
I have a question. In China, long pressing a picture in the album can segment the target. Is this model a local model? Is there any information? Can developers use it?
Hey Devs,
I'm trying to create my own Real Time Text detection like this Apple project. https://developer.apple.com/documentation/vision/extracting-phone-numbers-from-text-in-images
I want to use the new iOS18 RecognizeTextRequest instead of the old VNRecognizeTextRequest in my SwiftUI project.
This is my delegate code with the camera setup. I removed region of interest for debugging but I'm trying to scan English words in books. The idea is to get one word in the ROI in the future. But I can't even get proper words so testing without ROI incase my math is wrong.
@Observable
class CameraManager: NSObject, AVCapturePhotoCaptureDelegate
...
override init() {
super.init()
setUpVisionRequest()
}
private func setUpVisionRequest() {
textRequest = RecognizeTextRequest(.revision3)
}
...
func setup() -> Bool {
captureSession.beginConfiguration()
guard
let captureDevice = AVCaptureDevice.default(
.builtInWideAngleCamera, for: .video, position: .back)
else {
return false
}
self.captureDevice = captureDevice
guard let deviceInput = try? AVCaptureDeviceInput(device: captureDevice)
else {
return false
}
/// Check whether the session can add input.
guard captureSession.canAddInput(deviceInput) else {
print("Unable to add device input to the capture session.")
return false
}
/// Add the input and output to session
captureSession.addInput(deviceInput)
/// Configure the video data output
videoDataOutput.setSampleBufferDelegate(
self, queue: videoDataOutputQueue)
if captureSession.canAddOutput(videoDataOutput) {
captureSession.addOutput(videoDataOutput)
videoDataOutput.connection(with: .video)?
.preferredVideoStabilizationMode = .off
} else {
return false
}
// Set zoom and autofocus to help focus on very small text
do {
try captureDevice.lockForConfiguration()
captureDevice.videoZoomFactor = 2
captureDevice.autoFocusRangeRestriction = .near
captureDevice.unlockForConfiguration()
} catch {
print("Could not set zoom level due to error: \(error)")
return false
}
captureSession.commitConfiguration()
// potential issue with background vs dispatchqueue ??
Task(priority: .background) {
captureSession.startRunning()
}
return true
}
}
// Issue here ???
extension CameraManager: AVCaptureVideoDataOutputSampleBufferDelegate {
func captureOutput(
_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer,
from connection: AVCaptureConnection
) {
guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }
Task {
textRequest.recognitionLevel = .fast
textRequest.recognitionLanguages = [Locale.Language(identifier: "en-US")]
do {
let observations = try await textRequest.perform(on: pixelBuffer)
for observation in observations {
let recognizedText = observation.topCandidates(1).first
print("recognized text \(recognizedText)")
}
} catch {
print("Recognition error: \(error.localizedDescription)")
}
}
}
}
The results I get look like this ( full page of English from a any book)
recognized text Optional(RecognizedText(string: e bnUI W4, confidence: 0.5))
recognized text Optional(RecognizedText(string: ?'U, confidence: 0.3))
recognized text Optional(RecognizedText(string: traQt4, confidence: 0.3))
recognized text Optional(RecognizedText(string: li, confidence: 0.3))
recognized text Optional(RecognizedText(string: 15,1,#, confidence: 0.3))
recognized text Optional(RecognizedText(string: jllÈ, confidence: 0.3))
recognized text Optional(RecognizedText(string: vtrll, confidence: 0.3))
recognized text Optional(RecognizedText(string: 5,1,: 11, confidence: 0.5))
recognized text Optional(RecognizedText(string: 1141, confidence: 0.3))
recognized text Optional(RecognizedText(string: jllll ljiiilij41, confidence: 0.3))
recognized text Optional(RecognizedText(string: 2f4, confidence: 0.3))
recognized text Optional(RecognizedText(string: ktril, confidence: 0.3))
recognized text Optional(RecognizedText(string: ¥LLI, confidence: 0.3))
recognized text Optional(RecognizedText(string: 11[Itl,, confidence: 0.3))
recognized text Optional(RecognizedText(string: 'rtlÈ131, confidence: 0.3))
Even with ROI set to a specific rectangle Normalized to Vision, I get the same results with single characters returning gibberish.
Any help would be amazing thank you.
Am I using the buffer right ?
Am I using the new perform(on: CVPixelBuffer) right ?
Maybe I didn't set up my camera properly? I can provide code
Environment
MacOC 26
Xcode Version 26.0 beta 7 (17A5305k)
simulator: iPhone 16 pro
iOS: iOS 26
Problem
NLContextualEmbedding.load() fails with the following error
In simulator
Failed to load embedding from MIL representation: filesystem error: in create_directories: Permission denied ["/var/db/com.apple.naturallanguaged/com.apple.e5rt.e5bundlecache"]
filesystem error: in create_directories: Permission denied ["/var/db/com.apple.naturallanguaged/com.apple.e5rt.e5bundlecache"]
Failed to load embedding model 'mul_Latn' - '5C45D94E-BAB4-4927-94B6-8B5745C46289'
assetRequestFailed(Optional(Error Domain=NLNaturalLanguageErrorDomain Code=7 "Embedding model requires compilation" UserInfo={NSLocalizedDescription=Embedding model requires compilation}))
in #Playground
I'm new to this embedding model. Not sure if it's caused by my code or environment.
Code snippet
import Foundation
import NaturalLanguage
import Playgrounds
#Playground {
// Prefer initializing by script for broader coverage; returns NLContextualEmbedding?
guard let embeddingModel = NLContextualEmbedding(script: .latin) else {
print("Failed to create NLContextualEmbedding")
return
}
print(embeddingModel.hasAvailableAssets)
do {
try embeddingModel.load()
print("Model loaded")
} catch {
print("Failed to load model: \(error)")
}
}