Hey everyone,
I've been updating my code to take advantage of the new Vision API for text recognition in macOS 15. I'm noticing some very odd behavior though, it seems like in general the new Vision API consistently produces worse results than the old API. For reference here is how I'm setting up my request.
var request = RecognizeTextRequest()
request.recognitionLevel = getOCRMode() // generally accurate
request.usesLanguageCorrection = !disableLanguageCorrection // generally true
request.recognitionLanguages = language.split(separator: ",").map { Locale.Language(identifier: String($0)) } // generally 'en'
let observations = try? await request.perform(on: image) as [RecognizedTextObservation]
Then I will process the results and just get the top candidate, which as mentioned above, typically is of worse quality then the same request formed with the old API.
Am I doing something wrong here?
General
RSS for tagExplore the power of machine learning within apps. Discuss integrating machine learning features, share best practices, and explore the possibilities for your app.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
I was installing TensorFlow metal in the environment called "arm64_tf'" in anaconda using command line "python -m pip install tensorflow-metal" in terminal and it shows :
ERROR: Could not find a version that satisfies the requirement tensorflow-metal (from versions: none)
ERROR: No matching distribution found for tensorflow-metal
I have already tried using " conda install -c anaconda libffi" but it still doesn't work is there a solution ? Thanks
apologies for my bad English
Submited as : FB16052050
I am looking to adopt Machine Learning in a more granular manner, going beyond just using pre-built Metal, Core ML, or Create ML approaches. Specifically, I want to train models using Open Python PyTorch libraries, as these offer greater flexibility compared to Apple's native tools. However, these PyTorch APIs are primarily optimised for NVIDIA GPUs (or TPUs), not Apple's M3 or Apple Neural Engine (ANE).
My goal is to train the models locally without resorting to cloud-based solutions for training or inference, and to then convert the models into Core ML format for deployment on Apple hardware. This would allow me to leverage Apple's hardware acceleration (via ANE, Metal, and MPS) while maintaining control over the training process in PyTorch.
I want to know:
What are my options for training models in PyTorch on local hardware (Apple M3 or equivalent), and how can I ensure that the PyTorch model can eventually be converted to Core ML without losing flexibility in model training and customisation?
How can I perform training in PyTorch and avoid being restricted to inference-only workflows as Core ML typically allows? Is it possible to use the training capabilities of PyTorch and still get the performance benefits of Apple's hardware for both training and inference?
What are the best practices or tools to ensure that my training pipeline in PyTorch is compatible with Apple's hardware constraints and optimised for local execution?
I'm seeking a practical, cloud-free approach on Apple Hardware only that allows me to train models in PyTorch (keeping control over the training process) while ensuring that they can be deployed efficiently using Core ML on Apple hardware.
Hello all,
I'm working on a project that involves listening to a person speak off of a script and I want to stop then restart the recognitionTask between sections so I don't run afoul of keeping the recognitionTask running for longer than it needs to. Also, I'd like to be able to flush the current input between sections so the input from the previous section doesn't roll over into the next one.
This is based on the sample code for SFSpeechRecognizer so there's a chance I might be misunderstanding something.
private func restartRecording() {
let inputNode = audioEngine.inputNode
audioEngine.stop()
inputNode.removeTap(onBus: 0)
recognitionRequest?.endAudio()
recordingStarted = false
recognitionTask?.cancel()
do {
try startRecording()
} catch {
print("Oopsie.")
}
}
Here's my code. When I run it, the recognition task doesn't restart. Any ideas?
I’m trying to use a Decimal as a @Property in my AppEntity, but using the following code shows me a compiler error. I’m using Xcode 16.1.
The documentation notes the following:
You can use the @Parameter property wrapper with common Swift and Foundation types:
Primitives such as Bool, Int, Double, String, Duration, Date, Decimal, Measurement, and URL.
Collections such as Array and Set. Make sure the collection’s elements are of a type that’s compatible with IntentParameter.
Everything works fine for other primitives as bools, strings and integers. How do I use the Decimal though?
Code
struct MyEntity: AppEntity {
var id: UUID
@Property(title: "Amount")
var amount: Decimal
// …
}
Compiler Error
This error appears at the line of the @Property definition:
Generic class 'EntityProperty' requires that 'Decimal' conform to '_IntentValue'
I'm trying to build llama.cpp, a popular tool for running LLMs locally on macos15.1.1 (24B91) Sonoma using cmake but am encountering errors. Here is the stack overflow post regarding the issue:
https://stackoverflow.com/questions/79304015/cmake-unable-to-find-foundation-framework-on-macos-15-1-1-24b91?noredirect=1#comment139853319_79304015
iOS 18 App Intents while supporting iOS 17
Hello,
I have an existing app that supports iOS 17. I already have three App Intents but would like to add some of the new iOS 18 app intents like ShowInAppSearchResultsIntent.
However, I am having a hard time using #available or @available to limit this ShowInAppSearchResultsIntent to iOS 18 only while still supporting iOS 17.
Obviously, the ShowInAppSearchResultsIntent needs to use @AssistantIntent which is iOS 18 only, so I mark that struct as @available(iOS 18, *). That works as expected. It is when I need to add this "SearchSnippetIntent" intent to the AppShortcutsProvider, that I begin to have trouble doing. See code below:
struct SnippetsShortcutsAppShortcutsProvider: AppShortcutsProvider {
@AppShortcutsBuilder
static var appShortcuts: [AppShortcut] {
//iOS 17+
AppShortcut(intent: SnippetsNewSnippetShortcutsAppIntent(), phrases: [
"Create a New Snippet in \(.applicationName) Studio",
], shortTitle: "New Snippet", systemImageName: "rectangle.fill.on.rectangle.angled.fill")
AppShortcut(intent: SnippetsNewLanguageShortcutsAppIntent(), phrases: [
"Create a New Language in \(.applicationName) Studio",
], shortTitle: "New Language", systemImageName: "curlybraces")
AppShortcut(intent: SnippetsNewTagShortcutsAppIntent(), phrases: [
"Create a New Tag in \(.applicationName) Studio",
], shortTitle: "New Tag", systemImageName: "tag.fill")
//iOS 18 Only
AppShortcut(intent: SearchSnippetIntent(), phrases: [
"Search \(.applicationName) Studio",
"Search \(.applicationName)"
], shortTitle: "Search", systemImageName: "magnifyingglass")
}
let shortcutTileColor: ShortcutTileColor = .blue
}
The iOS 18 Only AppShortcut shows the following error but none of the options seem to work. Maybe I am going about it the wrong way.
'SearchSnippetIntent' is only available in iOS 18 or newer
Add 'if #available' version check
Add @available attribute to enclosing static property
Add @available attribute to enclosing struct
Thanks in advance for your help.
I am attempting to install Tensorflow on my M1 and I seem to be unable to find the correct matching versions of jax, jaxlib and numpy to make it all work.
I am in Bash, because the default shell gave me issues.
I downgraded to python 3.10, because with 3.13, I could not do anything right.
Current actions:
bash-3.2$ python3.10 -m venv ~/venv-metal
bash-3.2$ python --version
Python 3.10.16
python3.10 -m venv ~/venv-metal
source ~/venv-metal/bin/activate
python -m pip install -U pip
python -m pip install tensorflow-macos
And here, I keep running tnto errors like:
(venv-metal):~$ pip install tensorflow-macos tensorflow-metal
ERROR: Could not find a version that satisfies the requirement tensorflow-macos (from versions: none)
ERROR: No matching distribution found for tensorflow-macos
What is wrong here?
How can I fix that?
It seems like the system wants to use the x86 version of python ... which can't be right.
i'm trying to create an NLModel within a MessageFilterExtension handler.
The code works fine in the main app, but when I try to use it in the extension it fails to initialize. Just this doesn't even work and gets the error below.
Single line that fails.
SMS_Classifier is the class xcode generated for my model. This line works fine in the main app.
let mlModel = try SMS_Classifier(configuration: MLModelConfiguration()).model
Error
Unable to locate Asset for contextual word embedding model for local en.
MLModelAsset: load failed with error Error Domain=com.apple.CoreML Code=0 "initialization of text classifier model with model data failed" UserInfo={NSLocalizedDescription=initialization of text classifier model with model data failed}
Any ideas?
Hi,
I'm working with vision framework to detect barcodes. I tested both ean13 and data matrix detection and both are working fine except for the QuadrilateralProviding values in the returned BarcodeObservation. TopLeft, topRight, bottomRight and bottomLeft coordinates are rotated 90° counter clockwise (physical bottom left of data Matrix, the corner of the "L" is returned as the topLeft point in observation). The same behaviour is happening with EAN13 Barcode.
Did someone else experienced the same issue with orientation? Is it normal behaviour or should we expect a fix in next releases of the Vision Framework?
Hi
I'm having a problem with DataScannerViewController, I'm using the volume barcode scanning feature in my app, prior to that I was using an AVCaptureDevice with the UltraWideAngle set. After discovering DataScannerViewController, we planned to replace the previous obsolete code with DataScannerViewController, all together it was ok, when I want to set the ultra wide angle, I don't know how to start.
I tried to get the minZoomFactor and I realized that I get 0.0
I tried to set zoomFactor to 1.0 and I found that he is not valid
Note: func dataScannerDidZoom(_ dataScanner: DataScannerViewController), when I try to get the minZoomFactor, set the zoomFactor in this proxy method, I find that it is valid!
What should I do next, I want to use only DataScannerViewController and implement ultra wide angle
Thanks a lot.
We are using VNRecognizeTextRequest to detect text in documents, and we have noticed that even in some very clear and well-formatted documents, there are still instances where text blocks are missed. the live text also have the same issue.
I've checked on pypi.org and it appears to only have arm64 packages, has x86 with AMD been deprecated?
We are building an app which can reads texts. It can read english and Japanese normal texts successfully. But in some cases, we need to read Japanese tategaki (vertically aligned texts). But in that times, the same code gives no output. So, is there any need to change any configuration to read Japanese tategaki? Or is it really possible to read Japanese tategaki using vision framework?
lazy var detectTextRequest = VNRecognizeTextRequest { request, error in
self.resStr="\n"
self.words = [:]
// Get OCR result
guard let res = request.results as? [VNRecognizedTextObservation] else { return }
// separate the words by space
let text = res.compactMap({$0.topCandidates(1).first?.string}).joined(separator: " ")
var n = 0
self.wordArr=[[]]
self.xs = 1
self.ys = 1
var hs = 0.0 // To compare the heights of the words
// To get the original axis (top most word's axis), only once
for r in res {
var word = r.topCandidates(1).first?.string
self.words[word ?? ""] = [r.topLeft.x, r.topLeft.y]
if(self.cartLabelType == 1){
if(word?.components(separatedBy: CharacterSet(charactersIn: "//")).count ?? 0>2){
self.xs = r.topLeft.x
self.ys = r.topLeft.y
}
}
}
}
}
Hi everyone😊, I want to implement facial recognition into my app. I was planning to use createML's image classification, but there seams to be a lot of hassle to implement (the JSON file etc.). Are there some other easy to implement options that don't involve advanced coding. Thanks, Oliver
Topic:
Machine Learning & AI
SubTopic:
General
Hi, The most recent version of tensorflow-metal is only available for macosx 12.0 and python up to version 3.11. Is there any chance it could be updated with wheels for macos 15 and Python 3.12 (which is the default version supported for tensrofllow 2.17+)? I'd note that even downgrading to Python 3.11 would not be sufficient, as the wheels only work for macos 12.
Thanks.
Issue type: Bug
TensorFlow metal version: 1.1.1
TensorFlow version: 2.18
OS platform and distribution: MacOS 15.2
Python version: 3.11.11
GPU model and memory: Apple M2 Max GPU 38-cores
Standalone code to reproduce the issue:
import tensorflow as tf
if __name__ == '__main__':
gpus = tf.config.experimental.list_physical_devices('GPU')
print(gpus)
Current behavior
Apple silicone GPU with tensorflow-metal==1.1.0 and python 3.11 works fine with tensorboard==2.17.0
This is normal output:
/Users/mspanchenko/anaconda3/envs/cryptoNN_ml_core/bin/python /Users/mspanchenko/VSCode/cryptoNN/ml/core_second_window/test_tensorflow_gpus.py
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Process finished with exit code 0
But if I upgrade tensorflow to 2.18 I'll have error:
/Users/mspanchenko/anaconda3/envs/cryptoNN_ml_core/bin/python /Users/mspanchenko/VSCode/cryptoNN/ml/core_second_window/test_tensorflow_gpus.py
Traceback (most recent call last):
File "/Users/mspanchenko/VSCode/cryptoNN/ml/core_second_window/test_tensorflow_gpus.py", line 1, in <module>
import tensorflow as tf
File "/Users/mspanchenko/anaconda3/envs/cryptoNN_ml_core/lib/python3.11/site-packages/tensorflow/__init__.py", line 437, in <module>
_ll.load_library(_plugin_dir)
File "/Users/mspanchenko/anaconda3/envs/cryptoNN_ml_core/lib/python3.11/site-packages/tensorflow/python/framework/load_library.py", line 151, in load_library
py_tf.TF_LoadLibrary(lib)
tensorflow.python.framework.errors_impl.NotFoundError: dlopen(/Users/mspanchenko/anaconda3/envs/cryptoNN_ml_core/lib/python3.11/site-packages/tensorflow-plugins/libmetal_plugin.dylib, 0x0006): Symbol not found: __ZN3tsl8internal10LogMessageC1EPKcii
Referenced from: <D2EF42E3-3A7F-39DD-9982-FB6BCDC2853C> /Users/mspanchenko/anaconda3/envs/cryptoNN_ml_core/lib/python3.11/site-packages/tensorflow-plugins/libmetal_plugin.dylib
Expected in: <2814A58E-D752-317B-8040-131217E2F9AA> /Users/mspanchenko/anaconda3/envs/cryptoNN_ml_core/lib/python3.11/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
Process finished with exit code 1
I am a App designer and I am curious about what specific ML or AI Apple used to develop those features in the system.
As far as I know, Apple's hand-raising detection, destination recommendations in maps, and exercise types in fitness all use ML.
Are there more specific application examples of ML or AI?
Does Apple have a document specifically introducing examples of specific applications of ML or AI technology in the system?
Topic:
Machine Learning & AI
SubTopic:
General
Hello,
I am developing an app for the Swift Student challenge; however, I keep encountering an error when using ClassifyImageRequest from the Vision framework in Xcode:
VTEST: error: perform(_:): inside 'for await result in resultStream' error: internalError("Error Domain=NSOSStatusErrorDomain Code=-1 \"Failed to create espresso context.\" UserInfo={NSLocalizedDescription=Failed to create espresso context.}")
It works perfectly when testing it on a physical device, and I saw on another thread that ClassifyImageRequest doesn't work on simulators. Will this cause problems with my submission to the challenge?
Thanks
Topic:
Machine Learning & AI
SubTopic:
General
Tags:
Swift Student Challenge
Swift
Swift Playground
Vision
Hi everyone,
I'm working with VNFeaturePrintObservation in Swift to compute the similarity between images. The computeDistance function allows me to calculate the distance between two images, and I want to cluster similar images based on these distances.
Current Approach
Right now, I'm using a brute-force approach where I compare every image against every other image in the dataset. This results in an O(n^2) complexity, which quickly becomes a bottleneck. With 5000 images, it takes around 10 seconds to complete, which is too slow for my use case.
Question
Are there any efficient algorithms or data structures I can use to improve performance?
If anyone has experience with optimizing feature vector clustering or has suggestions on how to scale this efficiently, I'd really appreciate your insights. Thanks!