Explore the power of machine learning and Apple Intelligence within apps. Discuss integrating features, share best practices, and explore the possibilities for your app here.

All subtopics
Posts under Machine Learning & AI topic

Post

Replies

Boosts

Views

Activity

Do we need *both* associateAppEntity and to implement attributeSet when indexing App Entities?
I am working on adding indexing to my App Entities via IndexedEntity. I already, separately index my content via Spotlight. Watching 'What's New in App Intents', this is covered well but I have a question. Do I need to implement both CSSearchableItem's associateAppEntity AND also a custom implementation of attributeSet in my IndexedEntity conformance? It seems duplicative but I can't tell from the video if you're supposed to do both or just one or the other.
1
1
602
Nov ’24
Core ML Stable Diffusion
Attempting to set up ComfyUI-CoreMLSuite on my Mac Studio. ComfyUI starts but no Core nodes are in the add-node-list. cloned both ComfyUI-CoreMLSuite and ml-stable-diffusion into custom_nodes and bounced the ComfyUI server. The startup complains that ml-stable-diffusion has no init.py. FileNotFoundError: [Errno 2] No such file or directory: ... /ComfyUI/custom_nodes/ml-stable-diffusion/init.py' It appears to be a show stopper. What to do?
0
0
647
Nov ’24
DepthAnything v2
I'm finding the model is giving very jagged edges. This may be to do with the output resolution: Grayscale16Half 518 × 392. I have tried to re-convert this model on Colab but have not had much luck as this is very much out of my comfort zone. Has anyone else dealt with this? the model would be perfect if I could just overcome this issue.
2
0
649
Dec ’24
Running out of memory analyzing images with ImageRequestHandler
Hi, I'm trying to analyze images in my Photos library with the following code: func analyzeImages(_ inputIDs: [String]) { let manager = PHImageManager.default() let option = PHImageRequestOptions() option.isSynchronous = true option.isNetworkAccessAllowed = true option.resizeMode = .none option.deliveryMode = .highQualityFormat let concurrentTasks=1 let clock = ContinuousClock() let duration = clock.measure { let group = DispatchGroup() let sema = DispatchSemaphore(value: concurrentTasks) for entry in inputIDs { if let asset=PHAsset.fetchAssets(withLocalIdentifiers: [entry], options: nil).firstObject { print("analyzing asset: \(entry)") group.enter() sema.wait() manager.requestImage(for: asset, targetSize: PHImageManagerMaximumSize, contentMode: .aspectFit, options: option) { (result, info) in if let result = result { Task { print("retrieved asset: \(entry)") let aestheticsRequest = CalculateImageAestheticsScoresRequest() let fingerprintRequest = GenerateImageFeaturePrintRequest() let inputImage = result.cgImage! let handler = ImageRequestHandler(inputImage) let (aesthetics,fingerprint) = try await handler.perform(aestheticsRequest, fingerprintRequest) // save Results print("finished asset: \(entry)") sema.signal() group.leave() } } else { group.leave() } } } } group.wait() } print("analyzeImages: Duration \(duration)") } When running this code, only two requests are being processed simultaneously (due to to the semaphore)... However, if I call the function with a large list of images (>100), memory usage balloons over 1.6GB and the app crashes. If I call with a smaller number of images, the loop completes and the memory is freed. When I use instruments to look for memory leaks, it indicates no memory leaks are found, but there are 150+ VM:IOSurfaces allocated by CMPhoto, CoreVideo and CoreGraphics @ 35MB each. Shouldn't each surface be released when the task is complete?
2
0
593
Dec ’24
Source Files from the Session number 424 WWDC2019
In the 2019 WWDC session Training Object Detection Models in Create ML a JSON file named: annotations_832_newdice_copy.json was show alongside with the images folder named: Dice Training Images Two Sets. Are these resources made available for devs ? I am looking to understand whether the 6000 annotations were needed to be done manually ? Meaning, they have annotated around 1000 images making 6 labels on each manually to achieve this source ? Video shows around 1000 images. Can someone please clarify.
2
0
651
Dec ’24
unable to run tensorflow on my machine
Hello! I've been trying to run tensorflow on my MBA M3. I previously had an Intel Mac and was able to run tensorflow without any problem. I've been working on a personal project in a directory I made on my previous Mac, that I was running through Jupyter notebook. Now every time I try to run the code, the kernel will die and I'm unsure what to do. I tried following tutorials, but every tutorial I've seen has made me create a new environment to access Jupyter Notebook, but not letting me access notebooks and files that have already been created. I tried to run this following command in terminal and received the subsequent error back. python -m pip install tensorflow-metal ERROR: Could not find a version that satisfies the requirement tensorflow-metal (from versions: none) ERROR: No matching distribution found for tensorflow-metal I've installed miniforge, Xcode, and anaconda onto my computer already and wanted some assistance.
2
0
853
Dec ’24
Inference with non-square Images
I'm trying to set up Facebook AI's "Segment Anything" MLModel to compare its performance and efficacy on-device against the Vision library's Foreground Instance Mask Request. The Vision request accepts any reasonably-sized image for processing, and then has a method to produce an output at the same resolution as the input image. Conversely, the MLModel for Segment Anything accepts a 1024x1024 image for inference and outputs a 1024x1024 image for output. What is the best way to work with non-square images, such as 4:3 camera photos? I can basically think of 3 methods for accomplishing this: Scale the image to 1024x1024, ignoring aspect ratio, then inversely scale the output back to the original size. However, I have a big concern that squashing the content will result in poor inference results. Scale the image, preserving its aspect ratio so its minimum dimension is 1024, then run the model multiple times on a sliding 1024x1024 window and then aggregating the results. My main concern here is the complexity of de-duping the output, when each run could make different outputs based on how objects are cropped. Fit the image within 1024x1024 and pad with black pixels to make a square. I'm not sure if the border will muck up the inference. Anyway, this seems like it must be a well-solved problem in ML, but I'm having difficulty finding an authoritative best practice.
0
0
422
Dec ’24
Urgent Issue with SoundAnalysis in iOS 18 - Critical Background Permissions Error
We are experiencing a major issue with the native .version1 of the SoundAnalysis framework in iOS 18, which has led to all our user not having recordings. Our core feature relies heavily on sound analysis in the background, and it previously worked flawlessly in prior iOS versions. However, in the new iOS 18, sound analysis stops working in the background, triggering a critical warning. Details of the issue: We are using SoundAnalysis to analyze background sounds and have enabled the necessary background permissions. We are using the latest XCode A warning now appears, and sound analysis fails in the background. Below is the warning message we are encountering: Warning Message: Execution of the command buffer was aborted due to an error during execution. Insufficient Permission (to submit GPU work from background) [Espresso::handle_ex_plan] exception=Espresso exception: "Generic error": Insufficient Permission (to submit GPU work from background) (00000006:kIOGPUCommandBufferCallbackErrorBackgroundExecutionNotPermitted); code=7 status=-1 Unable to compute the prediction using a neural network model. It can be an invalid input data or broken/unsupported model (error code: -1). CoreML prediction failed with Error Domain=com.apple.CoreML Code=0 "Failed to evaluate model 0 in pipeline" UserInfo={NSLocalizedDescription=Failed to evaluate model 0 in pipeline, NSUnderlyingError=0x30330e910 {Error Domain=com.apple.CoreML Code=0 "Failed to evaluate model 1 in pipeline" UserInfo={NSLocalizedDescription=Failed to evaluate model 1 in pipeline, NSUnderlyingError=0x303307840 {Error Domain=com.apple.CoreML Code=0 "Unable to compute the prediction using a neural network model. It can be an invalid input data or broken/unsupported model (error code: -1)." UserInfo={NSLocalizedDescription=Unable to compute the prediction using a neural network model. It can be an invalid input data or broken/unsupported model (error code: -1).}}}}} We urgently need guidance or a fix for this, as our application’s main functionality is severely impacted by this background permission error. Please let us know the next steps or if this is a known issue with iOS 18.
12
12
2.2k
Dec ’24
New Vision API
Hey everyone, I've been updating my code to take advantage of the new Vision API for text recognition in macOS 15. I'm noticing some very odd behavior though, it seems like in general the new Vision API consistently produces worse results than the old API. For reference here is how I'm setting up my request. var request = RecognizeTextRequest() request.recognitionLevel = getOCRMode() // generally accurate request.usesLanguageCorrection = !disableLanguageCorrection // generally true request.recognitionLanguages = language.split(separator: ",").map { Locale.Language(identifier: String($0)) } // generally 'en' let observations = try? await request.perform(on: image) as [RecognizedTextObservation] Then I will process the results and just get the top candidate, which as mentioned above, typically is of worse quality then the same request formed with the old API. Am I doing something wrong here?
3
0
685
Dec ’24
Image Search Apple Intelligence 18.2 Beta - Can’t Find It
Hi! I recently updated to the latest 18.2 Beta version of iOS on my iPhone 15 Pro Max. Could you please guide me on how to locate and utilize the Image Search feature powered by Apple Intelligence? Just a little detail: I went on YouTube and the instruction was to hold the camera action button on the iPhone 16 and image search appears. So far, I haven’t been able to replicate these results on my iPhone 15 Pro Max. This is a great capability and I’d really like to try it out. “Live long and prosper.” -Spock -Jordan
1
0
943
Dec ’24
How to Fine-Tune the SNSoundClassifier for Custom Sound Classification in iOS?
Hi Apple Developer Community, I’m exploring ways to fine-tune the SNSoundClassifier to allow users of my iOS app to personalize the model by adding custom sounds or adjusting predictions. While Apple’s WWDC session on sound classification explains how to train from scratch, I’m specifically interested in using SNSoundClassifier as the base model and building/fine-tuning on top of it. Here are a few questions I have: 1. Fine-Tuning on SNSoundClassifier: Is there a way to fine-tune this model programmatically through APIs? The manual approach using macOS, as shown in this documentation is clear, but how can it be done dynamically - within the app for users or in a cloud backend (AWS/iCloud)? Are there APIs or classes that support such on-device/cloud-based fine-tuning or incremental learning? If not directly, can the classifier’s embeddings be used to train a lightweight custom layer? Training is likely computationally intensive and drains too much on battery, doing it on cloud can be right way but need the right apis to get this done. A sample code will do good. 2. Recommended Approach for In-App Model Customization: If SNSoundClassifier doesn’t support fine-tuning, would transfer learning on models like MobileNetV2, YAMNet, OpenL3, or FastViT be more suitable? Given these models (SNSoundClassifier, MobileNetV2, YAMNet, OpenL3, FastViT), which one would be best for accuracy and performance/efficiency on iOS? I aim to maintain real-time performance without sacrificing battery life. Also it is important to see architecture retention and accuracy after conversion to CoreML model. 3. Cost-Effective Backend Setup for Training: Mac EC2 instances on AWS have a 24-hour minimum billing, which can become expensive for limited user requests. Are there better alternatives for deploying and training models on user request when s/he uploads files (training data)? 4. TensorFlow vs PyTorch: Between TensorFlow and PyTorch, which framework would you recommend for iOS Core ML integration? TensorFlow Lite offers mobile-optimized models, but I’m also curious about PyTorch’s performance when converted to Core ML. 5. Metrics: Metrics I have in mind while picking the model are these: Publisher, Accuracy, Fine-Tuning capability, Real-Time/Live use, Suitability of iPhone 16, Architectural retention after coreML conversion, Reasons for unsuitability, Recommended use case. Any insights or recommended approaches would be greatly appreciated. Thanks in advance!
6
1
1.3k
Dec ’24
Feasibility of Real-Time Object Detection in Live Video with Core ML on M1 Pro and A-Series Chips
Hello, I am exploring real-time object detection, and its replacement/overlay with another shape, on live video streams for an iOS app using Core ML and Vision frameworks. My target is to achieve high-speed, real-time detection without noticeable latency, similar to what’s possible with PageFault handling and Associative Caching in OS, but applied to video processing. Given that this requires consistent, real-time model inference, I’m curious about how well the Neural Engine or GPU can handle such tasks on A-series chips in iPhones versus M-series chips (specifically M1 Pro and possibly M4) in MacBooks. Here are a few specific points I’d like insight on: Hardware Suitability: How feasible is it to perform real-time object detection with Core ML on the Neural Engine (i.e., can it maintain low latency)? Would the M-series chips (e.g., M1 Pro or newer) offer a tangible benefit for this type of task compared to the A-series in mobile devices? Which A- and M- chips would be minimum feasible recommendation for such task. Performance Expectations: For continuous, live video object detection, what would be the expected frame rate or latency using an optimized Core ML model? Has anyone benchmarked such applications, and is the M-series required to achieve smooth, real-time processing? Differences Across Apple Hardware: How does performance scale between the A-series Neural Engine and M-series GPU and Neural Engine? Is the M-series vastly superior for real-time Core ML tasks like object detection on live video feeds? If anyone has attempted live object detection on these chips, any insights on real-time performance, limitations, or optimizations would be highly appreciated. Please refer: Apple APIs Thank you in advance for your help!
3
0
785
Dec ’24
can't install tenserflow metal
I was installing TensorFlow metal in the environment called "arm64_tf'" in anaconda using command line "python -m pip install tensorflow-metal" in terminal and it shows : ERROR: Could not find a version that satisfies the requirement tensorflow-metal (from versions: none) ERROR: No matching distribution found for tensorflow-metal I have already tried using " conda install -c anaconda libffi" but it still doesn't work is there a solution ? Thanks apologies for my bad English
3
1
770
Dec ’24
How to Train and Deploy PyTorch Models on Apple Hardware: A Unified Path for Deep ML Practice on Core ML?
Submited as : FB16052050 I am looking to adopt Machine Learning in a more granular manner, going beyond just using pre-built Metal, Core ML, or Create ML approaches. Specifically, I want to train models using Open Python PyTorch libraries, as these offer greater flexibility compared to Apple's native tools. However, these PyTorch APIs are primarily optimised for NVIDIA GPUs (or TPUs), not Apple's M3 or Apple Neural Engine (ANE). My goal is to train the models locally without resorting to cloud-based solutions for training or inference, and to then convert the models into Core ML format for deployment on Apple hardware. This would allow me to leverage Apple's hardware acceleration (via ANE, Metal, and MPS) while maintaining control over the training process in PyTorch. I want to know: What are my options for training models in PyTorch on local hardware (Apple M3 or equivalent), and how can I ensure that the PyTorch model can eventually be converted to Core ML without losing flexibility in model training and customisation? How can I perform training in PyTorch and avoid being restricted to inference-only workflows as Core ML typically allows? Is it possible to use the training capabilities of PyTorch and still get the performance benefits of Apple's hardware for both training and inference? What are the best practices or tools to ensure that my training pipeline in PyTorch is compatible with Apple's hardware constraints and optimised for local execution? I'm seeking a practical, cloud-free approach on Apple Hardware only that allows me to train models in PyTorch (keeping control over the training process) while ensuring that they can be deployed efficiently using Core ML on Apple hardware.
1
0
973
Dec ’24