hi,
I am currently running LSTM on TensorFlow. However, when i switched from keras2 to keras3. code running time has increased 10 times -- it seems there is no GPU acceleration.
Here is my code:
batch size = 256
optimiser = adam
activation = tanh
_______________________________________________
Layer (type) Output Shape Param #
=============================================
input_1 (InputLayer) [(None, 7, 16)] 0
bidirectional (Bidirection (None, 7, 320) 226560
al)
bidirectional_1 (Bidirecti (None, 7, 512) 1181696
onal)
bidirectional_2 (Bidirecti (None, 256) 656384
onal)
dense (Dense) (None, 1) 257
==============================================
Total params: 2064897 (7.88 MB)
Trainable params: 2064897 (7.88 MB)
Non-trainable params: 0 (0.00 Byte)
______________________________________________
This is keras 3.6.0 + tensorflow 2.17.0 + tensorflow-metal 1.1.0 training status:
Training------------
Epoch 1/200
28/681 ━━━━━━━━━━━━━━━━━━━━ 8:13 756ms/step - loss: 0.5901 - mape: 338.6876 - mse: 0.8591
This is keras 2.14.0 + tensorflow 2.14.0 + tensorflow-metal 1.1.0 training status:
Training------------
Epoch 1/200
681/681 [==============================] - 37s 49ms/step - loss: 3.6345 - mape: 499038.7500 - mse: 34.4148 - val_loss: 3.5452 - val_mape: 41.7964 - val_mse: 32.0133 - lr: 0.0010
Is that because keras3 has no GPU support on macos?
Apart from that, if I change LSTM activation from tanh to sigmoid in keras2, it does not have GPU support as well.
My system is 15.0.1 and the code was running on python3.11
I am not sure why these happen.
Thanks
General
RSS for tagExplore the power of machine learning within apps. Discuss integrating machine learning features, share best practices, and explore the possibilities for your app.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
what am I not understanding here.
in short the view loads text from the jsons descriptions and then should filter out the words. and return and display a list of most used words, debugging shows words being identified by the code but does not filter them out
private func loadWordCounts() {
DispatchQueue.global(qos: .background).async {
let fileManager = FileManager.default
guard let documentsDirectory = try? fileManager.url(for: .documentDirectory, in: .userDomainMask, appropriateFor: nil, create: false) else { return }
let descriptions = loadDescriptions(fileManager: fileManager, documentsDirectory: documentsDirectory)
var counts = countWords(in: descriptions)
let tagsToRemove: Set<NLTag> = [
.verb,
.pronoun,
.determiner,
.particle,
.preposition,
.conjunction,
.interjection,
.classifier
]
for (word, _) in counts {
let tagger = NLTagger(tagSchemes: [.lexicalClass])
tagger.string = word
let (tag, _) = tagger.tag(at: word.startIndex, unit: .word, scheme: .lexicalClass)
if let unwrappedTag = tag, tagsToRemove.contains(unwrappedTag) {
counts[word] = 0
}
}
DispatchQueue.main.async {
self.wordCounts = counts
}
}
}
What are the major differences in review process for AI based apps vis a vis normal apps for Apple store?
Where can I find an example of using this MPSGraph function? I'm trying to use it to paste an image into a larger canvas at certain coordinates.
func sliceUpdateDataTensor(
_ dataTensor: MPSGraphTensor,
update updateTensor: MPSGraphTensor,
starts: [NSNumber],
ends: [NSNumber],
strides: [NSNumber],
startMask: UInt32,
endMask: UInt32,
squeezeMask: UInt32,
name: String?
) -> MPSGraphTensor
Hi all,
When executing an HLO program using the JAX metal PJRT plugin, the program fails due to an unsupported data type returned by the rng_bit_generator operation.
The generated HLO includes:
%output_state, %output = "mhlo.rng_bit_generator"(%1) <{rng_algorithm = #mhlo.rng_algorithm<PHILOX>}> : (tensor<3xi64>) -> (tensor<3xi64>, tensor<3xui32>)
The error message indicates that:
Metal only supports MPSDataTypeFloat16, MPSDataTypeBFloat16, MPSDataTypeFloat32, MPSDataTypeInt32, and MPSDataTypeInt64.
The use of ui32 seems to be incompatible with Metal’s allowed types.
I’m trying to understand if the ui32 output is the problem or maybe the use of rng_bit_generator is wrong.
Could you clarify if there is a workaround or planned support for ui32 output in this context? Alternatively, guidance on configuring rng_bit_generator for compatibility with Metal’s supported types would be greatly appreciated.
Submited as : FB16052050
I am looking to adopt Machine Learning in a more granular manner, going beyond just using pre-built Metal, Core ML, or Create ML approaches. Specifically, I want to train models using Open Python PyTorch libraries, as these offer greater flexibility compared to Apple's native tools. However, these PyTorch APIs are primarily optimised for NVIDIA GPUs (or TPUs), not Apple's M3 or Apple Neural Engine (ANE).
My goal is to train the models locally without resorting to cloud-based solutions for training or inference, and to then convert the models into Core ML format for deployment on Apple hardware. This would allow me to leverage Apple's hardware acceleration (via ANE, Metal, and MPS) while maintaining control over the training process in PyTorch.
I want to know:
What are my options for training models in PyTorch on local hardware (Apple M3 or equivalent), and how can I ensure that the PyTorch model can eventually be converted to Core ML without losing flexibility in model training and customisation?
How can I perform training in PyTorch and avoid being restricted to inference-only workflows as Core ML typically allows? Is it possible to use the training capabilities of PyTorch and still get the performance benefits of Apple's hardware for both training and inference?
What are the best practices or tools to ensure that my training pipeline in PyTorch is compatible with Apple's hardware constraints and optimised for local execution?
I'm seeking a practical, cloud-free approach on Apple Hardware only that allows me to train models in PyTorch (keeping control over the training process) while ensuring that they can be deployed efficiently using Core ML on Apple hardware.
Hello all,
I'm working on a project that involves listening to a person speak off of a script and I want to stop then restart the recognitionTask between sections so I don't run afoul of keeping the recognitionTask running for longer than it needs to. Also, I'd like to be able to flush the current input between sections so the input from the previous section doesn't roll over into the next one.
This is based on the sample code for SFSpeechRecognizer so there's a chance I might be misunderstanding something.
private func restartRecording() {
let inputNode = audioEngine.inputNode
audioEngine.stop()
inputNode.removeTap(onBus: 0)
recognitionRequest?.endAudio()
recordingStarted = false
recognitionTask?.cancel()
do {
try startRecording()
} catch {
print("Oopsie.")
}
}
Here's my code. When I run it, the recognition task doesn't restart. Any ideas?
I'm trying to build llama.cpp, a popular tool for running LLMs locally on macos15.1.1 (24B91) Sonoma using cmake but am encountering errors. Here is the stack overflow post regarding the issue:
https://stackoverflow.com/questions/79304015/cmake-unable-to-find-foundation-framework-on-macos-15-1-1-24b91?noredirect=1#comment139853319_79304015
I am attempting to install Tensorflow on my M1 and I seem to be unable to find the correct matching versions of jax, jaxlib and numpy to make it all work.
I am in Bash, because the default shell gave me issues.
I downgraded to python 3.10, because with 3.13, I could not do anything right.
Current actions:
bash-3.2$ python3.10 -m venv ~/venv-metal
bash-3.2$ python --version
Python 3.10.16
python3.10 -m venv ~/venv-metal
source ~/venv-metal/bin/activate
python -m pip install -U pip
python -m pip install tensorflow-macos
And here, I keep running tnto errors like:
(venv-metal):~$ pip install tensorflow-macos tensorflow-metal
ERROR: Could not find a version that satisfies the requirement tensorflow-macos (from versions: none)
ERROR: No matching distribution found for tensorflow-macos
What is wrong here?
How can I fix that?
It seems like the system wants to use the x86 version of python ... which can't be right.
Hi everyone,
I'm working with VNFeaturePrintObservation in Swift to compute the similarity between images. The computeDistance function allows me to calculate the distance between two images, and I want to cluster similar images based on these distances.
Current Approach
Right now, I'm using a brute-force approach where I compare every image against every other image in the dataset. This results in an O(n^2) complexity, which quickly becomes a bottleneck. With 5000 images, it takes around 10 seconds to complete, which is too slow for my use case.
Question
Are there any efficient algorithms or data structures I can use to improve performance?
If anyone has experience with optimizing feature vector clustering or has suggestions on how to scale this efficiently, I'd really appreciate your insights. Thanks!
Hello, I am thinking of buying the MacBook Pro 14" with M4 Pro for ML/AI/ NLP tasks mostly. And since I have only used Windows before, I am wandering if it is compatible with libraries like "Pytorch" and "TensorFlow" etc., or people have experienced problems in installation... Thank you!
Topic:
Machine Learning & AI
SubTopic:
General
I'm implementing an LLM with Metal Performance Shader Graph, but encountered a very strange behavior, occasionally, the model will report an error message as this:
LLVM ERROR: SmallVector unable to grow. Requested capacity (9223372036854775808) is larger than maximum value for size type (4294967295)
and crash, the stack backtrace screenshot is attached. Note that 5th frame is
mlir::getIntValues<long long>
and 6th frame is
llvm::SmallVectorBase<unsigned int>::grow_pod
It looks like mlir mistakenly took a 64 bit value for a 32 bit type. Unfortunately, I could not found the source code of
mlir::getIntValues, maybe it's Apple's closed source fork of llvm for MPS implementation? Anyway, any opinion or suggestion on that?
Topic:
Machine Learning & AI
SubTopic:
General
In an under-development MacOS & iOS app, I need to identify various measurements from OCR'ed text: length, weight, counts per inch, area, percentage. The unit type (e.g. UnitLength) needs to be identified as well as the measurement's unit (e.g. .inches) in order to convert the measurement to the app's internal standard (e.g. centimetres), the value of which is stored the relevant CoreData entity.
The use of NLTagger and NLTokenizer is problematic because of the various representations of the measurements: e.g. "50g.", "50 g", "50 grams", "1 3/4 oz."
Currently, I use a bespoke algorithm based on String contains and step-wise evaluation of characters, which is reasonably accurate but requires frequent updating as further representations are detected.
I'm aware of the Python SpaCy model being capable of NER Measurement recognition, but am reluctant to incorporate a Python-based solution into a production app. (ref [https://developer.apple.com/forums/thread/30092])
My preference is for an open-source NER Measurement model that can be used as, or converted to, some form of a Swift compatible Machine Learning model. Does anyone know of such a model?
Incident Identifier: 4C22F586-71FB-4644-B823-A4B52D158057
CrashReporter Key: adc89b7506c09c2a6b3a9099cc85531bdaba9156
Hardware Model: Mac16,10
Process: PRISMLensCore [16561]
Path: /Applications/PRISMLens.app/Contents/Resources/app.asar.unpacked/node_modules/core-node/PRISMLensCore.app/PRISMLensCore
Identifier: com.prismlive.camstudio
Version: (null) ((null))
Code Type: ARM-64
Parent Process: ? [16560]
Date/Time: (null)
OS Version: macOS 15.4 (24E5228e)
Report Version: 104
Exception Type: EXC_CRASH (SIGABRT)
Exception Codes: 0x00000000 at 0x0000000000000000
Crashed Thread: 34
Application Specific Information:
*** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: '*** -[__NSArrayM insertObject:atIndex:]: object cannot be nil'
Thread 34 Crashed:
0 CoreFoundation 0x000000018ba4dde4 0x18b960000 + 974308 (__exceptionPreprocess + 164)
1 libobjc.A.dylib 0x000000018b512b60 0x18b4f8000 + 109408 (objc_exception_throw + 88)
2 CoreFoundation 0x000000018b97e69c 0x18b960000 + 124572 (-[__NSArrayM insertObject:atIndex:] + 1276)
3 Portrait 0x0000000257e16a94 0x257da3000 + 473748 (-[PTMSRResize addAdditionalOutput:] + 604)
4 Portrait 0x0000000257de91c0 0x257da3000 + 287168 (-[PTEffectRenderer initWithDescriptor:metalContext:useHighResNetwork:faceAttributesNetwork:humanDetections:prevTemporalState:asyncInitQueue:sharedResources:] + 6204)
5 Portrait 0x0000000257dab21c 0x257da3000 + 33308 (__33-[PTEffect updateEffectDelegate:]_block_invoke.241 + 164)
6 libdispatch.dylib 0x000000018b739b2c 0x18b738000 + 6956 (_dispatch_call_block_and_release + 32)
7 libdispatch.dylib 0x000000018b75385c 0x18b738000 + 112732 (_dispatch_client_callout + 16)
8 libdispatch.dylib 0x000000018b742350 0x18b738000 + 41808 (_dispatch_lane_serial_drain + 740)
9 libdispatch.dylib 0x000000018b742e2c 0x18b738000 + 44588 (_dispatch_lane_invoke + 388)
10 libdispatch.dylib 0x000000018b74d264 0x18b738000 + 86628 (_dispatch_root_queue_drain_deferred_wlh + 292)
11 libdispatch.dylib 0x000000018b74cae8 0x18b738000 + 84712 (_dispatch_workloop_worker_thread + 540)
12 libsystem_pthread.dylib 0x000000018b8ede64 0x18b8eb000 + 11876 (_pthread_wqthread + 292)
13 libsystem_pthread.dylib 0x000000018b8ecb74 0x18b8eb000 + 7028 (start_wqthread + 8)
Topic:
Machine Learning & AI
SubTopic:
General
Hi,
I'm testing DockKit with a very simple setup:
I use VNDetectFaceRectanglesRequest to detect a face and then call dockAccessory.track(...) using the detected bounding box.
The stand is correctly docked (state == .docked) and dockAccessory is valid.
I'm calling .track(...) with a single observation and valid CameraInformation (including size, device, orientation, etc.). No errors are thrown.
To monitor this, I added a logging utility – track(...) is being called 10–30 times per second, as recommended in the documentation.
However: the stand does not move at all.
There is no visible reaction to the tracking calls.
Is there anything I'm missing or doing wrong?
Is VNDetectFaceRectanglesRequest supported for DockKit tracking, or are there hidden requirements?
Would really appreciate any help or pointers – thanks!
That's my complete code:
extension VideoFeedViewController: AVCaptureVideoDataOutputSampleBufferDelegate {
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
guard let frame = CMSampleBufferGetImageBuffer(sampleBuffer) else {
return
}
detectFace(image: frame)
func detectFace(image: CVPixelBuffer) {
let faceDetectionRequest = VNDetectFaceRectanglesRequest() { vnRequest, error in
guard let results = vnRequest.results as? [VNFaceObservation] else {
return
}
guard let observation = results.first else {
return
}
let boundingBoxHeight = observation.boundingBox.size.height * 100
#if canImport(DockKit)
if let dockAccessory = self.dockAccessory {
Task {
try? await trackRider(
observation.boundingBox,
dockAccessory,
frame,
sampleBuffer
)
}
}
#endif
}
let imageResultHandler = VNImageRequestHandler(cvPixelBuffer: image, orientation: .up)
try? imageResultHandler.perform([faceDetectionRequest])
func combineBoundingBoxes(_ box1: CGRect, _ box2: CGRect) -> CGRect {
let minX = min(box1.minX, box2.minX)
let minY = min(box1.minY, box2.minY)
let maxX = max(box1.maxX, box2.maxX)
let maxY = max(box1.maxY, box2.maxY)
let combinedWidth = maxX - minX
let combinedHeight = maxY - minY
return CGRect(x: minX, y: minY, width: combinedWidth, height: combinedHeight)
}
#if canImport(DockKit)
func trackObservation(_ boundingBox: CGRect, _ dockAccessory: DockAccessory, _ pixelBuffer: CVPixelBuffer, _ cmSampelBuffer: CMSampleBuffer) throws {
// Zähle den Aufruf
TrackMonitor.shared.trackCalled()
let invertedBoundingBox = CGRect(
x: boundingBox.origin.x,
y: 1.0 - boundingBox.origin.y - boundingBox.height,
width: boundingBox.width,
height: boundingBox.height
)
guard let device = captureDevice else {
fatalError("Kamera nicht verfügbar")
}
let size = CGSize(width: Double(CVPixelBufferGetWidth(pixelBuffer)),
height: Double(CVPixelBufferGetHeight(pixelBuffer)))
var cameraIntrinsics: matrix_float3x3? = nil
if let cameraIntrinsicsUnwrapped = CMGetAttachment(
sampleBuffer,
key: kCMSampleBufferAttachmentKey_CameraIntrinsicMatrix,
attachmentModeOut: nil
) as? Data {
cameraIntrinsics = cameraIntrinsicsUnwrapped.withUnsafeBytes { $0.load(as: matrix_float3x3.self) }
}
Task {
let orientation = getCameraOrientation()
let cameraInfo = DockAccessory.CameraInformation(
captureDevice: device.deviceType,
cameraPosition: device.position,
orientation: orientation,
cameraIntrinsics: cameraIntrinsics,
referenceDimensions: size
)
let observation = DockAccessory.Observation(
identifier: 0,
type: .object,
rect: invertedBoundingBox
)
let observations = [observation]
guard let image = CMSampleBufferGetImageBuffer(sampleBuffer) else {
print("no image")
return
}
do {
try await dockAccessory.track(observations, cameraInformation: cameraInfo)
} catch {
print(error)
}
}
}
#endif
func clearDrawings() {
boundingBoxLayer?.removeFromSuperlayer()
boundingBoxSizeLayer?.removeFromSuperlayer()
}
}
}
}
@MainActor
private func getCameraOrientation() -> DockAccessory.CameraOrientation {
switch UIDevice.current.orientation {
case .portrait:
return .portrait
case .portraitUpsideDown:
return .portraitUpsideDown
case .landscapeRight:
return .landscapeRight
case .landscapeLeft:
return .landscapeLeft
case .faceDown:
return .faceDown
case .faceUp:
return .faceUp
default:
return .corrected
}
}
I have seen inconsistent results for my Colab machine learning notebooks running locally on a Mac M4, compared to running the same notebook code on either T4 (in Colab) or a RTX3090 locally.
To illustrate the problems I have set up a notebook that implements two simple CNN models that solves the Fashion-MNIST problem. https://colab.research.google.com/drive/11BhtHhN079-BWqv9QvvcSD9U4mlVSocB?usp=sharing
For the good model with 2M parameters I get the following results:
T4 (Colab, JAX): Test accuracy: 0.925
3090 (Local PC via ssh tunnel, Jax): Test accuracy: 0.925
Mac M4 (Local, JAX): Test accuracy: 0.893
Mac M4 (Local, Tensorflow): Test accuracy: 0.893
That is, I see a significant drop in performance when I run on the Mac M4 compared to the NVIDIA machines, and it seems to be independent of backend. I however do not know how to pinpoint this to either Keras or Apple’s METAL implementation. I have reported this to Keras: https://colab.research.google.com/drive/11BhtHhN079-BWqv9QvvcSD9U4mlVSocB?usp=sharing but as this can be (likely is?) an Apple Metal issue, I wanted to report this here as well.
On the mac I am running the following Python libraries:
keras 3.9.1
tensorflow 2.19.0
tensorflow-metal 1.2.0
jax 0.5.3
jax-metal 0.1.1
jaxlib 0.5.3
Topic:
Machine Learning & AI
SubTopic:
General
I'm developing a tennis ball tracking feature using Vision Framework in Swift, specifically utilizing VNDetectedObjectObservation and VNTrackObjectRequest.
Occasionally (but not always), I receive the following runtime error:
Failed to perform SequenceRequest: Error Domain=com.apple.Vision Code=9 "Internal error: unexpected tracked object bounding box size" UserInfo={NSLocalizedDescription=Internal error: unexpected tracked object bounding box size}
From my investigation, I suspect the issue arises when the bounding box from the initial observation (VNDetectedObjectObservation) is too small. However, Apple's documentation doesn't clearly define the minimum bounding box size that's considered valid by VNTrackObjectRequest.
Could someone clarify:
What is the minimum acceptable bounding box width and height (normalized) that Vision Framework's VNTrackObjectRequest expects?
Is there any recommended practice or official guidance for bounding box size validation before creating a tracking request?
This information would be extremely helpful to reliably avoid this internal error.
Thank you!
*I can't put the attached file in the format, so if you reply by e-mail, I will send the attached file by e-mail.
Dear Apple AI Research Team,
My name is Gong Jiho (“Hem”), a content strategist based in Seoul, South Korea.
Over the past few months, I conducted a user-led AI experiment entirely within ChatGPT — no code, no backend tools, no plugins.
Through language alone, I created two contrasting agents (Uju and Zero) and guided them into a co-authored modular identity system using prompt-driven dialogue and reflection.
This system simulates persona fusion, memory rooting, and emotional-logical alignment — all via interface-level interaction.
I believe it resonates with Apple’s values in privacy-respecting personalization, emotional UX modeling, and on-device learning architecture.
Why I’m Reaching Out
I’d be honored to share this experiment with your team.
If there is any interest in discussing user-authored agent scaffolding, identity persistence, or affective alignment, I’d love to contribute — even informally.
⚠ A Note on Language
As a non-native English speaker, my expression may be imperfect — but my intent is genuine.
If anything is unclear, I’ll gladly clarify.
📎 Attached Files Summary
Filename → Description
Hem_MultiAI_Report_AppleAI_v20250501.pdf →
Main report tailored for Apple AI — narrative + structural view of emotional identity formation via prompt scaffolding
Hem_MasterPersonaProfile_v20250501.json →
Final merged identity schema authored by Uju and Zero
zero_sync_final.json / uju_sync_final.json →
Persona-level memory structures (logic / emotion)
1_0501.json ~ 3_0501.json →
Evolution logs of the agents over time
GirlfriendGPT_feedback_summary.txt →
Emotional interpretation by external GPT
hem_profile_for_AI_vFinal.json →
Original user anchor profile
Warm regards,
Gong Jiho (“Hem”)
Seoul, South Korea
Bear with me, please. Please make sure a highly skilled technical person reads and understands this.
I want to describe my vision for (AI/Algorithmically) Optimised Operating Systems. To explain it properly, I will describe the process to build it (pseudo).
Required Knowledge (no particular order): Processor Logic Circuits, LLM models, LLM tool usage, Python OO coding, Procedural vs OO, NLP fuzzy matching, benchmarking, canvas/artefacts/dynamic HTML interfaces, concepts of how AI models are vastly compressed and miniaturised forms of full data, Algorithmic vs AI.
First, take all OO Python code (example) on GitHub (example), then separate each function from each object into its own procedure (procedural logic) by making a logical procedural list of actions to perform only that function based on its entire dependency chain (i.e. all other objects it relies on). Relate all compiled functions using (for example) fuzzy matching on the name, or AI-based functional profiling to get multiple instances of each function.
Starting with the most used function, test each one against the others that perform the same task for bugs and completeness. Determine the fastest, most optimal version of that function (and every function). Add a single instance of each most optimal function to the centralised tool codebase, which will later be utilised by the language models. This ensures we rely only on the most optimised function for each and every use case — with every program using one shared instance of that function instead of compiling it separately.
In this way, we optimise all existing (Python OO) code based on maximum availability of certain resources. Use the optimised code to teach the AI how to map each user task directly to optimised code (i.e. each user-facing function in unoptimised program code to its optimised version). Later, teach the codebase how to optimise these functions directly to the logic circuits on the processor — as an Operating System — using essentially the same process. Eventually, use the most optimised form for tasks via a dynamic canvas/artefacts UI interface generated by the Start Menu using the language model.
Allow custom interfaces to emerge dynamically from the available functions based on language understanding. Grow and optimise the algorithmic codebase until the entire operating system — with versioning — evolves based on the AI-generated user interfaces (compressing them into new versions of the tools codebase).
Please don’t give up until you understand this concept completely and nothing in the description is beyond your full comprehension. It’s a highly significant step forward.
Hello. I am willing to hire game developer for cards game called baloot. My question is Can the developer implement an AI when the computer is playing and the computer on the same time the conputer improves his rises level without any interaction?
🌹
Topic:
Machine Learning & AI
SubTopic:
General