Accelerate

RSS for tag

Make large-scale mathematical computations and image calculations with high-performance, energy-efficient computation using Accelerate.

Posts under Accelerate tag

84 Posts

Post

Replies

Boosts

Views

Activity

ILP64 BLAS Interface from Accelerate?
Up until now, we have been using an optimized BLAS library for Intel processors. Now we are looking for replacements for Apple Silicon. I know that Accelerate provides such an interface. However, I haven't been able to find if it provides an ILP64, rather than LP64, interface, which is what we use on all 64-bit platforms. If it does, how do I access it? Thanks.
2
0
2.3k
Aug ’23
Implementation of some core functions of jax-metal
It appears that some of the jax core functions (in pjit, mlir) are not supported. Is this something to be supported in the future? For example, when I tested a diffrax example, from diffrax import diffeqsolve, ODETerm, Dopri5 import jax.numpy as jnp def f(t, y, args): return -y term = ODETerm(f) solver = Dopri5() y0 = jnp.array([2., 3.]) solution = diffeqsolve(term, solver, t0=0, t1=1, dt0=0.1, y0=y0) It generates an error saying EmitPythonCallback is not supported in metal. File ~/anaconda3/envs/jax-metal-0410/lib/python3.10/site-packages/jax/_src/interpreters/mlir.py:1787 in emit_python_callback raise ValueError( ValueError: `EmitPythonCallback` not supported on METAL backend. I uderstand that, currently, no M1 or M2 chips have multiple devices or can be arranged like that. Therefore, it may not be necessary to fully implement p*** functions (pmap, pjit, etc). But some powerful libraries use them. So, it would be great if at least some workaround for core functions are implemented. Or is there any easy fix for this?
0
1
1k
Jul ’23
When to use vImage, Metal Performance Shaders, or Core Image?
I've looked in multiple places online, including here in the forums where a somewhat similar question is asked (and never answered :( ) but i'm going to ask anyway: vImage, Metal Performance Shaders, and Core Image all have a big overlap in the kinds of operations they perform on image data. But none of supporting materials (documentation, WWDC session videos, help) ever seem to bother with paying much heed to even the existence of the others when talking about themselves. For example, Core Image talks about how efficient and fast it is. MPS talks about everything being "hand rolled" to be optimized for the hardware its running on. Which means yes, fast and efficient. and vImage talks about being fast and..yup, energy-saving. But I and other have very little to go on as to when vImage makes sense over MPS. Or Core Image. If I have a large set of images and I want to get the mean color value of each image and i want to equalize or adjust the histogram of each, or perform some other color operation on each in the set, for example, which is best? I hope someone from Apple -- preferably multiple people from the multiple teams that work on these multiple technologies -- can help clear some of this up?
3
0
1.9k
May ’23
Are any of the ML frameworks real-time safe for audio processing?
I'm working on an audio processing app and am creating an AVAudioUnit extension as a part of it. I need to train a small neural network in the app and use it to process audio in real-time in the AudioUnit. The network is mostly convolutions and is ideal for running on the GPU but it should run in real-time on the CPU. The problem that I'm currently facing is that none of the ML frameworks seem to be safe to use for inference within custom AVAudioUnit kernels. My understanding is that only C and C++ should be used in these kernels (in addition to the other rules of real-time computing). Objective-C and Swift are discouraged per the documentation. My background is primarily in ML so I'm newer to Apple development and especially new to real-time development in this ecosystem. I've investigated CoreML, MPS, BNNS/Accelerate, and MLCompute so far but I'm not certain that any of them are safe to use. Any feedback would be greatly appreciated!
0
1
1.7k
Feb ’23
cmake fails to generate executable file under macOS Monterey (v12.6.1), but manages under macOS v10.15.7
Hi, I have problems building an executable file for a simple disease-transmission model implemented in C++, using cmake under macOS Monterey (v12.6.1). When I build the executable file, I obtain the following error when I try running it: dyld[5281]: symbol not found in flat namespace (_cblas_caxpy) Abort trap: 6 The problem persists when I try to use XCode (v14.0.1) instead, resulting in the same error message. Interestingly, my friend is able to build (& run) the executable file under macOS v10.15.7 without any problems. Does anybody know what is going on here and how this issue can be resolved? The C++ project is publicly available on GitHub: https://github.com/AnnaMariaL/DengueSim Any help would be very much appreciated. Thanks! Anna
12
0
5.4k
Jan ’23
Crash in Accelerate framework when using Cholesky factorization
Hi. We are moving from MKL to Accelerate in order to accommodate the transition to apple silicon. However, we are occasionally getting crashes inside the Accelerate framework when factoring sparse matrices with the Cholesky decomposition. In the following link you can find an Xcode project with a minimal reproducible example: https://drive.google.com/file/d/1rHmJZbA5yc4-68Z1-vm7g3IvIPRgbN_c/view?usp=share_link I also pasted the code below. First, we load a sparse matrix from the hard drive. Then we factor it. When factoring the matrix with LDLT, everything works fine. When factoring it with Cholesky decomposition, the SparseFactor sometimes returns -3, sometimes it returns -1, and sometimes it crashes. The code below tries the factoring 100 times and always crashes. The matrix is positive definite, I checked it in another tool. It's smallest (algebraically) eigenvalue is at around 23000. Other matrices, such as the one in the 'good' file attached in the link above, doesn't cause crashes. Can someone please shed light onto what's causing the crash? What exactly in this matrix causes the crash, and is this a known issue in the Accelerate framework? #include <iostream> #include <fstream> #include <vector> #include <libgen.h> #include <Accelerate/Accelerate.h> int main(int argc, const char * argv[]) {          std::string filename = __FILE__;     std::string folder = dirname(const_cast<char*>(filename.c_str()));;     std::vector<std::string> files = {         folder  + "/m_good.bin",         folder  + "/m_bad.bin"     };               int GOOD = 0;     int BAD = 1;     std::string datafilename = files[BAD];     bool cholesky = true;          std::ifstream file(datafilename, std::ios::binary);     if (!file.is_open())     {         std::cout << " can't find file: " << datafilename  << "\n";         return 1;     }              SparseMatrix_Double _A;      // Read the data into a buffer      file.read((char*)&_A, sizeof(_A));          std::vector<double> _values;     auto size = _values.size();     file.read((char*)&size, sizeof(size));          _values.resize(size);     file.read((char*)&_values[0], size * sizeof(double));          std::vector<long> _column_starts;     file.read((char*)&size, sizeof(size));     _column_starts.resize(size);     file.read((char*)&_column_starts[0], size * sizeof(long));          std::vector<int> _row_indices;     file.read((char*)&size, sizeof(size));     _row_indices.resize(size);     file.read((char*)&_row_indices[0], size * sizeof(int));     file.close();     _A.data      = const_cast<double*>(&_values[0]);     _A.structure.columnStarts = &_column_starts[0];     _A.structure.rowIndices   = &_row_indices[0];          for (int i = 0; i < 100; i++)     {         std::cout << "\nRun: " << i << " ";         auto _LLT = SparseFactor(cholesky ? SparseFactorizationCholesky : SparseFactorizationLDLT, _A);                  std::cout << _LLT.status;         SparseCleanup(_LLT);                  if (_LLT.status != SparseStatusOK)         {             std::cout << " Failed!!";         }         std::cout << std::endl;          // Close the file     }          return 0; }
0
1
1.2k
Jan ’23
How to use scipy odeint function in Swift
Hi, I have python code which I need to migrate in swift. I stuck with odeint function, which integrate a system of ordinary differential equations. In code t is array of data which represents timeIntervalSince1970. from scipy.integrate import odeint def solve(self): y0 = [0., 0., 0., 0.] wsol = odeint(self.f, y0, t) return wsol def f(self, y, t): a, b, c, d = y // f return array of Doubles, after a lot of mathematic calculate f = [calculateValue0, calculateValue1, calculateValue2, calculateValue3] return f The recommendation is to use the Accelerate framework, but it is my first time to use them, and I don't see something similar for odeint...
0
0
1.5k
Dec ’22
In SwiftUI, how to manipulate sampling rate of a gesture in time?
If increasing sampling rate isn't possible, then is the only option for "continuous" drawing repeatedly adding a bezier curve from n-2 to n-1, that takes into account n-3 and n? Are there other (easy) options to interpolate between location n-2 and n-1 not only visually, but with knowing and storing all intermediate points? Think more bitmap, less vector. Below my code and a current "uneven" result struct ContentView: View {     @State var points: [CGPoint] = []     var body: some View {         Canvas { context, size in             for point in points{                 context.draw(Image(systemName: "circle"), at: point)             }         } .gesture(             DragGesture().onChanged{ value in                 points += [value.location]             }         )     } }
0
0
1.1k
Dec ’22
Swift performance with Accelerate when evaluating random binary trees
I am trying to move a program I currently have implemented in Python to Swift. Performance is critical as it is about a randomness-based search algorithm. What I need to do is to evaluate random binary trees where each non-leaf node represents a basic arithmetic or logical operation and each leaf-node represents a (large) vector of numbers. I managed to get Accelerate's BNNS functions to do the calculations but still they are slower than my (much simpler, Pandas-based) Python approach, which takes less than half the time on average and with similar circumstances. It would be great if someone could review my code and tell me whether there is any further potential for optimisation and/or any better approach. In below code I only cover the add (addition) operations but the others are very similar in structure. I also left out the part which generates the trees (and ensures they are "legit" in terms of consecutive operations) as I don't think that is particularly relevant here - happy to also add it however in case you think that is of help. import Accelerate enum NodeValue {   case add // Addition   case sub // Substraction   case mul // Multiplication   case div // Division   case sml // Smaller   case lrg // Larger   case met(columnIndex: Int) // This is a index of return_data and used for the leafs of the hierarchy tree } final class Node {   var value: NodeValue   var lhs: Node?   var rhs: Node?       init(value: NodeValue, lhs: Node?, rhs: Node?) {     self.value = value     self.lhs = lhs     self.rhs = rhs   }       func evaluate_signal(return_data: [Int: [Float16]]) -> [Any] {     // Determine output     switch self.value {     case .add:       let eval_left = lhs!.evaluate_signal(return_data: return_data) as! [Float16]       let eval_right = rhs!.evaluate_signal(return_data: return_data) as! [Float16]       let leftDescriptor = BNNSNDArrayDescriptor.allocate(initializingFrom: eval_left,                                 shape: .vector(eval_left.count))       let rightDescriptor = BNNSNDArrayDescriptor.allocate(initializingFrom: eval_right,                                  shape: .vector(eval_left.count))       let resultDescriptor = BNNSNDArrayDescriptor.allocateUninitialized(scalarType: Float16.self,                                         shape: .vector(eval_left.count))       let layer = BNNS.BinaryArithmeticLayer(inputA: leftDescriptor,                           inputADescriptorType: BNNS.DescriptorType.sample,                           inputB: rightDescriptor,                           inputBDescriptorType: BNNS.DescriptorType.sample,                           output: resultDescriptor,                           outputDescriptorType: BNNS.DescriptorType.sample,                           function: BNNS.ArithmeticBinaryFunction.add)       try! layer!.apply(batchSize: 1,                inputA: leftDescriptor,                inputB: rightDescriptor,                output: resultDescriptor)       let resultVector: [Float16] = resultDescriptor.makeArray(of: Float16.self)!       leftDescriptor.deallocate()       rightDescriptor.deallocate()       resultDescriptor.deallocate()       return resultVector     case .sub:       // Similar code for substraction     case .mul:       // Similar code for multiplication     case .div:       // Similar code for division     case .sml:       // Similar code for comparison if smaller     case .lrg:       // Similar code for comparison if larger     case .met(let columnIndex):       return return_data[columnIndex]!     }   } }
2
0
1.3k
Dec ’22
R painfully slow on Air M1 - Big Sur
I have bought a new Air with M1 Chip last week. It is Big Sur version 11.2.3. My code on RStudio is extremely slow, it takes around 7 minutes on this new laptop. I have tried to use R (rather than RStudio), and the same happens. I've checked it with my sister's Air (MacOS Mojave 10.14.6), and it takes only seconds to run the same code. What would be the reason that my 1-week-old laptop is very slow to run the R code? And what would be the solutions? Any help is so appreciated!
7
0
6.2k
Dec ’22
Optimising initialisation of big arrays with random data
I will be filling audio and video buffers with randomly distributed data for each frame in real time. Initializing these arrays with Floats inside basic for loop somehow seems naive. Are there any optimised methods for this task in iOS libraries? I was looking for data-science oriented framework from Apple, did not found one, but maybe Accelerate, Metal, or CoreML are good candidates to research? Is my thinking correct, and if so, can you guide me?
1
0
1.4k
Dec ’22
Swift performance - efficient calculation of boolean logical operations
I need to work with large Double and Boolean arrays / vectors and apply simple operations on them. E.g. additions, subtractions, multiplications, smaller, larger etc. on the Doubles and AND, OR, NOT etc. on the Bool ones. While I have found vDSP from Accelerate quite performant for the simple arithmetic ones, the larger/smaller as well as the logical ones are very slow, which probably has to do with the fact that I use the map function to apply these. Is there any better way to do this more efficient? Based on some things I have been reading pointers might help, but not sure really how I would need to apply it here in the context of zip and map. See below some code examples: import Accelerate let myDoubleArray1: [Double] = Array<Double>(repeating: 1.123, count: 1000000) let myDoubleArray2: [Double] = Array<Double>(repeating: 2.123, count: 1000000) let myBoolArray1: [Bool] = Array<Bool>(repeating: false, count: 1000000) let myBoolArray2: [Bool] = Array<Bool>(repeating: true, count: 1000000) _ = vDSP.multiply(myDoubleArray1, myDoubleArray2) // Takes about 0.5sec - very good _ = zip(myDoubleArray1, myDoubleArray2).map {$0 > $1} // Takes about 7sec - too slow _ = zip(myBoolArray1, myBoolArray2).map {$0 && $1} // Takes about 7sec - too slow _ = zip(myBoolArray1, myBoolArray2).map {$0 == $1} // Takes about 7sec - too slow _ = myBoolArray1.map {!$0} // Takes about 7sec - too slow
4
0
2.1k
Nov ’22
Floating point exception trapping on M1
I have written a simple test c++ program (below) that takes the square root of a negative number and then tries to print it out. I would like to trap the floating point exception caused by taking the square root of a negative number (e.g., I'd like the program to halt with an error after the floating point exception). On Intel Macs, I know how to do this. Is this possible on an Apple Silicon Mac? #include <cmath> #include <iostream> int main() { const double x = -1.0; double y = x; y = sqrt(y); // floating point exception...possible to build program so it terminates here? std::cout << y << "\n"; return 0; }
6
0
4.7k
Oct ’22
Apparent errors in single precision BLAS in -framework accelerate
There appear to be errors in the return types for some single precision BLAS functions in the Apple -framework accelerate library. These errors exist for both intel and arm64 hardware. Here is a small fortran program that demonstrates these errors: program sblas    ! test some single-precision blas results.    implicit none    real :: x(2)=[3.,4.], y(2)=[1.,1.]    complex :: w(2)=[(4.,3.),(3.,4.)], z(2)=[(5.,6.),(7.,8.)]    real, external :: sdot, sdsdot, snrm2, scnrm2, sasum, scasum    complex, external :: cdotu, cdotc    character(*), parameter :: cfmt='(*(g0.4,1x))'    write(*,cfmt) 'sdot=',   sdot(2,x,1,y,1),       'should be 7.000'    write(*,cfmt) 'sdsdot=', sdsdot(2,0.0,x,1,y,1), 'should be 7.000'    write(*,cfmt) 'snrm2=',  snrm2(2,x,1),          'should be 5.000'    write(*,cfmt) 'scnrm2=', scnrm2(2,w,1),         'should be 7.071'    write(*,cfmt) 'sasum=',  sasum(2,x,1),          'should be 7.000'    write(*,cfmt) 'scasum=', scasum(2,w,1),         'should be 14.00'    write(*,cfmt) 'cdotu=',  cdotu(2,w,1,z,1),      'should be -9.000 91.00'    write(*,cfmt) 'cdotc=',  cdotc(2,w,1,z,1),      'should be 91.00 5.000' end program sblas The correct output is: $ ifort -L${MKLROOT}/lib -Wl,-rpath,${MKLROOT}/lib -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread sblas.f90 &amp;&amp; a.out sdot= 7.000 should be 7.000 sdsdot= 7.000 should be 7.000 snrm2= 5.000 should be 5.000 scnrm2= 7.071 should be 7.071 sasum= 7.000 should be 7.000 scasum= 14.00 should be 14.00 cdotu= -9.000 91.00 should be -9.000 91.00 cdotc= 91.00 5.000 should be 91.00 5.000 With the Apple -framework accelerate library, I get $ ifort -framework accelerate sblas.f90 &amp;&amp; a.out sdot= .000 should be 7.000 sdsdot= .000 should be 7.000 snrm2= .000 should be 5.000 scnrm2= .000 should be 7.071 sasum= .000 should be 7.000 scasum= .000 should be 14.00 cdotu= -9.000 91.00 should be -9.000 91.00 cdotc= 91.00 5.000 should be 91.00 5.000 The REAL results are incorrect, while the single precision COMPLEX results are alright. Some experimentation reveals that the problem is that the function return values are REAL(8) rather than the correct REAL. If I try gfortran instead of ifort, I get: $ gfortran -framework accelerate sblas.f90 &amp;&amp; a.out sdot= 0.000 should be 7.000 sdsdot= 0.000 should be 7.000 snrm2= 0.000 should be 5.000 scnrm2= 0.000 should be 7.071 sasum= 0.000 should be 7.000 scasum= 0.000 should be 14.00 Program received signal SIGSEGV: Segmentation fault - invalid memory reference. Backtrace for this error: #0  0x10af498be #1  0x10af48a9d #2  0x7fff207ced7c #3  0x7fff2105dfc8 #4  0x10af37bc1 #5  0x10af37d7e Segmentation fault: 11 Here, not even the single precision COMPLEX results are returned correctly. Presumably, the accelerate library passes its regression tests. This implies that the regression tests have the incorrect return types declared for these functions. Thus to correct this error, both the library and its regression tests must be corrected together.
1
0
1.1k
Oct ’22
vImageScale_Planar8 crashing on orientation change in iOS 16
We are calling function vImageScale_Planar8 to downsample an image. On iOS 16, this function is crashing when camera orientation changes (something is changing underlying memory representation of CVPixelBuffer object). On orientation change, we are setting output AVCaptureConnection objects' orientation property. On iOS 15, the same code works perfectly.
1
1
1.5k
Oct ’22
Help with showing a spectrogram of an audio file
Hello everyone! I am trying to create a spectrogram like in the attached image for a macOs App. I am using Cocoa/AppKit but also have some SwiftUI views, so I can use either. I have found the sample app Visualizing Sound as an Audio Spectogram that apple provides but I do not want a real-time spectrogram. I want a spectrogram of the whole audio file. I have been trying to convert the sample app to what I need but I have been unsuccessful so far. Here is how I changed the code in the delegate public func captureBuffer() { let asset = AVAsset(url: audioFileUrl) let reader = try! AVAssetReader(asset: asset) let track = asset.tracks(withMediaType: AVMediaType.audio)[0] let settings = [ AVFormatIDKey : kAudioFormatLinearPCM ] let readerOutput = AVAssetReaderTrackOutput(track: track, outputSettings: settings) reader.add(readerOutput) reader.startReading() while let buffer = readerOutput.copyNextSampleBuffer() {   var audioBufferList = AudioBufferList(mNumberBuffers: 1, mBuffers: AudioBuffer(mNumberChannels: 0, mDataByteSize: 0, mData: nil))   var blockBuffer: CMBlockBuffer?   CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer( buffer, bufferListSizeNeededOut: nil, bufferListOut: &audioBufferList, bufferListSize:  MemoryLayout<AudioBufferList>.size, blockBufferAllocator: nil, blockBufferMemoryAllocator: nil, flags: kCMSampleBufferFlag_AudioBufferList_Assure16ByteAlignment, blockBufferOut: &blockBuffer   ); let buffers = UnsafeBufferPointer<AudioBuffer>(start: &audioBufferList.mBuffers, count: Int(audioBufferList.mNumberBuffers)) for buffer in buffers { let samplesCount = Int(buffer.mDataByteSize) / MemoryLayout<Int16>.size let samplesPointer = audioBufferList.mBuffers.mData!.bindMemory(to: Int16.self, capacity: samplesCount) let samples = UnsafeMutableBufferPointer<Int16>(start: samplesPointer, count: samplesCount) } guard let data = audioBufferList.mBuffers.mData else { return } /// The _Nyquist frequency_ is the highest frequency that a sampled system can properly /// reproduce and is half the sampling rate of such a system. Although  this app doesn't use /// `nyquistFrequency` you may find this code useful to add an overlay to the user interface. if nyquistFrequency == nil { let duration = Float(CMSampleBufferGetDuration(buffer).value) let timescale = Float(CMSampleBufferGetDuration(buffer).timescale) let numsamples = Float(CMSampleBufferGetNumSamples(buffer)) nyquistFrequency = 0.5 / (duration / timescale / numsamples) } if self.rawAudioData.count < AudioSpectrogram.sampleCount * 2 { let actualSampleCount = CMSampleBufferGetNumSamples(buffer) let ptr = data.bindMemory(to: Int16.self, capacity: actualSampleCount) let buf = UnsafeBufferPointer(start: ptr, count: actualSampleCount) rawAudioData.append(contentsOf: Array(buf)) } while self.rawAudioData.count >= AudioSpectrogram.sampleCount { let dataToProcess = Array(self.rawAudioData[0 ..< AudioSpectrogram.sampleCount]) self.rawAudioData.removeFirst(AudioSpectrogram.hopCount) self.processData(values: dataToProcess) } createAudioSpectrogram() } } } I am sure there are different or better ways to go about this, but the only examples I can find are on iOS and use UIKit, but I am building for MacOs. Does anyone know how display a spectrogram for an audio file without having to play the audio file? I dont mind using sox or ffmpeg if that is easier. Greatly appreciated!
2
0
1.7k
Oct ’22
ILP64 BLAS Interface from Accelerate?
Up until now, we have been using an optimized BLAS library for Intel processors. Now we are looking for replacements for Apple Silicon. I know that Accelerate provides such an interface. However, I haven't been able to find if it provides an ILP64, rather than LP64, interface, which is what we use on all 64-bit platforms. If it does, how do I access it? Thanks.
Replies
2
Boosts
0
Views
2.3k
Activity
Aug ’23
FCPR on Apple Silicon
Hi All, I would like to know if there are any C APIs to control the Floating-Point Control Register (FPCR) on Apple Silicon? The ARM documentation does not show any C APIs for doing this. The only example code looks like VHDL, so I was wondering if any developers here knew of any. Thanks
Replies
1
Boosts
0
Views
2k
Activity
Aug ’23
Implementation of some core functions of jax-metal
It appears that some of the jax core functions (in pjit, mlir) are not supported. Is this something to be supported in the future? For example, when I tested a diffrax example, from diffrax import diffeqsolve, ODETerm, Dopri5 import jax.numpy as jnp def f(t, y, args): return -y term = ODETerm(f) solver = Dopri5() y0 = jnp.array([2., 3.]) solution = diffeqsolve(term, solver, t0=0, t1=1, dt0=0.1, y0=y0) It generates an error saying EmitPythonCallback is not supported in metal. File ~/anaconda3/envs/jax-metal-0410/lib/python3.10/site-packages/jax/_src/interpreters/mlir.py:1787 in emit_python_callback raise ValueError( ValueError: `EmitPythonCallback` not supported on METAL backend. I uderstand that, currently, no M1 or M2 chips have multiple devices or can be arranged like that. Therefore, it may not be necessary to fully implement p*** functions (pmap, pjit, etc). But some powerful libraries use them. So, it would be great if at least some workaround for core functions are implemented. Or is there any easy fix for this?
Replies
0
Boosts
1
Views
1k
Activity
Jul ’23
aarch64 intrinsics
Many useful ARM intrinsics (such as fma, rng, ld64b, etc.) are described in Arm C Language Extensions. But the arm_acle.h header file shipped with Xcode not include them. Are these intrinsics supported by Apple Silicon chip?
Replies
1
Boosts
0
Views
2.1k
Activity
May ’23
When to use vImage, Metal Performance Shaders, or Core Image?
I've looked in multiple places online, including here in the forums where a somewhat similar question is asked (and never answered :( ) but i'm going to ask anyway: vImage, Metal Performance Shaders, and Core Image all have a big overlap in the kinds of operations they perform on image data. But none of supporting materials (documentation, WWDC session videos, help) ever seem to bother with paying much heed to even the existence of the others when talking about themselves. For example, Core Image talks about how efficient and fast it is. MPS talks about everything being "hand rolled" to be optimized for the hardware its running on. Which means yes, fast and efficient. and vImage talks about being fast and..yup, energy-saving. But I and other have very little to go on as to when vImage makes sense over MPS. Or Core Image. If I have a large set of images and I want to get the mean color value of each image and i want to equalize or adjust the histogram of each, or perform some other color operation on each in the set, for example, which is best? I hope someone from Apple -- preferably multiple people from the multiple teams that work on these multiple technologies -- can help clear some of this up?
Replies
3
Boosts
0
Views
1.9k
Activity
May ’23
Are any of the ML frameworks real-time safe for audio processing?
I'm working on an audio processing app and am creating an AVAudioUnit extension as a part of it. I need to train a small neural network in the app and use it to process audio in real-time in the AudioUnit. The network is mostly convolutions and is ideal for running on the GPU but it should run in real-time on the CPU. The problem that I'm currently facing is that none of the ML frameworks seem to be safe to use for inference within custom AVAudioUnit kernels. My understanding is that only C and C++ should be used in these kernels (in addition to the other rules of real-time computing). Objective-C and Swift are discouraged per the documentation. My background is primarily in ML so I'm newer to Apple development and especially new to real-time development in this ecosystem. I've investigated CoreML, MPS, BNNS/Accelerate, and MLCompute so far but I'm not certain that any of them are safe to use. Any feedback would be greatly appreciated!
Replies
0
Boosts
1
Views
1.7k
Activity
Feb ’23
cmake fails to generate executable file under macOS Monterey (v12.6.1), but manages under macOS v10.15.7
Hi, I have problems building an executable file for a simple disease-transmission model implemented in C++, using cmake under macOS Monterey (v12.6.1). When I build the executable file, I obtain the following error when I try running it: dyld[5281]: symbol not found in flat namespace (_cblas_caxpy) Abort trap: 6 The problem persists when I try to use XCode (v14.0.1) instead, resulting in the same error message. Interestingly, my friend is able to build (&amp; run) the executable file under macOS v10.15.7 without any problems. Does anybody know what is going on here and how this issue can be resolved? The C++ project is publicly available on GitHub: https://github.com/AnnaMariaL/DengueSim Any help would be very much appreciated. Thanks! Anna
Replies
12
Boosts
0
Views
5.4k
Activity
Jan ’23
Crash in Accelerate framework when using Cholesky factorization
Hi. We are moving from MKL to Accelerate in order to accommodate the transition to apple silicon. However, we are occasionally getting crashes inside the Accelerate framework when factoring sparse matrices with the Cholesky decomposition. In the following link you can find an Xcode project with a minimal reproducible example: https://drive.google.com/file/d/1rHmJZbA5yc4-68Z1-vm7g3IvIPRgbN_c/view?usp=share_link I also pasted the code below. First, we load a sparse matrix from the hard drive. Then we factor it. When factoring the matrix with LDLT, everything works fine. When factoring it with Cholesky decomposition, the SparseFactor sometimes returns -3, sometimes it returns -1, and sometimes it crashes. The code below tries the factoring 100 times and always crashes. The matrix is positive definite, I checked it in another tool. It's smallest (algebraically) eigenvalue is at around 23000. Other matrices, such as the one in the 'good' file attached in the link above, doesn't cause crashes. Can someone please shed light onto what's causing the crash? What exactly in this matrix causes the crash, and is this a known issue in the Accelerate framework? #include <iostream> #include <fstream> #include <vector> #include <libgen.h> #include <Accelerate/Accelerate.h> int main(int argc, const char * argv[]) {          std::string filename = __FILE__;     std::string folder = dirname(const_cast<char*>(filename.c_str()));;     std::vector<std::string> files = {         folder  + "/m_good.bin",         folder  + "/m_bad.bin"     };               int GOOD = 0;     int BAD = 1;     std::string datafilename = files[BAD];     bool cholesky = true;          std::ifstream file(datafilename, std::ios::binary);     if (!file.is_open())     {         std::cout << " can't find file: " << datafilename  << "\n";         return 1;     }              SparseMatrix_Double _A;      // Read the data into a buffer      file.read((char*)&_A, sizeof(_A));          std::vector<double> _values;     auto size = _values.size();     file.read((char*)&size, sizeof(size));          _values.resize(size);     file.read((char*)&_values[0], size * sizeof(double));          std::vector<long> _column_starts;     file.read((char*)&size, sizeof(size));     _column_starts.resize(size);     file.read((char*)&_column_starts[0], size * sizeof(long));          std::vector<int> _row_indices;     file.read((char*)&size, sizeof(size));     _row_indices.resize(size);     file.read((char*)&_row_indices[0], size * sizeof(int));     file.close();     _A.data      = const_cast<double*>(&_values[0]);     _A.structure.columnStarts = &_column_starts[0];     _A.structure.rowIndices   = &_row_indices[0];          for (int i = 0; i < 100; i++)     {         std::cout << "\nRun: " << i << " ";         auto _LLT = SparseFactor(cholesky ? SparseFactorizationCholesky : SparseFactorizationLDLT, _A);                  std::cout << _LLT.status;         SparseCleanup(_LLT);                  if (_LLT.status != SparseStatusOK)         {             std::cout << " Failed!!";         }         std::cout << std::endl;          // Close the file     }          return 0; }
Replies
0
Boosts
1
Views
1.2k
Activity
Jan ’23
How to use scipy odeint function in Swift
Hi, I have python code which I need to migrate in swift. I stuck with odeint function, which integrate a system of ordinary differential equations. In code t is array of data which represents timeIntervalSince1970. from scipy.integrate import odeint def solve(self): y0 = [0., 0., 0., 0.] wsol = odeint(self.f, y0, t) return wsol def f(self, y, t): a, b, c, d = y // f return array of Doubles, after a lot of mathematic calculate f = [calculateValue0, calculateValue1, calculateValue2, calculateValue3] return f The recommendation is to use the Accelerate framework, but it is my first time to use them, and I don't see something similar for odeint...
Replies
0
Boosts
0
Views
1.5k
Activity
Dec ’22
In SwiftUI, how to manipulate sampling rate of a gesture in time?
If increasing sampling rate isn't possible, then is the only option for "continuous" drawing repeatedly adding a bezier curve from n-2 to n-1, that takes into account n-3 and n? Are there other (easy) options to interpolate between location n-2 and n-1 not only visually, but with knowing and storing all intermediate points? Think more bitmap, less vector. Below my code and a current "uneven" result struct ContentView: View {     @State var points: [CGPoint] = []     var body: some View {         Canvas { context, size in             for point in points{                 context.draw(Image(systemName: "circle"), at: point)             }         } .gesture(             DragGesture().onChanged{ value in                 points += [value.location]             }         )     } }
Replies
0
Boosts
0
Views
1.1k
Activity
Dec ’22
Swift performance with Accelerate when evaluating random binary trees
I am trying to move a program I currently have implemented in Python to Swift. Performance is critical as it is about a randomness-based search algorithm. What I need to do is to evaluate random binary trees where each non-leaf node represents a basic arithmetic or logical operation and each leaf-node represents a (large) vector of numbers. I managed to get Accelerate's BNNS functions to do the calculations but still they are slower than my (much simpler, Pandas-based) Python approach, which takes less than half the time on average and with similar circumstances. It would be great if someone could review my code and tell me whether there is any further potential for optimisation and/or any better approach. In below code I only cover the add (addition) operations but the others are very similar in structure. I also left out the part which generates the trees (and ensures they are "legit" in terms of consecutive operations) as I don't think that is particularly relevant here - happy to also add it however in case you think that is of help. import Accelerate enum NodeValue {   case add // Addition   case sub // Substraction   case mul // Multiplication   case div // Division   case sml // Smaller   case lrg // Larger   case met(columnIndex: Int) // This is a index of return_data and used for the leafs of the hierarchy tree } final class Node {   var value: NodeValue   var lhs: Node?   var rhs: Node?       init(value: NodeValue, lhs: Node?, rhs: Node?) {     self.value = value     self.lhs = lhs     self.rhs = rhs   }       func evaluate_signal(return_data: [Int: [Float16]]) -> [Any] {     // Determine output     switch self.value {     case .add:       let eval_left = lhs!.evaluate_signal(return_data: return_data) as! [Float16]       let eval_right = rhs!.evaluate_signal(return_data: return_data) as! [Float16]       let leftDescriptor = BNNSNDArrayDescriptor.allocate(initializingFrom: eval_left,                                 shape: .vector(eval_left.count))       let rightDescriptor = BNNSNDArrayDescriptor.allocate(initializingFrom: eval_right,                                  shape: .vector(eval_left.count))       let resultDescriptor = BNNSNDArrayDescriptor.allocateUninitialized(scalarType: Float16.self,                                         shape: .vector(eval_left.count))       let layer = BNNS.BinaryArithmeticLayer(inputA: leftDescriptor,                           inputADescriptorType: BNNS.DescriptorType.sample,                           inputB: rightDescriptor,                           inputBDescriptorType: BNNS.DescriptorType.sample,                           output: resultDescriptor,                           outputDescriptorType: BNNS.DescriptorType.sample,                           function: BNNS.ArithmeticBinaryFunction.add)       try! layer!.apply(batchSize: 1,                inputA: leftDescriptor,                inputB: rightDescriptor,                output: resultDescriptor)       let resultVector: [Float16] = resultDescriptor.makeArray(of: Float16.self)!       leftDescriptor.deallocate()       rightDescriptor.deallocate()       resultDescriptor.deallocate()       return resultVector     case .sub:       // Similar code for substraction     case .mul:       // Similar code for multiplication     case .div:       // Similar code for division     case .sml:       // Similar code for comparison if smaller     case .lrg:       // Similar code for comparison if larger     case .met(let columnIndex):       return return_data[columnIndex]!     }   } }
Replies
2
Boosts
0
Views
1.3k
Activity
Dec ’22
R painfully slow on Air M1 - Big Sur
I have bought a new Air with M1 Chip last week. It is Big Sur version 11.2.3. My code on RStudio is extremely slow, it takes around 7 minutes on this new laptop. I have tried to use R (rather than RStudio), and the same happens. I've checked it with my sister's Air (MacOS Mojave 10.14.6), and it takes only seconds to run the same code. What would be the reason that my 1-week-old laptop is very slow to run the R code? And what would be the solutions? Any help is so appreciated!
Replies
7
Boosts
0
Views
6.2k
Activity
Dec ’22
Optimising initialisation of big arrays with random data
I will be filling audio and video buffers with randomly distributed data for each frame in real time. Initializing these arrays with Floats inside basic for loop somehow seems naive. Are there any optimised methods for this task in iOS libraries? I was looking for data-science oriented framework from Apple, did not found one, but maybe Accelerate, Metal, or CoreML are good candidates to research? Is my thinking correct, and if so, can you guide me?
Replies
1
Boosts
0
Views
1.4k
Activity
Dec ’22
Swift performance - efficient calculation of boolean logical operations
I need to work with large Double and Boolean arrays / vectors and apply simple operations on them. E.g. additions, subtractions, multiplications, smaller, larger etc. on the Doubles and AND, OR, NOT etc. on the Bool ones. While I have found vDSP from Accelerate quite performant for the simple arithmetic ones, the larger/smaller as well as the logical ones are very slow, which probably has to do with the fact that I use the map function to apply these. Is there any better way to do this more efficient? Based on some things I have been reading pointers might help, but not sure really how I would need to apply it here in the context of zip and map. See below some code examples: import Accelerate let myDoubleArray1: [Double] = Array<Double>(repeating: 1.123, count: 1000000) let myDoubleArray2: [Double] = Array<Double>(repeating: 2.123, count: 1000000) let myBoolArray1: [Bool] = Array<Bool>(repeating: false, count: 1000000) let myBoolArray2: [Bool] = Array<Bool>(repeating: true, count: 1000000) _ = vDSP.multiply(myDoubleArray1, myDoubleArray2) // Takes about 0.5sec - very good _ = zip(myDoubleArray1, myDoubleArray2).map {$0 > $1} // Takes about 7sec - too slow _ = zip(myBoolArray1, myBoolArray2).map {$0 && $1} // Takes about 7sec - too slow _ = zip(myBoolArray1, myBoolArray2).map {$0 == $1} // Takes about 7sec - too slow _ = myBoolArray1.map {!$0} // Takes about 7sec - too slow
Replies
4
Boosts
0
Views
2.1k
Activity
Nov ’22
Is there a way encode/decode portion of a huge Image?
if there is a huge image(10000x10000), loading it into memory will cause crash every time. so, can I load a portion of image into memory and process it, and write back this part to image file?
Replies
0
Boosts
0
Views
1.2k
Activity
Oct ’22
Floating point exception trapping on M1
I have written a simple test c++ program (below) that takes the square root of a negative number and then tries to print it out. I would like to trap the floating point exception caused by taking the square root of a negative number (e.g., I'd like the program to halt with an error after the floating point exception). On Intel Macs, I know how to do this. Is this possible on an Apple Silicon Mac? #include <cmath> #include <iostream> int main() { const double x = -1.0; double y = x; y = sqrt(y); // floating point exception...possible to build program so it terminates here? std::cout << y << "\n"; return 0; }
Replies
6
Boosts
0
Views
4.7k
Activity
Oct ’22
Apparent errors in single precision BLAS in -framework accelerate
There appear to be errors in the return types for some single precision BLAS functions in the Apple -framework accelerate library. These errors exist for both intel and arm64 hardware. Here is a small fortran program that demonstrates these errors: program sblas    ! test some single-precision blas results.    implicit none    real :: x(2)=[3.,4.], y(2)=[1.,1.]    complex :: w(2)=[(4.,3.),(3.,4.)], z(2)=[(5.,6.),(7.,8.)]    real, external :: sdot, sdsdot, snrm2, scnrm2, sasum, scasum    complex, external :: cdotu, cdotc    character(*), parameter :: cfmt='(*(g0.4,1x))'    write(*,cfmt) 'sdot=',   sdot(2,x,1,y,1),       'should be 7.000'    write(*,cfmt) 'sdsdot=', sdsdot(2,0.0,x,1,y,1), 'should be 7.000'    write(*,cfmt) 'snrm2=',  snrm2(2,x,1),          'should be 5.000'    write(*,cfmt) 'scnrm2=', scnrm2(2,w,1),         'should be 7.071'    write(*,cfmt) 'sasum=',  sasum(2,x,1),          'should be 7.000'    write(*,cfmt) 'scasum=', scasum(2,w,1),         'should be 14.00'    write(*,cfmt) 'cdotu=',  cdotu(2,w,1,z,1),      'should be -9.000 91.00'    write(*,cfmt) 'cdotc=',  cdotc(2,w,1,z,1),      'should be 91.00 5.000' end program sblas The correct output is: $ ifort -L${MKLROOT}/lib -Wl,-rpath,${MKLROOT}/lib -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread sblas.f90 &amp;&amp; a.out sdot= 7.000 should be 7.000 sdsdot= 7.000 should be 7.000 snrm2= 5.000 should be 5.000 scnrm2= 7.071 should be 7.071 sasum= 7.000 should be 7.000 scasum= 14.00 should be 14.00 cdotu= -9.000 91.00 should be -9.000 91.00 cdotc= 91.00 5.000 should be 91.00 5.000 With the Apple -framework accelerate library, I get $ ifort -framework accelerate sblas.f90 &amp;&amp; a.out sdot= .000 should be 7.000 sdsdot= .000 should be 7.000 snrm2= .000 should be 5.000 scnrm2= .000 should be 7.071 sasum= .000 should be 7.000 scasum= .000 should be 14.00 cdotu= -9.000 91.00 should be -9.000 91.00 cdotc= 91.00 5.000 should be 91.00 5.000 The REAL results are incorrect, while the single precision COMPLEX results are alright. Some experimentation reveals that the problem is that the function return values are REAL(8) rather than the correct REAL. If I try gfortran instead of ifort, I get: $ gfortran -framework accelerate sblas.f90 &amp;&amp; a.out sdot= 0.000 should be 7.000 sdsdot= 0.000 should be 7.000 snrm2= 0.000 should be 5.000 scnrm2= 0.000 should be 7.071 sasum= 0.000 should be 7.000 scasum= 0.000 should be 14.00 Program received signal SIGSEGV: Segmentation fault - invalid memory reference. Backtrace for this error: #0  0x10af498be #1  0x10af48a9d #2  0x7fff207ced7c #3  0x7fff2105dfc8 #4  0x10af37bc1 #5  0x10af37d7e Segmentation fault: 11 Here, not even the single precision COMPLEX results are returned correctly. Presumably, the accelerate library passes its regression tests. This implies that the regression tests have the incorrect return types declared for these functions. Thus to correct this error, both the library and its regression tests must be corrected together.
Replies
1
Boosts
0
Views
1.1k
Activity
Oct ’22
vImageScale_Planar8 crashing on orientation change in iOS 16
We are calling function vImageScale_Planar8 to downsample an image. On iOS 16, this function is crashing when camera orientation changes (something is changing underlying memory representation of CVPixelBuffer object). On orientation change, we are setting output AVCaptureConnection objects' orientation property. On iOS 15, the same code works perfectly.
Replies
1
Boosts
1
Views
1.5k
Activity
Oct ’22
Accelerate: "vImageFloodFill_" functions missing from iOS 16 SDK
The various "flood fill" functions from the vImage library (e.g. vImageFloodFill_Planar8 - listed here) are missing from Xcode 14.0.1. Is there a way to access them? Ex: import Accelerate // --> Cannot find 'vImageFloodFill_Planar8' in scope let err = vImageFloodFill_Planar8( &buffer, nil, 100, 100, 128, 8, 0) )
Replies
1
Boosts
0
Views
1.1k
Activity
Oct ’22
Help with showing a spectrogram of an audio file
Hello everyone! I am trying to create a spectrogram like in the attached image for a macOs App. I am using Cocoa/AppKit but also have some SwiftUI views, so I can use either. I have found the sample app Visualizing Sound as an Audio Spectogram that apple provides but I do not want a real-time spectrogram. I want a spectrogram of the whole audio file. I have been trying to convert the sample app to what I need but I have been unsuccessful so far. Here is how I changed the code in the delegate public func captureBuffer() { let asset = AVAsset(url: audioFileUrl) let reader = try! AVAssetReader(asset: asset) let track = asset.tracks(withMediaType: AVMediaType.audio)[0] let settings = [ AVFormatIDKey : kAudioFormatLinearPCM ] let readerOutput = AVAssetReaderTrackOutput(track: track, outputSettings: settings) reader.add(readerOutput) reader.startReading() while let buffer = readerOutput.copyNextSampleBuffer() {   var audioBufferList = AudioBufferList(mNumberBuffers: 1, mBuffers: AudioBuffer(mNumberChannels: 0, mDataByteSize: 0, mData: nil))   var blockBuffer: CMBlockBuffer?   CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer( buffer, bufferListSizeNeededOut: nil, bufferListOut: &audioBufferList, bufferListSize:  MemoryLayout<AudioBufferList>.size, blockBufferAllocator: nil, blockBufferMemoryAllocator: nil, flags: kCMSampleBufferFlag_AudioBufferList_Assure16ByteAlignment, blockBufferOut: &blockBuffer   ); let buffers = UnsafeBufferPointer<AudioBuffer>(start: &audioBufferList.mBuffers, count: Int(audioBufferList.mNumberBuffers)) for buffer in buffers { let samplesCount = Int(buffer.mDataByteSize) / MemoryLayout<Int16>.size let samplesPointer = audioBufferList.mBuffers.mData!.bindMemory(to: Int16.self, capacity: samplesCount) let samples = UnsafeMutableBufferPointer<Int16>(start: samplesPointer, count: samplesCount) } guard let data = audioBufferList.mBuffers.mData else { return } /// The _Nyquist frequency_ is the highest frequency that a sampled system can properly /// reproduce and is half the sampling rate of such a system. Although  this app doesn't use /// `nyquistFrequency` you may find this code useful to add an overlay to the user interface. if nyquistFrequency == nil { let duration = Float(CMSampleBufferGetDuration(buffer).value) let timescale = Float(CMSampleBufferGetDuration(buffer).timescale) let numsamples = Float(CMSampleBufferGetNumSamples(buffer)) nyquistFrequency = 0.5 / (duration / timescale / numsamples) } if self.rawAudioData.count < AudioSpectrogram.sampleCount * 2 { let actualSampleCount = CMSampleBufferGetNumSamples(buffer) let ptr = data.bindMemory(to: Int16.self, capacity: actualSampleCount) let buf = UnsafeBufferPointer(start: ptr, count: actualSampleCount) rawAudioData.append(contentsOf: Array(buf)) } while self.rawAudioData.count >= AudioSpectrogram.sampleCount { let dataToProcess = Array(self.rawAudioData[0 ..< AudioSpectrogram.sampleCount]) self.rawAudioData.removeFirst(AudioSpectrogram.hopCount) self.processData(values: dataToProcess) } createAudioSpectrogram() } } } I am sure there are different or better ways to go about this, but the only examples I can find are on iOS and use UIKit, but I am building for MacOs. Does anyone know how display a spectrogram for an audio file without having to play the audio file? I dont mind using sox or ffmpeg if that is easier. Greatly appreciated!
Replies
2
Boosts
0
Views
1.7k
Activity
Oct ’22