Accelerate

RSS for tag

Make large-scale mathematical computations and image calculations with high-performance, energy-efficient computation using Accelerate.

Accelerate Documentation

Posts under Accelerate tag

13 Posts
Sort by:
Post not yet marked as solved
1 Replies
266 Views
Hi! We're trying to calculate the delay between two audio inputs, represented by float arrays, by getting their maximum correlation, using vDSP_conv. Our solution is very similar to the one in the first answer here, only we are looking at a 0..5000 radius to find the delay in ms: https://stackoverflow.com/questions/65571299/swift-read-two-audio-files-and-calculate-their-cross-correlation The problem is that we had mixed results, sometimes the calculated delay is ok, but other times it isn't. Our best guess is that there is some overflow error happening, since the arrays we're working with can be pretty large (they can have around 4-5 million values). If we use a simple foreach to calculate these correlations we get good results, but obviously the is quite slow. Did anyone have similar problems?
Posted Last updated
.
Post not yet marked as solved
4 Replies
258 Views
Hi. I want to implement the code below using vDSP. for i in a.indices { a[i] = n[i] == 0.0 ? 0.0 : b[i] / n[i] } This code is slow. Are there any good implementation using Accelerate framework?
Posted
by y_ich.
Last updated
.
Post not yet marked as solved
0 Replies
493 Views
I am asking this more in hope than expectation, but would greatly appreciate any help or suggestions (with apologies for a rather lengthy post). The problem I have with my existing OpenCL code is, quite simply, that I am unable to get it to build in Xcode (I have always used Xcode without problems in the past). So my question, quite simply, is: Can anyone advise how to configure and use Xcode in order to successfully build OpenCL code for Apple Silicon? Background: Having just received a shiny new M3 MacBook Pro, I would really like to try out one or two of my GPU programs. They were all written several years using OpenCL, before Apple decided to give up on it in favour of Metal. (In fact, I have since converted one of them to use CUDA, but that is not useful here.) Now, I completely understand that the right thing to do is to convert them to use Metal directly, and will do this when I have time, but I suspect that it will take me several days, if not weeks (I have never had reason to use Metal until now, so I will also have to learn how to convert my code; there are quite a few kernels). I don’t have time to do that at the moment. Meanwhile, I would very much like to try the programs right now, using OpenCL, simply to find out how they run on Apple Silicon (I have previously only used them on older, Intel Macs with AMD GPUs). It would be great to see my code running on the M3’s GPU! The reasons I think this must still be possible are (a) there are plenty of Geekbench OpenCL results for the M3 chips; and (b) I have managed to compile and run a really trivial OpenCL program (but only using clang from the command line; I have been unable to work out how to compile individual .cl files containing OpenCL kernels). The problem I am getting is that, having cloned one of my sets of programs into Xcode on my new M3 Mac, I am unable to get any of the kernels even to build. The failure I’m getting is that Xcode is trying to run a version of openclc in the directory /System/Library/Frameworks/OpenCL.framework/Libraries/, which gives the error condition Bad CPU type in executable when Xcode tries to use it. It seems that this is an x86_64 version of openclc. There is a universal binary version in /System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/Libraries/, but I have been unable to find a way to configure (or force ….) Xcode to use that one. It may well be, of course, that if I manage to get past this problem, another one will present itself. Nonetheless, if any of you can suggest anything that I might try, I would be most grateful. One secondary question, if I may: Using openclc to compile a .cl file (containing a kernel) from the command line, is there a parameter (e.g. a value to specify with -arch) or combination of parameters that will cause it to produce a .bc file for an Apple Silicon GPU and also the .cl.h header file that has to be #included in the C or C++ code that will dispatch the kernel? Thanks …. Andrew PS. I’ve also posted this question on MacRumors, because there seem to be quite a number of people there who understand Apple Silicon, but I rather suspect there’s a better chance of getting getting the help I need here ….
Posted
by Kronsteen.
Last updated
.
Post not yet marked as solved
2 Replies
489 Views
I Instrument's CPU Profiling tool I've noticed that a significant portion (22.5%) of the CPU-side overhead related to MPS matrix multiplication (GEMM) is in a call to getenv(). Please see attached screenshot. It seems unnecessary to perform this same check over and over, as whatever hack that needs this should be able to perform the getenv() only once and cache the result for future use.
Posted
by jacobgorm.
Last updated
.
Post not yet marked as solved
3 Replies
541 Views
Hello I'm using functions from the Accelerate framework in my app as mentioned in this developer documentation: https://developer.apple.com/documentation/accelerate/solving_systems_of_linear_equations_with_lapack I've built the app and tested it and I get no errors, but when I try to upload to the App Store Connect I get the error: he app references non-public symbols in Payload/***.app/Frameworks/Matrix.framework/Matrix: _dgeev$NEWLAPACK$ILP64, _dposv$NEWLAPACK$ILP64, _dsyev$NEWLAPACK$ILP64, _dsysv$NEWLAPACK$ILP64 Please advise on how to resolve this issue. Thank you.
Posted
by Arvand.
Last updated
.
Post not yet marked as solved
1 Replies
433 Views
I have CVPixelBuffers in YCbCr (422 or 420 biplanar 10 bit video range) coming from camera. I see vImage framework is sophisticated enough to handle a variety of image formats (including pixel buffers in various YCbCr formats). I was looking to compute histograms for both Y (luma) and RGB. For the 8 bit YCbCr samples, I could use this code to compute histogram of Y component. CVPixelBufferLockBaseAddress(pixelBuffer, .readOnly) let bytesPerRow = CVPixelBufferGetBytesPerRowOfPlane(pixelBuffer, 0) let baseAddress = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 0) let height = CVPixelBufferGetHeightOfPlane(pixelBuffer, 0) let width = CVPixelBufferGetWidthOfPlane(pixelBuffer, 0) CVPixelBufferUnlockBaseAddress(pixelBuffer, .readOnly) var buffer = vImage_Buffer(data: baseAddress, height: vImagePixelCount( height), width: vImagePixelCount(width), rowBytes: bytesPerRow) alphaBin.withUnsafeMutableBufferPointer { alphaPtr in let error = vImageHistogramCalculation_Planar8(&buffer, alphaPtr.baseAddress!, UInt32(kvImageNoFlags)) guard error == kvImageNoError else { fatalError("Error calculating histogram luma: \(error)") } } How does one implement the same for 10 bit HDR pixel buffers, preferably using new iOS 16 vImage APIs that provide lot more flexibility (for instance, getting RGB histogram as well from YCbCr sample without explicitly performing pixel format conversion)?
Posted Last updated
.
Post not yet marked as solved
1 Replies
1.7k Views
Project is based on python3.8 and 3.9, containing some C and C++ source How can I do parallel computing on CPU and GPU of M1max In deed, I buy Mac m1max for the strong GPU to do quantitative finance, for which the speed is extremely important. Unfortunately, cuda is not compatible with Mac. Show me how to do it, thx. Are Accelerate(for CPU) and Metal(for GPU) can speed up any source by building like this: Step 1: download source from github Step 2: create a file named "site.cfg"in this souce file, and add content: [accelerate] libraries=Metal, Acelerate, vecLib Step 3: Terminal: NPY_LAPACK_Order=accelerate python3 setup.py build Step 4: pip3 install . or python3 setup.py install ? (I am not sure which method to apply) 2、how is the compatibility of such method? I need speed up numpy, pandas and even a open souce project, such as https://github.com/microsoft/qlib 3、just show me the code 4、when compiling C++, C source, a lot of errors were reported, which gcc and g++ to choose? the default gcc installed by brew is 4.2.1, which cannot work. and I even tried to download gcc from the offical website of ARM, still cannot work. give me a hint. thx so much urgent
Posted
by jefftang.
Last updated
.
Post not yet marked as solved
1 Replies
888 Views
Hello, In one of my apps, I'm trying to modify the pixel buffer from a ProRAW capture to then write the modified DNG. This is what I try to do: After capturing a ProRAW photo, I work in the delegate function func photoOutput(_ output: AVCapturePhotoOutput, didFinishProcessingPhoto photo: AVCapturePhoto, error: Error?) { ... } In here I can access the photo.pixelBuffer and get its base address: guard let buffer = photo.pixelBuffer else { return } CVPixelBufferLockBaseAddress(buffer, []) let pixelFormat = CVPixelBufferGetPixelFormatType(buffer) // I check that the pixel format corresponds with ProRAW . This is successful, the code enters the if block if (pixelFormat == kCVPixelFormatType_64RGBALE) { guard let pointer = CVPixelBufferGetBaseAddress(buffer) else { return } // We have 16bits per component, 4 components let count = CVPixelBufferGetWidth(buffer) * CVPixelBufferGetHeight(buffer) * 4 let mutable = pointer.bindMemory(to: UInt16.self, capacity: count) // As a test, I want to replace all pixels with 65000 to get a white image let finalBufferArray : [Float] = Array.init(repeating: 65000, count: count) vDSP_vfixu16(finalBufferArray, 1, mutable, 1, vDSP_Length(finalBufferArray.count)) // I create an vImage Pixel buffer. Note that I'm referencing the photo.pixelBuffer to be sure that I modified the underlying pixelBuffer of the AVCapturePhoto object let imageBuffer = vImage.PixelBuffer<vImage.Interleaved16Ux4>(referencing: photo.pixelBuffer!, planeIndex: 0) // Inspect the CGImage let cgImageFormat = vImage_CGImageFormat(bitsPerComponent: 16, bitsPerPixel: 64, colorSpace: CGColorSpace(name: CGColorSpace.displayP3)!, bitmapInfo: CGBitmapInfo(rawValue: CGImageAlphaInfo.last.rawValue | CGBitmapInfo.byteOrder16Little.rawValue))! let cgImage = imageBuffer.makeCGImage(cgImageFormat: cgImageFormat)! // I send the CGImage to the main view controller. This is successful, I can see a white image when rendering the CGImage into a UIImage. This lets me think that I successfully modified the photo.pixelBuffer firingFrameDelegate?.didSendCGImage(image: cgImage) } // Now I try to write data. Unfortunately, this does not work. The photo.fileDataRepresentation() writes the data corresponding to the original, unmodified pixelBuffer `if let photoData = photo.fileDataRepresentation() { // Sending the data to the view controller and rendering it in the UIImage displays the original photo, not the modified pixelBuffer firingFrameDelegate?.didSendData(data: photoData) thisPhotoData = photoData }` CVPixelBufferUnlockBaseAddress(buffer, []) The same happens if I try to write the data to disk. The DNG file displays the original photo and not the data corresponding to the modified photo.pixelBuffer. Do you know why this code should not work? Do you have any ideas on how I can modify the ProRAW pixel buffer so that I can write the modified buffer into a DNG file? My goal is to write a modified file, so, I'm not sure I can use CoreImage of vImage to output a ProRAW file.
Posted
by salvo_89.
Last updated
.
Post marked as solved
2 Replies
1.3k Views
Up until now, we have been using an optimized BLAS library for Intel processors. Now we are looking for replacements for Apple Silicon. I know that Accelerate provides such an interface. However, I haven't been able to find if it provides an ILP64, rather than LP64, interface, which is what we use on all 64-bit platforms. If it does, how do I access it? Thanks.
Posted
by iseggev.
Last updated
.
Post not yet marked as solved
1 Replies
1.5k Views
Hi All, I would like to know if there are any C APIs to control the Floating-Point Control Register (FPCR) on Apple Silicon? The ARM documentation does not show any C APIs for doing this. The only example code looks like VHDL, so I was wondering if any developers here knew of any. Thanks
Posted
by jamescjoy.
Last updated
.
Post not yet marked as solved
0 Replies
590 Views
It appears that some of the jax core functions (in pjit, mlir) are not supported. Is this something to be supported in the future? For example, when I tested a diffrax example, from diffrax import diffeqsolve, ODETerm, Dopri5 import jax.numpy as jnp def f(t, y, args): return -y term = ODETerm(f) solver = Dopri5() y0 = jnp.array([2., 3.]) solution = diffeqsolve(term, solver, t0=0, t1=1, dt0=0.1, y0=y0) It generates an error saying EmitPythonCallback is not supported in metal. File ~/anaconda3/envs/jax-metal-0410/lib/python3.10/site-packages/jax/_src/interpreters/mlir.py:1787 in emit_python_callback raise ValueError( ValueError: `EmitPythonCallback` not supported on METAL backend. I uderstand that, currently, no M1 or M2 chips have multiple devices or can be arranged like that. Therefore, it may not be necessary to fully implement p*** functions (pmap, pjit, etc). But some powerful libraries use them. So, it would be great if at least some workaround for core functions are implemented. Or is there any easy fix for this?
Posted
by sungsoo.
Last updated
.
Post not yet marked as solved
1 Replies
1.1k Views
Many useful ARM intrinsics (such as fma, rng, ld64b, etc.) are described in Arm C Language Extensions. But the arm_acle.h header file shipped with Xcode not include them. Are these intrinsics supported by Apple Silicon chip?
Posted Last updated
.
Post not yet marked as solved
3 Replies
1.1k Views
I've looked in multiple places online, including here in the forums where a somewhat similar question is asked (and never answered :( ) but i'm going to ask anyway: vImage, Metal Performance Shaders, and Core Image all have a big overlap in the kinds of operations they perform on image data. But none of supporting materials (documentation, WWDC session videos, help) ever seem to bother with paying much heed to even the existence of the others when talking about themselves. For example, Core Image talks about how efficient and fast it is. MPS talks about everything being "hand rolled" to be optimized for the hardware its running on. Which means yes, fast and efficient. and vImage talks about being fast and..yup, energy-saving. But I and other have very little to go on as to when vImage makes sense over MPS. Or Core Image. If I have a large set of images and I want to get the mean color value of each image and i want to equalize or adjust the histogram of each, or perform some other color operation on each in the set, for example, which is best? I hope someone from Apple -- preferably multiple people from the multiple teams that work on these multiple technologies -- can help clear some of this up?
Posted Last updated
.