Accelerate

How to make AppleArchive + ZLIB compatible with non-Apple systems?

I very much love the performance of AppleArchive and how approachable it is, and believe it to be one of the most underrated frameworks in the SDK. In a scenario quite typical, I need to compress files and submit them to a back end, where the server handling the files is not an Apple platform. Obviously, individual files compressed with AA will not be compatible with other systems out of the box, but there are compatible compression algorithms. ZLIB is recommended for cases where cross-platform compatibility is necessary. As I understand it, AA adds additional headers to files in order to support preservation of file attributes, ownership and other data. Following the steps outlined in the docs, I've written code to compress single files. I can easily compress and decompress using AA without issue. To create a proof-of-concept, I've written some code in python using its zlib module. In order to get to the compressed data, it's necessary to handle the AA header fields. The first 64 bytes of a compressed file appear as follows: AA documentation states that ZLIB Level 5 compression is used, and comes in the form of raw DEFLATE data prefixed with two header bytes. In this case, these bytes are 78 5e, which begin at the 28th byte and appear as x^ above. My hope was that seeking to the start of the compressed data, then passing what remains to a decompressor object initialized with the correct WBITS would work. It works fantastically for files 1MB or less in size. Files which are larger only decompress the first megabyte. The decompressor object is reaching EOF, and I've tried various ways of attempting to seek to and concatenate the other blocks, but to no avail. Using the older Compression framework and the method specified here, with the same algorithm, yields different results. I can decompress files of any size using python's zlib module. My assumption is that AppleArchive is doing something differently in order to support its multithreading capabilities, perhaps even with asymmetric encoding where the blocks are not ordered. Is there a solution to this problem? If not, why would one ever use ZLIB versus the much more efficient LZFSE? I could use the older Compression API, but it is significantly slower compressing synchronously, and performance is critical with the application I am adding this feature to.

Posted

by

bpmodesitt.

Last updated

.

How to get the position of dominant colors in CGImage?

so, my app needs to find the dominant palette and the position in the image of the k-most dominant colors. I followed the very useful sample project from the vImage documentation https://developer.apple.com/documentation/accelerate/bnns/calculating_the_dominant_colors_in_an_image and the algorithm works fine although I can't wrap my head around how should I go on about and linking said colors with a point in the image. Since the algorithm works by filling storages first, I tried also filling an array of CGPoints called LocationStorage and working with that //filling the array for i in 0...width { for j in 0...height { locationStorage.append( CGPoint(x: i, y: j)) } . . . //working with the array let randomIndex = Int.random(in: 0 ..< width * height) centroids.append(Centroid(red: redStorage[randomIndex], green: greenStorage[randomIndex], blue: blueStorage[randomIndex], position: locationStorage[randomIndex])) } struct Centroid { /// The red channel value. var red: Float /// The green channel value. var green: Float /// The blue channel value. var blue: Float /// The number of pixels assigned to this cluster center. var pixelCount: Int = 0 var position: CGPoint = CGPointZero init(red: Float, green: Float, blue: Float, position: CGPoint) { self.red = red self.green = green self.blue = blue self.position = position } } although it's not accurate. I also tried force trying every pixel in the image to get as close to each color but I think it's too slow. What do you think my approach should be? Let me know if you need additional info Please be kind I'm learning Swift.

Posted

by

RichardDiste.

Last updated

.

vDSP_conv returning wrong results

Hi! We're trying to calculate the delay between two audio inputs, represented by float arrays, by getting their maximum correlation, using vDSP_conv. Our solution is very similar to the one in the first answer here, only we are looking at a 0..5000 radius to find the delay in ms: https://stackoverflow.com/questions/65571299/swift-read-two-audio-files-and-calculate-their-cross-correlation The problem is that we had mixed results, sometimes the calculated delay is ok, but other times it isn't. Our best guess is that there is some overflow error happening, since the arrays we're working with can be pretty large (they can have around 4-5 million values). If we use a simple foreach to calculate these correlations we get good results, but obviously the is quite slow. Did anyone have similar problems?

Posted

by

sandor_kolumban.

Last updated

.

Are there any Conditional operators in vDSP?

Hi. I want to implement the code below using vDSP. for i in a.indices { a[i] = n[i] == 0.0 ? 0.0 : b[i] / n[i] } This code is slow. Are there any good implementation using Accelerate framework?

Accelerate

Posted

by

y_ich.

Last updated

.

Building OpenCL code for Apple Silicon

I am asking this more in hope than expectation, but would greatly appreciate any help or suggestions (with apologies for a rather lengthy post). The problem I have with my existing OpenCL code is, quite simply, that I am unable to get it to build in Xcode (I have always used Xcode without problems in the past). So my question, quite simply, is: Can anyone advise how to configure and use Xcode in order to successfully build OpenCL code for Apple Silicon? Background: Having just received a shiny new M3 MacBook Pro, I would really like to try out one or two of my GPU programs. They were all written several years using OpenCL, before Apple decided to give up on it in favour of Metal. (In fact, I have since converted one of them to use CUDA, but that is not useful here.) Now, I completely understand that the right thing to do is to convert them to use Metal directly, and will do this when I have time, but I suspect that it will take me several days, if not weeks (I have never had reason to use Metal until now, so I will also have to learn how to convert my code; there are quite a few kernels). I don’t have time to do that at the moment. Meanwhile, I would very much like to try the programs right now, using OpenCL, simply to find out how they run on Apple Silicon (I have previously only used them on older, Intel Macs with AMD GPUs). It would be great to see my code running on the M3’s GPU! The reasons I think this must still be possible are (a) there are plenty of Geekbench OpenCL results for the M3 chips; and (b) I have managed to compile and run a really trivial OpenCL program (but only using clang from the command line; I have been unable to work out how to compile individual .cl files containing OpenCL kernels). The problem I am getting is that, having cloned one of my sets of programs into Xcode on my new M3 Mac, I am unable to get any of the kernels even to build. The failure I’m getting is that Xcode is trying to run a version of openclc in the directory /System/Library/Frameworks/OpenCL.framework/Libraries/, which gives the error condition Bad CPU type in executable when Xcode tries to use it. It seems that this is an x86_64 version of openclc. There is a universal binary version in /System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/Libraries/, but I have been unable to find a way to configure (or force ….) Xcode to use that one. It may well be, of course, that if I manage to get past this problem, another one will present itself. Nonetheless, if any of you can suggest anything that I might try, I would be most grateful. One secondary question, if I may: Using openclc to compile a .cl file (containing a kernel) from the command line, is there a parameter (e.g. a value to specify with -arch) or combination of parameters that will cause it to produce a .bc file for an Apple Silicon GPU and also the .cl.h header file that has to be #included in the C or C++ code that will dispatch the kernel? Thanks …. Andrew PS. I’ve also posted this question on MacRumors, because there seem to be quite a number of people there who understand Apple Silicon, but I rather suspect there’s a better chance of getting getting the help I need here ….

Posted

by

Kronsteen.

Last updated

.

MPSMatrix wastes time calling getenv() over and over

I Instrument's CPU Profiling tool I've noticed that a significant portion (22.5%) of the CPU-side overhead related to MPS matrix multiplication (GEMM) is in a call to getenv(). Please see attached screenshot. It seems unnecessary to perform this same check over and over, as whatever hack that needs this should be able to perform the getenv() only once and cache the result for future use.

Posted

by

jacobgorm.

Last updated

.

Error when uploading app that uses Accelerate LAPACK functions to App Store Connect

Hello I'm using functions from the Accelerate framework in my app as mentioned in this developer documentation: https://developer.apple.com/documentation/accelerate/solving_systems_of_linear_equations_with_lapack I've built the app and tested it and I get no errors, but when I try to upload to the App Store Connect I get the error: he app references non-public symbols in Payload/***.app/Frameworks/Matrix.framework/Matrix: _dgeev$NEWLAPACK$ILP64, _dposv$NEWLAPACK$ILP64, _dsyev$NEWLAPACK$ILP64, _dsysv$NEWLAPACK$ILP64 Please advise on how to resolve this issue. Thank you.

Posted

by

Arvand.

Last updated

.

Compute histogram for 10 bit YCbCr samples

I have CVPixelBuffers in YCbCr (422 or 420 biplanar 10 bit video range) coming from camera. I see vImage framework is sophisticated enough to handle a variety of image formats (including pixel buffers in various YCbCr formats). I was looking to compute histograms for both Y (luma) and RGB. For the 8 bit YCbCr samples, I could use this code to compute histogram of Y component. CVPixelBufferLockBaseAddress(pixelBuffer, .readOnly) let bytesPerRow = CVPixelBufferGetBytesPerRowOfPlane(pixelBuffer, 0) let baseAddress = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 0) let height = CVPixelBufferGetHeightOfPlane(pixelBuffer, 0) let width = CVPixelBufferGetWidthOfPlane(pixelBuffer, 0) CVPixelBufferUnlockBaseAddress(pixelBuffer, .readOnly) var buffer = vImage_Buffer(data: baseAddress, height: vImagePixelCount( height), width: vImagePixelCount(width), rowBytes: bytesPerRow) alphaBin.withUnsafeMutableBufferPointer { alphaPtr in let error = vImageHistogramCalculation_Planar8(&buffer, alphaPtr.baseAddress!, UInt32(kvImageNoFlags)) guard error == kvImageNoError else { fatalError("Error calculating histogram luma: \(error)") } } How does one implement the same for 10 bit HDR pixel buffers, preferably using new iOS 16 vImage APIs that provide lot more flexibility (for instance, getting RGB histogram as well from YCbCr sample without explicitly performing pixel format conversion)?

Accelerate

Posted

by

testinstadev.

Last updated

.

How to fully apply parallel computing on CPU and GPU of M1max

Project is based on python3.8 and 3.9, containing some C and C++ source How can I do parallel computing on CPU and GPU of M1max In deed, I buy Mac m1max for the strong GPU to do quantitative finance, for which the speed is extremely important. Unfortunately, cuda is not compatible with Mac. Show me how to do it, thx. Are Accelerate(for CPU) and Metal(for GPU) can speed up any source by building like this: Step 1: download source from github Step 2: create a file named "site.cfg"in this souce file, and add content: [accelerate] libraries=Metal, Acelerate, vecLib Step 3: Terminal: NPY_LAPACK_Order=accelerate python3 setup.py build Step 4: pip3 install . or python3 setup.py install ? (I am not sure which method to apply) 2、how is the compatibility of such method? I need speed up numpy, pandas and even a open souce project, such as https://github.com/microsoft/qlib 3、just show me the code 4、when compiling C++, C source, a lot of errors were reported, which gcc and g++ to choose? the default gcc installed by brew is 4.2.1, which cannot work. and I even tried to download gcc from the offical website of ARM， still cannot work. give me a hint. thx so much urgent

Posted

by

jefftang.

Last updated

.

Modify the ProRAW pixel buffer to write a modified DNG

Hello, In one of my apps, I'm trying to modify the pixel buffer from a ProRAW capture to then write the modified DNG. This is what I try to do: After capturing a ProRAW photo, I work in the delegate function func photoOutput(_ output: AVCapturePhotoOutput, didFinishProcessingPhoto photo: AVCapturePhoto, error: Error?) { ... } In here I can access the photo.pixelBuffer and get its base address: guard let buffer = photo.pixelBuffer else { return } CVPixelBufferLockBaseAddress(buffer, []) let pixelFormat = CVPixelBufferGetPixelFormatType(buffer) // I check that the pixel format corresponds with ProRAW . This is successful, the code enters the if block if (pixelFormat == kCVPixelFormatType_64RGBALE) { guard let pointer = CVPixelBufferGetBaseAddress(buffer) else { return } // We have 16bits per component, 4 components let count = CVPixelBufferGetWidth(buffer) * CVPixelBufferGetHeight(buffer) * 4 let mutable = pointer.bindMemory(to: UInt16.self, capacity: count) // As a test, I want to replace all pixels with 65000 to get a white image let finalBufferArray : [Float] = Array.init(repeating: 65000, count: count) vDSP_vfixu16(finalBufferArray, 1, mutable, 1, vDSP_Length(finalBufferArray.count)) // I create an vImage Pixel buffer. Note that I'm referencing the photo.pixelBuffer to be sure that I modified the underlying pixelBuffer of the AVCapturePhoto object let imageBuffer = vImage.PixelBuffer<vImage.Interleaved16Ux4>(referencing: photo.pixelBuffer!, planeIndex: 0) // Inspect the CGImage let cgImageFormat = vImage_CGImageFormat(bitsPerComponent: 16, bitsPerPixel: 64, colorSpace: CGColorSpace(name: CGColorSpace.displayP3)!, bitmapInfo: CGBitmapInfo(rawValue: CGImageAlphaInfo.last.rawValue | CGBitmapInfo.byteOrder16Little.rawValue))! let cgImage = imageBuffer.makeCGImage(cgImageFormat: cgImageFormat)! // I send the CGImage to the main view controller. This is successful, I can see a white image when rendering the CGImage into a UIImage. This lets me think that I successfully modified the photo.pixelBuffer firingFrameDelegate?.didSendCGImage(image: cgImage) } // Now I try to write data. Unfortunately, this does not work. The photo.fileDataRepresentation() writes the data corresponding to the original, unmodified pixelBuffer `if let photoData = photo.fileDataRepresentation() { // Sending the data to the view controller and rendering it in the UIImage displays the original photo, not the modified pixelBuffer firingFrameDelegate?.didSendData(data: photoData) thisPhotoData = photoData }` CVPixelBufferUnlockBaseAddress(buffer, []) The same happens if I try to write the data to disk. The DNG file displays the original photo and not the data corresponding to the modified photo.pixelBuffer. Do you know why this code should not work? Do you have any ideas on how I can modify the ProRAW pixel buffer so that I can write the modified buffer into a DNG file? My goal is to write a modified file, so, I'm not sure I can use CoreImage of vImage to output a ProRAW file.

Posted

by

salvo_89.

Last updated

.

ILP64 BLAS Interface from Accelerate?

Up until now, we have been using an optimized BLAS library for Intel processors. Now we are looking for replacements for Apple Silicon. I know that Accelerate provides such an interface. However, I haven't been able to find if it provides an ILP64, rather than LP64, interface, which is what we use on all 64-bit platforms. If it does, how do I access it? Thanks.

Posted

by

iseggev.

Last updated

.

FCPR on Apple Silicon

Hi All, I would like to know if there are any C APIs to control the Floating-Point Control Register (FPCR) on Apple Silicon? The ARM documentation does not show any C APIs for doing this. The only example code looks like VHDL, so I was wondering if any developers here knew of any. Thanks

Posted

by

jamescjoy.

Last updated

.

Implementation of some core functions of jax-metal

It appears that some of the jax core functions (in pjit, mlir) are not supported. Is this something to be supported in the future? For example, when I tested a diffrax example, from diffrax import diffeqsolve, ODETerm, Dopri5 import jax.numpy as jnp def f(t, y, args): return -y term = ODETerm(f) solver = Dopri5() y0 = jnp.array([2., 3.]) solution = diffeqsolve(term, solver, t0=0, t1=1, dt0=0.1, y0=y0) It generates an error saying EmitPythonCallback is not supported in metal. File ~/anaconda3/envs/jax-metal-0410/lib/python3.10/site-packages/jax/_src/interpreters/mlir.py:1787 in emit_python_callback raise ValueError( ValueError: `EmitPythonCallback` not supported on METAL backend. I uderstand that, currently, no M1 or M2 chips have multiple devices or can be arranged like that. Therefore, it may not be necessary to fully implement p*** functions (pmap, pjit, etc). But some powerful libraries use them. So, it would be great if at least some workaround for core functions are implemented. Or is there any easy fix for this?

Posted

by

sungsoo.

Last updated

.

aarch64 intrinsics

Many useful ARM intrinsics (such as fma, rng, ld64b, etc.) are described in Arm C Language Extensions. But the arm_acle.h header file shipped with Xcode not include them. Are these intrinsics supported by Apple Silicon chip?

Posted

by

Cheney-Shue.

Last updated

.

When to use vImage, Metal Performance Shaders, or Core Image?

I've looked in multiple places online, including here in the forums where a somewhat similar question is asked (and never answered :( ) but i'm going to ask anyway: vImage, Metal Performance Shaders, and Core Image all have a big overlap in the kinds of operations they perform on image data. But none of supporting materials (documentation, WWDC session videos, help) ever seem to bother with paying much heed to even the existence of the others when talking about themselves. For example, Core Image talks about how efficient and fast it is. MPS talks about everything being "hand rolled" to be optimized for the hardware its running on. Which means yes, fast and efficient. and vImage talks about being fast and..yup, energy-saving. But I and other have very little to go on as to when vImage makes sense over MPS. Or Core Image. If I have a large set of images and I want to get the mean color value of each image and i want to equalize or adjust the histogram of each, or perform some other color operation on each in the set, for example, which is best? I hope someone from Apple -- preferably multiple people from the multiple teams that work on these multiple technologies -- can help clear some of this up?

Posted

by

godofbiscuits.

Last updated

.

Posts under Accelerate tag