Accelerate

RSS for tag

Make large-scale mathematical computations and image calculations with high-performance, energy-efficient computation using Accelerate.

Posts under Accelerate tag

6 Posts

Post

Replies

Boosts

Views

Activity

Apple Accelerate libSparse performance
I've created a Julia interface for Apple Accelerate's libSparse, via calling the library functions as if they were C (@ccall). I'm interested in using this in the context of power systems, where the sparse matrix is the Jacobian or the ABA matrix from a sparse grid network. However, I'm puzzled by the performance. I ran a sampling profiler on repeated in-place solves of Ax = b for a large sparse matrix A and random dense vectors b. (A is size 30k, positive definite so Cholesky factorization.) The 2 functions with the largest impact are _SparseConvertFromCoordinate_Double from libSparse.dylib, and BLASStateRelease from libBLAS.dylib. That strikes me as bizarre. This is an in-place solve: there should be minimal overheard from allocating/deallocating memory. Also, it seems strange that the library would repeatedly convert from coordinate form. Is this expected behavior? Thinking it might be an artifact of the Julia-C interface, I wrote up a similar program in C/Objective-C. I didn't profile it, but timing the same operation (repeated in-place solves of Ax = b for random vectors b, with the same matrix A as in the Julia) gave the same duration. I've attached the C/Objective-C below.profiling-comparison.m.txt If you're familiar with Julia, the following will give you the matrix I was working with: using PowerSystems, PowerNetworkMatrices sys = System("pglib_opf_case30000_goc.m") A = PowerNetworkMatrices.ABA_Matrix(sys).data where you can find the .m file here. (As a crude way to transfer A from Julia to C, I wrote the 3 arrays A.nzval, A.colptr, and A.rowval to .txt files as space-separated lists of numbers: the above C/objective-C reads in those files.) To duplicate my Julia profiling, do pkg> add AppleAccelerate#libSparse Profile--note the #libSparse part, these features aren't on the main branch--then run using AppleAccelerate, Profile # run previous code snippet to define A M, N = 10000, size(A)[1] bs = [rand(N) for _ in 1:M] aa_fact = AAFactorization(A) factor!(aa_fact) solve!(aa_fact, bs[1]) # pre-compile before we profile. Profile.init(n = 10^6, delay = 0.0003) @profile (for i in 1:M; solve!(aa_fact, bs[i]); end;) Profile.print(C = true, format = :flat, sortedby = :count)
3
0
883
1d
vImageBuffer_InitWithCGImage fails with Xcode 26 but succeeds with Xcode 15I am
I am attempting to load a jpeg image into a vImage_Buffer. I am just trying to get the data in an ARGB format. This code works fine in the Xcode 15 build , but fails with kvImageInvalidParameter error from vImageBuffer_InitWithCGImage in the Xcode 26 build. This code is written in ObjectiveC++. Here is a code fragment: int CDib_ARGB::Load(LPCSTR pFilename) { int rc = 0; if (NULL == m_pRaw_vImage_Buffer) { NSString *pNS_filename = [[NSString alloc]initWithUTF8String:pFilename]; NSImage *pNSImage = [[NSImage alloc] initWithContentsOfFile:pNS_filename]; if (nil == pNSImage) rc = -1; else { int width = pNSImage.size.width; int height = pNSImage.size.height; if (pNSImage.representations) { NSImageRep *imageRep; int jj; width = 0; height = 0; for (jj = 0; jj < pNSImage.representations.count; jj++) { imageRep = pNSImage.representations[jj]; if (imageRep.pixelsWide > width) width = imageRep.pixelsWide; if (imageRep.pixelsHigh > height) height = imageRep.pixelsHigh; } } NSSize imageSize = NSMakeSize(width, height); NSRect imageRect = NSMakeRect(0, 0, width, height); pNSImage.size = imageSize; CGImageRef cgImage = [pNSImage CGImageForProposedRect:&imageRect context:NULL hints:nil]; if (nil == cgImage) rc = -1; else { //Alloc and load vImage_Buffer. vImage_Buffer *pvImage_Buffer = new vImage_Buffer; if (NULL == pvImage_Buffer) rc = -1; else { vImage_CGImageFormat format; format.bitsPerComponent = 8; format.bitsPerPixel = 32; format.colorSpace = nil; format.bitmapInfo = kCGImageAlphaFirst | kCGBitmapByteOrderDefault;//ARGB8888 format.version = 0; format.decode = nil; format.renderingIntent = kCGRenderingIntentDefault; memset(pvImage_Buffer, 0, sizeof(vImage_Buffer)); long status = vImageBuffer_InitWithCGImage(pvImage_Buffer, &format, nil, cgImage, kvImagePrintDiagnosticsToConsole); if (kvImageNoError != status) { //This is where Xcode 26 sends me. delete pvImage_Buffer; rc = -1; } =========================
13
0
680
Feb ’26
vDSP.DiscreteFourierTransform failed to initialize with 5 * 5 * 2^n count
I am implementing the FFT using vDSP.DiscreteFourierTransform. According to the official documentation, the count parameter has requirements as outlined below: /// The `count` parameter must be: /// * For split-complex real-to-complex: `2ⁿ` or `f * 2ⁿ`, where `f` is `3`, `5`, or `15` and `n >= 4`. /// * For split-complex complex-to-complex: `2ⁿ` or `f * 2ⁿ`, where `f` is `3`, `5`, or `15` and `n >= 3`. /// * For interleaved: `f * 2ⁿ`, where `f` is `2`, `3`, `5`, `3x3`, `3x5`, or `5x5`, and `n>=2`. Despite adhering to these specifications in theory, my attempt to initialize an interleaved DFT with count = 2 * 2 * 5 * 5 (equivalent to 5×5 × 2²) resulted in a failure. Below is the code snippet I used for the initialization: do { let dft = try vDSP.DiscreteFourierTransform( previous: nil, count: 2 * 2 * 5 * 5, direction: .forward, transformType: .complexReal, ofType: DSPComplex.self ) print(dft) } catch { print("DFT init failed:", error) } Could somebody more knowledgeable with these APIs have a look? Thanks!
1
0
815
Jan ’26
vImageConverter_CreateWithCGImageFormat Fails with kvImageInvalidImageFormat When Trying to Convert CMYK to RGB
So I get JPEG data in my app. Previously I was using the higher level NSBitmapImageRep API and just feeding the JPEG data to it. But now I've noticed on Sonoma If I get a JPEG in the CMYK color space the NSBitmapImageRep renders mostly black and is corrupted. So I'm trying to drop down to the lower level APIs. Specifically I grab a CGImageRef and and trying to use the Accelerate API to convert it to another format (to hopefully workaround the issue... CGImageRef sourceCGImage = `CGImageCreateWithJPEGDataProvider(jpegDataProvider,` NULL, shouldInterpolate, kCGRenderingIntentDefault); Now I use vImageConverter_CreateWithCGImageFormat... with the following values for source and destination formats: Source format: (derived from sourceCGImage) bitsPerComponent = 8 bitsPerPixel = 32 colorSpace = (kCGColorSpaceICCBased; kCGColorSpaceModelCMYK; Generic CMYK Profile) bitmapInfo = kCGBitmapByteOrderDefault version = 0 decode = 0x000060000147f780 renderingIntent = kCGRenderingIntentDefault Destination format: bitsPerComponent = 8 bitsPerPixel = 24 colorSpace = (DeviceRBG) bitmapInfo = 8197 version = 0 decode = 0x0000000000000000 renderingIntent = kCGRenderingIntentDefault But vImageConverter_CreateWithCGImageFormat fails with kvImageInvalidImageFormat. Now if I change the destination format to use 32 bitsPerpixel and use alpha in the bitmap info the vImageConverter_CreateWithCGImageFormat does not return an error but I get a black image just like NSBitmapImageRep
14
0
1.6k
Aug ’25
Horrendous Swift overlay of vDSP Fourier Transform functions
I'm very unpleased to have found out the hard way, that vDSP.FFT&amp;lt;T&amp;gt; where T : vDSP_FourierTransformable and its associated vDSP_FourierTransformFunctions, only does real-complex conversion!!! This is horribly misleading - the only hint that it calls vDSP_fft_zrop() (split-complex real-complex out-of-place) under the hood, not vDSP_fft_zop()[1], is "A 1D single- and double-precision fast Fourier transform." - instead of "a single- and double-precision complex fast Fourier transform". Holy ******* ****. Just look at how people miss-call this routine. Poor him, realizing he had to log2n+1 lest only the first half of the array was transformed, not understanding why [2]. And poor me, having taken days investigating why a simple Swift overlay vDSP.FFT.transform may execute at 1.4x the speed of vDSP_fft_zopt(out-of-place with temporary buffer)! [3] [1]: or better, vDSP_fft_zopt with the temp buffer alloc/dealloc taken care of, for us [2]: for real-complex conversion, say real signal of length 16. log2n is 4 no problem, but then the real and imaginary vectors are both length 8. Also, vDSP_fft only works on integer powers of 2, so he had to choose next integer power of 2 (i.e. 1&amp;lt;&amp;lt;(log2n-1)) instead of plain length for his internal arrays. [3]: you guessed it. fft_zrop(log2n-1, ...) vs. fft_zop(log2n, ...). Half the problem size lol. Now we have vDSP.DiscreteFourierTransform, which wraps vDSP_DFT routines and "calls FFT routines when the size allows it", and works too for interleaved complexes. Just go all DFT, right? if let setup_f = try? vDSP.DiscreteFourierTransform(previous: nil, count: 8, direction: .forward, transformType: .complexComplex, ofType: DSPComplex.self) { // Done forward transformation // and scaled the results with 1/N // How about going reverse? if let setup_r = try? vDSP.DiscreteFourierTransform(previous: setup_f, count: 8, direction: .inverse, transformType: .complexComplex, ofType: DSPComplex.self) { // Error: cannot convert vDSP.DiscreteFourierTransform&amp;lt;DSPComplex&amp;gt; to vDSP.DiscreteFourierTransform&amp;lt;Float&amp;gt; // lolz } } This API appeared in macOS 12. 3 years later nobody have ever noticed. I'm at a loss for words.
4
0
351
Jun ’25
BNNS random number generator for Double value types
I generate an array of random floats using the code shown below. However, I would like to do this with Double instead of Float. Are there any BNNS random number generators for double values, something like BNNSRandomFillUniformDouble? If not, is there a way I can convert BNNSNDArrayDescriptor from float to double? import Accelerate let n = 100_000_000 let result = Array<Float>(unsafeUninitializedCapacity: n) { buffer, initCount in var descriptor = BNNSNDArrayDescriptor(data: buffer, shape: .vector(n))! let randomGenerator = BNNSCreateRandomGenerator(BNNSRandomGeneratorMethodAES_CTR, nil) BNNSRandomFillUniformFloat(randomGenerator, &descriptor, 0, 1) initCount = n }
3
0
289
Jun ’25
Apple Accelerate libSparse performance
I've created a Julia interface for Apple Accelerate's libSparse, via calling the library functions as if they were C (@ccall). I'm interested in using this in the context of power systems, where the sparse matrix is the Jacobian or the ABA matrix from a sparse grid network. However, I'm puzzled by the performance. I ran a sampling profiler on repeated in-place solves of Ax = b for a large sparse matrix A and random dense vectors b. (A is size 30k, positive definite so Cholesky factorization.) The 2 functions with the largest impact are _SparseConvertFromCoordinate_Double from libSparse.dylib, and BLASStateRelease from libBLAS.dylib. That strikes me as bizarre. This is an in-place solve: there should be minimal overheard from allocating/deallocating memory. Also, it seems strange that the library would repeatedly convert from coordinate form. Is this expected behavior? Thinking it might be an artifact of the Julia-C interface, I wrote up a similar program in C/Objective-C. I didn't profile it, but timing the same operation (repeated in-place solves of Ax = b for random vectors b, with the same matrix A as in the Julia) gave the same duration. I've attached the C/Objective-C below.profiling-comparison.m.txt If you're familiar with Julia, the following will give you the matrix I was working with: using PowerSystems, PowerNetworkMatrices sys = System("pglib_opf_case30000_goc.m") A = PowerNetworkMatrices.ABA_Matrix(sys).data where you can find the .m file here. (As a crude way to transfer A from Julia to C, I wrote the 3 arrays A.nzval, A.colptr, and A.rowval to .txt files as space-separated lists of numbers: the above C/objective-C reads in those files.) To duplicate my Julia profiling, do pkg> add AppleAccelerate#libSparse Profile--note the #libSparse part, these features aren't on the main branch--then run using AppleAccelerate, Profile # run previous code snippet to define A M, N = 10000, size(A)[1] bs = [rand(N) for _ in 1:M] aa_fact = AAFactorization(A) factor!(aa_fact) solve!(aa_fact, bs[1]) # pre-compile before we profile. Profile.init(n = 10^6, delay = 0.0003) @profile (for i in 1:M; solve!(aa_fact, bs[i]); end;) Profile.print(C = true, format = :flat, sortedby = :count)
Replies
3
Boosts
0
Views
883
Activity
1d
vImageBuffer_InitWithCGImage fails with Xcode 26 but succeeds with Xcode 15I am
I am attempting to load a jpeg image into a vImage_Buffer. I am just trying to get the data in an ARGB format. This code works fine in the Xcode 15 build , but fails with kvImageInvalidParameter error from vImageBuffer_InitWithCGImage in the Xcode 26 build. This code is written in ObjectiveC++. Here is a code fragment: int CDib_ARGB::Load(LPCSTR pFilename) { int rc = 0; if (NULL == m_pRaw_vImage_Buffer) { NSString *pNS_filename = [[NSString alloc]initWithUTF8String:pFilename]; NSImage *pNSImage = [[NSImage alloc] initWithContentsOfFile:pNS_filename]; if (nil == pNSImage) rc = -1; else { int width = pNSImage.size.width; int height = pNSImage.size.height; if (pNSImage.representations) { NSImageRep *imageRep; int jj; width = 0; height = 0; for (jj = 0; jj < pNSImage.representations.count; jj++) { imageRep = pNSImage.representations[jj]; if (imageRep.pixelsWide > width) width = imageRep.pixelsWide; if (imageRep.pixelsHigh > height) height = imageRep.pixelsHigh; } } NSSize imageSize = NSMakeSize(width, height); NSRect imageRect = NSMakeRect(0, 0, width, height); pNSImage.size = imageSize; CGImageRef cgImage = [pNSImage CGImageForProposedRect:&imageRect context:NULL hints:nil]; if (nil == cgImage) rc = -1; else { //Alloc and load vImage_Buffer. vImage_Buffer *pvImage_Buffer = new vImage_Buffer; if (NULL == pvImage_Buffer) rc = -1; else { vImage_CGImageFormat format; format.bitsPerComponent = 8; format.bitsPerPixel = 32; format.colorSpace = nil; format.bitmapInfo = kCGImageAlphaFirst | kCGBitmapByteOrderDefault;//ARGB8888 format.version = 0; format.decode = nil; format.renderingIntent = kCGRenderingIntentDefault; memset(pvImage_Buffer, 0, sizeof(vImage_Buffer)); long status = vImageBuffer_InitWithCGImage(pvImage_Buffer, &format, nil, cgImage, kvImagePrintDiagnosticsToConsole); if (kvImageNoError != status) { //This is where Xcode 26 sends me. delete pvImage_Buffer; rc = -1; } =========================
Replies
13
Boosts
0
Views
680
Activity
Feb ’26
vDSP.DiscreteFourierTransform failed to initialize with 5 * 5 * 2^n count
I am implementing the FFT using vDSP.DiscreteFourierTransform. According to the official documentation, the count parameter has requirements as outlined below: /// The `count` parameter must be: /// * For split-complex real-to-complex: `2ⁿ` or `f * 2ⁿ`, where `f` is `3`, `5`, or `15` and `n >= 4`. /// * For split-complex complex-to-complex: `2ⁿ` or `f * 2ⁿ`, where `f` is `3`, `5`, or `15` and `n >= 3`. /// * For interleaved: `f * 2ⁿ`, where `f` is `2`, `3`, `5`, `3x3`, `3x5`, or `5x5`, and `n>=2`. Despite adhering to these specifications in theory, my attempt to initialize an interleaved DFT with count = 2 * 2 * 5 * 5 (equivalent to 5×5 × 2²) resulted in a failure. Below is the code snippet I used for the initialization: do { let dft = try vDSP.DiscreteFourierTransform( previous: nil, count: 2 * 2 * 5 * 5, direction: .forward, transformType: .complexReal, ofType: DSPComplex.self ) print(dft) } catch { print("DFT init failed:", error) } Could somebody more knowledgeable with these APIs have a look? Thanks!
Replies
1
Boosts
0
Views
815
Activity
Jan ’26
vImageConverter_CreateWithCGImageFormat Fails with kvImageInvalidImageFormat When Trying to Convert CMYK to RGB
So I get JPEG data in my app. Previously I was using the higher level NSBitmapImageRep API and just feeding the JPEG data to it. But now I've noticed on Sonoma If I get a JPEG in the CMYK color space the NSBitmapImageRep renders mostly black and is corrupted. So I'm trying to drop down to the lower level APIs. Specifically I grab a CGImageRef and and trying to use the Accelerate API to convert it to another format (to hopefully workaround the issue... CGImageRef sourceCGImage = `CGImageCreateWithJPEGDataProvider(jpegDataProvider,` NULL, shouldInterpolate, kCGRenderingIntentDefault); Now I use vImageConverter_CreateWithCGImageFormat... with the following values for source and destination formats: Source format: (derived from sourceCGImage) bitsPerComponent = 8 bitsPerPixel = 32 colorSpace = (kCGColorSpaceICCBased; kCGColorSpaceModelCMYK; Generic CMYK Profile) bitmapInfo = kCGBitmapByteOrderDefault version = 0 decode = 0x000060000147f780 renderingIntent = kCGRenderingIntentDefault Destination format: bitsPerComponent = 8 bitsPerPixel = 24 colorSpace = (DeviceRBG) bitmapInfo = 8197 version = 0 decode = 0x0000000000000000 renderingIntent = kCGRenderingIntentDefault But vImageConverter_CreateWithCGImageFormat fails with kvImageInvalidImageFormat. Now if I change the destination format to use 32 bitsPerpixel and use alpha in the bitmap info the vImageConverter_CreateWithCGImageFormat does not return an error but I get a black image just like NSBitmapImageRep
Replies
14
Boosts
0
Views
1.6k
Activity
Aug ’25
Horrendous Swift overlay of vDSP Fourier Transform functions
I'm very unpleased to have found out the hard way, that vDSP.FFT&amp;lt;T&amp;gt; where T : vDSP_FourierTransformable and its associated vDSP_FourierTransformFunctions, only does real-complex conversion!!! This is horribly misleading - the only hint that it calls vDSP_fft_zrop() (split-complex real-complex out-of-place) under the hood, not vDSP_fft_zop()[1], is "A 1D single- and double-precision fast Fourier transform." - instead of "a single- and double-precision complex fast Fourier transform". Holy ******* ****. Just look at how people miss-call this routine. Poor him, realizing he had to log2n+1 lest only the first half of the array was transformed, not understanding why [2]. And poor me, having taken days investigating why a simple Swift overlay vDSP.FFT.transform may execute at 1.4x the speed of vDSP_fft_zopt(out-of-place with temporary buffer)! [3] [1]: or better, vDSP_fft_zopt with the temp buffer alloc/dealloc taken care of, for us [2]: for real-complex conversion, say real signal of length 16. log2n is 4 no problem, but then the real and imaginary vectors are both length 8. Also, vDSP_fft only works on integer powers of 2, so he had to choose next integer power of 2 (i.e. 1&amp;lt;&amp;lt;(log2n-1)) instead of plain length for his internal arrays. [3]: you guessed it. fft_zrop(log2n-1, ...) vs. fft_zop(log2n, ...). Half the problem size lol. Now we have vDSP.DiscreteFourierTransform, which wraps vDSP_DFT routines and "calls FFT routines when the size allows it", and works too for interleaved complexes. Just go all DFT, right? if let setup_f = try? vDSP.DiscreteFourierTransform(previous: nil, count: 8, direction: .forward, transformType: .complexComplex, ofType: DSPComplex.self) { // Done forward transformation // and scaled the results with 1/N // How about going reverse? if let setup_r = try? vDSP.DiscreteFourierTransform(previous: setup_f, count: 8, direction: .inverse, transformType: .complexComplex, ofType: DSPComplex.self) { // Error: cannot convert vDSP.DiscreteFourierTransform&amp;lt;DSPComplex&amp;gt; to vDSP.DiscreteFourierTransform&amp;lt;Float&amp;gt; // lolz } } This API appeared in macOS 12. 3 years later nobody have ever noticed. I'm at a loss for words.
Replies
4
Boosts
0
Views
351
Activity
Jun ’25
BNNS random number generator for Double value types
I generate an array of random floats using the code shown below. However, I would like to do this with Double instead of Float. Are there any BNNS random number generators for double values, something like BNNSRandomFillUniformDouble? If not, is there a way I can convert BNNSNDArrayDescriptor from float to double? import Accelerate let n = 100_000_000 let result = Array<Float>(unsafeUninitializedCapacity: n) { buffer, initCount in var descriptor = BNNSNDArrayDescriptor(data: buffer, shape: .vector(n))! let randomGenerator = BNNSCreateRandomGenerator(BNNSRandomGeneratorMethodAES_CTR, nil) BNNSRandomFillUniformFloat(randomGenerator, &descriptor, 0, 1) initCount = n }
Replies
3
Boosts
0
Views
289
Activity
Jun ’25