Accelerate

Technical specification of IEEE 754 implementation on Apple silicon

I'm looking for technical documentation on Apple's IEEE 754 implementation on the M1 - M4 chips. Specifically, I'd like to know how NaN payloads are handled in arithmetic operations. Does anyone know where this information is available? Thank you!

App & System Services Hardware Apple Silicon Accelerate

2

0

201

Mar ’25

How Can I Apply A vImage ContrastStretch To A Grayscale Image

I'm writing an app to help with astrophotography, and I need to perform a contrast stretch to the image, because it was taken with a specialized astrophotography camera in monochrome and most of the data is very dark. Most astrophotography software (astropy, Pixinsight) has something called an autostretch, which is a form of contrast stretching. I would like to do the same thing in my iOS app, using the tools available to me in SwiftUI, UIImage, CIImage, and CGImage. I am to the point that I have created a buffer .withUnsafeMutableBufferPointer that contains the image data as 16-bit unsigned integers (the format given to me by the camera). I then create a vImage_Buffer using: var buffer = vImage_Buffer(data: outPtr.baseAddress, height: vImagePixelCount(imageHeight), width: vImagePixelCount(imageWidth), rowBytes: MemoryLayout<Float>.size * imageWidth) ... and now I would like to apply either an equalizeHistogram() or a contrastStretch() to the buffer. What do I need to do? Do I need to create a CGImageFormat, like this? let cgiImageFormat = vImage_CGImageFormat(bitsPerComponent: 16, bitsPerPixel: 16, colorSpace: CGColorSpaceCreateDeviceGray(), bitmapInfo: bitmapInfo)! Which function should I use to do the equalization or contrast stretch? There appears to be a vImageContrastStretch_PlanarF() function, but I'm not sure the input data will be in the proper format (is a monochrome CGImage 32-bit planar F?), and I certainly don't know how to setup the histogram_entries parameter for that function. It seems like the function could just scan the image itself, form the histogram, and then stretch it, right? So a code example would help a lot! Thanks in advance, Robert

UI Frameworks General Accelerate

5

0

376

Feb ’25

Apple Accelerate libSparse performance

I've created a Julia interface for Apple Accelerate's libSparse, via calling the library functions as if they were C (@ccall). I'm interested in using this in the context of power systems, where the sparse matrix is the Jacobian or the ABA matrix from a sparse grid network. However, I'm puzzled by the performance. I ran a sampling profiler on repeated in-place solves of Ax = b for a large sparse matrix A and random dense vectors b. (A is size 30k, positive definite so Cholesky factorization.) The 2 functions with the largest impact are _SparseConvertFromCoordinate_Double from libSparse.dylib, and BLASStateRelease from libBLAS.dylib. That strikes me as bizarre. This is an in-place solve: there should be minimal overheard from allocating/deallocating memory. Also, it seems strange that the library would repeatedly convert from coordinate form. Is this expected behavior? Thinking it might be an artifact of the Julia-C interface, I wrote up a similar program in C/Objective-C. I didn't profile it, but timing the same operation (repeated in-place solves of Ax = b for random vectors b, with the same matrix A as in the Julia) gave the same duration. I've attached the C/Objective-C below.profiling-comparison.m.txt If you're familiar with Julia, the following will give you the matrix I was working with: using PowerSystems, PowerNetworkMatrices sys = System("pglib_opf_case30000_goc.m") A = PowerNetworkMatrices.ABA_Matrix(sys).data where you can find the .m file here. (As a crude way to transfer A from Julia to C, I wrote the 3 arrays A.nzval, A.colptr, and A.rowval to .txt files as space-separated lists of numbers: the above C/objective-C reads in those files.) To duplicate my Julia profiling, do pkg> add AppleAccelerate#libSparse Profile--note the #libSparse part, these features aren't on the main branch--then run using AppleAccelerate, Profile # run previous code snippet to define A M, N = 10000, size(A)[1] bs = [rand(N) for _ in 1:M] aa_fact = AAFactorization(A) factor!(aa_fact) solve!(aa_fact, bs[1]) # pre-compile before we profile. Profile.init(n = 10^6, delay = 0.0003) @profile (for i in 1:M; solve!(aa_fact, bs[i]); end;) Profile.print(C = true, format = :flat, sortedby = :count)

Programming Languages General Accelerate

2

0

450

Jan ’25

Segmentation Fault in np.matmul on macOS 15.2 with Accelerate BLAS

I'm encountering a segmentation fault when using np.matmul with relatively small arrays on macOS 15.2. The issue only occurs in specific scenarios and results in a crash with the following error: Exception Type: EXC_BAD_ACCESS (SIGSEGV) Exception Codes: KERN_INVALID_ADDRESS at 0x0000000000000110 Termination Reason: Namespace SIGNAL, Code 11 Segmentation fault: 11 Full error log: Gist link The crash consistently occurs on a specific line where np.matmul is called, despite similar np.matmul operations succeeding earlier in the same script. The issue cannot be reproduced in a separate script that contains identical operations. When I build the NumPy wheel using OpenBLAS, this issue no longer arises, which leads me to believe that it is related to a problem with Accelerate. Environment NumPy Version: 2.1.3 Python Version: 3.12.7 OS Version: macOS 15.2 BLAS Configuration: Build Dependencies: blas: detection method: system found: true include directory: unknown lib directory: unknown name: accelerate openblas configuration: unknown pc file directory: unknown version: unknown lapack: detection method: system found: true include directory: unknown lib directory: unknown name: accelerate openblas configuration: unknown pc file directory: unknown version: unknown Compilers: c: commands: cc linker: ld64 name: clang version: 15.0.0 c++: commands: c++ linker: ld64 name: clang version: 15.0.0 cython: commands: cython linker: cython name: cython version: 3.0.11 Machine Information: build: cpu: aarch64 endian: little family: aarch64 system: darwin host: cpu: aarch64 endian: little family: aarch64 system: darwin

Developer Tools & Services Xcode ML Compute macOS Accelerate Apple Silicon

1

0

359

Jan ’25

Polynomial Coefficients calculation

How can I calculate polynomial coefficients for Tone Curve points: // • Red channel: (0, 0), (60, 39), (128, 128), (255, 255) // • Green channel: (0, 0), (63, 50), (128, 128), (255, 255) // • Blue channel: (0, 0), (60, 47), (119, 119), (255, 255) CIFilter: func colorCrossPolynomial(inputImage: CIImage) -> CIImage? { let colorCrossPolynomial = CIFilter.colorCrossPolynomial() let redfloatArr: [CGFloat] = [1, 1, 1, 1, 0, 0, 0, 0, 0, 0] let greenfloatArr: [CGFloat] = [0, 1, 1, 0, 0, 0, 0, 0, 0, 1] let bluefloatArr: [CGFloat] = [0, 0, 1, 0, 0, 0, 0, 1, 1, 0] colorCrossPolynomial.inputImage = inputImage colorCrossPolynomial.blueCoefficients = CIVector(values: bluefloatArr, count: bluefloatArr.count) colorCrossPolynomial.redCoefficients = CIVector(values: redfloatArr, count: redfloatArr.count) colorCrossPolynomial.greenCoefficients = CIVector(values: greenfloatArr, count: greenfloatArr.count) return colorCrossPolynomial.outputImage }

Programming Languages Swift Accelerate Core Image

1

0

392

Jan ’25

Questions about calculate the square root using Accelerate

I am currently studying the Accelerate library by referring to Apple documentation. Here is the link to the referenced document: https://developer.apple.com/documentation/accelerate/veclib/vforce When I executed the sample code provided at the bottom of the document, I found a case where the results were different. let n = 10_000 let x = (0..<n).map { _ in Float.random(in: 1 ... 10_000) } let y = x.map { return sqrt($0) } and let y = [Float](unsafeUninitializedCapacity: n) { buffer, initializedCount in vForce.sqrt(x, result: &buffer) initializedCount = n } The code below is provided to observe the issue described above. import Accelerate Task { let n = 1//10_000 let x = (0..<n).map { _ in Float(6737.015)//Float.random(in: 1 ... 10_000) } let y = x.map { return sqrt($0) } try? await Task.sleep(nanoseconds: 1_000_000_000) let z = [Float](unsafeUninitializedCapacity: n) { buffer, initializedCount in vForce.sqrt(x, result: &buffer) initializedCount = n } } For a value of 6737.015 when calculating the square root: Using the sqrt(_:) function gives the result 82.07932, While using the vForce.sqrt(_:result:) function gives the result 82.07933. Using a calculator, the value comes out as 82.07932139, which shows that the result from vForce is incorrect. Could you explain the reason behind this difference?

Programming Languages Swift Swift Debugging Accelerate

2

0

481

Jan ’25

Strange results from Accelerate DFT

Not sure if this is the right forum (don't see one for Maths...). There's no double-precision version of the newest Swift API for doing a DCT, so I tried to roll my own. Was expecting to have to scale the output, but by a constant factor... The results are out by a factor of about 0.2 at the start, and this falls smoothly to about 0.0067 by the end of the "period". I've included a direct calculation of the normalised DCT, as this was the kind of scaling error I was expecting to see. I think I'm doing it right: mirror the data into an array twice the size of the original, pass that in as the real part, pass zeroes in as the imaginary part, then take the real result. It's alternating zeroes all right, and it's in sync, but ... DO I need to scale the input? Chebyshev nodes. I'm at a loss. Anybody know what I'm doing wrong? PS Sorry for the huge long post. // Unnormalized values from SciPy, to save rewrites. // scipy.fft.dct([x for x in range(1,49)], type=2, norm=None) let fromSciPy = [ 2.35200000e+03, ... -3.27541474e-02] // Direct calculation which produces scaled output that agrees with SciPy // (although the zeroes don't seem to be exactly zero...) func dctII(input: [Double]) -> [Double] { let N = input.count let n = 1 / (2 * Double(N)) let factor = sqrt(2.0 / Double(N)) var result = (0..<N).map { k in factor * (0..<N).reduce(into: Double()) { sum_k, j in sum_k += input[j] * cos(Double.pi * Double(k) * (2 * Double(j) + 1) * n) } } result[0] /= sqrt(2.0) return result } let format = FloatingPointFormatStyle<Double>()... let factor = FloatingPointFormatStyle<Double>()... // According to the docs, an acceptable size. // .init() would fail if it wasn't. Effect's the same with monger sequences. let N = 48 let input = (1...N).map { Double($0) } // The bit that doesn't work. func dctII_viaDFT(input: [Double]) -> [Double] { let N = input.count // Initialize DFT object for 2N points guard let dft = try? vDSP.DiscreteFourierTransform( count: N * 2, direction: .forward, transformType: .complexComplex, ofType: Double.self ) else { fatalError("Failed to create DFT object") } // Extend the input signal to enforce even symmetry var real = [Double](repeating: 0, count: N * 2) var imag = [Double](repeating: 0, count: N * 2) for i in 0..<N { real[i] = input[i] real[(N * 2) - 1 - i] = input[i] } // Compute the DFT let (re, im) = dft.transform(real: real, imaginary: imag) // Extract DCT-II coefficients from the real part of the first N terms var dctCoefficients = [Double](repeating: 0, count: N) for k in 0..<N { dctCoefficients[k] = re[k] } // Normalize to match SciPy's orthogonal normalization let scaleFactor = sqrt(2 / Double(N)) dctCoefficients[0] *= scaleFactor for k in 1..<N { dctCoefficients[k] *= scaleFactor } return dctCoefficients } let viaFFT = dctII_viaDFT(input: input) let direct = dctII(input: input) print(" SciPy Direct Drct/SciPy DFT DFT/SciPy FT/Direct") for i in direct.indices where viaFFT[i] != 0 { let truth = fromSciPy[i] let naive = direct[i] let weird = viaFFT[i] print("\(truth.formatted(format))\t\(naive.formatted(format))\t\((truth / naive).formatted(factor))\t\(weird.formatted(format))\t\t\((weird / truth).formatted(factor))\t\((weird / naive).formatted(factor))") } And here are the results (I've missed out the zero terms): SciPy Direct Drct/SciPy DFT DFT/SciPy FT/Direct +352.000000 +169.740979 3.856406 +480.099990 0.204124 2.828427 -933.609299 -095.286100 9.797959 -190.470165 0.204015 1.998929 -103.585662 -010.572167 9.797959 -021.042519 0.203141 1.990369 -037.182805 -003.794954 9.797959 -007.488532 0.201398 1.973287 -018.886898 -001.927636 9.797959 -003.754560 0.198792 1.947754 -011.356294 -001.159047 9.797959 -002.218278 0.195335 1.913881 -007.542756 -000.769829 9.797959 -001.440976 0.191041 1.871812 -005.347733 -000.545801 9.797959 -000.994300 0.185929 1.821728 -003.968777 -000.405062 9.797959 -000.714465 0.180021 1.763843 -003.045311 -000.310811 9.797959 -000.527882 0.173343 1.698404 -002.395797 -000.244520 9.797959 -000.397515 0.165922 1.625693 -001.920738 -000.196035 9.797959 -000.303073 0.157790 1.546021 -001.561880 -000.159409 9.797959 -000.232693 0.148983 1.459728 -001.283256 -000.130972 9.797959 -000.179063 0.139538 1.367185 -001.061666 -000.108356 9.797959 -000.137480 0.129495 1.268787 -000.881581 -000.089976 9.797959 -000.104818 0.118898 1.164955 -000.732264 -000.074736 9.797959 -000.078932 0.107791 1.056136 -000.606076 -000.061857 9.797959 -000.058319 0.096223 0.942793 -000.497433 -000.050769 9.797959 -000.041906 0.084243 0.825414 -000.402149 -000.041044 9.797959 -000.028916 0.071903 0.704500 -000.316996 -000.032353 9.797959 -000.018783 0.059254 0.580569 -000.239422 -000.024436 9.797959 -000.011098 0.046352 0.454153 -000.167336 -000.017079 9.797959 -000.005564 0.033251 0.325791 -000.098968 -000.010101 9.797959 -000.001980 0.020008 0.196034 -000.032754 -000.003343 9.797959 -000.000219 0.006679 0.065438

App & System Services Core OS Accelerate

4

0

509

Dec ’24

Integer arithmetic with Accelerate

Almost all the functions in Accelerate are for single precision (Float) and double precision (Double) operations. However, I stumbled upon three integer arithmetic functions which operate on Int32 values. Are there any more functions in Accelerate that operate on integer values? If not, then why aren't there more functions that work with integers?

Machine Learning & AI General Accelerate

1

0

510

Oct ’24

Do BLAS and LAPACK functions use Apple Silicon features

I can use BLAS and LAPACK functions via the Accelerate framework to perform vector and matrix arithmetic and linear algebra calculations. But do these functions take advantage of Apple Silicon features?

Machine Learning & AI General Accelerate

3

0

955

Oct ’24

vImageConverter_CreateWithCGImageFormat Fails with kvImageInvalidImageFormat When Trying to Convert CMYK to RGB

So I get JPEG data in my app. Previously I was using the higher level NSBitmapImageRep API and just feeding the JPEG data to it. But now I've noticed on Sonoma If I get a JPEG in the CMYK color space the NSBitmapImageRep renders mostly black and is corrupted. So I'm trying to drop down to the lower level APIs. Specifically I grab a CGImageRef and and trying to use the Accelerate API to convert it to another format (to hopefully workaround the issue... CGImageRef sourceCGImage = `CGImageCreateWithJPEGDataProvider(jpegDataProvider,` NULL, shouldInterpolate, kCGRenderingIntentDefault); Now I use vImageConverter_CreateWithCGImageFormat... with the following values for source and destination formats: Source format: (derived from sourceCGImage) bitsPerComponent = 8 bitsPerPixel = 32 colorSpace = (kCGColorSpaceICCBased; kCGColorSpaceModelCMYK; Generic CMYK Profile) bitmapInfo = kCGBitmapByteOrderDefault version = 0 decode = 0x000060000147f780 renderingIntent = kCGRenderingIntentDefault Destination format: bitsPerComponent = 8 bitsPerPixel = 24 colorSpace = (DeviceRBG) bitmapInfo = 8197 version = 0 decode = 0x0000000000000000 renderingIntent = kCGRenderingIntentDefault But vImageConverter_CreateWithCGImageFormat fails with kvImageInvalidImageFormat. Now if I change the destination format to use 32 bitsPerpixel and use alpha in the bitmap info the vImageConverter_CreateWithCGImageFormat does not return an error but I get a black image just like NSBitmapImageRep

Graphics & Games General AppKit Accelerate Core Graphics

13

0

1.3k

Nov ’24

Is it possible to access system audio

Hello! I’m making an app which will have a waveform of the frequency of what’s playing on a Mac. The question is whether it is possible to have access to the signal of the media and use it with the FFT?

Media Technologies Audio Accelerate Audio Core Audio Core Audio Kit

3

0

967

Sep ’24

MLTensor computation took more time than expected.

func testMLTensor() { let t1 = MLTensor(shape: [2000, 1], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 2000), scalarType: Float.self) let t2 = MLTensor(shape: [1, 3000], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 3000), scalarType: Float.self) for _ in 0...50 { let t = Date() let x = (t1 * t2) print("MLTensor", t.timeIntervalSinceNow * 1000, "ms") } } testMLTensor() The above code took more time than expected, especially in the early stage of iteration.

Machine Learning & AI Core ML ML Compute Accelerate Performance Core ML

1

0

781

Aug ’24

MLTensor computation took more time than expected.

func testMLTensor() { let t1 = MLTensor(shape: [2000, 1], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 2000), scalarType: Float.self) let t2 = MLTensor(shape: [1, 3000], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 3000), scalarType: Float.self) for _ in 0...50 { let t = Date() let x = (t1 * t2) print("MLTensor", t.timeIntervalSinceNow * 1000, "ms") } } testMLTensor() The above code took more time than expected, especially in the early stage of iteration.

Machine Learning & AI Core ML ML Compute Accelerate Core ML

0

642

Aug ’24

MLTensor computation took more time than expected.

func testMLTensor() { let t1 = MLTensor(shape: [2000, 1], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 2000), scalarType: Float.self) let t2 = MLTensor(shape: [1, 3000], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 3000), scalarType: Float.self) for _ in 0...50 { let t = Date() let x = (t1 * t2) print("MLTensor", t.timeIntervalSinceNow * 1000, "ms") } } testMLTensor() The above code took more time than expected, especially in the early stage of iteration.

Machine Learning & AI Core ML ML Compute Accelerate

0

570

Aug ’24

Documentation and usage of BNNS.NormalizationLayer

Hello everybody, I am running into an error with BNNS.NormalizationLayer. It appears to only work with .vector, and matrix shapes throws layerApplyFail during training. Inference doesn't throw but the output stays the same. How to correctly use BNNS.NormalizationLayer with matrix shapes? How to debug layerApplyFail exception? Thanks let array: [Float32] = [ 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, ] // let inputShape: BNNS.Shape = .vector(6 * 3) // works let inputShape: BNNS.Shape = .matrixColumnMajor(6, 3) let input = BNNSNDArrayDescriptor.allocateUninitialized(scalarType: Float32.self, shape: inputShape) let output = BNNSNDArrayDescriptor.allocateUninitialized(scalarType: Float32.self, shape: inputShape) let beta = BNNSNDArrayDescriptor.allocate(repeating: Float32(0), shape: inputShape, batchSize: 1) let gamma = BNNSNDArrayDescriptor.allocate(repeating: Float32(1), shape: inputShape, batchSize: 1) let activation: BNNS.ActivationFunction = .identity let layer = BNNS.NormalizationLayer(type: .layer(normalizationAxis: 0), input: input, output: output, beta: beta, gamma: gamma, epsilon: 1e-12, activation: activation)! let layerInput = BNNSNDArrayDescriptor.allocate(initializingFrom: array, shape: inputShape) let layerOutput = BNNSNDArrayDescriptor.allocateUninitialized(scalarType: Float32.self, shape: inputShape) // try layer.apply(batchSize: 1, input: layerInput, output: layerOutput, for: .inference) // No throw try layer.apply(batchSize: 1, input: layerInput, output: layerOutput, for: .training) _ = layerOutput.makeArray(of: Float32.self) // All zeros when .inference

Machine Learning & AI General Accelerate

1

0

832

Jul ’24

Performant alternative to scaling a CIImage / PixelBuffer

Hey, I’m building a camera app where I am applying real time effects to the view finder. One of those effects is a variable blur, so to improve performance I am scaling down the input image using CIFilter.lanczosScaleTransform(). This works fine and runs at 30FPS, but when running the metal profiler I can see that the scaling transforms use a lot of GPU time, almost as much as the variable blur. Is there a more efficient way to do this? The simplified chain is like this: Scale down viewFinder CVPixelBuffer (CIFilter.lanczosScaleTransform) Scale up depthMap CVPixelBuffer to match viewFinder size (CIFilter.lanczosScaleTransform) Create CIImages from both CVPixelBuffers Apply VariableDepthBlur (CIFilter.maskedVariableBlur) Scale up final image to metal view size (CIFilter.lanczosScaleTransform) Render CIImage to a MTKView using CIRenderDestination From some research, I wonder if scaling the CVPixelBuffer using the accelerate framework would be faster? Also, Instead of scaling the final image, perhaps I could offload this to the metal view? Any pointers greatly appreciated!

Media Technologies Photos & Camera Metal Camera Accelerate Photos and Imaging

2

0

976

Jul ’24

Peculiar EXC_BAD_ACCESS, involving sparse matrices

Helo all, Currently, I'm working on an iOS app that performs measurement and shows the results to the user in a graph. I use a Savitzky-Golay filter to filter out noise, so that the graph is nice and smooth. However, the code that calculates the Savitzky-Golay coefficients using sparse matrices crashes sometimes, throwing an EXC_BAD_ACCESS. I tried to find out what the problem is by turning on Address Sanitizer and Thread Sanitizer, but, for some reason, the bad access exception isn't thrown when either of these is on. What else could I try to trace back the problem? Thanks in advance, CaS To reproduce the error, run the following: import SwiftUI import Accelerate struct ContentView: View { var body: some View { VStack { Button("Try", action: test) } .padding() } func test() { for windowLength in 3...100 { let coeffs = SavitzkyGolay.coefficients(windowLength: windowLength, polynomialOrder: 2) print(coeffs) } } } class SavitzkyGolay { static func coefficients(windowLength: Int, polynomialOrder: Int, derivativeOrder: Int = 0, delta: Int = 1) -> [Double] { let (halfWindow, remainder) = windowLength.quotientAndRemainder(dividingBy: 2) var pos = Double(halfWindow) if remainder == 0 { pos -= 0.5 } let X = [Double](stride(from: Double(windowLength) - pos - 1, through: -pos, by: -1)) let P = [Double](stride(from: 0, through: Double(polynomialOrder), by: 1)) let A = P.map { exponent in X.map { pow($0, exponent) } } var B = [Double](repeating: 0, count: polynomialOrder + 1) B[derivativeOrder] = Double(factorial(derivativeOrder)) / pow(Double(delta), Double(derivativeOrder)) return leastSquaresSolution(A: A, B: B) } static func leastSquaresSolution(A: [[Double]], B: [Double]) -> [Double] { let sparseA = A.sparseMatrix() var sparseAValuesCopy = sparseA.values var xValues = [Double](repeating: 0, count: A.transpose().count) var bValues = B sparseAValuesCopy.withUnsafeMutableBufferPointer { valuesPtr in let a = SparseMatrix_Double( structure: sparseA.structure, data: valuesPtr.baseAddress! ) bValues.withUnsafeMutableBufferPointer { bPtr in xValues.withUnsafeMutableBufferPointer { xPtr in let b = DenseVector_Double( count: Int32(B.count), data: bPtr.baseAddress! ) let x = DenseVector_Double( count: Int32(A.transpose().count), data: xPtr.baseAddress! ) #warning("EXC_BAD_ACCESS is thrown below") print("This code is executed...") let status = SparseSolve(SparseLSMR(), a, b, x, SparsePreconditionerDiagScaling) print("...but, if an EXC_BAD_ACCESS is thrown, this code isn't") if status != SparseIterativeConverged { fatalError("Failed to converge. Returned with error \(status).") } } } } return xValues } } func factorial(_ n: Int) -> Int { n < 2 ? 1 : n * factorial(n - 1) } extension Array where Element == [Double] { func sparseMatrix() -> (structure: SparseMatrixStructure, values: [Double]) { let columns = self.transpose() var rowIndices: [Int32] = columns.map { column in column.indices.compactMap { indexInColumn in if column[indexInColumn] != 0 { return Int32(indexInColumn) } return nil } }.reduce([], +) let sparseColumns = columns.map { column in column.compactMap { if $0 != 0 { return $0 } return nil } } var counter = 0 var columnStarts = [Int]() for sparseColumn in sparseColumns { columnStarts.append(counter) counter += sparseColumn.count } let reducedSparseColumns = sparseColumns.reduce([], +) columnStarts.append(reducedSparseColumns.count) let structure: SparseMatrixStructure = rowIndices.withUnsafeMutableBufferPointer { rowIndicesPtr in columnStarts.withUnsafeMutableBufferPointer { columnStartsPtr in let attributes = SparseAttributes_t() return SparseMatrixStructure( rowCount: Int32(self.count), columnCount: Int32(columns.count), columnStarts: columnStartsPtr.baseAddress!, rowIndices: rowIndicesPtr.baseAddress!, attributes: attributes, blockSize: 1 ) } } return (structure, reducedSparseColumns) } func transpose() -> Self { let columns = self.count let rows = self.reduce(0) { Swift.max($0, $1.count) } return (0 ..< rows).reduce(into: []) { result, row in result.append((0 ..< columns).reduce(into: []) { result, column in result.append(row < self[column].count ? self[column][row] : 0) }) } } }

Programming Languages Swift iOS Swift Accelerate Xcode Sanitizers and Runtime Issues

11

0

1.3k

Jul ’24

Data storage for a Matrix struct when working with Accelerate

I have a Matrix structure as defined below for working with 2D numerical data in Accelerate. The underlying numerical data in this Matrix struct is stored as an Array. struct Matrix<T> { let rows: Int let columns: Int var data: [T] init(rows: Int, columns: Int, fill: T) { self.rows = rows self.columns = columns self.data = Array(repeating: fill, count: rows * columns) } init(rows: Int, columns: Int, source: (inout UnsafeMutableBufferPointer<T>) -> Void) { self.rows = rows self.columns = columns self.data = Array(unsafeUninitializedCapacity: rows * columns) { buffer, initializedCount in source(&buffer) initializedCount = rows * columns } } subscript(row: Int, column: Int) -> T { get { return self.data[(row * self.columns) + column] } set { self.data[(row * self.columns) + column] = newValue } } } Multiplication is implemented by the functions shown below. import Accelerate infix operator .* func .* (lhs: Matrix<Double>, rhs: Matrix<Double>) -> Matrix<Double> { precondition(lhs.rows == rhs.rows && lhs.columns == rhs.columns, "Matrices must have same dimensions") let result = Matrix<Double>(rows: lhs.rows, columns: rhs.columns) { buffer in vDSP.multiply(lhs.data, rhs.data, result: &buffer) } return result } func * (lhs: Matrix<Double>, rhs: Matrix<Double>) -> Matrix<Double> { precondition(lhs.columns == rhs.rows, "Number of columns in left matrix must equal number of rows in right matrix") var a = lhs.data var b = rhs.data let m = lhs.rows // number of rows in matrices A and C let n = rhs.columns // number of columns in matrices B and C let k = lhs.columns // number of columns in matrix A; number of rows in matrix B let alpha = 1.0 let beta = 0.0 // matrix multiplication where C ← αAB + βC let c = Matrix<Double>(rows: lhs.rows, columns: rhs.columns) { buffer in cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, alpha, &a, k, &b, n, beta, buffer.baseAddress, n) } return c } I can also define a Matrix structure where the underlying data is an UnsafeMutableBufferPointer. The buffer is handled by the MatrixData class. struct Matrix<T> { let rows: Int let columns: Int var data: MatrixData<T> init(rows: Int, columns: Int, fill: T) { self.rows = rows self.columns = columns self.data = MatrixData(count: rows * columns, fill: fill) } init(rows: Int, columns: Int) { self.rows = rows self.columns = columns self.data = MatrixData(count: rows * columns) } subscript(row: Int, column: Int) -> T { get { return self.data.buffer[(row * self.columns) + column] } set { self.data.buffer[(row * self.columns) + column] = newValue } } } class MatrixData<T> { var buffer: UnsafeMutableBufferPointer<T> var baseAddress: UnsafeMutablePointer<T> { get { self.buffer.baseAddress! } } init(count: Int, fill: T) { let start = UnsafeMutablePointer<T>.allocate(capacity: count) self.buffer = UnsafeMutableBufferPointer(start: start, count: count) self.buffer.initialize(repeating: fill) } init(count: Int) { let start = UnsafeMutablePointer<T>.allocate(capacity: count) self.buffer = UnsafeMutableBufferPointer(start: start, count: count) } deinit { self.buffer.deinitialize() self.buffer.deallocate() } } Multiplication for this approach is implemented by the functions shown here. import Accelerate infix operator .* func .* (lhs: Matrix<Double>, rhs: Matrix<Double>) -> Matrix<Double> { precondition(lhs.rows == rhs.rows && lhs.columns == rhs.columns, "Matrices must have same dimensions") let result = Matrix<Double>(rows: lhs.rows, columns: lhs.columns) vDSP.multiply(lhs.data.buffer, rhs.data.buffer, result: &result.data.buffer) return result } func * (lhs: Matrix<Double>, rhs: Matrix<Double>) -> Matrix<Double> { precondition(lhs.columns == rhs.rows, "Number of columns in left matrix must equal number of rows in right matrix") let a = lhs.data.baseAddress let b = rhs.data.baseAddress let m = lhs.rows // number of rows in matrices A and C let n = rhs.columns // number of columns in matrices B and C let k = lhs.columns // number of columns in matrix A; number of rows in matrix B let alpha = 1.0 let beta = 0.0 // matrix multiplication where C ← αAB + βC let c = Matrix<Double>(rows: lhs.rows, columns: rhs.columns) cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, alpha, a, k, b, n, beta, c.data.baseAddress, n) return c } Both of these approaches give me similar performance. The only difference that I have noticed is the matrix buffer approach allows for reference semantics. For example, the code below uses half the memory with the matrix buffer approach compared to the matrix array approach. This is because b acts as a reference to a using the matrix buffer approach; otherwise, the matrix array approach makes a full copy of a. let n = 10_000 let a = Matrix<Double>(rows: n, columns: n, fill: 0) var b = a b[0, 0] = 99 b[0, 1] = 22 Other than reference semantics, are there any reasons to use one of these approaches over the other?

Programming Languages Swift Accelerate

3

0

889

Jun ’24

compute Eigenvalues/EigenVector

I am looking for code that computes the Eigenvalues and Eigenvectors using the Accelerate Sparse Matrix library.

App & System Services Core OS Accelerate

1

0

642

May ’24

Problem with vImagePiecewiseGamma_Planar8

In our app we use the following function for inverting a CGImageRef using vImage. The workflow is a obj-c version of the code in the AdjustingTheBrightnessAndContrastOfAnImage sample from Apple: CGImageRef InvertImage( CGImageRef frameImageRef ) { CGImageRef resultImage = nil; CGBitmapInfo imgBitmapInfo = CGImageGetBitmapInfo( frameImageRef ); size_t img_bPC = CGImageGetBitsPerComponent( frameImageRef ); size_t img_bPP = CGImageGetBitsPerPixel( frameImageRef ); vImage_CGImageFormat invIFormat; invIFormat.bitsPerComponent = img_bPC; invIFormat.bitsPerPixel = img_bPP; invIFormat.colorSpace = (img_bPP == 8) ? gDeviceGrayColorSpaceRef : gDeviceRGBColorSpaceRef; invIFormat.bitmapInfo = imgBitmapInfo; invIFormat.version = 0; invIFormat.decode = 0; invIFormat.renderingIntent = kCGRenderingIntentDefault; vImage_Buffer sourceVImageBuffer; vImage_Error viErr = vImageBuffer_InitWithCGImage( &sourceVImageBuffer, &invIFormat, nil, frameImageRef, kvImageNoFlags ); if (viErr == kvImageNoError) { vImage_Buffer destinationVImageBuffer; viErr = vImageBuffer_Init( &destinationVImageBuffer, sourceVImageBuffer.height, sourceVImageBuffer.width, img_bPP, kvImageNoFlags ); if (viErr == kvImageNoError) { float linearCoeffs[2] = { -1.0, 1.0 }; float expoCoeffs[3] = { 1.0, 0.0, 0.0 }; float gamma = 0.0; Pixel_8 boundary = 255; viErr = vImagePiecewiseGamma_Planar8( &sourceVImageBuffer, &destinationVImageBuffer, expoCoeffs, gamma, linearCoeffs, boundary, kvImageNoFlags ); if (viErr == kvImageNoError) { CGImageRef newImgRef = vImageCreateCGImageFromBuffer( &destinationVImageBuffer, &invIFormat, nil, nil, kvImageNoFlags, &viErr ); if (viErr == kvImageNoError) resultImage = newImgRef; } free( destinationVImageBuffer.data ); } free( sourceVImageBuffer.data ); } return resultImage; } The function works fine for 8-bit monochrome images. When I try it with 24-bit RGB images, although I get no errors from any of the calls, the output shows only the 1/3 of the image inverted as expected. What am I missing? I suspect I might have to use a different function for 24-bit images (instead of the vImagePiecewiseGamma_Planar8) but I cannot find which one in the headers. Thanks.

App & System Services Core OS Accelerate Core Graphics

9

0

1.1k

May ’24

Post

Replies

Boosts

Views

Activity

Accelerate

Posts under Accelerate tag

Post

Replies

Boosts

Views

Activity