Accelerate

RSS for tag

Make large-scale mathematical computations and image calculations with high-performance, energy-efficient computation using Accelerate.

Posts under Accelerate tag

84 Posts

Post

Replies

Boosts

Views

Activity

How to sort the indices of Slice<UnsafeMutableBufferPointer<Double>> using vDSP?
I’m using buffer in many other vDSP operations before the code below. To those operations it is naturally a contiguous piece of memory, and that’s handy as there are vastly reduced number of calls to vDSP. // […] many operations with buffer here var buffer = UnsafeMutableBufferPointer<Double>.allocate(capacity: 10000000) var sortedIndices: UnsafeMutablePointer<UInt> = UnsafeMutablePointer<UInt>.allocate(capacity: 10000000) for (indexIndices, indexC) in buffer.indices.enumerated() { sortedIndices[indexIndices] = UInt(indexC) } vDSP_vsortiD( UnsafePointer<Double>(buffer.baseAddress)!, sortedIndices, nil, UInt(buffer.count), vDSP.SortOrder.descending.rawValue ) buffer needs to be sorted sliced up (a grouped sort, where each Slice corresponds to some special attribute of the value and it needs to be sorted separately because the resulting data is used separately later). Also, copying the buffer’s contents into smaller buffers before every sort has to be avoided as performance is critical (as many times as possible per second). vDSP_vsortiD() and vDSP_vsorti() do not support Slice. For example this code is not possible since .baseAddress doesn’t exist on a Slice and I guess UnsafePointer doesn’t understand Slice either. var sliceOfBuffer: Slice<UnsafeMutableBufferPointer<Double>> = buffer[0...30000] vDSP_vsortiD( UnsafePointer<Double>(sliceOfBuffer.baseAddress)!, // This fails sortedIndices[0...30000], // This also fails nil, UInt(30000), vDSP.SortOrder.descending.rawValue ) // Is there another sort indices function which accepts those types as params? Many operations under vDSP accept Slice, however the above sort functions don’t. Swift v5.6
1
1
1.3k
Sep ’22
Random hangs on M1 in Apple Accelerate when performing sparse Cholesky factorization
Hello, I'm trying to use Accelerate's sparse Cholesky solver but met an issue on M1. Is anyone aware of this issue or similar ones? The sparse factorization class _SparseNumericFactorSymmetric in Apple Accelerate will hang randomly for some large symmetric positive definite matrices with a specific sparsity pattern when created for Cholesky factorization. This issue happens in probability about 1/1000. It can be reproduced through following steps Read and load the attached sparse matrix (in MatrixMarket format). In a for-loop, repeatedly factorize the same matrix through _SparseNumericFactorSymmetric for 1000~2000 times. Matrix: https://www.dropbox.com/s/2pyl0cpmgy1qdrh/mat.mtx.zip?dl=0 The following code calls Apple Accelerate through a wrapper from Eigen 3.4.9 (https://eigen.tuxfamily.org/dox/group__AccelerateSupport__Module.html): #include <unsupported/Eigen/SparseExtra> #include <Eigen/AccelerateSupport> #include <iostream> int main() { Eigen::SparseMatrix<double> A; Eigen::loadMarket(A, "mat.mtx"); for (int i = 0; i < 2000; ++i) { Eigen::AccelerateLLT<Eigen::SparseMatrix<double>> solver; solver.compute(A); std::cout << i << std::endl; } return 0; } The factorizations should perform smoothly for the times specified in the for-loop. But the app will hang and stop being responsive infinitely after performing the factorization for hundreds of times (sometimes ~500+ loops, or sometimes ~900+ loops). I'm using a Xcode Version 13.1 (13A1030d) on MacOS Monterey 12.5.1 (21G83), with Apple M1 Max. It seems this issue is related to multithreading in VecLib since it only happens when we leave the environment variable unset (i.e., VECLIB_MAXIMUM_THREADS > 1). Setting VECLIB_MAXIMUM_THREADS = 1 will eliminate the issue at the cost of losing performance. And this issue only happens on M1 Macs, not on Intel-based ones.
0
0
1.1k
Aug ’22
CVPixelBufferGetBytesPerRowOfPlane strange result
Hello everyone. I'm decoding an H264 elementary stream with VideoToolbox. The VTDecompressionSession's output callback returns a CVImageBuffer successfully. While attempting to convert this YUV image to RGB, I prepare a luma vImage_Buffer and chroma vImage_Buffer, but I get strange values from CVPixelBufferGetBytesPerRowOfPlane: CVPixelBufferGetBaseAddressOfPlane(0) 0x10e98c000 CVPixelBufferGetHeightOfPlane(0) 1080 CVPixelBufferGetWidthOfPlane(0) 1440 CVPixelBufferGetBytesPerRowOfPlane(0) 46080 CVPixelBufferGetBaseAddressOfPlane(1) 0x10eb2b000 CVPixelBufferGetHeightOfPlane(1) 540 CVPixelBufferGetWidthOfPlane(1) 720 CVPixelBufferGetBytesPerRowOfPlane(1) 23040 If I use this value (46080) in initializing the vImage_Buffer's rowBytes field, vImageConvert_420Yp8_CbCr8ToARGB8888 crashes. If I simply set rowBytes=width, the conversion is successful. Has anyone seen this before? Notice the value returned is 32 × width. Why?
2
0
1.3k
Aug ’22
Symbol not found vDSP.dot
Hi there, I have following crash during application start on iOS below 15.0, even in simulator dyld: Symbol not found: _$s10Accelerate4vDSPO3dotySfx_q_tAA0A6BufferRzAaER_Sf7ElementRtzSfAFRt_r0_lFZ Referenced from: /Users/Nikita/Library/Developer/CoreSimulator/Devices/21071B75-CC1F-46E5-94B3-5E01072B90C6/data/Containers/Bundle/Application/73D74763-449A-4E20-B7EB-8807008B335B/vDSP test.app/vDSP test Expected in: /usr/lib/swift/libswiftAccelerate.dylib Sample code is very simple and looks like this:         let vector1: [Float] = [0, 1, 2, 3, 4, 5, 6, 7]         let vector2: [Float] = [0, 1, 2, 3, 4, 5, 6, 7]         // Causes crash on application start         let result = vDSP.dot(vector1, vector2)         // Works fine         let resultMul = vDSP.multiply(vector1, vector2)         let result1 = vDSP.sum(resultMul) According to docs, vDSP.dot is available since iOS 13.0.
3
0
1.3k
Aug ’22
How to update cofficients of vDSP_biquad_SetupD???
Hi there I'm writing some audio plug-ins that use biquad filtering of incoming audio. The audio is supplied to me as vectors of doubles. I am using the Accelerate callbacks of vDSP_biquad_CreateSetupD, vDSP_biquad_DestroySetupD and vDSP_biquadD on a vDSP_biquad_SetupD struct. When the user changes the filter parameters, I want to update the coefficients of the biquad filter. I assumed that I would be able to use the new vDSP_biquad_SetCoefficientsDouble callback, but that requires a vDSP_biquad_Setup rater than a vDSP_biquad_SetupD — i.e. a single-precision vector, rather than the double-precision vector that I would have thought it would require. Is that an error? How do I update the coefficients of a double-precision object? Thanks in advance, Michael
1
0
1.1k
Jun ’22
reading ldoor matrix and incomplete factorization?
I tried to read in the ldoor matrix and attempted the LLT factorization but it gives me: "parseLdoor[55178:5595352] Factored does not hold a completed matrix factorization. (lldb)" Because the ldoor matrix is large I have not been able to discover the issue. I am unsure if the matrix data was converted correctly via the SparseConvertFromCoordinate function. Otoh, I was able to use the same code to get the correct answers for the simple 4x4 example used in the Sparse Solver documentation. Any help would be appreciated. Here is my code ... without the ldoor matrix
2
0
1.1k
Apr ’22
Why does Accelerate appear so out of place in terms of naming style?
Reading a solution given in a book to adding the elements of an input array of doubles, an example is given with Accelerate as func challenge52c(numbers: [Double]) -> Double { var result: Double = 0.0 vDSP_sveD(numbers, 1, &result, vDSP_Length(numbers.count)) return result } I can understand why Accelerate API's don't adhere to Swift API design guidelines, why is it that they don't seem to use Cocoa guidelines either? Are there other conventions or precedents that I'm missing?
2
0
935
Apr ’22
vDSP.correlate(_:withKernel:) — meaning of output
My clients are medical researchers researching methods for characterizing patients' gait from raw accelerometry by matching the data stream against a set of "templates." for the various characteristics. Their (labyrinthine) pseudocode appears to slide a snippet ("template;" kernel?) across a data stream looking for goodness-of-fit by correlation coefficient. This is for each of several templates, so performance is at a premium. As I read the name and the 13-word description, vDSP.correlate(_:withKernel:) does this — in some way. However, the set of numbers that emerge from my playground experiments don't make sense: Identical segments are scored 0.0 (should be 1.0, right?). Merely similar matches show values barely distinguishable from the rest, and are often well outside the range -1.0 ... 1.0. Clearly I'm doing it wrong. Web searches don't tell me anything, but I'm naïve on the subject. Am I mistaken in hoping this vDSP function does what I want? "Yes, you're mistaken" is an acceptable answer. Bonus if you can point me to a correct solution. If I'm on the right track, how can I generate inputs so I can interpret the output as fits my needs? Note: Both streams are normalized to µ = 0.0 and σ = 1.0, by vDSP, and validated by all the unit tests I've done so far.
2
0
2.0k
Mar ’22
vImage vs CoreImage vs MetalPerformaceShaders strengths and weaknesses
While the above three frameworks (viz. vImage, CoreImage, and MetalPerformaceShaders) serve different overall purposes, what are the strengths and weaknesses of the each of the three frameworks in terms of performance with respect to image processing? It seems that any of the three frameworks is highly performant; but where does each framework shine?
1
3
1.4k
Mar ’22
Undefined symbols for architecture arm64: "(extension in Accelerate)
Hi, Xcode fails to build a very simple code shown below ONLY IF build configuration is Debug and produces "Undefined symbols for architecture arm64:  "(extension in Accelerate):Accelerate.AccelerateMutableBuffer< where A: Swift.MutableCollection>.withUnsafeMutableBufferPointer((inout Swift.UnsafeMutableBufferPointer<A.Accelerate.AccelerateBuffer.Element>) throws -> A1) throws -> A1", referenced " If build configuration is Release, the build is success and it runs just fine. Note: The code below is a just sample one to reproduce the issue easily. I don't want to use Accelerate framework or those pointer functions in viewDidLoad() for the actual project. import UIKit import Accelerate class ViewController: UIViewController {   override func viewDidLoad() {     super.viewDidLoad()     // Do any additional setup after loading the view.          let b = UnsafeMutablePointer<Float>.allocate(capacity: 10)     var c = UnsafeMutableBufferPointer(start: b, count: 10)     c.withUnsafeMutableBufferPointer{ buf in       let base = buf.baseAddress       print("test ",base!)     }   } } Xcode version is 13.2.1 target iOS version is 15.2 The only workaround I know for now is just to build for release. But I really need debugger for a real project which I'm working on. Any help, advice or comment would be appreciated. Best Regards, Hikaru
1
0
1.3k
Feb ’22
Accelerate FFT: Intermittent crashes
I've implemented FFT using the Accelerate frame work and I'm not sure I've done it correctly. For starters, I don't like that the imaginary array is filled with zeroes, this seems to be a waste of memory. However the more serious issue is that I get intermittent crashes when using the vDSP API (malloc errors) I've read the online docs, tried to follow several online samples. Could somebody more knowledgeable with these APIs have a look? See attached file: FFT.swift
3
0
1.2k
Feb ’22
How to update destinationBuffer's width and height during a video buffer stream
I am using the Accelerate Framework to convert YUV Data to ARGB Data for a Video Call App. The framework works great, However when I hold calls I use a place holder image sent from the server. That image causes issues sometimes because of its size. Accelerate is telling me that it's range of interest is larger than the input buffer(roiLargerThanInputBuffer). I am not sure exactly how to address this issue. Any thoughts or suggestions would be greatly appreciated.  The problem was that my video buffer stream's pixel buffer width and height changed from the server side. That being said all that needed to be done is to check for when it changes and then remove the current vImage_buffer from memory and reinitialize a new one with the correct size. Is it proper to tell the accelerate framework to change the vImage_buffer width and height this way. It seems to work well.  if myBuffer.height != destinationBuffer.height {             free(destinationBuffer.data)                 error = vImageBuffer_Init(&destinationBuffer,                                           vImagePixelCount(myBuffer.height),                                           vImagePixelCount(myBuffer.height),                                           cgImageFormat.bitsPerPixel,                                           vImage_Flags(kvImageNoFlags))                 guard error == kvImageNoError else {                     return nil                 }         } Thanks
3
0
1.1k
Feb ’22
Using BLAS and LAPACK
I am really lost and very much a newbie to programming. I am trying to use the BLAS and LAPACK libraries. I am programming in VSCODE and at the terminal when I'm trying to run my code I am using this command - g++ MV_mult_sequential.cpp -I/usr/local/include -L/usr/local/lib -llapack -lblas. I think there is some way I could be using Accelerate to do this, but again I have no clue. I have no idea if I'm doing anything correctly. I could use some guidance.
2
0
3.1k
Jan ’22
how to speed up open source packages on M1max by Acclerate and Metal
Please help me, really urgent, please. The compatibilty of m1max chip troubled me hundreds of hour. 1、Please show me how to speed up source downloaded from github, such as numpy 、pandas or any other source, by fully using the CPU and GPU chips. (python3.8 and 3.9) can I do it just like this? Step 1: download source from github Step 2: create a file named "site.cfg"in this souce file, and add content: [accelerate] libraries=Metal, Acelerate, vecLib Step 3: Terminal: NPY_LAPACK_Order=accelerate python3 setup.py build Step 4: pip3 install . or python3 setup.py install ? (I am not sure which method to apply) 2、How is the compatibility of Accelate and Metal? Can work with most of the source? Any tips? such as https://github.com/microsoft/qlib 3、which gcc to install? show me the code when I do it, some error happens, gcc(version 4.2.1 installed by brew) cannot compile some source, such as "ecos". Moreover, I cannot compile many sources directly by python3 setup.py install (without accelerate) How to config the gcc? which version to use on m1max 4、sometimes I can compile source by brew. but extremely unconvenient, because I need to install packages on vitual environment (e.g. conda env)other than on base path. what should I do? can I install brew on vitual environment? or just use brew to build the source, then I install by pip on vitual env? or can I config the brew to install on only vitual environment? Just show me the code 5、to compile, do I also need to install g++? witch version, show me the code 6、show me how to speed up python program by GPU and parallel computing on Accelerate
2
0
2k
Jan ’22
BNNSLayerParametersLSTM with hiddenSize != inputSize
Hi all, I've spent some time experimenting with the BNNS (Accelerate) LSTM-related APIs lately and despite a distinct lack of documentation (even though the headers have quite a few) a got most things to a point where I think I know what's going on and I get the expected results. However, one thing I have not been able to do is to get this working if inputSize != hiddenSize. I am currently only concerned with a simple unidirectional LSTM with a single layer but none of my permutations of gate "iw_desc" matrices with various 2D layouts and reordering input-size/hidden-size made any difference, ultimately BNNSDirectApplyLSTMBatchTrainingCaching always returns -1 as an indication of error. Any help would be greatly appreciated. PS: The bnns.h framework header file claims that "When a parameter is invalid or an internal error occurs, an error message will be logged. Some combinations of parameters may not be supported. In that case, an info message will be logged.", and yet, I've not been able to find any such messages logged to NSLog() or stderr or Console. Is there a magic environment variable that I need to set to get more verbose logging?
0
0
793
Dec ’21
Why does the execution of vDSP operations sometimes take longer in M1 native code than through Rosetta translation?
Hi I am porting some applications to M1 that make extensive use of vDSP. I found in many cases there to be a minimal speed-up, which I put down to Rosetta doing a good job translating SSE instructions into equivalent Neon instructions in the vDSP library. To try and understand this more I started profiling various areas of code and have found situations where the performance of translated code runs faster than natively. Often native code speed is similar or faster as expected, but there are a notable numbers of cases where it is not. This is not what I expected. I include a sample below to show a somewhat contrived and trivial routine exhibiting the effect. I have built it using XCode 12.5.1 in Release with an 11.3 deployment target. The Mac is running macOS 11.6. On my M1 Mac mini the Rosetta build takes around 900-1000 µs to run to completion, switching to native code it takes around 1500-1600 µs. I can make various adjustments to the data size or types of vDSP operations used to find scenarios where native builds are faster, that is not difficult, but it shouldn't be necessary. I can understand why vDSP could perhaps perform similarly across native vs translated runs, but surely it should never be the case that translated code could beat native code by a margin like this. What is going on, and is it expected? Thanks, Matt #include <iostream> #include <sys/types.h> #include <sys/sysctl.h> // determine if process is running through Rosetta translation int processIsTranslated() {   int ret = 0;   size_t size = sizeof(ret);   if (sysctlbyname("sysctl.proc_translated", &ret, &size, NULL, 0) == -1)   {    if (errno == ENOENT)      return 0;    return -1;   }   return ret; } int main(int argc, const char * argv[]) {   // print translation status   if(processIsTranslated() == 1)     std::cout << "Rosetta" << std::endl;   else     std::cout << "Native" << std::endl;       // size of test   vDSP_Length array_len = 512;   const int iterations = 10000;       // allocate and clear memory   float* buf1_ptr = (float*)malloc(array_len * sizeof(float));   float* buf2_ptr = (float*)malloc(array_len * sizeof(float));   float* buf3_ptr = (float*)malloc(array_len * sizeof(float));   float* buf4_ptr = (float*)malloc(array_len * sizeof(float));   if(!buf1_ptr) return EXIT_FAILURE;   if(!buf2_ptr) return EXIT_FAILURE;   if(!buf3_ptr) return EXIT_FAILURE;   if(!buf4_ptr) return EXIT_FAILURE;   memset(buf1_ptr, 0, array_len * sizeof(float));   memset(buf2_ptr, 0, array_len * sizeof(float));   memset(buf3_ptr, 0, array_len * sizeof(float));   memset(buf4_ptr, 0, array_len * sizeof(float));       // start timer   __uint64_t start_ns = clock_gettime_nsec_np(CLOCK_UPTIME_RAW);   // scalar constants   const float scalar1 = 10;   const float scalar2 = 11;   // loop test   for(int i = 0; i < iterations; i++)   {     vDSP_vsadd(buf1_ptr, 1, &scalar1, buf2_ptr, 1, array_len);     vDSP_vsadd(buf1_ptr, 1, &scalar2, buf3_ptr, 1, array_len);     vDSP_vadd(buf2_ptr, 1, buf3_ptr, 1, buf4_ptr, 1, array_len);   }       // report test time   __uint64_t end_ns = clock_gettime_nsec_np(CLOCK_UPTIME_RAW);   double time_us = (end_ns - start_ns) / 1000.f;   std::cout << time_us << " us" << std::endl;       // clean up   if(buf1_ptr) free(buf1_ptr);   if(buf2_ptr) free(buf2_ptr);   if(buf3_ptr) free(buf3_ptr);       return 0; }
1
0
1.1k
Oct ’21
vDSP.convolve incorrectly reverses kernel?
vDSP.convolve() reverses the kernel before applying it. For example, the following uses a kernel of 10 elements where the first element is 1.0 and the rest of the elements are 0.0. Applying this kernel to a vector should return the same vector. let values = (0 ..< 30).map { Double($0) } var kernel = Array.init(repeating: 0.0, count: 10) kernel[0] = 1.0 let result = vDSP.convolve(values, withKernel: kernel) print("kernel: \(kernel)") print("values: \(values)") print("result: \(result)") Applied to a values array containing elements 0.0, 1.0, 2.0, etc. the first results should be 0.0, 1.0, 2.0, etc, but instead the results start at 9.0 and increase from there: kernel: [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] values: [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0] result: [9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0] If instead the kernel is reversed, placing the 1.0 at the end of the kernel: let values = (0 ..< 30).map { Double($0) } var kernel = Array.init(repeating: 0.0, count: 10) kernel[9] = 1.0 let result = vDSP.convolve(values, withKernel: kernel) print("kernel: \(kernel)") print("values: \(values)") print("result: \(result)") The results are now correct: kernel: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0] values: [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0] result: [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0]
2
0
881
Aug ’21
vDSP.convolve returns wrong sized array?
vDSP.convolve() returns an array with length: values.count - kernel.count But shouldn't the result array have length: values.count - kernel.count + 1 I ran the following which prints out the size of the results array with various combinations of values and kernel lengths: for i in 0 ..< 10 {   let values = Array.init(repeating: 1.0, count: 1000 + i)   for j in 0 ..< 10 {     let kernel = Array.init(repeating: 1.0, count: 100 + j)     let result = vDSP.convolve(values, withKernel: kernel)           print("values[\(values.count)], kernel[\(kernel.count)], result[\(result.count)], result[\(result.count - 1)] = \(result[result.count - 1])")   } } As you can see the results array always has length values.count - kernel.count: values[1000], kernel[100], result[900], result[899] = 100.0 values[1000], kernel[101], result[899], result[898] = 101.0 values[1000], kernel[102], result[898], result[897] = 102.0 values[1000], kernel[103], result[897], result[896] = 103.0 values[1000], kernel[104], result[896], result[895] = 104.0 values[1000], kernel[105], result[895], result[894] = 105.0 values[1000], kernel[106], result[894], result[893] = 106.0 values[1000], kernel[107], result[893], result[892] = 107.0 values[1000], kernel[108], result[892], result[891] = 108.0 values[1000], kernel[109], result[891], result[890] = 109.0 values[1001], kernel[100], result[901], result[900] = 100.0 values[1001], kernel[101], result[900], result[899] = 101.0 values[1001], kernel[102], result[899], result[898] = 102.0 values[1001], kernel[103], result[898], result[897] = 103.0 values[1001], kernel[104], result[897], result[896] = 104.0 values[1001], kernel[105], result[896], result[895] = 105.0 ... However, the result array should have length values.count - kernel.count + 1. For example, if instead of using the returned result array, a result array is passed to vDSP.convolve, with length values.count - kernel.count + 1 the last value has a valid result: for i in 0 ..< 10 {   let values = Array.init(repeating: 1.0, count: 1000 + i)   for j in 0 ..< 10 {     let kernel = Array.init(repeating: 1.0, count: 100 + j)     var result = Array.init(repeating: 0.0, count: values.count - kernel.count + 1)     vDSP.convolve(values, withKernel: kernel, result: &result)           print("values[\(values.count)], kernel[\(kernel.count)], result[\(result.count)], result[\(result.count - 1)] = \(result[result.count - 1])")   } } values[1000], kernel[100], result[901], result[900] = 100.0 values[1000], kernel[101], result[900], result[899] = 101.0 values[1000], kernel[102], result[899], result[898] = 102.0 values[1000], kernel[103], result[898], result[897] = 103.0 values[1000], kernel[104], result[897], result[896] = 104.0 values[1000], kernel[105], result[896], result[895] = 105.0 values[1000], kernel[106], result[895], result[894] = 106.0 values[1000], kernel[107], result[894], result[893] = 107.0 values[1000], kernel[108], result[893], result[892] = 108.0 values[1000], kernel[109], result[892], result[891] = 109.0 values[1001], kernel[100], result[902], result[901] = 100.0 values[1001], kernel[101], result[901], result[900] = 101.0 values[1001], kernel[102], result[900], result[899] = 102.0 values[1001], kernel[103], result[899], result[898] = 103.0 values[1001], kernel[104], result[898], result[897] = 104.0 values[1001], kernel[105], result[897], result[896] = 105.0 If the result array is created with length values.count - kernel.count + 2 then we get the following runtime error: error: Execution was interrupted, reason: EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0). The process has been left at the point where it was interrupted, use "thread return -x" to return to the state before expression evaluation. Indicating the extra element in the result array is valid and vDSP.convolve() is returning a result array which is one element too short.
1
0
850
Aug ’21
How to sort the indices of Slice<UnsafeMutableBufferPointer<Double>> using vDSP?
I’m using buffer in many other vDSP operations before the code below. To those operations it is naturally a contiguous piece of memory, and that’s handy as there are vastly reduced number of calls to vDSP. // […] many operations with buffer here var buffer = UnsafeMutableBufferPointer<Double>.allocate(capacity: 10000000) var sortedIndices: UnsafeMutablePointer<UInt> = UnsafeMutablePointer<UInt>.allocate(capacity: 10000000) for (indexIndices, indexC) in buffer.indices.enumerated() { sortedIndices[indexIndices] = UInt(indexC) } vDSP_vsortiD( UnsafePointer<Double>(buffer.baseAddress)!, sortedIndices, nil, UInt(buffer.count), vDSP.SortOrder.descending.rawValue ) buffer needs to be sorted sliced up (a grouped sort, where each Slice corresponds to some special attribute of the value and it needs to be sorted separately because the resulting data is used separately later). Also, copying the buffer’s contents into smaller buffers before every sort has to be avoided as performance is critical (as many times as possible per second). vDSP_vsortiD() and vDSP_vsorti() do not support Slice. For example this code is not possible since .baseAddress doesn’t exist on a Slice and I guess UnsafePointer doesn’t understand Slice either. var sliceOfBuffer: Slice<UnsafeMutableBufferPointer<Double>> = buffer[0...30000] vDSP_vsortiD( UnsafePointer<Double>(sliceOfBuffer.baseAddress)!, // This fails sortedIndices[0...30000], // This also fails nil, UInt(30000), vDSP.SortOrder.descending.rawValue ) // Is there another sort indices function which accepts those types as params? Many operations under vDSP accept Slice, however the above sort functions don’t. Swift v5.6
Replies
1
Boosts
1
Views
1.3k
Activity
Sep ’22
Random hangs on M1 in Apple Accelerate when performing sparse Cholesky factorization
Hello, I'm trying to use Accelerate's sparse Cholesky solver but met an issue on M1. Is anyone aware of this issue or similar ones? The sparse factorization class _SparseNumericFactorSymmetric in Apple Accelerate will hang randomly for some large symmetric positive definite matrices with a specific sparsity pattern when created for Cholesky factorization. This issue happens in probability about 1/1000. It can be reproduced through following steps Read and load the attached sparse matrix (in MatrixMarket format). In a for-loop, repeatedly factorize the same matrix through _SparseNumericFactorSymmetric for 1000~2000 times. Matrix: https://www.dropbox.com/s/2pyl0cpmgy1qdrh/mat.mtx.zip?dl=0 The following code calls Apple Accelerate through a wrapper from Eigen 3.4.9 (https://eigen.tuxfamily.org/dox/group__AccelerateSupport__Module.html): #include <unsupported/Eigen/SparseExtra> #include <Eigen/AccelerateSupport> #include <iostream> int main() { Eigen::SparseMatrix<double> A; Eigen::loadMarket(A, "mat.mtx"); for (int i = 0; i < 2000; ++i) { Eigen::AccelerateLLT<Eigen::SparseMatrix<double>> solver; solver.compute(A); std::cout << i << std::endl; } return 0; } The factorizations should perform smoothly for the times specified in the for-loop. But the app will hang and stop being responsive infinitely after performing the factorization for hundreds of times (sometimes ~500+ loops, or sometimes ~900+ loops). I'm using a Xcode Version 13.1 (13A1030d) on MacOS Monterey 12.5.1 (21G83), with Apple M1 Max. It seems this issue is related to multithreading in VecLib since it only happens when we leave the environment variable unset (i.e., VECLIB_MAXIMUM_THREADS > 1). Setting VECLIB_MAXIMUM_THREADS = 1 will eliminate the issue at the cost of losing performance. And this issue only happens on M1 Macs, not on Intel-based ones.
Replies
0
Boosts
0
Views
1.1k
Activity
Aug ’22
CVPixelBufferGetBytesPerRowOfPlane strange result
Hello everyone. I'm decoding an H264 elementary stream with VideoToolbox. The VTDecompressionSession's output callback returns a CVImageBuffer successfully. While attempting to convert this YUV image to RGB, I prepare a luma vImage_Buffer and chroma vImage_Buffer, but I get strange values from CVPixelBufferGetBytesPerRowOfPlane: CVPixelBufferGetBaseAddressOfPlane(0) 0x10e98c000 CVPixelBufferGetHeightOfPlane(0) 1080 CVPixelBufferGetWidthOfPlane(0) 1440 CVPixelBufferGetBytesPerRowOfPlane(0) 46080 CVPixelBufferGetBaseAddressOfPlane(1) 0x10eb2b000 CVPixelBufferGetHeightOfPlane(1) 540 CVPixelBufferGetWidthOfPlane(1) 720 CVPixelBufferGetBytesPerRowOfPlane(1) 23040 If I use this value (46080) in initializing the vImage_Buffer's rowBytes field, vImageConvert_420Yp8_CbCr8ToARGB8888 crashes. If I simply set rowBytes=width, the conversion is successful. Has anyone seen this before? Notice the value returned is 32 × width. Why?
Replies
2
Boosts
0
Views
1.3k
Activity
Aug ’22
Symbol not found vDSP.dot
Hi there, I have following crash during application start on iOS below 15.0, even in simulator dyld: Symbol not found: _$s10Accelerate4vDSPO3dotySfx_q_tAA0A6BufferRzAaER_Sf7ElementRtzSfAFRt_r0_lFZ Referenced from: /Users/Nikita/Library/Developer/CoreSimulator/Devices/21071B75-CC1F-46E5-94B3-5E01072B90C6/data/Containers/Bundle/Application/73D74763-449A-4E20-B7EB-8807008B335B/vDSP test.app/vDSP test Expected in: /usr/lib/swift/libswiftAccelerate.dylib Sample code is very simple and looks like this:         let vector1: [Float] = [0, 1, 2, 3, 4, 5, 6, 7]         let vector2: [Float] = [0, 1, 2, 3, 4, 5, 6, 7]         // Causes crash on application start         let result = vDSP.dot(vector1, vector2)         // Works fine         let resultMul = vDSP.multiply(vector1, vector2)         let result1 = vDSP.sum(resultMul) According to docs, vDSP.dot is available since iOS 13.0.
Replies
3
Boosts
0
Views
1.3k
Activity
Aug ’22
How to update cofficients of vDSP_biquad_SetupD???
Hi there I'm writing some audio plug-ins that use biquad filtering of incoming audio. The audio is supplied to me as vectors of doubles. I am using the Accelerate callbacks of vDSP_biquad_CreateSetupD, vDSP_biquad_DestroySetupD and vDSP_biquadD on a vDSP_biquad_SetupD struct. When the user changes the filter parameters, I want to update the coefficients of the biquad filter. I assumed that I would be able to use the new vDSP_biquad_SetCoefficientsDouble callback, but that requires a vDSP_biquad_Setup rater than a vDSP_biquad_SetupD — i.e. a single-precision vector, rather than the double-precision vector that I would have thought it would require. Is that an error? How do I update the coefficients of a double-precision object? Thanks in advance, Michael
Replies
1
Boosts
0
Views
1.1k
Activity
Jun ’22
reading ldoor matrix and incomplete factorization?
I tried to read in the ldoor matrix and attempted the LLT factorization but it gives me: "parseLdoor[55178:5595352] Factored does not hold a completed matrix factorization. (lldb)" Because the ldoor matrix is large I have not been able to discover the issue. I am unsure if the matrix data was converted correctly via the SparseConvertFromCoordinate function. Otoh, I was able to use the same code to get the correct answers for the simple 4x4 example used in the Sparse Solver documentation. Any help would be appreciated. Here is my code ... without the ldoor matrix
Replies
2
Boosts
0
Views
1.1k
Activity
Apr ’22
Why does Accelerate appear so out of place in terms of naming style?
Reading a solution given in a book to adding the elements of an input array of doubles, an example is given with Accelerate as func challenge52c(numbers: [Double]) -> Double { var result: Double = 0.0 vDSP_sveD(numbers, 1, &result, vDSP_Length(numbers.count)) return result } I can understand why Accelerate API's don't adhere to Swift API design guidelines, why is it that they don't seem to use Cocoa guidelines either? Are there other conventions or precedents that I'm missing?
Replies
2
Boosts
0
Views
935
Activity
Apr ’22
vDSP.correlate(_:withKernel:) — meaning of output
My clients are medical researchers researching methods for characterizing patients' gait from raw accelerometry by matching the data stream against a set of "templates." for the various characteristics. Their (labyrinthine) pseudocode appears to slide a snippet ("template;" kernel?) across a data stream looking for goodness-of-fit by correlation coefficient. This is for each of several templates, so performance is at a premium. As I read the name and the 13-word description, vDSP.correlate(_:withKernel:) does this — in some way. However, the set of numbers that emerge from my playground experiments don't make sense: Identical segments are scored 0.0 (should be 1.0, right?). Merely similar matches show values barely distinguishable from the rest, and are often well outside the range -1.0 ... 1.0. Clearly I'm doing it wrong. Web searches don't tell me anything, but I'm naïve on the subject. Am I mistaken in hoping this vDSP function does what I want? "Yes, you're mistaken" is an acceptable answer. Bonus if you can point me to a correct solution. If I'm on the right track, how can I generate inputs so I can interpret the output as fits my needs? Note: Both streams are normalized to µ = 0.0 and σ = 1.0, by vDSP, and validated by all the unit tests I've done so far.
Replies
2
Boosts
0
Views
2.0k
Activity
Mar ’22
vImage vs CoreImage vs MetalPerformaceShaders strengths and weaknesses
While the above three frameworks (viz. vImage, CoreImage, and MetalPerformaceShaders) serve different overall purposes, what are the strengths and weaknesses of the each of the three frameworks in terms of performance with respect to image processing? It seems that any of the three frameworks is highly performant; but where does each framework shine?
Replies
1
Boosts
3
Views
1.4k
Activity
Mar ’22
Undefined symbols for architecture arm64: "(extension in Accelerate)
Hi, Xcode fails to build a very simple code shown below ONLY IF build configuration is Debug and produces "Undefined symbols for architecture arm64:  "(extension in Accelerate):Accelerate.AccelerateMutableBuffer< where A: Swift.MutableCollection>.withUnsafeMutableBufferPointer((inout Swift.UnsafeMutableBufferPointer<A.Accelerate.AccelerateBuffer.Element>) throws -> A1) throws -> A1", referenced " If build configuration is Release, the build is success and it runs just fine. Note: The code below is a just sample one to reproduce the issue easily. I don't want to use Accelerate framework or those pointer functions in viewDidLoad() for the actual project. import UIKit import Accelerate class ViewController: UIViewController {   override func viewDidLoad() {     super.viewDidLoad()     // Do any additional setup after loading the view.          let b = UnsafeMutablePointer<Float>.allocate(capacity: 10)     var c = UnsafeMutableBufferPointer(start: b, count: 10)     c.withUnsafeMutableBufferPointer{ buf in       let base = buf.baseAddress       print("test ",base!)     }   } } Xcode version is 13.2.1 target iOS version is 15.2 The only workaround I know for now is just to build for release. But I really need debugger for a real project which I'm working on. Any help, advice or comment would be appreciated. Best Regards, Hikaru
Replies
1
Boosts
0
Views
1.3k
Activity
Feb ’22
Accelerate FFT: Intermittent crashes
I've implemented FFT using the Accelerate frame work and I'm not sure I've done it correctly. For starters, I don't like that the imaginary array is filled with zeroes, this seems to be a waste of memory. However the more serious issue is that I get intermittent crashes when using the vDSP API (malloc errors) I've read the online docs, tried to follow several online samples. Could somebody more knowledgeable with these APIs have a look? See attached file: FFT.swift
Replies
3
Boosts
0
Views
1.2k
Activity
Feb ’22
C++ API for Accelerate Framework (SparseSolve)
Hi, I am a Julia/C++ developper and a total novice with apple ecosystem. I look for examples and documentation about the SparseSolve C++ API from the Accelerate framework. So far, I have only found Swift and Objective C documentation. Any hint form the community ?
Replies
3
Boosts
0
Views
2.6k
Activity
Feb ’22
How to update destinationBuffer's width and height during a video buffer stream
I am using the Accelerate Framework to convert YUV Data to ARGB Data for a Video Call App. The framework works great, However when I hold calls I use a place holder image sent from the server. That image causes issues sometimes because of its size. Accelerate is telling me that it's range of interest is larger than the input buffer(roiLargerThanInputBuffer). I am not sure exactly how to address this issue. Any thoughts or suggestions would be greatly appreciated.  The problem was that my video buffer stream's pixel buffer width and height changed from the server side. That being said all that needed to be done is to check for when it changes and then remove the current vImage_buffer from memory and reinitialize a new one with the correct size. Is it proper to tell the accelerate framework to change the vImage_buffer width and height this way. It seems to work well.  if myBuffer.height != destinationBuffer.height {             free(destinationBuffer.data)                 error = vImageBuffer_Init(&destinationBuffer,                                           vImagePixelCount(myBuffer.height),                                           vImagePixelCount(myBuffer.height),                                           cgImageFormat.bitsPerPixel,                                           vImage_Flags(kvImageNoFlags))                 guard error == kvImageNoError else {                     return nil                 }         } Thanks
Replies
3
Boosts
0
Views
1.1k
Activity
Feb ’22
Using BLAS and LAPACK
I am really lost and very much a newbie to programming. I am trying to use the BLAS and LAPACK libraries. I am programming in VSCODE and at the terminal when I'm trying to run my code I am using this command - g++ MV_mult_sequential.cpp -I/usr/local/include -L/usr/local/lib -llapack -lblas. I think there is some way I could be using Accelerate to do this, but again I have no clue. I have no idea if I'm doing anything correctly. I could use some guidance.
Replies
2
Boosts
0
Views
3.1k
Activity
Jan ’22
how to speed up open source packages on M1max by Acclerate and Metal
Please help me, really urgent, please. The compatibilty of m1max chip troubled me hundreds of hour. 1、Please show me how to speed up source downloaded from github, such as numpy 、pandas or any other source, by fully using the CPU and GPU chips. (python3.8 and 3.9) can I do it just like this? Step 1: download source from github Step 2: create a file named "site.cfg"in this souce file, and add content: [accelerate] libraries=Metal, Acelerate, vecLib Step 3: Terminal: NPY_LAPACK_Order=accelerate python3 setup.py build Step 4: pip3 install . or python3 setup.py install ? (I am not sure which method to apply) 2、How is the compatibility of Accelate and Metal? Can work with most of the source? Any tips? such as https://github.com/microsoft/qlib 3、which gcc to install? show me the code when I do it, some error happens, gcc(version 4.2.1 installed by brew) cannot compile some source, such as "ecos". Moreover, I cannot compile many sources directly by python3 setup.py install (without accelerate) How to config the gcc? which version to use on m1max 4、sometimes I can compile source by brew. but extremely unconvenient, because I need to install packages on vitual environment (e.g. conda env)other than on base path. what should I do? can I install brew on vitual environment? or just use brew to build the source, then I install by pip on vitual env? or can I config the brew to install on only vitual environment? Just show me the code 5、to compile, do I also need to install g++? witch version, show me the code 6、show me how to speed up python program by GPU and parallel computing on Accelerate
Replies
2
Boosts
0
Views
2k
Activity
Jan ’22
BNNSLayerParametersLSTM with hiddenSize != inputSize
Hi all, I've spent some time experimenting with the BNNS (Accelerate) LSTM-related APIs lately and despite a distinct lack of documentation (even though the headers have quite a few) a got most things to a point where I think I know what's going on and I get the expected results. However, one thing I have not been able to do is to get this working if inputSize != hiddenSize. I am currently only concerned with a simple unidirectional LSTM with a single layer but none of my permutations of gate "iw_desc" matrices with various 2D layouts and reordering input-size/hidden-size made any difference, ultimately BNNSDirectApplyLSTMBatchTrainingCaching always returns -1 as an indication of error. Any help would be greatly appreciated. PS: The bnns.h framework header file claims that "When a parameter is invalid or an internal error occurs, an error message will be logged. Some combinations of parameters may not be supported. In that case, an info message will be logged.", and yet, I've not been able to find any such messages logged to NSLog() or stderr or Console. Is there a magic environment variable that I need to set to get more verbose logging?
Replies
0
Boosts
0
Views
793
Activity
Dec ’21
Why does the execution of vDSP operations sometimes take longer in M1 native code than through Rosetta translation?
Hi I am porting some applications to M1 that make extensive use of vDSP. I found in many cases there to be a minimal speed-up, which I put down to Rosetta doing a good job translating SSE instructions into equivalent Neon instructions in the vDSP library. To try and understand this more I started profiling various areas of code and have found situations where the performance of translated code runs faster than natively. Often native code speed is similar or faster as expected, but there are a notable numbers of cases where it is not. This is not what I expected. I include a sample below to show a somewhat contrived and trivial routine exhibiting the effect. I have built it using XCode 12.5.1 in Release with an 11.3 deployment target. The Mac is running macOS 11.6. On my M1 Mac mini the Rosetta build takes around 900-1000 µs to run to completion, switching to native code it takes around 1500-1600 µs. I can make various adjustments to the data size or types of vDSP operations used to find scenarios where native builds are faster, that is not difficult, but it shouldn't be necessary. I can understand why vDSP could perhaps perform similarly across native vs translated runs, but surely it should never be the case that translated code could beat native code by a margin like this. What is going on, and is it expected? Thanks, Matt #include <iostream> #include <sys/types.h> #include <sys/sysctl.h> // determine if process is running through Rosetta translation int processIsTranslated() {   int ret = 0;   size_t size = sizeof(ret);   if (sysctlbyname("sysctl.proc_translated", &ret, &size, NULL, 0) == -1)   {    if (errno == ENOENT)      return 0;    return -1;   }   return ret; } int main(int argc, const char * argv[]) {   // print translation status   if(processIsTranslated() == 1)     std::cout << "Rosetta" << std::endl;   else     std::cout << "Native" << std::endl;       // size of test   vDSP_Length array_len = 512;   const int iterations = 10000;       // allocate and clear memory   float* buf1_ptr = (float*)malloc(array_len * sizeof(float));   float* buf2_ptr = (float*)malloc(array_len * sizeof(float));   float* buf3_ptr = (float*)malloc(array_len * sizeof(float));   float* buf4_ptr = (float*)malloc(array_len * sizeof(float));   if(!buf1_ptr) return EXIT_FAILURE;   if(!buf2_ptr) return EXIT_FAILURE;   if(!buf3_ptr) return EXIT_FAILURE;   if(!buf4_ptr) return EXIT_FAILURE;   memset(buf1_ptr, 0, array_len * sizeof(float));   memset(buf2_ptr, 0, array_len * sizeof(float));   memset(buf3_ptr, 0, array_len * sizeof(float));   memset(buf4_ptr, 0, array_len * sizeof(float));       // start timer   __uint64_t start_ns = clock_gettime_nsec_np(CLOCK_UPTIME_RAW);   // scalar constants   const float scalar1 = 10;   const float scalar2 = 11;   // loop test   for(int i = 0; i < iterations; i++)   {     vDSP_vsadd(buf1_ptr, 1, &scalar1, buf2_ptr, 1, array_len);     vDSP_vsadd(buf1_ptr, 1, &scalar2, buf3_ptr, 1, array_len);     vDSP_vadd(buf2_ptr, 1, buf3_ptr, 1, buf4_ptr, 1, array_len);   }       // report test time   __uint64_t end_ns = clock_gettime_nsec_np(CLOCK_UPTIME_RAW);   double time_us = (end_ns - start_ns) / 1000.f;   std::cout << time_us << " us" << std::endl;       // clean up   if(buf1_ptr) free(buf1_ptr);   if(buf2_ptr) free(buf2_ptr);   if(buf3_ptr) free(buf3_ptr);       return 0; }
Replies
1
Boosts
0
Views
1.1k
Activity
Oct ’21
Tensorflow acceleration on macOS
Would it be possible to use GPU acceleration when training a Tensorflow model on macOS? How’s the performance when we training the same model on an Apple-chip platform?
Replies
4
Boosts
0
Views
1.6k
Activity
Aug ’21
vDSP.convolve incorrectly reverses kernel?
vDSP.convolve() reverses the kernel before applying it. For example, the following uses a kernel of 10 elements where the first element is 1.0 and the rest of the elements are 0.0. Applying this kernel to a vector should return the same vector. let values = (0 ..< 30).map { Double($0) } var kernel = Array.init(repeating: 0.0, count: 10) kernel[0] = 1.0 let result = vDSP.convolve(values, withKernel: kernel) print("kernel: \(kernel)") print("values: \(values)") print("result: \(result)") Applied to a values array containing elements 0.0, 1.0, 2.0, etc. the first results should be 0.0, 1.0, 2.0, etc, but instead the results start at 9.0 and increase from there: kernel: [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] values: [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0] result: [9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0] If instead the kernel is reversed, placing the 1.0 at the end of the kernel: let values = (0 ..< 30).map { Double($0) } var kernel = Array.init(repeating: 0.0, count: 10) kernel[9] = 1.0 let result = vDSP.convolve(values, withKernel: kernel) print("kernel: \(kernel)") print("values: \(values)") print("result: \(result)") The results are now correct: kernel: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0] values: [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0] result: [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0]
Replies
2
Boosts
0
Views
881
Activity
Aug ’21
vDSP.convolve returns wrong sized array?
vDSP.convolve() returns an array with length: values.count - kernel.count But shouldn't the result array have length: values.count - kernel.count + 1 I ran the following which prints out the size of the results array with various combinations of values and kernel lengths: for i in 0 ..< 10 {   let values = Array.init(repeating: 1.0, count: 1000 + i)   for j in 0 ..< 10 {     let kernel = Array.init(repeating: 1.0, count: 100 + j)     let result = vDSP.convolve(values, withKernel: kernel)           print("values[\(values.count)], kernel[\(kernel.count)], result[\(result.count)], result[\(result.count - 1)] = \(result[result.count - 1])")   } } As you can see the results array always has length values.count - kernel.count: values[1000], kernel[100], result[900], result[899] = 100.0 values[1000], kernel[101], result[899], result[898] = 101.0 values[1000], kernel[102], result[898], result[897] = 102.0 values[1000], kernel[103], result[897], result[896] = 103.0 values[1000], kernel[104], result[896], result[895] = 104.0 values[1000], kernel[105], result[895], result[894] = 105.0 values[1000], kernel[106], result[894], result[893] = 106.0 values[1000], kernel[107], result[893], result[892] = 107.0 values[1000], kernel[108], result[892], result[891] = 108.0 values[1000], kernel[109], result[891], result[890] = 109.0 values[1001], kernel[100], result[901], result[900] = 100.0 values[1001], kernel[101], result[900], result[899] = 101.0 values[1001], kernel[102], result[899], result[898] = 102.0 values[1001], kernel[103], result[898], result[897] = 103.0 values[1001], kernel[104], result[897], result[896] = 104.0 values[1001], kernel[105], result[896], result[895] = 105.0 ... However, the result array should have length values.count - kernel.count + 1. For example, if instead of using the returned result array, a result array is passed to vDSP.convolve, with length values.count - kernel.count + 1 the last value has a valid result: for i in 0 ..< 10 {   let values = Array.init(repeating: 1.0, count: 1000 + i)   for j in 0 ..< 10 {     let kernel = Array.init(repeating: 1.0, count: 100 + j)     var result = Array.init(repeating: 0.0, count: values.count - kernel.count + 1)     vDSP.convolve(values, withKernel: kernel, result: &result)           print("values[\(values.count)], kernel[\(kernel.count)], result[\(result.count)], result[\(result.count - 1)] = \(result[result.count - 1])")   } } values[1000], kernel[100], result[901], result[900] = 100.0 values[1000], kernel[101], result[900], result[899] = 101.0 values[1000], kernel[102], result[899], result[898] = 102.0 values[1000], kernel[103], result[898], result[897] = 103.0 values[1000], kernel[104], result[897], result[896] = 104.0 values[1000], kernel[105], result[896], result[895] = 105.0 values[1000], kernel[106], result[895], result[894] = 106.0 values[1000], kernel[107], result[894], result[893] = 107.0 values[1000], kernel[108], result[893], result[892] = 108.0 values[1000], kernel[109], result[892], result[891] = 109.0 values[1001], kernel[100], result[902], result[901] = 100.0 values[1001], kernel[101], result[901], result[900] = 101.0 values[1001], kernel[102], result[900], result[899] = 102.0 values[1001], kernel[103], result[899], result[898] = 103.0 values[1001], kernel[104], result[898], result[897] = 104.0 values[1001], kernel[105], result[897], result[896] = 105.0 If the result array is created with length values.count - kernel.count + 2 then we get the following runtime error: error: Execution was interrupted, reason: EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0). The process has been left at the point where it was interrupted, use "thread return -x" to return to the state before expression evaluation. Indicating the extra element in the result array is valid and vDSP.convolve() is returning a result array which is one element too short.
Replies
1
Boosts
0
Views
850
Activity
Aug ’21