Horrendous Swift overlay of vDSP Fourier Transform functions

I'm very unpleased to have found out the hard way, that vDSP.FFT<T> where T : vDSP_FourierTransformable and its associated vDSP_FourierTransformFunctions, only does real-complex conversion!!!

This is horribly misleading - the only hint that it calls vDSP_fft_zrop() (split-complex real-complex out-of-place) under the hood, not vDSP_fft_zop()[1], is "A 1D single- and double-precision fast Fourier transform." - instead of "a single- and double-precision complex fast Fourier transform". Holy ******* ****.

Just look at how people miss-call this routine. Poor him, realizing he had to log2n+1 lest only the first half of the array was transformed, not understanding why [2].

And poor me, having taken days investigating why a simple Swift overlay vDSP.FFT.transform may execute at 1.4x the speed of vDSP_fft_zopt(out-of-place with temporary buffer)! [3]


[1]: or better, vDSP_fft_zopt with the temp buffer alloc/dealloc taken care of, for us

[2]: for real-complex conversion, say real signal of length 16. log2n is 4 no problem, but then the real and imaginary vectors are both length 8. Also, vDSP_fft only works on integer powers of 2, so he had to choose next integer power of 2 (i.e. 1<<(log2n-1)) instead of plain length for his internal arrays.

[3]: you guessed it. fft_zrop(log2n-1, ...) vs. fft_zop(log2n, ...). Half the problem size lol.


Now we have vDSP.DiscreteFourierTransform, which wraps vDSP_DFT routines and "calls FFT routines when the size allows it", and works too for interleaved complexes. Just go all DFT, right?

if let setup_f = try? vDSP.DiscreteFourierTransform(previous: nil, count: 8, direction: .forward, transformType: .complexComplex, ofType: DSPComplex.self) {

    // Done forward transformation
    // and scaled the results with 1/N
    // How about going reverse?
    if let setup_r = try? vDSP.DiscreteFourierTransform(previous: setup_f, count: 8, direction: .inverse, transformType: .complexComplex, ofType: DSPComplex.self) {
    // Error: cannot convert vDSP.DiscreteFourierTransform<DSPComplex> to vDSP.DiscreteFourierTransform<Float>
    // lolz
    }
}

This API appeared in macOS 12. 3 years later nobody have ever noticed. I'm at a loss for words.

I had to refresh my mind on FFT. Found this interesting reference, may be of interest for those of us who need as well.

https://www.analog.com/media/en/technical-documentation/dsp-book/dsp_book_Ch31.pdf

Accepted Answer

Hi,

Many thanks for your post.

Regarding the vDSP.DiscreteFourierTransform initializer API, This is an issue we're aware of, and you may find using the vDSP.DFT a suitable alternative. The vDSP.DFT explicitly supports complexComplex and complexReal.

Summary of related routines

Getting data to expected form/type

// 1. idiomatic Swift, but pyramid of doom
let a = [1,2,3,4]
a.withUnsafeBufferPointer { pa in
    var one : Int32 = 1
    withUnsafeMutablePointer(to: &one) { pone in
        // LAPACK routines pass ALL parameters by reference.
        // Luckily most vDSP functions don't do this
        sgemm_(pa.baseAddress!,pone, ...)
    }
}

// 2. Manual memory management. Always remember to deallocate. At least we have type checks
let b = UnsafeMutableBufferPointer<Float>
        .allocate(capacity: 16)
let c = UnsafeMutableBufferPointer<Float>
        .allocate(capacity: 16)
defer { b.deallocate(); c.deallocate() }

Let's setup some data: 16 complex values in split complex form.

_ = b.initialize(from:[0,0.1,0.3,0.5,
                       0.7,0.9,1,1,
                       1,0.9,0.7,0.5,
                       0.3,0.1,0,0])
c[0..<12] = b[4..<16]
c[12..<16] = b[0..<4]
print(Array(c)) // Remember, array has value semantics, and Array(c) won't follow c's subsequent updates
var d = UnsafeMutableBufferPointer<Float>.allocate(capacity: 16)
var e = UnsafeMutableBufferPointer<Float>.allocate(capacity: 16)
var z = UnsafeMutableBufferPointer<DSPSplitComplex>.allocate(capacity: 2)
z[0] = DSPSplitComplex(realp: b.baseAddress!, imagp: c.baseAddress!)
z[1] = DSPSplitComplex(realp: d.baseAddress!, imagp: e.baseAddress!)

Now, routines by API sets. First of all, those bridged from C

var pi = z.baseAddress!.advanced(by: 0) // All operands are pass-by-reference
var po = z.baseAddress!.advanced(by: 1) // Even though DSPSplitComplex itself contains only references
let log2n : vDSP_Length = 4             // 16 = 2**4

Raw: vDSP

Complex FFT

// we have *complexes of length 16*, thus log2n=4
if let vsetup = vDSP_create_fftsetup( log2n, FFTDirection(FFT_FORWARD) ) {
    // complex out of place. No surprises
    vDSP_fft_zop(vsetup, pi, 1, po, 1, log2n, FFTDirection(FFT_FORWARD))
    print("FFT of 16 complexes:",Array(d),Array(e),separator: "\n")
    // while the header may state otherwise, none of the reverse routines actually do that 1/N scale
    var scale = Float(1) / Float(16)
    vDSP_vsmul(po.pointee.realp, 1, &scale, po.pointee.realp, 1, 16)
    vDSP_vsmul(z[1].imagp, 1, &scale, z[1].imagp, 1, 16)
    vDSP_fft_zip(vsetup, po, 1, log2n, FFTDirection(FFT_INVERSE)) // zip is in-place
    print("Reverse FFT, back to where we started:",Array(d),Array(e),separator: "\n")
    vDSP_destroy_fftsetup(vsetup)
}

Raw: vDSP

Real FFT

let tmpB = Array(b); let tmpC = Array(c) // Take snapshot
c.update(repeating: 0)


// Path1: just complex FFT with 50% wasted space. 
if let vsetup = vDSP_create_fftsetup( log2n, FFTDirection(FFT_FORWARD) ) {
    vDSP_fft_zop(vsetup, pi, 1, po, 1, log2n, FFTDirection(FFT_FORWARD))
    print("Complex FFT of Real signal, as reference:",Array(d),Array(e),"Note the symmetry.",separator: "\n")
    vDSP_destroy_fftsetup(vsetup)
}
d.update(repeating: 0); e.update(repeating: 0)
// Path2: dedicated real-complex transformation. zrop
b.withMemoryRebound(to:DSPComplex.self) {
    vDSP_ctoz(b.baseAddress!, 2, pi, 1, 8)
    // 16 reals, viewed as 8 complexes ^^^
}
b[8..<16].update(repeating:0)
print("Real signal interleaved",Array(b),Array(c),separator: "\n") // Complexes of length 8

// Still, we have *reals of length 16*, thus log2n=4
if let vsetup = vDSP_create_fftsetup( log2n, FFTDirection(FFT_FORWARD) ) {
    vDSP_fft_zrop(vsetup, pi, 1, po, 1, log2n, FFTDirection(FFT_FORWARD))
    print("Faster FFT of Real signal, packed:",Array(d),Array(e),"Everything's 2x the numbers above.",separator: "\n")
    var scale = Float(0.5) / Float(16) // scale is 1/2N this time.
    vDSP_vsmul(z[1].realp, 1, &scale, z[1].realp, 1, 8)
    vDSP_vsmul(z[1].imagp, 1, &scale, z[1].imagp, 1, 8)
    vDSP_fft_zrip(vsetup, po, 1, log2n, FFTDirection(FFT_INVERSE)) // zrip is in-place
    print("Reverse FFT, back to where we started:",Array(d),Array(e),separator: "\n")
    vDSP_destroy_fftsetup(vsetup)
}
d.update(repeating: 0); e.update(repeating: 0)

Very thin overlay: vDSP.FourierTransformable & co.

Real FFT ONLY

// 1. note it's still *reals of length 16*, thus log2n=4
if let setup = DSPSplitComplex.FFTFunctions.makeFFTSetup(log2n: log2n, radix: .radix2) {
    DSPSplitComplex.FFTFunctions.transform(fftSetup: setup, log2n: log2n, source: pi, destination: po, direction: .forward)
    DSPSplitComplex.FFTFunctions.destroySetup(setup)
}
print("Look! Packed real-complex FFT!"Array(d),Array(e),separator: "\n"); d.update(repeating: 0); e.update(repeating: 0)
// 2. So thin that the setup is interchangeable with C-side
if var setup = vDSP_SplitComplexFloat.makeFFTSetup(log2n: log2n, radix: .radix2) {
    vDSP_SplitComplexFloat.transform(fftSetup: setup, log2n: log2n, source: pi, destination: po, direction: .forward)
    print(Array(d),Array(e))
    withUnsafeBytes(of: &setup) { setupp in
        let p = setupp.bindMemory(to: uintptr_t.self)
        let setup = OpaquePointer(bitPattern: p[0])!
        vDSP_fft_zrip(setup, po, 1, log2n, FFTDirection(FFT_INVERSE))
        print("Forgot to scale with 1/2N before transforming back:Array(d),Array(e),separator: "\n"")
        vDSP_destroy_fftsetup(setup)
    }
}
d.update(repeating: 0); e.update(repeating: 0)

Idiomatic, thicker overlay: vDSP.FFT

Real FFT ONLY

// Of Type SplitComplex. Who could have thought this is a zrop.
if let setup = vDSP.FFT(log2n: log2n, radix: .radix2, ofType: DSPSplitComplex.self) {
    // Idiomatic as in, setup is typed,
    // setup is destroyed automatically at deinit(),
    // and the operands (which are already references) are not pass-by-reference
    setup.transform(input: z[0], output: &z[1], direction: .forward)
    print("See? Packed results:",Array(d),Array(e),separator: "\n")
    let scale = Float(0.5) / Float(16) 
    // Vectors are length 8, but log2n needs to be 4,
    // and scale needs to be 1/32.
    // Signs of a packed real-complex transformation.
    vDSP.multiply(scale, d, result:&d)
    vDSP.multiply(scale, e, result:&e)
    print("Expect:",Array(b),Array(c))
    setup.transform(input: z[1], output: &z[0], direction: .inverse)
    print("Actual:",Array(b),Array(c))
}

Idiomatic, thicker overlay: vDSP.DiscreteFourierTransform

real-complex and complex-complex

// Demonstrating real-complex here
// Much simpler interface. I wonder if there's a lock underneath
// because we shouldn't modify (create or destroy) a setup
// while a routine using [it or one sharing memory with it] is running.
// Count 16 for split complexes
if let setup = try? vDSP.DiscreteFourierTransform(previous: nil, count: 16, direction: .forward, transformType: .complexReal, ofType: Float.self) {
    setup.transform(inputReal: b, inputImaginary: c, outputReal: &d, outputImaginary: &c)
    print("See? Packed results just like above:",Array(d),Array(e))
}

Actually, DFT can operate on interleaved complexes, so we don't have to convert the real signal anymore - just act as if it were interleaved complex:

_ = b.update(fromContentsOf: tmpB)
b.withMemoryRebound(to: DSPComplex.self) { b in
d.withMemoryRebound(to: DSPComplex.self) { d in
var d = d
// Count 8 for interleaved complexes???
if let setup = try? vDSP.DiscreteFourierTransform(previous: nil, count: 8, direction: .forward, transformType: .complexReal, ofType: DSPComplex.self) {
    setup.transform(input:b, output: &d)
    vDSP.convert(interleavedComplexVector: Array(d),toSplitComplexVector: &z[1])
}
}
}
print("See? Packed results just like above:",Array(d),Array(e))

Have to fight with the type checker though, and probably messes codegen once we have pointer authentication in user space.

Horrendous Swift overlay of vDSP Fourier Transform functions
 
 
Q