Swift performance - efficient calculation of boolean logical operations

I need to work with large Double and Boolean arrays / vectors and apply simple operations on them. E.g. additions, subtractions, multiplications, smaller, larger etc. on the Doubles and AND, OR, NOT etc. on the Bool ones.

While I have found vDSP from Accelerate quite performant for the simple arithmetic ones, the larger/smaller as well as the logical ones are very slow, which probably has to do with the fact that I use the map function to apply these. Is there any better way to do this more efficient? Based on some things I have been reading pointers might help, but not sure really how I would need to apply it here in the context of zip and map.

See below some code examples:

import Accelerate

let myDoubleArray1: [Double] = Array<Double>(repeating: 1.123, count: 1000000)
let myDoubleArray2: [Double] = Array<Double>(repeating: 2.123, count: 1000000)
let myBoolArray1: [Bool] = Array<Bool>(repeating: false, count: 1000000)
let myBoolArray2: [Bool] = Array<Bool>(repeating: true, count: 1000000)

_ = vDSP.multiply(myDoubleArray1, myDoubleArray2) // Takes about 0.5sec - very good

_ = zip(myDoubleArray1, myDoubleArray2).map {$0 > $1} // Takes about 7sec - too slow

_ = zip(myBoolArray1, myBoolArray2).map {$0 && $1} // Takes about 7sec - too slow

_ = zip(myBoolArray1, myBoolArray2).map {$0 == $1} // Takes about 7sec - too slow

_ = myBoolArray1.map {!$0} // Takes about 7sec - too slow
Answered by Engineer in 734789022

Hi,

Yes, you can use BNNS for this and it's significantly faster than map:

let clock = ContinuousClock()
let n = 1_000_000

let a: [Bool] = (0 ..< n).map { _ in .random() }
let b: [Bool] = (0 ..< n).map { _ in .random() }

let aDescriptor = BNNSNDArrayDescriptor.allocate(initializingFrom: a, shape: .vector(n))
let bDescriptor = BNNSNDArrayDescriptor.allocate(initializingFrom: b, shape: .vector(n))

let bnnsResult = BNNSNDArrayDescriptor.allocateUninitialized(scalarType: Bool.self, shape: .vector(n))

let bnnsTime = clock.measure {
    try! BNNS.compare(aDescriptor, bDescriptor, using: .equal, output: bnnsResult)
}

var mapResult = [Bool]()

let mapTime = clock.measure {
    mapResult = zip(a, b).map {$0 == $1}
}

print("bnns time", bnnsTime)
print(" map time", mapTime)
print(bnnsResult.makeArray(of: Bool.self)!.elementsEqual(mapResult))

I tested the last one in playground (which is not a race horse):

let myBoolArray1: [Bool] = Array<Bool>(repeating: false, count: 1000000)

print(Date())
_ = myBoolArray1.map {!$0} // Takes about 7sec - too slow
print(Date())

Result: 14 s

  • 2022-11-02 15:44:59 +0000
  • 2022-11-02 15:45:13 +0000

Tested in an app in Simulator.

Result: less than 1s (in fact around 0.1s), and don't use Accelerate.

  • 2022-11-02 15:48:04 +0000
  • 2022-11-02 15:48:04 +0000

So tested with 10_000_000. Result : about 1s

  • 2022-11-02 15:49:52 +0000
  • 2022-11-02 15:49:53 +0000

With 100_000_000: Result: 12s

  • 2022-11-02 15:50:51 +0000
  • 2022-11-02 15:51:03 +0000

So there is something weird in your test.

Accepted Answer

Hi,

Yes, you can use BNNS for this and it's significantly faster than map:

let clock = ContinuousClock()
let n = 1_000_000

let a: [Bool] = (0 ..< n).map { _ in .random() }
let b: [Bool] = (0 ..< n).map { _ in .random() }

let aDescriptor = BNNSNDArrayDescriptor.allocate(initializingFrom: a, shape: .vector(n))
let bDescriptor = BNNSNDArrayDescriptor.allocate(initializingFrom: b, shape: .vector(n))

let bnnsResult = BNNSNDArrayDescriptor.allocateUninitialized(scalarType: Bool.self, shape: .vector(n))

let bnnsTime = clock.measure {
    try! BNNS.compare(aDescriptor, bDescriptor, using: .equal, output: bnnsResult)
}

var mapResult = [Bool]()

let mapTime = clock.measure {
    mapResult = zip(a, b).map {$0 == $1}
}

print("bnns time", bnnsTime)
print(" map time", mapTime)
print(bnnsResult.makeArray(of: Bool.self)!.elementsEqual(mapResult))

This is magnificent, exactly the nudge I needed! Thanks a lot!

I have a follow up question. I have not implemented the BNNS approach proposed above and it works. However, the calculation speed is still about 3 times as high when I use a similar approach in Python - without any particular special packages or approaches on Python side. I was hoping that Swift would generally be faster so wanted to get your view on whether maybe my approach is generally flawed and her any potential alternatives. I would love to continue with coding in Swift on the project but with runtime being 300% it is just too compelling to rather use Python.

Let me explain what I am doing: I create a lot of (random) binary trees which contain arithmetic or logic operations in all non-leaf nodes and numeric arrays (all of the same length) on leaf nodes which are taken from return_data. I.e. my trees are representing (simple) mathematic formulas. I have tens of thousands such trees and need to efficiently evaluate them. I currently do this via a recursive function evaluate_signal. As said above, the same approach in Python only takes 1/3 of the time to finish in a comparable setting. See below the code. There is obviously a lot of redundant code in evaluate_signal but that shouldn't be a problem for the efficiency (I would obviously improve that, but first need to get the runtime tackled).

Please see below my code (snippet). I am happy to explain anything or provide more code/background if required. Would be really great to hear your thought and whether you think this could somehow be improved.

import Accelerate

enum NodeValue {
  case add
  case sub
  case mul
  case div
  case sml
  case lrg
  case opp
  case met(columnIndex: Int) // This is a column index of return_data
  case cst(constant: Float) // This is a constant value
}

final class Node {
  var value: NodeValue
  var lhs: Node?
  var rhs: Node?
         
  init(value: NodeValue, lhs: Node?, rhs: Node?) {
    self.value = value
    self.lhs = lhs
    self.rhs = rhs
  }
   
  func evaluate_signal(return_data: [Int: [Float]]) -> [Any] {
    switch self.value {
    case .add:
      let eval_left = lhs!.evaluate_signal(return_data: return_data) as! [Float]
      let eval_right = rhs!.evaluate_signal(return_data: return_data) as! [Float]
      return vDSP.add(eval_left, eval_right)
    case .sub:
      let eval_left = lhs!.evaluate_signal(return_data: return_data) as! [Float]
      let eval_right = rhs!.evaluate_signal(return_data: return_data) as! [Float]
      return vDSP.subtract(eval_left, eval_right)
    case .mul:
      let eval_left = lhs!.evaluate_signal(return_data: return_data) as! [Float]
      let eval_right = rhs!.evaluate_signal(return_data: return_data) as! [Float]
      return vDSP.multiply(eval_left, eval_right)
    case .div:
      let eval_left = lhs!.evaluate_signal(return_data: return_data) as! [Float]
      let eval_right = rhs!.evaluate_signal(return_data: return_data) as! [Float]
      return vDSP.divide(eval_left, eval_right)
    case .sml:
      let eval_left = lhs!.evaluate_signal(return_data: return_data) as! [Float]
      let eval_right = rhs!.evaluate_signal(return_data: return_data) as! [Float]
      let leftDescriptor = BNNSNDArrayDescriptor.allocate(initializingFrom: eval_left, shape: .vector(eval_left.count))
      let rightDescriptor = BNNSNDArrayDescriptor.allocate(initializingFrom: eval_right, shape: .vector(eval_left.count))
      let resultDescriptor = BNNSNDArrayDescriptor.allocateUninitialized(scalarType: Bool.self, shape: .vector(eval_left.count))
      try! BNNS.compare(leftDescriptor, rightDescriptor, using: .less, output: resultDescriptor)
      let resultVector: [Bool] = resultDescriptor.makeArray(of: Bool.self)!
      leftDescriptor.deallocate()
      rightDescriptor.deallocate()
      resultDescriptor.deallocate()
      return resultVector
    case .lrg:
      let eval_left = lhs!.evaluate_signal(return_data: return_data) as! [Float]
      let eval_right = rhs!.evaluate_signal(return_data: return_data) as! [Float]
      let leftDescriptor = BNNSNDArrayDescriptor.allocate(initializingFrom: eval_left, shape: .vector(eval_left.count))
      let rightDescriptor = BNNSNDArrayDescriptor.allocate(initializingFrom: eval_right, shape: .vector(eval_left.count))
      let resultDescriptor = BNNSNDArrayDescriptor.allocateUninitialized(scalarType: Bool.self, shape: .vector(eval_left.count))
      try! BNNS.compare(leftDescriptor, rightDescriptor, using: .greater, output: resultDescriptor)
      let resultVector: [Bool] = resultDescriptor.makeArray(of: Bool.self)!
      leftDescriptor.deallocate()
      rightDescriptor.deallocate()
      resultDescriptor.deallocate()
      return resultVector
    case .opp:
      let eval_left = lhs!.evaluate_signal(return_data: return_data) as! [Bool]
      let leftDescriptor = BNNSNDArrayDescriptor.allocate(initializingFrom: eval_left, shape: .vector(eval_left.count))
      let resultDescriptor = BNNSNDArrayDescriptor.allocateUninitialized(scalarType: Bool.self, shape: .vector(eval_left.count))
      try! BNNS.compare(leftDescriptor, leftDescriptor, using: .not, output: resultDescriptor)
      let resultVector: [Bool] = resultDescriptor.makeArray(of: Bool.self)!
      leftDescriptor.deallocate()
      resultDescriptor.deallocate()
      return resultVector
    case .met(let columnIndex):
      return return_data[columnIndex]!
    case .cst(let constant):
      return Array<Float>(repeating: constant, count: return_data[0]!.count)
    }
  }
}

     
Swift performance - efficient calculation of boolean logical operations
 
 
Q