Accelerate Sparse Solution? ...

I'm attempting a sparse BLAS solution for linear equations based upon the 2015 WWDC Session 712 presentation. I have run solutions with dgetrf and dgetrs, and also dgesv for dense arrays, but my previous posts for sparse solutions didn't work, now I think I've gotten things to work after rewriting the routine ... but it is not at all elegant, e.g. using too many arrays at this point, plus I still need sparse factorization instead of using dgetrf. Really need interface to SuperLU or at least dgstrf.


Swift code:


// SparseBLAS linear equation solver coded in Swift

// Original input for AO is in a linear array (of sequential rows)

// AO is rearranged from a row linear array form into a column linear array AI

// Lapack dgetrf factorization is performed for the LU decomposition

// Solution is found from the sparse solver functions:

// sparse_vector_triangular_solve_dense_double solved in Lower and then Upper T


import Accelerate


// Original row major order of example arrays AO and rhs vectors BO


// dense example 1: answers from A^-1 * B are given in the example as : 3, -1, -2

//var AO: [Double] = [ 1.0, -3.0, 1.0, 2.0, -8.0, 8.0, -6.0, 3.0, -15.0 ]

//var BO: [Double] = [ 4.0, -2.0, 9.0 ]


// sparse example 2: answers from A^-1 * B are given in the example as : 5, 18, 26

//var AO: [Double] = [ 0, 1, -1, 2, -2, 1, 1, -3, 2 ]

//var BO: [Double] = [ -8, 0, 3 ]


// example 3: answers: ac = 1/2, ab = -0.7071, ba = -0.7071, bc = -0.7071, ac = 1/2, cb = -0.7071

// simple triangle truss example x and y equilibrium equations written for each panel point

// AO matrix represents the direction cosines of truss members

let eacx = 1.0000, eabx = 0.7071 ; let eacy = 0.0000, eaby = 0.7071

let ebax = -0.7071, ebcx = 0.7071 ; let ebay = -0.7071, ebcy = -0.7071

let ecax = -1.0000, ecbx = -0.7071 ; let ecay = 0.0000, ecby = 0.7071


let AO = [ eacx, eabx, 0, 0, 0, 0,

eacy, eaby, 0, 0, 0, 0,

0, 0, ebax, ebcx, 0, 0,

0, 0, ebay, ebcy, 0, 0,

0, 0, 0, 0, ecax, ecbx,

0, 0, 0, 0, ecay, ecby ]


var BO = [ 0, -0.5, 0, 1, 0, -0.5 ] // vertical loads and reactions


//print("Row major order of matrix AO ", AO)


var n: Int = 6 // number of rows or columns

var n64 = Int64(n)

var tmp: Double = 0.0

var nr = __CLPK_integer(n) // number of rows in dgetrf

var ipiv = [__CLPK_integer](1...nr) // set pivot integers from 1 to nr

var err: __CLPK_integer = 0

var incx: sparse_stride = 1

var m = sparse_dimension(n) // number of rows in sparse array

var na: sparse_dimension = m*m // number of elements in sparse array


// Rearrange AO into column major order for use in dgetrf, output from dgetrf will be in column order

// Column major order of AO is necessary for Fortran Lapack routines

var AI = [Double](count: n*n, repeatedValue: 0.0) // initialize column array

for j in 0...n-1 { for i in 0...n-1 { AI[i+j*n] = AO[j+n*i]} }


//print("Column major order matrix AI ", AI)


// Matrix factorization via dgetrf into LU where ipiv output is the row pivot order

dgetrf_(&nr, &nr, &AI, &nr, &ipiv, &err)

//print("LU Factorization of matrix A ", AI)

print("LU pivots ", ipiv)


// Number of non-zero elements

var nz = sparse_get_vector_nonzero_count_double(na, &AI, 1)

var nzz = sparse_dimension(nz)


// Pack array A and output location indy indices for nonzero A values

var A = [Double](count: nz, repeatedValue: 0.0) // initialize packed array

var indy = [sparse_index](count: nz, repeatedValue: 0) // initialize packed index

var nsz = sparse_pack_vector_double(na, nzz, &AI, incx, &A, &indy)

print("Number of nonzeros in A ", nsz)

//print("Nonzero values of A ", A)

print("Index of nonzero values ", indy)


// Create sparse matrix T

var T = sparse_matrix_create_double(m,m) ; var t = UnsafeMutablePointer<Void>(T)


// Using L triangle and solving forward

sparse_set_matrix_property(t, SPARSE_LOWER_TRIANGULAR)


// T matrix is filled by dgetrf results

// with dgetrf diagonal replaced by unit diagonal

// Lower triangular data is used


var ii: sparse_index = 0 ; var jj: sparse_index = 0


var incc = [sparse_index](count: nz, repeatedValue: 0) ; var jncc = incc


for i in 0..<nz { jncc[i] = indy[i]/n64 ; incc[i] = indy[i] - jncc[i]*n64 }

//for i in 0..<nz { print("Nonzero row and column indices ", incc[i],jncc[i])}


for i in 0..<nz { if incc[i] != jncc[i] {sparse_insert_entry_double(T, A[i], incc[i], jncc[i])}}

//print("Inserted A values ", A)


ii = 0 ; for i in 0..<n { sparse_insert_entry_double(T, 1.0, ii, ii) ; ii=ii+1 }


// Reorder B from original BI based on pivots from dgetrf

var B = BO

for i in 0..<n { tmp = B[i] ; B[i] = B[ipiv[i]-1] ; B[ipiv[i]-1] = tmp }

//print("Original B ", BO)

print("Pivoted B ", B)


sparse_commit(t)


// Solve for vector y using the L triangle and dense RHS vector B

sparse_vector_triangular_solve_dense_double(CblasNoTrans, 1.0 , T, &B, 1)

//print("Solution y from lower T^-1 * B ", B)


sparse_matrix_destroy(t)


// Create sparse matrix T1

var T1 = sparse_matrix_create_double(m,m) ; var t1 = UnsafeMutablePointer<Void>(T1)


// Using U triangle and output of y written to B and solving backwards

sparse_set_matrix_property(t1, SPARSE_UPPER_TRIANGULAR)


// T matrix is filled by dgetrf results

// Upper triangular data is used and original diagonal from dgetrf


for i in 0..<nz { sparse_insert_entry_double(T1, A[i], incc[i], jncc[i]) }


sparse_commit(t1)


// Solve for vector x using the sparse T matrix and dense RHS vector B

sparse_vector_triangular_solve_dense_double(CblasNoTrans, 1.0 , T1, &B, 1)

print("Solution x from upper T^-1 * y ", B)


sparse_matrix_destroy(t1)


print("----------------")

Solved a dense Successive Over Relaxation (SOR) and a sparse SOR to compare with the direct sparseBLAS solutions from above. Everything checks except where SOR diverges for diagonally weak problems (example 1 above). I'd post the coding but it looks terrible ... even for simple SOR! Basically, I use sparse_pack_vector_ to form the reduced A array and then perform the iteration on the x vector. I haven't tried anything but small arrays so I don't expect to see any advantage ... and I haven't thought about any 'Metal-lization' or GPU stuff to accelerate sweeping. I don't think it has any advantage over sparseBLAS direct solutions for any of the limited size problems. I just wanted to see what I could do with Swift and the sparse methods.

Accelerate Sparse Solution? ...
 
 
Q