I've created a Julia interface for Apple Accelerate's libSparse, via calling the library functions as if they were C (@ccall). I'm interested in using this in the context of power systems, where the sparse matrix is the Jacobian or the ABA matrix from a sparse grid network. However, I'm puzzled by the performance.
I ran a sampling profiler on repeated in-place solves of Ax = b for a large sparse matrix A and random dense vectors b. (A is size 30k, positive definite so Cholesky factorization.) The 2 functions with the largest impact are _SparseConvertFromCoordinate_Double from libSparse.dylib, and BLASStateRelease from libBLAS.dylib. That strikes me as bizarre. This is an in-place solve: there should be minimal overheard from allocating/deallocating memory. Also, it seems strange that the library would repeatedly convert from coordinate form. Is this expected behavior?
Thinking it might be an artifact of the Julia-C interface, I wrote up a similar program in C/Objective-C. I didn't profile it, but timing the same operation (repeated in-place solves of Ax = b for random vectors b, with the same matrix A as in the Julia) gave the same duration. I've attached the C/Objective-C below.profiling-comparison.m.txt
If you're familiar with Julia, the following will give you the matrix I was working with:
using PowerSystems, PowerNetworkMatrices
sys = System("pglib_opf_case30000_goc.m")
A = PowerNetworkMatrices.ABA_Matrix(sys).data
where you can find the .m file here. (As a crude way to transfer A from Julia to C, I wrote the 3 arrays A.nzval, A.colptr, and A.rowval to .txt files as space-separated lists of numbers: the above C/objective-C reads in those files.) To duplicate my Julia profiling, do pkg> add AppleAccelerate#libSparse Profile--note the #libSparse part, these features aren't on the main branch--then run
using AppleAccelerate, Profile
# run previous code snippet to define A
M, N = 10000, size(A)[1]
bs = [rand(N) for _ in 1:M]
aa_fact = AAFactorization(A)
factor!(aa_fact)
solve!(aa_fact, bs[1]) # pre-compile before we profile.
Profile.init(n = 10^6, delay = 0.0003)
@profile (for i in 1:M; solve!(aa_fact, bs[i]); end;)
Profile.print(C = true, format = :flat, sortedby = :count)