Post

Replies

Boosts

Views

Activity

Apple Accelerate libSparse performance
I've created a Julia interface for Apple Accelerate's libSparse, via calling the library functions as if they were C (@ccall). I'm interested in using this in the context of power systems, where the sparse matrix is the Jacobian or the ABA matrix from a sparse grid network. However, I'm puzzled by the performance. I ran a sampling profiler on repeated in-place solves of Ax = b for a large sparse matrix A and random dense vectors b. (A is size 30k, positive definite so Cholesky factorization.) The 2 functions with the largest impact are _SparseConvertFromCoordinate_Double from libSparse.dylib, and BLASStateRelease from libBLAS.dylib. That strikes me as bizarre. This is an in-place solve: there should be minimal overheard from allocating/deallocating memory. Also, it seems strange that the library would repeatedly convert from coordinate form. Is this expected behavior? Thinking it might be an artifact of the Julia-C interface, I wrote up a similar program in C/Objective-C. I didn't profile it, but timing the same operation (repeated in-place solves of Ax = b for random vectors b, with the same matrix A as in the Julia) gave the same duration. I've attached the C/Objective-C below.profiling-comparison.m.txt If you're familiar with Julia, the following will give you the matrix I was working with: using PowerSystems, PowerNetworkMatrices sys = System("pglib_opf_case30000_goc.m") A = PowerNetworkMatrices.ABA_Matrix(sys).data where you can find the .m file here. (As a crude way to transfer A from Julia to C, I wrote the 3 arrays A.nzval, A.colptr, and A.rowval to .txt files as space-separated lists of numbers: the above C/objective-C reads in those files.) To duplicate my Julia profiling, do pkg> add AppleAccelerate#libSparse Profile--note the #libSparse part, these features aren't on the main branch--then run using AppleAccelerate, Profile # run previous code snippet to define A M, N = 10000, size(A)[1] bs = [rand(N) for _ in 1:M] aa_fact = AAFactorization(A) factor!(aa_fact) solve!(aa_fact, bs[1]) # pre-compile before we profile. Profile.init(n = 10^6, delay = 0.0003) @profile (for i in 1:M; solve!(aa_fact, bs[i]); end;) Profile.print(C = true, format = :flat, sortedby = :count)
2
0
336
Jan ’25