Data Science on Apple Silicon: new distros and builds for R, Python, Julia?

This question does not come from a developer working on any of these languages. I am a data scientist working *in* these languages. But I'd like to see some clarity how these ecosystems will transition from Intel to Apple Silicon.

Intel has specifically built tools for Python lately. R became much more efficient with Revolution (now Microsoft) bundling Intel's Math Kernel Library (and more) into R. R can also be much faster on the Mac with the Accelerate framework (esp. BLAS and LAPACK from veclib, though these are not the officially supported default for the Mac build).

As we are investing into these platforms (both Apple hardware and our own codebase, not to mention human capital), it would be great to get more advance guidance on what performance we can expect on what front. Data scientists are more than just pro consumers needing an Adobe update for the new architecture (though for Matlab or Stata, the situation is similar), but less than full-blown developers who will use Swift anyway.

Converters from coremltools can save some models (say, scikit-learn under Python) to use in apps. Does this promise any further optimization and support for Python on Apple Silicon?

Accepted Reply

Apple has announced that we'll be submitting patches to enable Python3 to build natively for Apple Silicon. Otherwise we’re unable to comment on any future plans or features.

For R and Julia, you would need to ask the maintainers of those projects as we cannot comment on their behalf.

Replies

Apple has announced that we'll be submitting patches to enable Python3 to build natively for Apple Silicon. Otherwise we’re unable to comment on any future plans or features.

For R and Julia, you would need to ask the maintainers of those projects as we cannot comment on their behalf.
It looks like the R developers are already testing on Apple Silicon and are confident that they can provide a native R version for Apple Silicon. Check the "The R Blog" "Will R Work on Apple Silicon?" for details.
What they are saying is that:
"It turns out there is hope that R will work on Apple silicon. A usable Fortran 90 compiler for Apple silicon will hopefully be available relatively soon, since the development version of GFortran already seems to be working (check-all passed for R including reference LAPACK/BLAS) and there is a strong need for such compiler not only for R, but any scientific computing on that platform."

The "there is hope" part sends a message to me, that if I have to use R in the near future, or everyday even -as in my case - do not buy an Apple silicon mac yet....

What do you think?
Hi Guys,

Here are some benchmark results leman and I just did:

forums.macrumors.com/threads/data-science-r-and-spss-26-etc-under-rosetta-2-apple-silicon-m1.2269302/?post=29326680#post-29326680

R under Rosetta2 is basically 70% faster on average then on my i5 16GB RAM late 2017 MBPro
The biggest hurdle for Data Science on Apple Silicon is gcc (the GNU Compiler Collection). The compiler hasn’t supported Apple’s ARM architecture (instruction set, calling convention, object format, etc) since an ancient version of iOS. Work on that is in progress, but as with all open-source efforts, there is “no timeline” since commitments are done on a “time-available” basis.

Lack of GCC implies lack of FORTRAN support. The other notable FORTRAN compiler is Intel’s, and the latter is very unlikely to be ported to Apple Silicon. No FORTRAN also means a lot of numerical libraries are being held back (e.g. SciPy, BLAS, LAPACK, etc).

In any case, you could probably run your data science workloads under Rosetta 2 (i.e. Intel emulation/translation). Geekbench has shown that the M1 processors are faster than many Mac portables that came before it, even when running Intel apps. Search the web for “How to Run Legacy Command Line Apps on Apple Silicon” to set up your Terminal sessions to prefer running Intel applications.


For the Python ecosystem, the conda-forge distribution is already supporting Apple Silicon in native ARM64 mode (without Rosetta 2).

https://github.com/conda-forge/miniforge#miniforge3 (choose the osx-arm64 variant).

It ships compilers up to date compilers (clang for C/C++ including llvm-openmp and gfortran for Fortran):

Code Block python
c-compiler 1.1.3 h27ca646_0 conda-forge/osx-arm64 Cached
cctools 949.0.1 ha9384d2_18 conda-forge/osx-arm64 Cached
cctools_osx-arm64 949.0.1 h1c8944f_18 conda-forge/osx-arm64 Cached
clang 11.0.0 hce30654_2 conda-forge/osx-arm64 Cached
clang-11 11.0.0 default_h87665d4_2 conda-forge/osx-arm64 Cached
clang_osx-arm64 11.0.0 h54d7cd3_8 conda-forge/osx-arm64 Cached
clangxx 11.0.0 default_hbe4449c_2 conda-forge/osx-arm64 Cached
clangxx_osx-arm64 11.0.0 hb84c830_8 conda-forge/osx-arm64 Cached
compiler-rt 11.0.0 h9316cab_2 conda-forge/osx-arm64 Cached
compiler-rt_osx-arm64 11.0.0 hd64e075_2 conda-forge/noarch Cached
compilers 1.1.3 hce30654_0 conda-forge/osx-arm64 Cached
cxx-compiler 1.1.3 h260d524_0 conda-forge/osx-arm64 Cached
fortran-compiler 1.1.3 h630c574_0 conda-forge/osx-arm64 Cached
gfortran_impl_osx-arm64 11.0.0.dev0 h2cdbfd1_13 conda-forge/osx-arm64 Cached
gfortran_osx-arm64 11.0.0.dev0 h617dd65_10 conda-forge/osx-arm64 Cached
gmp 6.2.1 h9f76cd9_0 conda-forge/osx-arm64 Cached
isl 0.22.1 hb904e53_2 conda-forge/osx-arm64 Cached
ld64 530 h08716b2_18 conda-forge/osx-arm64 Cached
ld64_osx-arm64 530 h8a2aa15_18 conda-forge/osx-arm64 Cached
ldid 2.1.2 h34db0f2_2 conda-forge/osx-arm64 Cached
libclang-cpp11 11.0.0 default_h87665d4_2 conda-forge/osx-arm64 Cached
libcxx 11.0.0 h7cf67bf_1 conda-forge/osx-arm64 Cached
libgfortran 5.0.0.dev0 h181927c_13 conda-forge/osx-arm64 Cached
libgfortran-devel_osx-arm64 11.0.0.dev0 h181927c_13 conda-forge/noarch Cached
libgfortran5 11.0.0.dev0 h181927c_13 conda-forge/osx-arm64 Cached
libiconv 1.16 h642e427_0 conda-forge/osx-arm64 Cached
libllvm11 11.0.0 h8522ed7_0 conda-forge/osx-arm64 Cached
llvm-openmp 11.0.0 hdb94862_1 conda-forge/osx-arm64 Cached
llvm-tools 11.0.0 h8522ed7_0 conda-forge/osx-arm64 Cached
mpc 1.1.0 hb760245_1009 conda-forge/osx-arm64 Cached
mpfr 4.0.2 hbc63f68_1 conda-forge/osx-arm64 Cached
tapi 1100.0.11 he4954df_0 conda-forge/osx-arm64 Cached
zlib 1.2.11 h31e879b_1009 conda-forge/osx-arm64 Cached


numpy and scipy link against OpenBLAS 0.3.12 with OpenMP and while I am not sure it can use all the SIMD bells and whistles of the M1 instruction set, the SGEMM performance seems good enough on the MacBook Air with M1:

Code Block
import numpy as np
X = np.random.randn(4096, 4096).astype(np.float32)
out = np.empty_like(X)
%timeit np.dot(X.T, X, out=out)

Code Block
275 ms ± 26.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


It would appear a commercial Fortran M1 compiler is available from NAG in Oxford,UK. Is there any good reason it cannot be used in place of llvm? At least to build the Fortran libraries delivered with R & Python.