General
Follow


Faster linear algebra for Apple Silicon users in the R2025a pre-release (available now!)

Mike Croucher on 16 Jan 2025
Latest activity Reply by Mike Croucher on 27 Jan 2025 at 13:34

So you've downloaded the R2025a pre-release, tried Dark mode and are wondering what else is new. A lot! A lot is new!
One thing I am particularly happy about is the fact that Apple Accelerate is now the default BLAS on Apple Silicon machines. Check it out by doing
>> version -blas
ans =
'Apple Accelerate BLAS (ILP64)'
If you compare this to R2024b that is using OpenBLAS you'll see some dramatic speed-ups in some areas. For example, I saw up to 3.7x speed-up for matrix-matrix multiplication on my M2 Mabook Pro and 2x faster LU factorisation.
Details regarding my experiments are in this blog post Life in the fast lane: Making MATLAB even faster on Apple Silicon with Apple Accelerate » The MATLAB Blog - MATLAB & Simulink . Back then you had to to some trickery to switch to Apple Accelerate, now its the default.
John D'Errico
John D'Errico on 20 Jan 2025 at 16:20
YEAY!!!!! Sort of. Maybe not. So, a qualified yeay.
You should realize this may cost me, and possibly dearly. It now encourages me to replace my older intel based iMac. At least it adds fuel to that fire. The Studio model seems interesting.
Does anyone have bench results for an M4 mac? (Which is not yet available on that platform.)
Steve Eddins
Steve Eddins on 20 Jan 2025 at 16:37
I know what you mean, John. I have a 2019 Intel-based iMac. I'm thinking that an M4 Mac mini might be in my future.
Mike Croucher
Mike Croucher on 27 Jan 2025 at 12:59
Fortunately for me, my wife is a Mac fan and in need of a new machine. So, I bought her an M4 Macbook Pro for Christmas. I get to play with a new toy AND it all appears to be her idea :)
Here are the results for bench(5) on the R2025a pre-release for the M4 Macbook Pro. This has 4 performance cores and 6 efficiency core and is the weakest M4 available for a Macbook Pro I think. I'm not crazy-happy about the fact that there's more efficiency cores than performance cores but what can you do?
Mike Croucher
Mike Croucher on 27 Jan 2025 at 13:34
and here are the results from R2025a pre-release for MacBook Pro M4 Max with 64GB RAM. This is from the personal machine of a colleague. Some nice Dark Mode action going on here too.
Steve Eddins
Steve Eddins on 20 Jan 2025 at 13:09
I don't see anything in the R2025a Prerelease release notes about this. It would be nice to see some examples in the Performance section of the release notes.
Royi Avital
Royi Avital on 20 Jan 2025 at 11:57
This is great!
Any updates regarding AMD based CPU's?
Summer
Summer on 18 Jan 2025 at 16:07 (Edited on 20 Jan 2025 at 17:30)
Looking at the comments from your blog post, I'm wondering if I will see any benefit if I'm running code in parallel. Do you know if Apple Accelerate still only uses one thread?
Mike Croucher
Mike Croucher on 18 Jan 2025 at 16:39
Not sure. I've asked development if they can comment. In the meantime, why not try your code on the pre-release, see what happens and report back?
Mike Croucher
Mike Croucher on 27 Jan 2025 at 12:18
So the answer is 'It's complicated'. Apple Accelerate uses multiple threads but doesn't use OpenMP, it uses Apple's own thing. Some details at BLAS_THREADING | Apple Developer Documentation
So far so simple. However, Apple Silicon also has what I think of as the 'magic matrix units' which are controlled by Apple's AMX instructions. I'm never sure how many of these each version of the processor has. These are used by Apple Accelerate when appropriate..for example for matrix-matrix multiplication.
The difference is that the magic matrix unit is running independently of the rest of the CPU and one thread/core is in charge of controlling it. So if you call a large matrix-matrix multiplication, Apple Accelerate will appear to use one thread. An alternative BLAS, e.g. OpenBLAS, would use all CPU threads but not the magic matrix unit. Apple Accelerate performance tends to be better at the present time.
The situation seems to have changed again for M4 silicon Apple appears to have replaced AMX with ARM's SME in M4 : r/apple.
This is all just what I've figured out from a combination of informal internal chats and web searches.

Tags

No tags entered yet.