GPU programming for Mac M1

I have an M2 Mac and I have just bought an M4 Mac for my wife. I love the hardware on these machines, its superb! I wrote the blog posts that announced the betas for Apple Silicon and also showed how to switch to Apple Accelerate around this time last year Apple » The MATLAB Blog - MATLAB & Simulink. I, along with many other MathWorkers, am invested in this platform.

I would love to see MATLAB have support for Apple GPUs and I help development keep track of requests from users.

First off, I disagree with @Walter Roberson, our GPU support is not primaraily aimed at Deep Learning support. We have over 1200 gpuArray enabled functions spread across 14 toolboxes and more are being added with every release. MATLAB now has over 1,000 functions that Just Work on NVIDIA GPUs » The MATLAB Blog - MATLAB & Simulink. At least one extra toolbox will be getting GPU support in 2025a that I know of.

So, let's think about gpuArray support first. Most uses of technical computing outside of deep learning use double precision. MATLAB's default data type is double. MATLAB users expect double. Apple silicon GPUs do not support double. This is a problem!

OK, so when you have this conversation with people there will be a subset who will say 'I'd be happy with single'. OK, great. To do what? What workflow do you have right now that you need this support for? What functions would you need to see supported? Have you ever run this on an NVIDIA GPU and got a speed-up? Do you have any evidence that the Apple silicon GPU would actually help here? By how much?

The answers to all of these questions help drive conversations internally. Providing support for Apple Silicon GPUs would be a major undertaking and doing it would mean that something else wouldn't get done. More likely it would be a lot of 'something else's'!

Of course I can't say if this support will ever come or not but I know that detailed conversations on what is wanted and why is the way to help out.

4 Comments
Show 2 older commentsHide 2 older comments

Walter Roberson on 17 Mar 2025

The M3 and M4 are single precision, with FP16 apparently being recommended (according to https://forums.appleinsider.com/discussion/234279/what-apples-three-gpu-enhancements-in-a17-pro-and-m3-actually-do )

The Pro chips are single precision too.

The chances that an Apple device will get an nvidia GPU is very small. Apple was refusing to approve NVIDIA drivers; the word repeatedly was that the Apple engineering staff were happy with proposed NVIDIA drivers but that someone very high level at Apple blocked approval. Apple also changed the policy so that third-party drivers would have to be included in every seperate application, claiming that the new policy would cut down on the possibility of security bugs in a single application being used to infect other applications.

https://apple.slashdot.org/story/24/12/24/1735235/how-apple-developed-an-nvidia-allergy

Apple has long avoided directly purchasing Nvidia's chips and is now developing its own AI server chip with Broadcom, aiming for production by 2026, The Information reported Tuesday, shedding broader light on why the two companies don't get along so well.

The relationship deteriorated after a 2001 meeting where Steve Jobs accused Nvidia of copying technology from Pixar, which he then controlled. Relations worsened in 2008 when Nvidia's faulty graphics chips forced Apple to extend MacBook warranties without full compensation.

Federico Manfrin on 18 Mar 2025

Edited: Federico Manfrin on 18 Mar 2025

Thanks Walter, I had some fun to dive into this Apple-Nvidia story, thanks.

I also found interesting informations about the AMX co-processor the Apple Silicon Chips have, and I can reply to the on-topic question about the BLAS implementation, it's FP64 of course!

It would be nice to have a performance benchmark comparing several gpuArray operations over the same several Apple silicon operations (featuring Apple Accelerate BLAS, of course). Since AMX supports fixed matrix dimensions (e.g., 4×4 or 8×8) I expect the performance to drop with big matrix, but I also expect to be very fast with small matrix, gaining benefit of shared memory (between CPU and co-processor on Apple Silicon) the GPU can't have (losing time to move data from CPU memory to GPU memory before the calc, and vice-versa to give back the result to cpu). What would be the matrix dimension the GPU outperform AMX?

Thanks

Sign in to comment.

Answer 2

Walter Roberson on 19 Nov 2024

1
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/2167243-gpu-programming-for-mac-m1#answer_1546948

If I recall correctly, someone posted indicating that they had generated mex C++ code that calls into Apple's GPU routines, and invokes the code from within MATLAB. This is not the same as using gpuArray() with automatic dispatch to GPU as needed. As far as I recall, the person had not made the interface code publicly available.

Does Matlab have any plans to extend their GPU programming to non NVIDIA GPUs?

The last time I asked Mathworks about this, the answer was that they had no plans to extend GPU programming to Apple Silicon.

It is difficult to get straight answers from Apple about the best way to use the GPU.

Apple has a history of leaving pieces of technology undocumented, and letting ecosystems of best-effort grow up, only to later deliberately break the best-effort code, saying, "We never said to do it that way so any problems are your fault!"

But also, Mathworks GPU support is primarily aimed at Deep Learning support. Mathworks chases the Deep Learning market. The current research work that is using Apple Silicon GPU is a comparatively small portion of research work. The majority of research work is on NVIDIA GPU; the second largest group of Deep Learning research work is on IBM equipment; other groups are fair behind on portion of the market. By market share measures, Mathworks would be better off going after support for IBM equipment.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

GPU programming for Mac M1

3 Comments
Show 1 older commentHide 1 older comment

Accepted Answer

4 Comments
Show 2 older commentsHide 2 older comments

More Answers (1)

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

GPU programming for Mac M1

3 Comments Show 1 older commentHide 1 older comment

Accepted Answer

4 Comments Show 2 older commentsHide 2 older comments

More Answers (1)

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

3 Comments
Show 1 older commentHide 1 older comment

4 Comments
Show 2 older commentsHide 2 older comments

0 Comments
Show -2 older commentsHide -2 older comments