testing SVD Performance on M1

Question

0 votes

Hello Community,

here is a script for testing the M1 performance on solving a SVD Problem. (Parallel Computing Toolbox is required)

% This script evaluates the Singular Value Decomposition (SVD) of size 1 to N.
% It detect also the maximum number of threads and determine the parallel
% effiency.
%% clear and define environment
clear all;delete(gcp('nocreate'));clc;
%% detect max threads
core_info = evalc('feature(''numcores'')');
maxThreads = str2num(core_info(53));
disp(['maximum number of simultanously threads: ',num2str(maxThreads)])
%% define lowest possible problemsize N wihtout a reminder for every thread
N = smallest_multiple(max(maxThreads,8));
% increase N to around 1000 for differen
%if N < 1000
%    N = ceil(1000/N)*N;
%end
disp(['problem size N = ',num2str(N)])
%% benchmark
result = kron(1:maxThreads,[1 0 0]')';
for k = 1:maxThreads
    y = zeros(N,1);
    myCluster = parcluster('local');
    myCluster.NumWorkers = k;  % 'Modified' property now TRUE
    saveProfile(myCluster);
%    evalc('parpool(''local'',k)');
    tic
    parfor n = 1:N
        y(n) = max(svd(randn(n)));
    end
    result(k,2)=toc;
    disp(['    Problem solved with ',num2str(k),' of '...
        ,num2str(maxThreads),' threads has finished in ',num2str(result(k,2)),'s'])
    evalc('delete(gcp(''nocreate''))');
end
%% present the results
clc;
result(:,3) = result(1,2)./(result(:,2).*result(:,1));
disp('    #threads |time in s|Efficiency')
disp(result)
%% alternative solution to smallest multiple
%  https://de.mathworks.com/matlabcentral/answers/386271-write-a-function-called-smallest_multiple#answer_319994
function r = smallest_multiple(k)
r = 1;
for n = 1:k
    r = r * (n / gcd(r,n));
end
end

Could anybody run the script and post the output?

I can't understand the bad performance:

    #threads |time in s|Efficiency
0000   93.1989    1.0000
0000   80.0887    0.5818
0000   76.6610    0.4052
0000   87.3170    0.2668

5 Comments
Show 3 older comments Hide 3 older comments

Benny Hartwig on 31 Jul 2021

Open in MATLAB Online

Dear Marko,

I've run your script on MacMini M1 with 8gb and observed similar numbers as you did. However, when altering a few lines of codes I observed increased efficiency for this particular exercise. Specifically, I changed:

maxThreads = str2num(core_info(53))-4; % deduct number of low performance cores
N = smallest_multiple(max(maxThreads,8))*2; % increase the number of iterations
y(n) = max(svd(randn(500))); % fix the size of the random matrix

The most important change is the thrid one of fixing the size of a randomly generated matrix. I think the reason is that the batches are somewhat unequally distributed in terms of size of the random matrix.

based on N = 1680

#threads |time in s|Efficiency

1.0000 61.9659 1.0000

2.0000 34.3137 0.9029

3.0000 26.9083 0.7676

4.0000 23.4078 0.6618

based on N = 3360

#threads |time in s|Efficiency

1.0000 111.3407 1.0000

2.0000 60.9948 0.9127

3.0000 45.1030 0.8229

4.0000 37.0086 0.7521

based on N = 8400

#threads |time in s|Efficiency

1.0000 263.5285 1.0000

2.0000 140.6530 0.9368

3.0000 105.1190 0.8357

4.0000 82.8724 0.7950

Best,

Benny

Marko on 2 Aug 2021

Edited: Marko on 2 Aug 2021

Open in MATLAB Online

Hi Benny,

i have an Intel i5-5250U a 2C/4T, so i can only guess your problem.

Could you enter this in your command line in matlab:

feature('numcores')

and here is the fix from https://www.mathworks.com/matlabcentral/answers/268978-use-all-cores-of-cpu#answer_367651:

To use all logical processes (number of threads) you need to change the NumWorkers in the matlab setting. in matlab 2018 menu follow this: Preferences >> Parallel Computing Toolbox>> Cluster Profile Manager >> click "Edit" on the bottom right >> Set "NumWorkers" to the number of logical process, 8 in your case. >> Done >> close and apply

I am excited to see what benchmark values you will post.

Benny Hartwig on 2 Aug 2021

Open in MATLAB Online

Hi Marko,

thank you very much for the hint. I managed to connect the M1 Mini 8gb to all eight cores and run the benchmark run again. However, I needed to make some changes to the exercise to get a better understanding of the performance:

N = smallest_multiple(max(maxThreads,8))*1000*5 % increase the size of the loop
evalc('parpool(''local'',k)'); % activate before tic toc (otherwise dilutes time keeping)
parfor n = 1:N*k % scale the loop by the number of threads s.t. every worker has to finish N jobs
        y(n) = max(svd(randn(10))); % reduce size of the random matrix to contain memory pressure
end
result(k,2)=toc/k; % divide total time by the number of workers

#threads |time in s|Efficiency

1.0000 33.1795 1.0000

2.0000 17.1001 0.9702

3.0000 11.8776 0.9312

4.0000 9.2032 0.9013

5.0000 8.4971 0.7810

6.0000 7.7836 0.7105

7.0000 7.2541 0.6534

8.0000 7.1321 0.5815

So its seems that the efficiency cores also improve the performance but are a bit slower than the performance cores. Moreover, the chart on memory pressure indicates that these efficiency numbers might be downward biased because the memory usage turned yellow during the run with 5 to 8 threads. With 1 to 4 threads, the memory useage was always green.

So it probably pays off to get the model with 16gb of ram as the ram consumption of the parfor loop increases quite strongly when you add more workers. Maybe this problem could be solved when Matlab runs natively on the M1.

testing SVD Performance on M1

5 Comments
Show 3 older comments Hide 3 older comments

Answers (0)

Categories

Products

Release

Tags

Community Treasure Hunt

testing SVD Performance on M1

5 Comments Show 3 older comments Hide 3 older comments

Answers (0)

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

5 Comments
Show 3 older comments Hide 3 older comments