vecnorm seems slower than it should be

In the speed comparison below, I would expect vecnorm() to be faster than the first version, because the first version creates an A-sized temporary matrix. However, vecnorm() is significantly slower. Any thoughts as to why this is?
A=rand(1e4,1000)-0.5;
tic
x1=mean(abs(A),2);
toc;
Elapsed time is 0.026976 seconds.
tic;
x2=vecnorm(A,1,2)./size(A,2);
toc
Elapsed time is 0.045126 seconds.

 Accepted Answer

I can confirm your observation.
A = rand(1e4, 1000) - 0.5;
tic
for k = 1:50
x1 = mean(abs(A),2);
end
toc; % Overhead of the function MEAN is measureable:
Elapsed time is 1.222890 seconds.
tic
for k = 1:50
x1 = sum(abs(A), 2) / size(A, 2);
end
toc; % A direct SUM(X)/LENGTH is faster than MEAN:
Elapsed time is 1.157796 seconds.
tic;
for k = 1:50
x3 = vecnorm(A, 1, 2) / size(A, 2);
end
toc % VECNORM(X, 1) is slower:
Elapsed time is 1.471223 seconds.
tic;
for k = 1:50
x3 = vecnorm(A, 2, 2) / size(A, 2);
end
toc % Obviously VECNORM(X, 2) is optimized: It shoud be slower due to the SQRT:
Elapsed time is 0.437623 seconds.

6 Comments

I guess they took to heart your comment that p=2 would be the main use case.
They don't since the same code shows vecnorm is still twice slower in R2021b, three year later.
I just don't use them because of that.
Maybe it's hardware-related? I ran the test from the page you linked and am seeing vecnorm with p=2 doing better.
N = 10000;
A = rand(N,2);
tic
B = sqrt(sum((permute(A,[1,3,2])-permute(A,[3,1,2])).^2,3));
toc % Elapsed time is 0.914213 seconds.
tic
C = vecnorm(permute(A,[1,3,2]) - permute(A,[3,1,2]), 2, 3);
toc %Elapsed time is 0.862503 seconds.
On my poor old laptop
sqrt: Elapsed time is 0.744571 seconds.
vecnorm: Elapsed time is 1.417963 seconds.
See: FEX: DNorm2 - mex . Timings on my i5 mobile, R2018b.
A = rand(1e4, 1000) - 0.5;
tic;
for k = 1:50
x3 = vecnorm(A, 2, 2);
end
toc
% Elapsed time is 3.662537 seconds.
tic;
for k = 1:50
x4 = DNorm2(A, 2);
end
toc
% Elapsed time is 0.427130 seconds.
This is a single-threaded C-mex. The only trick is, that columnwise operations are preferred until some heuristic limits. Calling optimized BLAS routines should slightly faster.
vecnorm was improved since R2018b. For a fair comparison:
N = 10000;
A = rand(N,2);
D = permute(A,[1,3,2])-permute(A,[3,1,2]);
tic
B = sqrt(sum(D.^2,3));
toc % Elapsed time is 1.817327 seconds.
tic
C = DNorm2(D, 3);
toc % Elapsed time is 0.829278 seconds.

Sign in to comment.

More Answers (0)

Products

Release

R2021b

Asked:

on 7 Mar 2022

Commented:

Jan
on 7 Mar 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!