Asked by Eike
on 8 May 2011

Hey there,

I have a vector t (nx1) and a matrix A (nxm). I need to multiply t with each column of A element-wise.

Example: t=[ 1; 2 ]; A=[ 1 2; 3 4 ] => Res=[ 1*1 1*2; 2*3 2*4 ]

So far I have tried 4 versions:

% 1. Looping

for j = 1 : m

res( j ) = t .* A( :, j );

end

% 2. Repmat

tmat = repmat( t, 1, m )

res = tmat .* A;

% 3. diag

res = diag( t ) * A;

% 4. Indexing

A = A( : );

t = t( :, ones( 1, m ) );

t = t( : );

res = A .* t;

res = reshape( res, n, m );

I did some testing on these and found #1 to be (by far) the fastest one, followed by #2 and #4 (about 2 to 3 times slower) and then followed by #3 (8 to 9 times slower). This kind of surprised me, since #1 is the only version not taking advantage of matrix operations in some way.

Does anyone know a faster way to do this? Or could anybody give some reasons for the result I found?

Answer by Matt Tearle
on 9 May 2011

Accepted Answer

The first question has been addressed, so I'll just throw out a couple of points about the second question.

- The newer the version you're running, the more benefit you'll see from JIT enhancements, so loops aren't as slow and generally evil as they used to be.
- Methods 2 & 4 both require a lot more memory and related memory operations. Assuming the variables are large, that's a lot of overhead on top of the calculations. (Also remember that MATLAB requires contiguous memory addresses for arrays.)
- Method 3 not only incurs the same memory-monkeying penalty, but also needs more calculations.

Teja Muppirala
on 20 Jun 2011

Loops are fast, but trying this problem out on R2011a, BSXFUN seems to be a better choice, running consistently about 50% faster than a for loop.

For size 2000 matrices,

Loops: 0.0384 sec

BSXFUN: 00251 sec

MATLAB does tend to get more efficient as the releases go on.

Teja Muppirala
on 20 Jun 2011

Oops. That should be: 0.0251 sec

Teja Muppirala
on 20 Jun 2011

It should also be mentioned that while using DIAG is very bad, using a sparse diagonal matrix is faster than the loop method, but still a little bit slower than BSXFUN.

sparse(1:m,1:m,t)*A

0.0304 sec

Sign in to comment.

Answer by Matt Fig
on 9 May 2011

I find the looping option fastest on my Win Vista 32 bit. Using r2007b.

>> more_loop_timings

Elapsed time is 0.017479 seconds.

Elapsed time is 0.037864 seconds.

Elapsed time is 0.376887 seconds.

Elapsed time is 0.030641 seconds.

Elapsed time is 0.022277 seconds.

%

%

%

%

function [] = more_loop_timings

% Yet more of these....

t = (1:1000)';

A = rand(length(t),1000); % Large and square

[n,m] = size(A);

%%1. Looping

tic

for jj = m:-1:1

res(:,jj) = t.*A(:,jj); % dynamically pre-allocate

end

toc

%%2. Repmat

tic

res2 = repmat(t,1,m) .* A;

toc

%%3. diag

tic

res3 = diag(t)*A; % Yikes!

toc

% The obvious choice...

tic

res4 = bsxfun(@times,A,t);

toc

%%4. Indexing

tic

t = t(:,ones( 1, m,'single'));

res5 = reshape(A(:).*t(:),n,m);

toc

% isequal(res,res2,res3,res4,res5)

Andrew Newell
on 9 May 2011

That pre-allocation is sneaky!

James Tursa
on 9 May 2011

Another fast alternative to the ZEROS function for pre-allocating is my UNINIT function from the FEX:

http://www.mathworks.com/matlabcentral/fileexchange/31362-uninit-create-an-uninitialized-variable-like-zeros-but-faster

Eike
on 9 May 2011

I also fancy that prealloc :-P

And thanks for the bsxfun hint, didn't even know that function!

Since I found the timing results to be of pretty high variance, I did some 500 runs (of your code which I copied shamelessly) and normalized the results.

The output is as follows:

Looping - 1.0 (best)

Repmat - 1.9152

diag - 60.6682

bsxfun - 1.4830

indexing - 2.2796

Indeed, looping seems to be the best solution for this problem. Very interesting! :-)

Sign in to comment.

Answer by David
on 21 Jun 2017

I just encountered this problem myself, and found all these answers to be very helpful. However, after poking a little bit more, I discovered that MATLAB 2016b and later allow implicit multiplication to do exactly the same thing as bsxfun, slightly faster.

res = A .* t or res = t .* A are equivalent and solve the problem, as long as t is a vector with the same number of rows OR columns as A.

The other answers were of course correct at the time the question was asked. This page is still being viewed ~1000 times/month, so I wanted to get the current best answer out there.

clearvars;

A = rand(1e4,2e4);

[Nr,Nc] = size(A);

v = rand(Nr,1);

tic;

% answer = zeros(size(A));

for i=Nc:-1:1

answer(:,i) = A(:,i).*v;

end

toc

tic;

answer = A.*(v*ones(1,Nc));

toc

tic;

answer = A.*(repmat(v,1,Nc));

toc

tic;

answer = bsxfun(@times,v,A);

toc

tic;

answer = A.*v;

toc

tic;

answer = v.*A;

toc

Yields the following times

Elapsed time is 0.915789 seconds.

Elapsed time is 0.863713 seconds.

Elapsed time is 0.911458 seconds.

Elapsed time is 0.685757 seconds.

Elapsed time is 0.302117 seconds.

Elapsed time is 0.285025 seconds.

For the last two, I was checking if order matters. It does not; whichever is first is always slightly faster, but moving "answer = v.*A" earlier makes it the slightly slower one.

Jeffrey Daniels
on 7 Dec 2017

I found the same as David on a larger matrix using Ver 2016b.

A=rand(100000,1000);

[Nr,Nc] = size(A);

v = rand(Nr,1);

tic

answer = A.*v(:,ones(1,Nc));

toc

tic

answer = bsxfun(@times,A,v);

toc

tic

answer = v.*A;

toc

Elapsed time is 2.679808 seconds.

Elapsed time is 2.653772 seconds.

Elapsed time is 1.280892 seconds.

Jonathan
on 26 Feb 2019

Sign in to comment.

Answer by Jan
on 9 May 2011

Simplification:

% 4. Indexing

res = t(:, ones(1, m)) .* A;

This is exactly the same effect as performed inside REPMAT. On my Matlab 2009a on a single core BSXFUN is reproducibly the fastest. It is very near to a naively implemented C-Mex version (single threaded, no SSE).

EDITED: Matt's pre-allocation is fast. Matlab 2009a, WinXP 32, 1.5GHz Pentium-M (single core)

t = (1:1000)';

A = rand(1000, 200);

tic; for rep = 1:100; R = func1(t, A); end; toc

function R = func1(t, A) % 0.78 sec

[n, m] = size(A);

R = zeros(n, m);

for i = 1:m

R(:, i) = t .* A(:, i);

end

function R = func2(t, A) % 0.54 sec

[n, m] = size(A);

for i = m:-1:1

R(:, i) = t .* A(:, i);

end

function R = func3(t, A) % 0.54 sec

R = bsxfun(@times, t, A);

function A = func4(t, A) % 0.75 sec

[n, m] = size(A);

for i = 1:m

A(:, i) = t .* A(:, i);

end

James Tursa
on 9 May 2011

James Tursa
on 9 May 2011

@Jan: The mxFastZeros function signature appears to be this based on some quick testing:

mxArray *mxFastZeros(mxComplexity ComplexFlag, mwSize m, mwSize n);

It returns a double matrix presumably filled with zeros. And it is indeed fast (as fast as my UNINIT function). It does have a drawback in that the returned mxArray is a NORMAL array and not a TEMPORARY array, so you can't return it as-is back to MATLAB without first hacking into it and changing the VariableType field to 4 (TEMPORARY). I got strange results unless I changed the VariableType first. Also, it appears to only return a double matrix. I tried to add a mxClassID argument at the end but it didn't change anything. mxFastZeros appears in all of the MATLAB versions that I have (R2006b through R2011a PC WinXP 32-bit). Don't know about other versions.

Jan
on 10 May 2011

Sign in to comment.

Answer by Andrew Newell
on 8 May 2011

I don't find that. If my input is

clear

t=(1:1000)'; A=rand(length(t),200);

[n,m] = size(A);

%%1. Looping

tic

for j = 1 : m

res(:,j) = t .* A( :, j ); % You had an error in this line

end

toc

%%2. Repmat

tic

tmat = repmat( t, 1, m );

res = tmat .* A;

toc

%%3. diag

tic

res = diag( t ) * A;

toc

%%4. Indexing

tic

A = A( : );

t = t( :, ones( 1, m ) );

t = t( : );

res = A .* t;

res = reshape( res, n, m );

toc

The output is

Elapsed time is 0.300359 seconds.

Elapsed time is 0.003952 seconds.

Elapsed time is 0.035276 seconds.

Elapsed time is 0.004409 seconds.

Andrei Bobrov
on 8 May 2011

my research and the result

clear

a = zeros(5,100);

for jj = 1:100

t=rand(1000,1); A=rand(length(t),200);

[n,m] = size(A);

% 1. Looping

tic

for j = 1 : m

res(:,j) = t .* A( :, j );

end

a(1,jj)=toc;

% 2. Repmat

tic

tmat = repmat( t, 1, m );

res = tmat .* A;

a(2,jj)=toc;

% 3. diag

tic

res = diag( t ) * A;

a(3,jj)=toc;

% 4. Indexing

tic

A1 = A( : );

t1 = t( :, ones( 1, m ) );

t1 = t1( : );

res = A1 .* t1;

res = reshape( res, n, m );

a(4,jj)=toc;

% 5. bsxfun

tic

res = bsxfun(@times,t,A );

a(5,jj)=toc;

end

AA = mean(a,2)

AA =

0.0006

0.0023

0.0413

0.0031

0.0015

Sign in to comment.

Answer by Anders Munk-Nielsen
on 23 Mar 2012

I'm having similar problems in that I have an NxM matrix, where N is extremely large (5 million) and M is fairly small (below 100) and I have a 1xM vector that I want to multiply onto all N rows. I'm trying to improve on an implementation that uses repmat which is incredibly slow and in my intuition this is due to the huge memory footprint.

My question: Why isn't bsxfun any faster than it is? I would expect it to be waaay faster but it's only somewhat faster. If I were to implement a mex function to do this (a multiplyVectorToMatrix function), what is the crucial feature to get built in? Multithreaded'ness?

Anders Munk-Nielsen
on 23 Mar 2012

Btw, here is some code to run what I have in mind (By the way, M is K below):

clc; clear all;

N = 300000;

S = 1000;

K = 10;

x = rand(N,K);

B = rand(K,1);

eps = randn(N,1);

Y = x*B + eps;

Bhat = zeros(K,S);

C = randn(N,S);

%% repmat

tic;

px = (x'*x)\x';

Bhat = px*(repmat(Y,1,S)-C);

fprintf('repmat took %1.3g seconds.\n',toc);

%% for loop

tic;

px = (x'*x)\x';

for s=1:S;

%YmC = Y-C(:,s);

Bhat(:,s) = px*(Y-C(:,s));

end;

fprintf('for took %1.3g seconds.\n',toc);

%% parfor loop

tic;

px = (x'*x)\x';

parfor s=1:S;

Bhat(:,s) = px*(Y-C(:,s));

end;

fprintf('Parfor took %1.3g seconds.\n',toc);

%% bsxfun

tic;

px = (x'*x)\x';

Bhat = px*bsxfun(@minus,Y,C);

fprintf('Bsxfun took %1.3g seconds.\n',toc);

Sean de Wolski
on 23 Mar 2012

bsxfun(@times,yourNxM,yourMx1)

doc bsxfun %for more info.

Sean de Wolski
on 23 Mar 2012

Your setup for timign this isn't really fair. Look at Matt's framework for a more fair comparison.

Sign in to comment.

Opportunities for recent engineering grads.

Apply Today
## 3 Comments

## Andrew Newell (view profile)

## Direct link to this comment

https://ch.mathworks.com/matlabcentral/answers/7039-how-to-multiply-a-vector-with-each-column-of-a-matrix-most-efficiently#comment_14636

## Eike (view profile)

## Direct link to this comment

https://ch.mathworks.com/matlabcentral/answers/7039-how-to-multiply-a-vector-with-each-column-of-a-matrix-most-efficiently#comment_14812

## Mehrdad Isfandbod (view profile)

## Direct link to this comment

https://ch.mathworks.com/matlabcentral/answers/7039-how-to-multiply-a-vector-with-each-column-of-a-matrix-most-efficiently#comment_551529

Sign in to comment.