I want to multiply each 2D matrix slice of a 4D matrix, X, with a number of vectors, stored in Beta. If I was doing it on the GPU a function like pagefun would be perfect for this since it passes on slices of high dimensional arrays so they can be done in parallel. But since that function does not exist on the CPU I’m really struggling to find an efficient way of writing this. Right now I have this horrible nested for-loop:
for a = 1 : 10
for b = 1 : 245
Y(:,a,b) = X(:,:,a,b) * Beta(:,b,a);
Which is as ugly as it is slow. Can anyone give be some tips on how this can be done in a better and faster way without nested loops? (The Y variable is of course pre-allocated before I run this).